DATA WAREHOUSE DESIGN - unibo.it srizzi/PDF/ WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse Lifecycle Toolkit

Download DATA WAREHOUSE DESIGN - unibo.it srizzi/PDF/  WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse Lifecycle Toolkit

Post on 22-Apr-2018

217 views

Category:

Documents

4 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>1</p><p>DATA WAREHOUSE DESIGNDATA WAREHOUSE DESIGN</p><p>ICDE 2001 TutorialICDE 2001 Tutorial</p><p>Stefano Rizzi, Matteo Golfarelli</p><p>DEIS - University of Bologna, Italy</p><p>2</p><p>MotivationMotivationBuilding a data warehouse for an enterprise is a huge and complextask, which requires an accurate planning aimed at devisingsatisfactory answers to organizational and architectural questions.Despite the pushing demand for working solutions coming fromenterprises and the wide offer of advanced technologies fromproducers, few attempts towards devising a specific methodology fordata warehouse design have been made. On the other hand, thestatistic reports related to DW project failures state that a majorcause lies in the absence of a global view of the design process: inother terms, in the absence of a design methodology.</p><p>SummarySummary Introduction to Data Warehousing Conceptual design of Data Warehouses Workload-based logical design for ROLAP Indexes for physical design</p></li><li><p>3</p><p>IntroductionIntroductionto Data Warehousingto Data Warehousing</p><p>Stefano Rizzi</p><p>4</p><p> Information systems are rooted in the relationshipbetween information, decision and control.</p><p> An IS should collectcollect and classifyclassify the information, bymeans of integratedintegrated and suitablesuitable procedures, inorder to produce in timein time and at the right levelsright levels thesynthesis to be used to support the decisionalprocess, as well as to administrate and globallycontrol the enterprise activity.</p><p>Information Systems: profile and roleInformation Systems: profile and role</p></li><li><p>5</p><p>Manufacturing</p><p>system</p><p>Information</p><p>systemInformation</p><p>Finished product</p><p>Information as a resourceInformation as a resource Information is an increasing value resource,</p><p>required from managers to schedule and monitoreffectively the enterprise activities.</p><p> Information is the first matter which is transformedby information systems like unfinished products aretransformed by manufacturing systems.</p><p>6</p><p>Amount</p><p>Value Strategic directions</p><p>Reports</p><p>Selected information</p><p>Primary information sources</p><p>Value of informationValue of information</p><p> Information is an enterprise resource like capital, firstmatters, plants and people; thus, it has a cost.</p><p> Hence, understanding the value of information isimportant.</p></li><li><p>7</p><p>Different kinds of information systemsDifferent kinds of information systems</p><p>SalesSales and andmarketingmarketing</p><p>ManufacturingManufacturing FinanceFinance AccountingAccounting HumanHumanresourcesresources</p><p>Oper</p><p>atio</p><p>nal</p><p>Oper</p><p>atio</p><p>nal OperationalOperational</p><p>managersmanagersTPS</p><p>Know</p><p>ledg</p><p>e</p><p>Know</p><p>ledg</p><p>eKnowledgeKnowledge and anddata data workersworkers</p><p>OASKWS</p><p>Man</p><p>agem</p><p>ent</p><p>Man</p><p>agem</p><p>ent</p><p>MiddleMiddlemanagersmanagers</p><p>MISDSS</p><p>Stra</p><p>tegi</p><p>c</p><p>Stra</p><p>tegi</p><p>c SeniorSeniormanagersmanagers</p><p>ESS</p><p>8</p><p>The The Data WarehouseData Warehouse phenomenon phenomenon</p><p> Usual complaints:Usual complaints:</p><p>We have tons of data but we cannot accessthem!How can people playing the same roleproduce substantially different results?We want to slice and dice data in anypossible way!Show me only what is important!Everyone knows some data are incorrect...</p><p>(R. Kimball, The Data Warehouse Toolkit)</p></li><li><p>9</p><p>Data WarehousingData Warehousing A collection of technologies and tools supporting the</p><p>knowledge worker (executive, manager, analyst) inanalysing data aimed at decision making and atimproving the knowledge assets of the enterprise.</p><p>Data WarehouseAt the core of the architecture of modern information systems,it is a data repository:</p><p>Oriented to subjectsIntegrated and consistentRepresenting temporal evolutionNon volatile</p><p>The data warehouse is regularly refreshed, permanently growing,The data warehouse is regularly refreshed, permanently growing,logically centralised and easily accessed by users, essentially read-onlylogically centralised and easily accessed by users, essentially read-only</p><p>10</p><p>External dataOperational data (relational, legacy)</p><p>ReportingtoolsAnalysis tools</p><p>(OLAP)</p><p>WarehouseWarehouseSummarySummarydatadata</p><p>AccessAccess</p><p>Data mining</p><p>Data WarehouseData Warehouse</p><p>What-Ifanalysis</p><p>ETL tools</p></li><li><p>11</p><p>Data Data MartsMarts</p><p>Data Data WarehouseWarehouse</p><p>Data mart</p><p> ClientClientmanagementmanagement</p><p>GeographicalGeographicalregionsregions</p><p>SupplierSuppliermanagementmanagement</p><p>MarketingMarketingFinanceFinance</p><p>Replication and broadcasting</p><p>12</p><p>Subject Subject vsvs Process Process</p><p>reservations</p><p>charge</p><p>Medicalreports</p><p>admissions</p><p>Emphasis on applications</p><p>patient</p><p>region</p><p>consumption</p><p>Emphasis on subjects</p></li><li><p>13</p><p>Integration and consistencyIntegration and consistency</p><p>DB</p><p>DW</p><p>Externaldata</p><p>Text files</p><p>Schema IntegrationExtraction</p><p>TransformationCleaning</p><p>ValidationFilteringLoading</p><p>wrappers mediators</p><p>loaders</p><p>14</p><p>Temporal evolutionTemporal evolution</p><p>OLTPDW</p><p>Restricted historical content, Often time is not includedin keys,Data are updated</p><p>Rich historical content,Time is included in keys,Snapshots cannot beupdated</p><p>Current values Snapshot</p></li><li><p>15</p><p>Non-volatilityNon-volatility</p><p>OLTP</p><p>insert delete</p><p>updateDW</p><p>load</p><p>Huge data volumes:from 20 GBs to some TBs</p><p>in a few years</p><p> In a DW, no advanced techniques for transaction managementare required (differently from OLTP systems)</p><p> Key issues are the query throughput and the resilience</p><p>access</p><p>16</p><p>DWDW vs vs. OLTP. OLTP</p><p> 90% ad hoc queries</p><p> Mostly read access Hundreds users Denormalised Supports historical</p><p>versions Optimised for accesses</p><p>involving mostdatabase</p><p> Based on summarydata</p><p> 90% predefinedtransactions</p><p> Read/write access Thousands users Normalised Does not support historical</p><p>versions Optimised for accesses</p><p>involving a small databasefraction</p><p> Based on elemental data</p></li><li><p>17</p><p>ROLAP (Relational OLAP)ROLAP (Relational OLAP)</p><p> Intermediate level server between a relational back- end serverand the front-end client</p><p> Specialised middleware Generation of SQL multi-statements for the back-end server Query scheduling</p><p>MOLAP (Multidimensional OLAP)MOLAP (Multidimensional OLAP)</p><p> Direct support of multi-dimensional views Special data structures (e.g., multi-dimensional arrays) Compression techniques Intelligent disk/memory caching Pre-computation Complex analysis</p><p>18</p><p>The technological progressThe technological progress</p><p>datadata</p><p>knowledgeknowledge</p><p>1970 1980 1990 2000</p><p>Statistics Statistics &amp;&amp;reportingreporting</p><p>DataDataWarehousingWarehousing</p><p>OLAPOLAP</p><p>DataDataMiningMining</p><p>PatternPatternWarehousingWarehousing</p><p>Ref</p><p>inem</p><p>ent</p><p>Source:InformationDiscovery</p></li><li><p>19</p><p>The Data The Data Warehouse Warehouse MarketMarket</p><p>0</p><p>500</p><p>1000</p><p>1500</p><p>2000</p><p>2500</p><p>3000</p><p>3500</p><p>4000</p><p>4500</p><p>1998 1999 2000 2001 2002</p><p>RDBMS</p><p>OLAP</p><p>0</p><p>5 0</p><p>100</p><p>150</p><p>200</p><p>250</p><p>300</p><p>350</p><p>400</p><p>1998 1999 2000 2001 2002</p><p>Data Marts</p><p>ETL</p><p>Data Quality</p><p>Metadata</p><p>Source: Shilakes, Tylman -Enterprise Information Portals</p><p>20</p><p>The DW life-cycleThe DW life-cycle</p><p>Objective definition andplanning</p><p>Clearly determine the scopes,define the borders, estimatedimensions, choose the approach todesign, evaluate the benefits</p><p>Infrastructure design Choose the technologies and thetools, analyse the architecturalsolutions, solve the managementproblems</p><p>Design and implementationof applications Add iteratively new data marts</p><p>and applications to the warehouse</p></li><li><p>21</p><p>BibliographyBibliography R. Barquin, S. Edelstein. Planning and Designing the Data Warehouse. Prentice Hall</p><p>(1996).</p><p> S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology.SIGMOD Record 26,1 (1997).</p><p> G. Colliat. OLAP, relational and multidimensional database systems. SIGMOD Record25, 3 (1996).</p><p> M. Demarest. The politics of data warehousing.Http://www.hevanet.com/demarest/marc/dwpol.html</p><p> U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. Data mining and knowledge discoveryin databases: an overview. Comm. of the ACM 39, 11 (1996).</p><p> W.H. Inmon. Building the data warehouse. John Wiley &amp; Sons (1996). S. Kelly. Data Warehousing in Action. John Wiley &amp; Sons (1997).</p><p> R. Kimball. The data warehouse toolkit. John Wiley &amp; Sons (1996).</p><p> R. Kimball, L. Reeves, M. Ross, W. Thornthwaite. The data Warehouse LifecycleToolkit. John Wiley &amp; Sons (1998).</p><p> C. Shilakes, J. Tylman. Enterprise Information Portals.Http://www.sagemaker.com/company/downloads/eip/indepth.pdf</p><p> P. Vassiliadis. Gulliver inthe land of data warehousing: practical experiences andobservations of a researcher. Proc. DMDW2000 (2000).</p><p> J. Widom. Research Problems in Data Warehousing. Proc. CIKM (1995).</p><p>22</p><p>Conceptual modellingConceptual modellingfor Data Warehousingfor Data Warehousing</p><p>Stefano Rizzi</p></li><li><p>23</p><p>Why a new conceptual model?Why a new conceptual model?</p><p> While it is universally recognised that a DW leans on amultidimensional model, there is no agreement on theapproach to conceptual modelling.</p><p> On the other hand, an accurate conceptual design isthe necessary foundation for building a goodinformation system.</p><p> The Entity/Relationship model is widespread in theenterprises, but.</p><p>"Entity relation data models [...] cannot be understoodby users and they cannot be navigated usefully by DBMS</p><p>software. Entity relation models cannot be used as thebasis for enterprise data warehouses. (Kimball, 96)</p><p>24</p><p>SalesSales</p><p>Stor</p><p>eSt</p><p>ore</p><p>ProductProductTi</p><p>meTi</p><p>me</p><p>The multidimensional data modelThe multidimensional data modelNumber of Cokecans sold atBIGSTORES inLondon on 10/10/99</p><p>Number of Pepsicans sold at allBIGSTORES on10/10/99</p><p>Number of Fantacans globally sold</p></li><li><p>25</p><p>Basic Basic terminologyterminology</p><p> Fact (cube, target). It is a focus of interest for the decision-making process; typically, it models an event occurring in theenterprise world (sales, shipments, purchases). It is essential fora fact to have some dynamic aspects, i.e., to evolve somehowacross time.</p><p> Measures (attributes, variables, metrics, properties). They arecontinuously valued (typically numerical) attributes which describea fact from different points of view. For instance, each sale ismeasured by its revenue.</p><p> Dimensions. They are discrete attributes which determine theminimum granularity adopted to represent facts. Typicaldimensions for the sale fact are product, store and date.</p><p> Hierarchies (dimensions). They contain dimensionattributes (levels, parameters) connected in a tree-likestructure by many-to-one relationships (functional dependencies).</p><p>26</p><p>DW DW modellingmodelling in the in the literatureliterature</p><p>Gyssens, Lakshmanan 97</p><p>Agrawal et al. 95 </p><p>Li, Wang 96</p><p>Cabibbo, Torlone 98Datta, Thomas 97</p><p>Vassiliadis 98</p><p>Tryfona et al. 99 </p><p>Hsemann et al. 00</p><p>Sapia et al. 98</p><p>Franconi, Sattler 99</p><p>Golfarelli et al. 98</p></li><li><p>27</p><p>LOGICALLOGICAL</p><p>CONCEPTUALCONCEPTUAL</p><p>DW DW modellingmodelling in the in the literatureliterature</p><p>Gyssens, Lakshmanan 97</p><p>Agrawal et al. 95 </p><p>Li, Wang 96</p><p>Cabibbo, Torlone 98Datta, Thomas 97</p><p>Vassiliadis 98</p><p>Tryfona et al. 99 </p><p>Hsemann et al. 00</p><p>Sapia et al. 98</p><p>Franconi, Sattler 99</p><p>Golfarelli et al. 98</p><p>28</p><p>GRAPHICALGRAPHICAL</p><p>FORMALFORMAL</p><p>DW DW modellingmodelling in the in the literatureliterature</p><p>Gyssens, Lakshmanan 97</p><p>Agrawal et al. 95 </p><p>Li, Wang 96</p><p>Cabibbo, Torlone 98Datta, Thomas 97</p><p>Vassiliadis 98</p><p>Tryfona et al. 99 </p><p>Hsemann et al. 00</p><p>Sapia et al. 98</p><p>Franconi, Sattler 99</p><p>Golfarelli et al. 98</p></li><li><p>29</p><p>ALGEBRAALGEBRA</p><p>DW DW modellingmodelling in the in the literatureliterature</p><p>Gyssens, Lakshmanan 97</p><p>Agrawal et al. 95 </p><p>Li, Wang 96</p><p>Cabibbo, Torlone 98Datta, Thomas 97</p><p>Vassiliadis 98</p><p>Tryfona et al. 99 </p><p>Hsemann et al. 00</p><p>Sapia et al. 98</p><p>Franconi, Sattler 99</p><p>Golfarelli et al. 98</p><p>30</p><p>DESIGNDESIGN</p><p>DW DW modellingmodelling in the in the literatureliterature</p><p>Gyssens, Lakshmanan 97</p><p>Agrawal et al. 95 </p><p>Li, Wang 96</p><p>Cabibbo, Torlone 98Datta, Thomas 97</p><p>Vassiliadis 98</p><p>Tryfona et al. 99 </p><p>Hsemann et al. 00</p><p>Sapia et al. 98</p><p>Franconi, Sattler 99</p><p>Golfarelli et al. 98</p></li><li><p>31</p><p>Conceptual modelsConceptual models</p><p> Sapia, Blaschka, Hfling, Dinter (1998)</p><p>dimension level</p><p>roll-up relationship</p><p>fact relationship</p><p>attribute</p><p>32</p><p>Conceptual models (2)Conceptual models (2)</p><p> Franconi, Sattler (1999)</p><p>dimensiontarget</p><p>property</p><p>level</p><p>aggregated entity</p></li><li><p>33</p><p>Conceptual models (3)Conceptual models (3)</p><p> Hsemann, Lechtenbrger, Vossen (2000)</p><p>dimension</p><p>dimensionlevel</p><p>measure property attribute</p><p>optional property attribute</p><p>optional</p><p>aggregation path</p><p>fact</p><p>34</p><p>The Dimensional Fact ModelThe Dimensional Fact Model</p><p>The Dimensional Fact ModelDimensional Fact Model (DFM) is a graphicalconceptual model for DWs, aimed to: Effectively support conceptual design; Provide an environment where user queries can be formulated</p><p>intuitively; Enable communication between the designer and the final user</p><p>in order to refine requirement specification; Supply a stable platform for logical design; Provide an expressive and non-ambiguous documentation.</p><p>The DFM is independent of the target logical model(multidimensional or relational)</p></li><li><p>35</p><p> Three levels of conceptual documentation are provided: Fact scheme: represents a fact of interest and the associated</p><p>measures, dimensions and hierarchies. Data Mart scheme: summarizes the fact schemes which</p><p>constitute each data mart and emphasize the feasibleconnections between them.</p><p> Data Warehouse scheme: shows the different data martsemphasizing their overlaps, the different profiles of the usersaccessing them, and the operational sources which feedthem.</p><p>The Dimensional Fact Model (2)The Dimensional Fact Model (2)</p><p> Each documentation level is integrated by glossarieswhich explain the names adopted within the schemes,define a connection between the DW data and theoperational sources, express data volumes.</p><p> Data mart schemes are associated to the workloadspecification.</p><p>36</p><p>hierarchy</p><p>Fact schemesFact schemes</p><p>A fact expresses a many-to-many relationship between its dimensions</p><p>state</p><p>SALE</p><p>category</p><p>type</p><p>quarter month</p><p>store</p><p>storecity</p><p>county</p><p>sales manager</p><p>year</p><p>sale district</p><p>date</p><p>holidayday of week</p><p>marketinggroup</p><p>department</p><p>brand</p><p>qty soldrevenueunit priceno. of customers</p><p>brand city</p><p>product</p><p>week</p><p>dimensionattribute</p><p>measure</p><p>fact</p><p>dimension</p></li><li><p>37</p><p>address</p><p>non-dimensionattribute</p><p>phone</p><p>manager</p><p>diet</p><p>manager</p><p>promotion</p><p>price reduction</p><p>cost</p><p>end datebegin date</p><p>ad type</p><p>optionality</p><p>state</p><p>SALE</p><p>category</p><p>type</p><p>quarter month</p><p>store</p><p>storecity</p><p>county</p><p>sales manager</p><p>year</p><p>sale district</p><p>date</p><p>holidayday of week</p><p>marketinggroup</p><p>department</p><p>brand</p><p>qty soldrevenueunit priceno. of customers</p><p>brand city</p><p>product</p><p>week</p><p>Fact schemes (2)Fact schemes (2) A non-dimension attribute contains additional information</p><p>about a dimension attribute, and is typically connected toit by a one-to-one relationship.It cannot be usedfor aggregation.</p><p> Some links betweenattributes canbe optional.</p><p>38</p><p>Fact schemes (3)Fact schemes (3)</p><p> Convergence Cross-dimension attributes Additivity,</p><p>non-additivity,non-aggregability</p><p> Overlap</p><p>begin date</p><p>end date</p><p>store state</p><p>diet</p><p>marketinggroup</p><p>brand city</p><p>store county</p><p>store city</p><p>SALE</p><p>product</p><p>qty soldrevenueunit priceno. of customers</p><p>category</p><p>type</p><p>department</p><p>brand</p><p>store</p><p>promotion</p><p>ad type</p><p>price reduction</p><p>fiscalweek</p><p>fiscalquarter</p><p>fiscalmonth</p><p>fiscalyear</p><p>date</p><p>week</p><p>day of week</p><p>quarter monthyear</p><p>manager</p><p>sale district</p><p>phone</p><p>address</p><p>V.A.T.</p><p>non-aggregabilitycross-dimension</p><p>attribute</p><p>convergence</p></li><li><p>39</p><p>The SHIPMENTS fact schemeThe SHIPMENTS fact scheme</p><p>marketinggroup</p><p>brand city</p><p>store state</p><p>store city</p><p>warehousestate</p><p>warehouse city</p><p>SHIPMENTTO STORES</p><p>product</p><p>qty shippedshipping cost</p><p>category</p><p>type</p><p>war...</p></li></ul>

Recommended

View more >