The Data Warehouse Environment - Building the Data WareHouse

Download The Data Warehouse Environment - Building the Data WareHouse

Post on 03-Apr-2018

217 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 1/52</p><p>Building Data WareHouse</p><p>by InmonChapter 2: The Data Warehouse Environment</p><p>http://it-slideshares.blogspot.com/IT-Slideshares</p>http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/</li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 2/52</p><p>2. The Data WarehouseEnvironment1. The Structure of the Data Warehouse2. Subject Orientation</p><p>3. Day 1 to Day n Phenomenon</p><p>4. Granularity5. Exploration and Data Mining</p><p>6. Living Sample Database</p><p>7.Partitioning as a Design Approach8. Structuring Data in the Data Warehouse</p><p>9. Auditing and the Data Warehouse</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 3/52</p><p>2. The Data Warehouse Environment(cont.)</p><p>10. Data Homogeneity and Heterogeneity</p><p>11. Purging Warehouse Data</p><p>12. Reporting and the Architected</p><p>Environment</p><p>13. The Operational Window ofOpportunity</p><p>14. Incorrect Data in the Data Warehouse</p><p>15. Summary</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 4/52</p><p>2.0 Introduction datawarehouse characteristics Subject-oriented in regards to DSS</p><p> Integrated of multiple data sources</p><p> Non-volatile data archive</p><p> Time-Variant collection of data insupport of DSS report</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 5/52</p><p>2.1. data warehouse characteristics</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 6/52</p><p>2.1. data warehouse characteristics</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 7/52</p><p>2.1. The Structure of the Data Warehouse</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 8/52</p><p>2.1 The Structure of the Datawarehouse</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 9/52</p><p>2.2. Subject Orientation</p><p>The data warehouse is oriented to the majorsubject areas of the corporation that havebeen defined in the high-level corporate datamodel. Typical subject areas include the</p><p>following:</p><p> Customer Product Transaction or activity Policy ClaimAccount</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 10/52</p><p>2.2.1</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 11/52</p><p>2.2.2 Subject Orientation (cont)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 12/52</p><p>2.2.3 Subject-Orientation (cont)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 13/52</p><p>2.2.4 Subject Orientation (cont)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 14/52</p><p>2.3. Day 1 to Day n Phenomenon Data warehouses are not built all at once. data warehouse be built in an orderly,</p><p>iterative, step-at-a-time fashion.</p><p>The big bang approach to data warehousedevelopment is simply an invitation todisaster and is never an appropriatealternative.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 15/52</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 16/52</p><p>2.4. Granularity</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 17/52</p><p>2.4.1. The Benefits ofGranularity The granular data found in the data warehouse is the</p><p>key to reusability.</p><p> Looking at the data in different ways is only oneadvantage of having a solid foundation.</p><p>Focus on specific needs of each DSS report e.g. daily,monthly, quarterly or yearly or even multiple years trendingreports</p><p> Another related benefit of a low level of granularity isflexibility</p><p> Another benefit of granular data is that it contains ahistory of activities and events across the corporation.</p><p> largest benefit of a data warehouse foundation is thatfuture unknown requirements can be accommodated.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 18/52</p><p>2.4.2. An Example of Granularity</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 19/52</p><p>2.4.2.1</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 20/52</p><p>2.4.3. Dual Levels of Granularity</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 21/52</p><p>2.4.3.1 Telephone example</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 22/52</p><p>2.4.3.2 Telephone example (cont)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 23/52</p><p>2.4.3.3 Telephone Example (cont)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 24/52</p><p>2.5. Exploration and DataMining Granular data in Data warehouse support Data</p><p>marts</p><p> Support process of data mining or data exploration</p><p> References</p><p> Exploration Warehousing: Turning</p><p>Business Information into Business</p><p>Opportunity(Hoboken, N.J.: Wiley, 2000)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 25/52</p><p>2.6. Living Sample Database</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 26/52</p><p>2.7. Partitioning as a Design Approach</p><p>Proper partitioning can benefit the datawarehouse in several ways:</p><p>Loading dataAccessing data</p><p>Archiving data</p><p>Deleting data Monitoring data</p><p> Storing data</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 27/52</p><p>2.7.1. Partitioning of Data</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 28/52</p><p>2.7.1. Partitioning of Data (cont.)</p><p>Following are some of the tasks that cannoteasily be performed when data resides inlarge physical units:</p><p> Restructuring Indexing Sequential scanning, if needed Reorganization Recovery Monitoring</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 29/52</p><p>2.7.1. Partitioning of Data (cont.)</p><p>Data can be divided by many criteria, suchas:</p><p> By date</p><p> By line of business</p><p>By geography By organizational unit</p><p> By all of the above</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 30/52</p><p>2.7.1. Partitioning of Data (cont.)</p><p>As an example of how a life insurance company maychoose to partition by physical units of data.</p><p> data, consider the following physical units of data: 2000 health claims 2001 health claims 2002 health claims 1999 life claims 2000 life claims 2001 life claims 2002 life claims 2000 casualty claims 2001 casualty claims</p><p>2002 casualty claims </p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 31/52</p><p>2.8 Structuring Data in the Data Warehouse</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 32/52</p><p>2.8 Structuring Data in the Data Warehouse(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 33/52</p><p>2.8 Structuring Data in the Data Warehouse(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 34/52</p><p>2.8 Structuring Data in the Data Warehouse(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 35/52</p><p>2.8 Structuring Data in the Data Warehouse(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 36/52</p><p>2.8. Structuring Data in the DataWarehouse (cont.)</p><p>There are many more ways to structuredata within the data warehouse. Themost common are these:</p><p> Simple cumulative</p><p> Rolling summary</p><p> Simple direct Continuous</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 37/52</p><p>2.8. Structuring Data in the DataWarehouse (cont.)</p><p>At the key level, data warehouse keysare inevitably compoundedkeys.There are two compellingreasons for this:</p><p> Dateyear, year/month,year/month/day, and so onis almostalways a part of the key.</p><p> Because data warehouse data ispartitioned, the different componentsof the partitioning show up as part ofthe key.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 38/52</p><p>2.8. Structuring Data in the Data Warehouse(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 39/52</p><p>2.9 Auditing and the Data Warehouse</p><p> Data that otherwise would not find itsway into the warehouse suddenly has tobe there.</p><p> The timing of data entry into the</p><p>warehouse changes dramatically whenan auditing capability is required. The backup and recovery restrictions for</p><p>the data warehouse change drastically</p><p>when an auditing capability is required.Auditing data at the warehouse forces</p><p>the granularity of data in the warehouseto be at the very lowest level.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 40/52</p><p>2.10 Data Homogeneity andHeterogeneity</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 41/52</p><p>2.10 Data Homogeneity and Heterogeneity(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 42/52</p><p>2.10 Data Homogeneity andHeterogeneity (cont.)</p><p>The data in the data warehouse then issubdivided by the following criteria:</p><p> Subject area Table</p><p> Occurrences of data within table</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 43/52</p><p>2.10. Data Homogeneity and Heterogeneity(cont.)</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 44/52</p><p>2.11 Purging Warehouse Data</p><p>There are several ways in which data is purged orthe detail of data is transformed, including thefollowing:</p><p> Data is added to a rolling summary file wheredetail is lost.</p><p> Data is transferred to a bulk storage medium froma high-performance medium such as DASD.</p><p> Data is actually purged from the system.</p><p> Data is transferred from one level of thearchitecture to another, such as from theoperational level to the data warehouse level.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 45/52</p><p>2.12 Reporting and the Architected Environment</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 46/52</p><p>2.13. The Operational Window ofOpportunityThe following are some suggestions as to how the operational windowof archival data may look in different industries:</p><p> Insurance2 to 3 years</p><p> Bank trust processing2 to 5 years</p><p>Telephone customer usage30 to 60 days Supplier/vendor activity2 to 3 years</p><p> Retail banking customer account activity30 days</p><p> Vendor activity1 year</p><p> Loans2 to 5 years</p><p>Retailing SKU activity1 to 14 days Vendor activity1 week to 1 month</p><p> Airlines flight seat activity30 to 90 days</p><p> Vendor/supplier activity1 to 2 years</p><p> Public utility customer utilization60 to 90 days</p><p>Supplier activity1 to 5 years </p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 47/52</p><p>2.14. Incorrect Data in the Data Warehouse</p><p> Choice 1: Go back into the datawarehouse for July 2 and find theoffending entry. Then, using update</p><p>capabilities, replace the value $5,000with the value $750.</p><p> Choice 2: Enter offsetting entries.</p><p> Choice 3: Reset the account to theproper value on August 16.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 48/52</p><p>2.14. Incorrect Data in the DataWarehouse (cont.)</p><p>Choice 1</p><p> The integrity of the data has beendestroyed. Any report running betweenJuly 2 and Aug 16 will not be able to bereconciled.</p><p> The update must be done in the data</p><p>warehouse environment. In many cases, there is not a single entry</p><p>that must be corrected, but many, manyentries that must be corrected.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 49/52</p><p>2.14. Incorrect Data in the DataWarehouse (cont.)</p><p>Choice 2</p><p> Many entries may have to be</p><p>corrected, not just one. Making asimple adjustment may not be an easything to do at all.</p><p> Sometimes the formula for correctionis so complex that making anadjustment cannot be done.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 50/52</p><p>2.14. Incorrect Data in the DataWarehouse (cont.)</p><p>Choice 2 (cont)</p><p> The ability to simply reset an account</p><p>as of one moment in time requiresapplication and proceduralconventions.</p><p> Such a resetting of values does notaccurately account for the error thathas been made.</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 51/52</p><p>2.15. Summary1. The Structure of the Data Warehouse2. Subject Orientation</p><p>3. Granularity</p><p>4. Exploration and Data Mining5. Living Sample Database</p><p>6. Structuring Data in the Data Warehouse</p><p>7. Auditing and the Data Warehouse</p><p>8. Data Homogeneity and Heterogeneity</p><p>9. Purging Warehouse Data</p><p>2 15 S</p></li><li><p>7/29/2019 The Data Warehouse Environment - Building the Data WareHouse</p><p> 52/52</p><p>2.15. Summary</p><p>10. Reporting and the ArchitectedEnvironment</p><p>11. The Operational Window of</p><p>Opportunity12. Incorrect Data in the Data Warehouse</p></li></ul>