the data warehouse environment - building the data warehouse

Upload: bondaigia

Post on 03-Apr-2018

256 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    1/52

    Building Data WareHouse

    by InmonChapter 2: The Data Warehouse Environment

    http://it-slideshares.blogspot.com/IT-Slideshares

    http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/
  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    2/52

    2. The Data WarehouseEnvironment1. The Structure of the Data Warehouse2. Subject Orientation

    3. Day 1 to Day n Phenomenon

    4. Granularity5. Exploration and Data Mining

    6. Living Sample Database

    7.Partitioning as a Design Approach8. Structuring Data in the Data Warehouse

    9. Auditing and the Data Warehouse

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    3/52

    2. The Data Warehouse Environment(cont.)

    10. Data Homogeneity and Heterogeneity

    11. Purging Warehouse Data

    12. Reporting and the Architected

    Environment

    13. The Operational Window ofOpportunity

    14. Incorrect Data in the Data Warehouse

    15. Summary

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    4/52

    2.0 Introduction datawarehouse characteristics Subject-oriented in regards to DSS

    Integrated of multiple data sources

    Non-volatile data archive

    Time-Variant collection of data insupport of DSS report

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    5/52

    2.1. data warehouse characteristics

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    6/52

    2.1. data warehouse characteristics

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    7/52

    2.1. The Structure of the Data Warehouse

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    8/52

    2.1 The Structure of the Datawarehouse

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    9/52

    2.2. Subject Orientation

    The data warehouse is oriented to the majorsubject areas of the corporation that havebeen defined in the high-level corporate datamodel. Typical subject areas include the

    following:

    Customer Product Transaction or activity Policy ClaimAccount

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    10/52

    2.2.1

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    11/52

    2.2.2 Subject Orientation (cont)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    12/52

    2.2.3 Subject-Orientation (cont)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    13/52

    2.2.4 Subject Orientation (cont)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    14/52

    2.3. Day 1 to Day n Phenomenon Data warehouses are not built all at once. data warehouse be built in an orderly,

    iterative, step-at-a-time fashion.

    The big bang approach to data warehousedevelopment is simply an invitation todisaster and is never an appropriatealternative.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    15/52

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    16/52

    2.4. Granularity

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    17/52

    2.4.1. The Benefits ofGranularity The granular data found in the data warehouse is the

    key to reusability.

    Looking at the data in different ways is only oneadvantage of having a solid foundation.

    Focus on specific needs of each DSS report e.g. daily,monthly, quarterly or yearly or even multiple years trendingreports

    Another related benefit of a low level of granularity isflexibility

    Another benefit of granular data is that it contains ahistory of activities and events across the corporation.

    largest benefit of a data warehouse foundation is thatfuture unknown requirements can be accommodated.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    18/52

    2.4.2. An Example of Granularity

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    19/52

    2.4.2.1

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    20/52

    2.4.3. Dual Levels of Granularity

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    21/52

    2.4.3.1 Telephone example

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    22/52

    2.4.3.2 Telephone example (cont)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    23/52

    2.4.3.3 Telephone Example (cont)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    24/52

    2.5. Exploration and DataMining Granular data in Data warehouse support Data

    marts

    Support process of data mining or data exploration

    References

    Exploration Warehousing: Turning

    Business Information into Business

    Opportunity(Hoboken, N.J.: Wiley, 2000)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    25/52

    2.6. Living Sample Database

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    26/52

    2.7. Partitioning as a Design Approach

    Proper partitioning can benefit the datawarehouse in several ways:

    Loading dataAccessing data

    Archiving data

    Deleting data Monitoring data

    Storing data

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    27/52

    2.7.1. Partitioning of Data

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    28/52

    2.7.1. Partitioning of Data (cont.)

    Following are some of the tasks that cannoteasily be performed when data resides inlarge physical units:

    Restructuring Indexing Sequential scanning, if needed Reorganization Recovery Monitoring

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    29/52

    2.7.1. Partitioning of Data (cont.)

    Data can be divided by many criteria, suchas:

    By date

    By line of business

    By geography By organizational unit

    By all of the above

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    30/52

    2.7.1. Partitioning of Data (cont.)

    As an example of how a life insurance company maychoose to partition by physical units of data.

    data, consider the following physical units of data: 2000 health claims 2001 health claims 2002 health claims 1999 life claims 2000 life claims 2001 life claims 2002 life claims 2000 casualty claims 2001 casualty claims

    2002 casualty claims

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    31/52

    2.8 Structuring Data in the Data Warehouse

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    32/52

    2.8 Structuring Data in the Data Warehouse(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    33/52

    2.8 Structuring Data in the Data Warehouse(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    34/52

    2.8 Structuring Data in the Data Warehouse(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    35/52

    2.8 Structuring Data in the Data Warehouse(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    36/52

    2.8. Structuring Data in the DataWarehouse (cont.)

    There are many more ways to structuredata within the data warehouse. Themost common are these:

    Simple cumulative

    Rolling summary

    Simple direct Continuous

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    37/52

    2.8. Structuring Data in the DataWarehouse (cont.)

    At the key level, data warehouse keysare inevitably compoundedkeys.There are two compellingreasons for this:

    Dateyear, year/month,year/month/day, and so onis almostalways a part of the key.

    Because data warehouse data ispartitioned, the different componentsof the partitioning show up as part ofthe key.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    38/52

    2.8. Structuring Data in the Data Warehouse(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    39/52

    2.9 Auditing and the Data Warehouse

    Data that otherwise would not find itsway into the warehouse suddenly has tobe there.

    The timing of data entry into the

    warehouse changes dramatically whenan auditing capability is required. The backup and recovery restrictions for

    the data warehouse change drastically

    when an auditing capability is required.Auditing data at the warehouse forces

    the granularity of data in the warehouseto be at the very lowest level.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    40/52

    2.10 Data Homogeneity andHeterogeneity

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    41/52

    2.10 Data Homogeneity and Heterogeneity(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    42/52

    2.10 Data Homogeneity andHeterogeneity (cont.)

    The data in the data warehouse then issubdivided by the following criteria:

    Subject area Table

    Occurrences of data within table

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    43/52

    2.10. Data Homogeneity and Heterogeneity(cont.)

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    44/52

    2.11 Purging Warehouse Data

    There are several ways in which data is purged orthe detail of data is transformed, including thefollowing:

    Data is added to a rolling summary file wheredetail is lost.

    Data is transferred to a bulk storage medium froma high-performance medium such as DASD.

    Data is actually purged from the system.

    Data is transferred from one level of thearchitecture to another, such as from theoperational level to the data warehouse level.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    45/52

    2.12 Reporting and the Architected Environment

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    46/52

    2.13. The Operational Window ofOpportunityThe following are some suggestions as to how the operational windowof archival data may look in different industries:

    Insurance2 to 3 years

    Bank trust processing2 to 5 years

    Telephone customer usage30 to 60 days Supplier/vendor activity2 to 3 years

    Retail banking customer account activity30 days

    Vendor activity1 year

    Loans2 to 5 years

    Retailing SKU activity1 to 14 days Vendor activity1 week to 1 month

    Airlines flight seat activity30 to 90 days

    Vendor/supplier activity1 to 2 years

    Public utility customer utilization60 to 90 days

    Supplier activity1 to 5 years

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    47/52

    2.14. Incorrect Data in the Data Warehouse

    Choice 1: Go back into the datawarehouse for July 2 and find theoffending entry. Then, using update

    capabilities, replace the value $5,000with the value $750.

    Choice 2: Enter offsetting entries.

    Choice 3: Reset the account to theproper value on August 16.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    48/52

    2.14. Incorrect Data in the DataWarehouse (cont.)

    Choice 1

    The integrity of the data has beendestroyed. Any report running betweenJuly 2 and Aug 16 will not be able to bereconciled.

    The update must be done in the data

    warehouse environment. In many cases, there is not a single entry

    that must be corrected, but many, manyentries that must be corrected.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    49/52

    2.14. Incorrect Data in the DataWarehouse (cont.)

    Choice 2

    Many entries may have to be

    corrected, not just one. Making asimple adjustment may not be an easything to do at all.

    Sometimes the formula for correctionis so complex that making anadjustment cannot be done.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    50/52

    2.14. Incorrect Data in the DataWarehouse (cont.)

    Choice 2 (cont)

    The ability to simply reset an account

    as of one moment in time requiresapplication and proceduralconventions.

    Such a resetting of values does notaccurately account for the error thathas been made.

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    51/52

    2.15. Summary1. The Structure of the Data Warehouse2. Subject Orientation

    3. Granularity

    4. Exploration and Data Mining5. Living Sample Database

    6. Structuring Data in the Data Warehouse

    7. Auditing and the Data Warehouse

    8. Data Homogeneity and Heterogeneity

    9. Purging Warehouse Data

    2 15 S

  • 7/29/2019 The Data Warehouse Environment - Building the Data WareHouse

    52/52

    2.15. Summary

    10. Reporting and the ArchitectedEnvironment

    11. The Operational Window of

    Opportunity12. Incorrect Data in the Data Warehouse