data warehouse
DESCRIPTION
Data Warehouse. Definition. Data Warehouse: A Subject-oriented (eg customers, patients) integrated (consistent names, formats..) time-variant (time dimension so may be used for historical records) non-volatile (refreshed from live system, cannot be updated by end-users) - PowerPoint PPT PresentationTRANSCRIPT
TM 14-TM 14-11Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Data Warehouse
TM 14-TM 14-22Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Definition
• Data Warehouse: A• Subject-oriented (eg customers, patients)
• integrated (consistent names, formats..)
• time-variant (time dimension so may be used for historical records)
• non-volatile (refreshed from live system, cannot be updated by end-users)
– collection of data used in support of management decision making processes.
TM 14-TM 14-33Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Need forData Warehousing
• Separation of– operational (used by business in real time) and– informational systems (support decision
making) and– data.
• Data warehouse created for informational system.
TM 14-TM 14-44Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Examples of heterogeneous data in operational system.
Inconsistent keySynonyms
Free-form fields
Inconsistent data
Missing data
TM 14-TM 14-55Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Factors AllowingData Warehousing
• Relational DBMS.
• Advances in hardware: speed and storage capacity.
• End-user computing interfaces and tools.
TM 14-TM 14-66Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Generic (two-level) data warehouse architecture
Two-level1. Operational data.2. Enterprise data warehouse (EDW)- single source of data for decision making
TM 14-TM 14-77Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Three-layer architecture
1. Operational data.2. Enterprise data warehouse (EDW)- single source of data for decision making.(reconciled data)3. Data marts - limited scope; data selected from EDW.(derived data)
TM 14-TM 14-88Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Reasons for theThree-Level Architecture
• EDW and data marts have different purposes and data architectures.
• Data transformation is complex and is best performed in two steps.
• Data marts customized decision support for different groups.
TM 14-TM 14-99Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Example of DBMS log entry
Status (before and after images) vs.Event data (database action resulting from the transaction).
Data Characteristics
TM 14-TM 14-1010Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Transient operational data
Transient (changes written to the record eg. Phone number) vs.Periodic data (never changed. eg.accounting records).
Fig. 14-6,7.
Data Characteristics
TM 14-TM 14-1111Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Reconciled DataCharacteristics
• Detailed
• Historical
• Normalized
• Enterprise-wide
• Quality controlled
TM 14-TM 14-1212Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The Data Reconciliation Process• Capture
– Static - initial load.
– Incremental - ongoing update.
• Scrub or data cleansing– Pattern
recognition and other artificial intelligence techniques.
TM 14-TM 14-1313Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The Data Reconciliation Process• Transform
– Convert the data format from the source to the target system.
– Record-Level Functions• Selection.
• Joining.
• Aggregation (for data marts).
– Field-Level Functions• Single-field transformation, Fig. 14-9.
• Multi-field transformation, Fig. 14-10.
TM 14-TM 14-1414Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The Data Reconciliation Process
• Load and Index– Refresh Mode
• When the warehouse is first created.
• Static data capture.
– Update Mode• Ongoing update of the warehouse.
• Incremental data capture.
TM 14-TM 14-1515Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Derived DataCharacteristics (Data marts)
• Type of data– Detailed, possibly periodic.– Aggregated.
• Distributed to departmental servers.
• Implemented in star schema.
TM 14-TM 14-1616Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Star Schema
• Also called the dimensional model.
• Fact and dimension tables.
• Fig. 14-11,12, 13. (following)
• Grain of a fact table - time period for each record.
• Multiple Fact Table - Fig. 14-14.
• Snowflake Schema - Fig. 14-15.
TM 14-TM 14-1717Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Components of a star schema
TM 14-TM 14-1818Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Star schema example
TM 14-TM 14-1919Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Star schema with sample data
TM 14-TM 14-2020Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Star schema with two fact tables
TM 14-TM 14-2121Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Example of snowflake sample
TM 14-TM 14-2222Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
Types of Data Marts
• Dependent - Populated from the EDW.
• Independent - Data taken directly from the operational databases.
TM 14-TM 14-2323Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The User Interface
• The role of metadata.
• Traditional query and reporting tools.
• On-line analytical processing. (OLAP)– The use of a set of graphical tools that provides users
with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
TM 14-TM 14-2424Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The User Interface– Slicing a cube.– Pivot
• Rotate the view for a particular data point to obtain another perspective.
• E.g. take a value from the units column and obtain by-store values.
– Drill-down - Fig. 14-17.
Slicing a data cube
TM 14-TM 14-2525Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.
The User Interface
• Data Mining– Knowledge discovery.– Search for patterns in the data.– Table 14-3, 4.
• Data Visualization