data warehouse

25
TM 14- TM 14-1 Copyright © 1999 Addison Wesley Longman, Inc. Copyright © 1999 Addison Wesley Longman, Inc. Data Warehouse

Upload: nasim-farley

Post on 01-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

Data Warehouse. Definition. Data Warehouse: A Subject-oriented (eg customers, patients) integrated (consistent names, formats..) time-variant (time dimension so may be used for historical records) non-volatile (refreshed from live system, cannot be updated by end-users) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Warehouse

TM 14-TM 14-11Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Data Warehouse

Page 2: Data Warehouse

TM 14-TM 14-22Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Definition

• Data Warehouse: A• Subject-oriented (eg customers, patients)

• integrated (consistent names, formats..)

• time-variant (time dimension so may be used for historical records)

• non-volatile (refreshed from live system, cannot be updated by end-users)

– collection of data used in support of management decision making processes.

Page 3: Data Warehouse

TM 14-TM 14-33Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Need forData Warehousing

• Separation of– operational (used by business in real time) and– informational systems (support decision

making) and– data.

• Data warehouse created for informational system.

Page 4: Data Warehouse

TM 14-TM 14-44Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Examples of heterogeneous data in operational system.

Inconsistent keySynonyms

Free-form fields

Inconsistent data

Missing data

Page 5: Data Warehouse

TM 14-TM 14-55Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Factors AllowingData Warehousing

• Relational DBMS.

• Advances in hardware: speed and storage capacity.

• End-user computing interfaces and tools.

Page 6: Data Warehouse

TM 14-TM 14-66Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Generic (two-level) data warehouse architecture

Two-level1. Operational data.2. Enterprise data warehouse (EDW)- single source of data for decision making

Page 7: Data Warehouse

TM 14-TM 14-77Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Three-layer architecture

1. Operational data.2. Enterprise data warehouse (EDW)- single source of data for decision making.(reconciled data)3. Data marts - limited scope; data selected from EDW.(derived data)

Page 8: Data Warehouse

TM 14-TM 14-88Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Reasons for theThree-Level Architecture

• EDW and data marts have different purposes and data architectures.

• Data transformation is complex and is best performed in two steps.

• Data marts customized decision support for different groups.

Page 9: Data Warehouse

TM 14-TM 14-99Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Example of DBMS log entry

Status (before and after images) vs.Event data (database action resulting from the transaction).

Data Characteristics

Page 10: Data Warehouse

TM 14-TM 14-1010Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Transient operational data

Transient (changes written to the record eg. Phone number) vs.Periodic data (never changed. eg.accounting records).

Fig. 14-6,7.

Data Characteristics

Page 11: Data Warehouse

TM 14-TM 14-1111Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Reconciled DataCharacteristics

• Detailed

• Historical

• Normalized

• Enterprise-wide

• Quality controlled

Page 12: Data Warehouse

TM 14-TM 14-1212Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The Data Reconciliation Process• Capture

– Static - initial load.

– Incremental - ongoing update.

• Scrub or data cleansing– Pattern

recognition and other artificial intelligence techniques.

Page 13: Data Warehouse

TM 14-TM 14-1313Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The Data Reconciliation Process• Transform

– Convert the data format from the source to the target system.

– Record-Level Functions• Selection.

• Joining.

• Aggregation (for data marts).

– Field-Level Functions• Single-field transformation, Fig. 14-9.

• Multi-field transformation, Fig. 14-10.

Page 14: Data Warehouse

TM 14-TM 14-1414Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The Data Reconciliation Process

• Load and Index– Refresh Mode

• When the warehouse is first created.

• Static data capture.

– Update Mode• Ongoing update of the warehouse.

• Incremental data capture.

Page 15: Data Warehouse

TM 14-TM 14-1515Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Derived DataCharacteristics (Data marts)

• Type of data– Detailed, possibly periodic.– Aggregated.

• Distributed to departmental servers.

• Implemented in star schema.

Page 16: Data Warehouse

TM 14-TM 14-1616Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Star Schema

• Also called the dimensional model.

• Fact and dimension tables.

• Fig. 14-11,12, 13. (following)

• Grain of a fact table - time period for each record.

• Multiple Fact Table - Fig. 14-14.

• Snowflake Schema - Fig. 14-15.

Page 17: Data Warehouse

TM 14-TM 14-1717Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Components of a star schema

Page 18: Data Warehouse

TM 14-TM 14-1818Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Star schema example

Page 19: Data Warehouse

TM 14-TM 14-1919Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Star schema with sample data

Page 20: Data Warehouse

TM 14-TM 14-2020Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Star schema with two fact tables

Page 21: Data Warehouse

TM 14-TM 14-2121Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Example of snowflake sample

Page 22: Data Warehouse

TM 14-TM 14-2222Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Types of Data Marts

• Dependent - Populated from the EDW.

• Independent - Data taken directly from the operational databases.

Page 23: Data Warehouse

TM 14-TM 14-2323Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The User Interface

• The role of metadata.

• Traditional query and reporting tools.

• On-line analytical processing. (OLAP)– The use of a set of graphical tools that provides users

with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.

Page 24: Data Warehouse

TM 14-TM 14-2424Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The User Interface– Slicing a cube.– Pivot

• Rotate the view for a particular data point to obtain another perspective.

• E.g. take a value from the units column and obtain by-store values.

– Drill-down - Fig. 14-17.

Slicing a data cube

Page 25: Data Warehouse

TM 14-TM 14-2525Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The User Interface

• Data Mining– Knowledge discovery.– Search for patterns in the data.– Table 14-3, 4.

• Data Visualization