populating data warehouse structures examining the star schema dimension tables dimension table fact...

25
Populating Data Warehouse Structures

Upload: rayna-fackrell

Post on 31-Mar-2015

240 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Populating Data Warehouse Structures

Page 2: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Examining the Star Schema

DimensionTables Dimension Table

Fact Table

Sales Star Schema

Page 3: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Implementing the Star Schema

1. Extract Data From Multiple Sources

2. Integrate, Transform, and Restructure Data

3. Load Data Into Dimension Tables and Fact Tables

Page 4: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

The Star Schema Data Load

Northwind Northwind OLTPOLTP

Staging Area

Polaris Data Warehouse

Heterogeneous

Data Sources

ExternalExternalFilesFiles

External External FilesFiles

Internal Internal FilesFiles

Inventory Inventory StarStar

Sales Sales StarStar

Extracting Data From Extracting Data From Transforming Loading the Transforming Loading the Heterogeneous SourcesHeterogeneous Sources Data Data Star Schema Star Schema

DTSDTS

DTSDTS DTSDTSFinancial Financial

DTSDTS

Page 5: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Verifying the Dimension Source Data

Verifying Accuracy of Source Data Integrating data from multiple sources

Applying business rules

Checking structural requirements

Managing Invalid Data Rejecting invalid data

Saving invalid data to a log

Correcting Invalid Data Transforming data

Reassigning data values

Page 6: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Dimension Data Load Examples:

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

22

44

66

......

buyer_firstbuyer_firstbuyer_firstbuyer_first

AdamAdam

SeanSean

ErinErin

......

buyer_lastbuyer_lastbuyer_lastbuyer_last

BarrBarr

ChaiChai

O’MeliaO’Melia

......

reg_idreg_idreg_idreg_id

22

44

66

......DTSDTS

buyer_codebuyer_codebuyer_codebuyer_code

A123A123

B456B456

......

buyer_lastbuyer_lastbuyer_lastbuyer_last

BarrBarr

ChaiChai

O’MeliaO’Melia

......

reg_idreg_idreg_idreg_id

22

44

66

......

buyer_codebuyer_codebuyer_codebuyer_code

U999U999

A123A123

B456B456

......

buyer_lastbuyer_lastbuyer_lastbuyer_last

BarrBarr

ChaiChai

O’MeliaO’Melia

......

reg_idreg_idreg_idreg_id

22

44

66

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

Smith, JaneSmith, Jane

Paper, AnnePaper, Anne

reg_idreg_idreg_idreg_id

22

44

22

44

DTSDTS

DTSDTS

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

reg_idreg_idreg_idreg_id

IIII

IVIV

buyer_namebuyer_namebuyer_namebuyer_name

Smith, JaneSmith, Jane

Paper, AnnePaper, Anne

reg_idreg_idreg_idreg_id

22

44

Page 7: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Maintaining Integrity of the Dimension

Assigning a Surrogate Key to Each Record

Defines the dimension’s primary key

Relates to the foreign key fields of the fact table

Loading One Record Per Application Key

Maintains uniqueness in the dimension

Depends on how you manage changing dimension data

Maintains integrity of the fact table

Page 8: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Managing Changing Dimension Data

Dimensions with Changing Column Values

Inserts of new data

Updates of existing data

Slowly-Changing Dimension Design Solutions

Type 1: Overwrite the dimension record

Type 2: Write another dimension record

Type 3: Add attributes to the dimension record

Page 9: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Type 1: Overwriting the Dimension Slide

Existing recordis changed

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcat...

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcat...

Product Dimension

001Rice Puffs10 oz.BagGroceryDry GoodsSnacks...

001Rice Puffs10 oz.BagGroceryDry GoodsSnacks...

Before After001Rice Puffs12 OzBagGroceryDry GoodsSnacks...

001Rice Puffs12 OzBagGroceryDry GoodsSnacks...

12 oz.

Page 10: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Type 2: Writing Another Dimension Record

Adds a new record

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcateffective_date…

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcateffective_date…

Product Dimension001Rice Puffs10 oz.BagGroceryDry GoodsSnacks05-01-1995...

001Rice Puffs10 oz.BagGroceryDry GoodsSnacks05-01-1995...

Before After001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-1995...

001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-1995...

10 oz. 12 oz.Rice Puffs12 OzBagGroceryDry GoodsSnacks10-15-1998...

Rice Puffs12 OzBagGroceryDry GoodsSnacks10-15-1998...

731

Page 11: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Type 3: Adding Attributes in the Dimension Record

Additional information is storedin an existing record

Product Dimension

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcatcurrent product size dateprevious product sizeprevious product size date2nd previous product size2nd previous product size date...

product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcatcurrent product size dateprevious product sizeprevious product size date2nd previous product size2nd previous product size date...

product size

previous product sizeprevious product size date

Before001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-199511 Oz03-20-1994(null)(null)...

001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-199511 Oz03-20-1994(null)(null)...

10 oz.

11 oz.03-20-1994

After001Rice Puffs12 oz.BagGroceryDry GoodsSnacks10-15-199810 oz.05-01-199511 Oz03-20-1994...

001Rice Puffs12 oz.BagGroceryDry GoodsSnacks10-15-199810 oz.05-01-199511 Oz03-20-1994...

12 oz

10-15-1998

11 oz.03-20-1994

05-01-1995

Page 12: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Verifying the Fact Table Source Data

Verifying Accuracy of Source Data Integrating data from multiple sources

Applying business rules

Checking structural requirements

Managing Invalid Data Rejecting invalid data

Saving invalid data to a log

Correcting Invalid Data Transforming data

Reassigning data values

Page 13: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Assigning Foreign Keys

DimensionTables

DimensionTables

customer_dimcustomer_dimcustomer_dimcustomer_dim201 ALFI Alfreds201 ALFI Alfreds

product_dimproduct_dimproduct_dimproduct_dim 25 123 Chai 25 123 Chai

Source Data

customer idcustomer id

ALFI 123 1/1/2000 400

134 1/1/2000134 1/1/2000

time_dimtime_dimtime_dimtime_dim

product idproduct id order dateorder date quantity_salesquantity_sales amount_salesamount_sales

10,789123 1/1/2000 400 10,789

cust_keycust_key

123 1/1/2000 400

prod_keyprod_key time_keytime_key quantity_salesquantity_sales amount_salesamount_sales

25 134 400 10,789201

Sales Fact Data

Page 14: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Defining Measures

Loading Measures from the Source System

Calculating Additional Measures

Source System Data

Fact Table Data

customer_idcustomer_idcustomer_idcustomer_id

VINETVINET

ALFIALFI

HANARHANAR

......

product_idproduct_idproduct_idproduct_id

9GZ9GZ

1KJ1KJ

0ZA0ZA

......

pricepricepriceprice

.55.55

1.101.10

.98.98

......

qtyqtyqtyqty

3232

4848

99

......

customer_keycustomer_keycustomer_keycustomer_key

100100

238238

437437

......

product_keyproduct_keyproduct_keyproduct_key

512512

207207

338338

......

qtyqtyqtyqty

3232

4848

99

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

Page 15: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Maintaining Data Integrity

Adhering to the Fact Table Grain

A fact table can only have one grain

You must load a fact table with data at the same level of detail as defined by the grain

Enforcing Column Constraints

NOT NULL constraints

FOREIGN KEY constraints

Page 16: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Implementing Staging Tables

Centralize and Integrate Source Data

Break Up Complex Data Transformations

Facilitate Error Recovery

Staging Area sales_stagesales_stage

inventory_stageinventory_stage

market_stagemarket_stage

shipments_stageshipments_stage

Page 17: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

DTS Functionality

Accessing Heterogeneous Data Sources

Importing, Exporting, and Transforming Data

Creating Reusable Transformations and Functions

Automating Data Loads

Managing Metadata

Customizing and Extending Functionality

Page 18: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Defining DTS Packages

Identifies Data Sources and Destinations

Defines Tasks or Actions

Implements Transformation Logic

Defines Order of Operations

Page 19: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Identifying Package Components

Connections Access Data Sources and Destinations

Tasks Describe Data Transformations or Functions

Steps Define the Order of Task Operations or Workflow

Global Variables Store Data that Can Be Shared Across Tasks

Page 20: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Creating Packages

Using the DTS Import / Export Wizard

Perform ad-hoc table and data transfers Develop a prototype package

Using DTS Package Designer

Edit packages created with the DTS Import/Export Wizard

Create packages with a wide range of functionality Programming DTS Applications

Directly access the functionality of the DTS Object Model Requires Microsoft Visual Basic or Microsoft Visual C++

Page 21: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Using DTS to Populate the Sales Star

Populating the Sales Star Dimensions

Populating the Sales Star Fact Table

Page 22: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Populating the Sales Star Dimensions

Product Product Tab DelimitedTab Delimited

FilesFiles

Northwind Northwind OLTPOLTP

DTSDTS

DTSDTS

time_dimtime_dim

customer_dimcustomer_dim

product_dimproduct_dim

SQL Server SQL Server Stored ProcedureStored Procedure

DTSDTS

Page 23: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Populating the Sales Star Fact Table

DTSDTS

sales_factDTSDTS

sales_stagesales_stage

time_dimtime_dimcustomer_dimcustomer_dim

product_dimproduct_dim sales_stagesales_stage

Sales DataSales DataFileFile

Page 24: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Designing Modular Packages

Creating Modular Packages

Simplify complex workflows Create more readable packages Produce smaller packages that are easier to debug

Using Outer Packages

Execute multiple packages within a single package Combine modular packages into logical workflows Reuse modular packages in different workflows Execute packages in parallel

Page 25: Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact Table Sales Star Schema

Using DTS to Populate the Sales Star