data ware housing bits lecture

26
BITS Pilani Pilani Campus Data Warehousing SS ZG515 PC Reddy Guest Faculty WILP, BITS Pilani

Upload: anshul-rohilla

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

DESCRIPTION

Data Ware housing Bits lecture

TRANSCRIPT

Page 1: Data Ware housing Bits lecture

BITS Pilani Pilani Campus

Data Warehousing SS ZG515

PC Reddy Guest Faculty – WILP, BITS Pilani

Page 2: Data Ware housing Bits lecture

BITS Pilani Pilani Campus

Data Warehousing – Lecture 4 Dimensional Modeling

Page 3: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Lecture 3 Outline

• Review Lecture 3

• Dimensional modeling

• Retail grocery store case study.

Page 4: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

The Dimensional Data Model

An alternative to the normalized data model

• Present information as simple as possible (easier to

understand)

• Return queries as quickly as possible (efficient for

queries)

• Track the underlying business processes (process

focused)

Page 5: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

The Dimensional Data Model

• Contains the same information as the normalized model

• Has far fewer tables

• Grouped in coherent business categories

• Pre-joins hierarchies and lookup tables resulting in fewer join paths and fewer intermediate tables

• Normalized fact table with denormalized dimension tables.

Page 6: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Fact Table

Measurements associated with a specific business process

• Grain: level of detail of the table

• Process events produce fact records

• Facts (attributes) are usually – Numeric

– Additive

• Derived facts included

• Foreign (surrogate) keys refer to dimension tables (entities)

Page 7: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Dimension Tables

Entities describing the objects of the process

• Conformed dimensions - cross processes

• Attributes are descriptive – Text

– Numeric

• Surrogate keys

• 1:m with the fact table

• Null entries

• Date dimensions

Page 8: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Bus Architecture

• An architecture that permits aggregating data across

multiple marts

• Conformed dimensions and attributes

• Bus matrix

Page 9: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Keys and Surrogate Keys

A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys)

• Protect against changes in source systems

• Allow integration from multiple sources

• Enable rows that do not exist in source data

• Track changes over time (e.g. new customer instances when addresses change)

• Replace text keys with integers for efficiency

Page 10: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Slowly Changing Dimensions

Attributes in a dimension that change more slowly than the fact granularity

• Type 1: Current only / overwrite the old value

• Type 2: All history / create a new dimensional record

• Type 3: Most recent few (rare) / create a “previous value” attribute

Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table

Page 11: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Slowly Changing Dimensions

CustKey BKCustID CustName CommDist Gender HomOwn?

1552 31421 Jane Rider 3 F N

Date CustKey ProdKey Item Count Amount

1/7/2004 1552 95 1 1,798.00

3/2/2004 1552 37 1 27.95

5/7/2005 1552 87 2 320.26

2/21/2006 1552 2387 42 1 19.95

Fact Table

Cust

Key

BKCust

ID

Cust

Name

Comm

Dist

Gender Hom

Own?

Eff End

1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006

2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999

Dimension with a slowly changing attribute

Page 12: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Slowly Changing Dimensions

ProductKey Description Category SKU

21553 LeapPad Education LP2105

ProductKey Description Category SKU

21553 LeapPad Toy LP2105

ProductKey Description Category SKU

21553 LeapPad Education LP2105

44631 LeapPad Toy LP2105

ProductKey Description Category OldCat SKU

21553 LeapPad Toy Education LP2105

ProductKey Description Category OldCat SKU

21553 LeapPad Education Electronics LP2105

44631 LeapPad Toy Education LP2105

68122 LeapPad Education Electronics LP2105

Original

Type 1

Type 2

Type 3

Hybrid

Page 13: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Date Dimensions

• One row for every day for which you expect to

have data for the fact table (perhaps

generated in a spreadsheet and imported)

• Usually use a meaningful integer surrogate

key (such as yyyymmdd 20060926 for Sep.

26, 2006). Note: this order sorts correctly.

• Include rows for missing or future dates to be

added later.

Page 14: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

More about dimensions

• Views for dimensions used for different purposes – e.g. StartDate and EndDate

• Junk dimensions for flags and miscellaneous categories

removed from the fact table

• Degenerate dimensions have no attributes – Usually reserved for order number or something similar

Page 15: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Aggregates

• Precalculated summary tables – Improve performance

– Record data an coarser granularity

• State change summary that has one row per item.

• Access rows on each update.

Page 16: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Fact Tables

• Transaction – Track processes at discrete points in time when they occur

• Periodic snapshot – Cumulative performance over specific time intervals

• Accumulating snapshot – Constantly updated over time. May include multiple dates representing

stages.

Page 17: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Case Study: Retail Grocery Store

• Process: Retail Sales

• Grain: POS line item

• Dimensions: Date, Store, Product, Promotion

• Facts: Sales Quantity, Sales Dollar Amount, Cost Dollar

Amount, Gross Profit Dollar Amount.

Page 18: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Star schema Model

DATE

DateKey

Attributes

STORE

StoreKey

Attributes

PROMOTION

PromotionKey

Attributes

PRODUCT

ProductKey

Attributes

POS FACT

DateKey

ProductKey

StoreKey

PromotionKey

POSTransactionNumber

SalesQuantity

SalesDollarAmount

CostDollarAmount

GrossProfitDollarAmount

Page 19: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Possible Date Attributes

SQL date

Full date description

Day of week

Day of month

Day of calendar year

Day of fiscal year

Month of calendar year

Month of fiscal year

Calendar Quarter

Fiscal Quarter

• Fiscal week

• Year

• Month

• Fiscal year

• Holiday ?

• Holiday name

• Day of holiday

• Weekday ?

• Selling season

• Major event

• etc.

Page 20: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Possible Product Attributes

Description

SKU number

Brand description

Department

Package type

Package size

Fat content

Diet type

Weight

• Weight units of

measure

• Storage type

• Shelf unit type

• Shelf width

• Shelf height

• Shelf depth

• etc.

Page 21: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Possible Store Attributes

Store Name

Store Number

Street address

City

County

State

Zip

Manager

District

• Region

• Floor plan type

• Photo processing type

• Financial service type

• Square footage

• Selling square footage

• First open date

• Last remodel date

• etc.

Page 22: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Factless Fact Tables

• In order to evaluate promotions that might have

generated no sales we need another approach.

• Promotion could generate another fact table (or could be

considered a fact table in itself). That new fact table

would have no additive attributes.

Page 23: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Conformed Dimensions: Inventory Snapshot Model

• Process: Store inventory

• Grain: Daily inventory by product and store

• Dimensions: Date, product, store

• Fact: quantity-on-hand

Page 24: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Dimensional Model

DATE

DateKey

Attributes

STORE

StoreKey

Attributes

PRODUCT

ProductKey

Attributes

Inventory Fact

ProductKey

DateKey

StoreKey

QuantityOnHand

QuantitySold

ValueAtCost

ValueAtSellingPrice

Note: QuantityOnHand is semi-additive. It is additive across product and store,

but not across date. The other attributes are additive.

Page 25: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

Conformed Dimensions

Common dimensions for different processes should be

the same.

• Note: Dimensions for roll-up or aggregated fact tables

my add or eliminate attributes based on the aggregation

Where attributes apply, they should mean the same

thing.

Page 26: Data Ware housing Bits lecture

BITS Pilani, Pilani Campus

The Bus Matrix

Process

Date Product Store Promotion Warehouse Vendor Contract Shipper

Retail Sales X X X X

Retail Inventory X X X

Retail

Deliveries

X X X

Warehouse

Inventory

X X X X

Warehouse

Deliveries

X X X X

Purchase Orders X X X X X X