data ware housing bits lecture
DESCRIPTION
Data Ware housing Bits lectureTRANSCRIPT
BITS Pilani Pilani Campus
Data Warehousing SS ZG515
PC Reddy Guest Faculty – WILP, BITS Pilani
BITS Pilani Pilani Campus
Data Warehousing – Lecture 4 Dimensional Modeling
BITS Pilani, Pilani Campus
Lecture 3 Outline
• Review Lecture 3
• Dimensional modeling
• Retail grocery store case study.
BITS Pilani, Pilani Campus
The Dimensional Data Model
An alternative to the normalized data model
• Present information as simple as possible (easier to
understand)
• Return queries as quickly as possible (efficient for
queries)
• Track the underlying business processes (process
focused)
BITS Pilani, Pilani Campus
The Dimensional Data Model
• Contains the same information as the normalized model
• Has far fewer tables
• Grouped in coherent business categories
• Pre-joins hierarchies and lookup tables resulting in fewer join paths and fewer intermediate tables
• Normalized fact table with denormalized dimension tables.
BITS Pilani, Pilani Campus
Fact Table
Measurements associated with a specific business process
• Grain: level of detail of the table
• Process events produce fact records
• Facts (attributes) are usually – Numeric
– Additive
• Derived facts included
• Foreign (surrogate) keys refer to dimension tables (entities)
BITS Pilani, Pilani Campus
Dimension Tables
Entities describing the objects of the process
• Conformed dimensions - cross processes
• Attributes are descriptive – Text
– Numeric
• Surrogate keys
• 1:m with the fact table
• Null entries
• Date dimensions
BITS Pilani, Pilani Campus
Bus Architecture
• An architecture that permits aggregating data across
multiple marts
• Conformed dimensions and attributes
• Bus matrix
BITS Pilani, Pilani Campus
Keys and Surrogate Keys
A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys)
• Protect against changes in source systems
• Allow integration from multiple sources
• Enable rows that do not exist in source data
• Track changes over time (e.g. new customer instances when addresses change)
• Replace text keys with integers for efficiency
BITS Pilani, Pilani Campus
Slowly Changing Dimensions
Attributes in a dimension that change more slowly than the fact granularity
• Type 1: Current only / overwrite the old value
• Type 2: All history / create a new dimensional record
• Type 3: Most recent few (rare) / create a “previous value” attribute
Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table
BITS Pilani, Pilani Campus
Slowly Changing Dimensions
CustKey BKCustID CustName CommDist Gender HomOwn?
1552 31421 Jane Rider 3 F N
Date CustKey ProdKey Item Count Amount
1/7/2004 1552 95 1 1,798.00
3/2/2004 1552 37 1 27.95
5/7/2005 1552 87 2 320.26
2/21/2006 1552 2387 42 1 19.95
Fact Table
Cust
Key
BKCust
ID
Cust
Name
Comm
Dist
Gender Hom
Own?
Eff End
1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006
2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999
Dimension with a slowly changing attribute
BITS Pilani, Pilani Campus
Slowly Changing Dimensions
ProductKey Description Category SKU
21553 LeapPad Education LP2105
ProductKey Description Category SKU
21553 LeapPad Toy LP2105
ProductKey Description Category SKU
21553 LeapPad Education LP2105
44631 LeapPad Toy LP2105
ProductKey Description Category OldCat SKU
21553 LeapPad Toy Education LP2105
ProductKey Description Category OldCat SKU
21553 LeapPad Education Electronics LP2105
44631 LeapPad Toy Education LP2105
68122 LeapPad Education Electronics LP2105
Original
Type 1
Type 2
Type 3
Hybrid
BITS Pilani, Pilani Campus
Date Dimensions
• One row for every day for which you expect to
have data for the fact table (perhaps
generated in a spreadsheet and imported)
• Usually use a meaningful integer surrogate
key (such as yyyymmdd 20060926 for Sep.
26, 2006). Note: this order sorts correctly.
• Include rows for missing or future dates to be
added later.
BITS Pilani, Pilani Campus
More about dimensions
• Views for dimensions used for different purposes – e.g. StartDate and EndDate
• Junk dimensions for flags and miscellaneous categories
removed from the fact table
• Degenerate dimensions have no attributes – Usually reserved for order number or something similar
BITS Pilani, Pilani Campus
Aggregates
• Precalculated summary tables – Improve performance
– Record data an coarser granularity
• State change summary that has one row per item.
• Access rows on each update.
BITS Pilani, Pilani Campus
Fact Tables
• Transaction – Track processes at discrete points in time when they occur
• Periodic snapshot – Cumulative performance over specific time intervals
• Accumulating snapshot – Constantly updated over time. May include multiple dates representing
stages.
BITS Pilani, Pilani Campus
Case Study: Retail Grocery Store
• Process: Retail Sales
• Grain: POS line item
• Dimensions: Date, Store, Product, Promotion
• Facts: Sales Quantity, Sales Dollar Amount, Cost Dollar
Amount, Gross Profit Dollar Amount.
BITS Pilani, Pilani Campus
Star schema Model
DATE
DateKey
Attributes
STORE
StoreKey
Attributes
PROMOTION
PromotionKey
Attributes
PRODUCT
ProductKey
Attributes
POS FACT
DateKey
ProductKey
StoreKey
PromotionKey
POSTransactionNumber
SalesQuantity
SalesDollarAmount
CostDollarAmount
GrossProfitDollarAmount
BITS Pilani, Pilani Campus
Possible Date Attributes
SQL date
Full date description
Day of week
Day of month
Day of calendar year
Day of fiscal year
Month of calendar year
Month of fiscal year
Calendar Quarter
Fiscal Quarter
• Fiscal week
• Year
• Month
• Fiscal year
• Holiday ?
• Holiday name
• Day of holiday
• Weekday ?
• Selling season
• Major event
• etc.
BITS Pilani, Pilani Campus
Possible Product Attributes
Description
SKU number
Brand description
Department
Package type
Package size
Fat content
Diet type
Weight
• Weight units of
measure
• Storage type
• Shelf unit type
• Shelf width
• Shelf height
• Shelf depth
• etc.
BITS Pilani, Pilani Campus
Possible Store Attributes
Store Name
Store Number
Street address
City
County
State
Zip
Manager
District
• Region
• Floor plan type
• Photo processing type
• Financial service type
• Square footage
• Selling square footage
• First open date
• Last remodel date
• etc.
BITS Pilani, Pilani Campus
Factless Fact Tables
• In order to evaluate promotions that might have
generated no sales we need another approach.
• Promotion could generate another fact table (or could be
considered a fact table in itself). That new fact table
would have no additive attributes.
BITS Pilani, Pilani Campus
Conformed Dimensions: Inventory Snapshot Model
• Process: Store inventory
• Grain: Daily inventory by product and store
• Dimensions: Date, product, store
• Fact: quantity-on-hand
BITS Pilani, Pilani Campus
Dimensional Model
DATE
DateKey
Attributes
STORE
StoreKey
Attributes
PRODUCT
ProductKey
Attributes
Inventory Fact
ProductKey
DateKey
StoreKey
QuantityOnHand
QuantitySold
ValueAtCost
ValueAtSellingPrice
Note: QuantityOnHand is semi-additive. It is additive across product and store,
but not across date. The other attributes are additive.
BITS Pilani, Pilani Campus
Conformed Dimensions
Common dimensions for different processes should be
the same.
• Note: Dimensions for roll-up or aggregated fact tables
my add or eliminate attributes based on the aggregation
Where attributes apply, they should mean the same
thing.
BITS Pilani, Pilani Campus
The Bus Matrix
Process
Date Product Store Promotion Warehouse Vendor Contract Shipper
Retail Sales X X X X
Retail Inventory X X X
Retail
Deliveries
X X X
Warehouse
Inventory
X X X X
Warehouse
Deliveries
X X X X
Purchase Orders X X X X X X