data warehouse design - jamkhomes.jamk.fi/~huojo/opetus/iio30120/dw_design.pdf · data warehouse...

18
IIO30120 Database Design Data warehouse design based on the book Hovi, Huotari, Lahdenmäki: Tietokantojen suunnittelu & indeksointi, Docendo (2003, 2005) chapter 8 © Jouni Huotari & Ari Hovi Some slides from http ://www.slideshare.net/idnats/data- warehousing-and-data-mining-presentation-725476 18.10.2015

Upload: phungthu

Post on 05-Aug-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

IIO30120 Database Design

Data warehouse designbased on the book Hovi, Huotari, Lahdenmäki:

Tietokantojen suunnittelu & indeksointi, Docendo (2003, 2005) chapter 8

© Jouni Huotari & Ari Hovi

Some slides from http://www.slideshare.net/idnats/data-warehousing-and-data-mining-presentation-72547618.10.2015

Page 2: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

• What is a data warehouse?

• How the term Business Intelligence relates to data warehousing?

• How data mart differs from data warehouse?

• Why OLAP (online analytical processing) is important?

• What is the basic idea behind ETL?

For discussion

© Jouni Huotari & Ari Hovi2

18.10.2015

Page 3: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

What is Data Warehouse?

• Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the

organization’s operational database

Support information processing by providing a solid platform of consolidated, historical data for analysis.

• “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon

• Data warehousing: The process of constructing and using data warehouses

Page 4: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

To support fast summary queries, analysis, and reporting

• This is often difficult in operative databases, but some solutions exist

• can you mention any solutions?

It is important to

• Maintain history for seeing trends etc.

• Clean up and make data consistent from different data sources

• Make the structure clear and understandable

Basic design principles for data warehouses

© Jouni Huotari & Ari Hovi4

18.10.2015

Page 5: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

5

Example: a producer wants to know….

Which are ourlowest/highest margin

customers ?Who are my customers

and what products are they buying?

Which customersare most likely to go to the competition ?

What impact will new products/services

have on revenue and margins?

What product prom--otions have the biggest

impact on revenue?

What is the most effective distribution

channel?

18.10.2015

Page 6: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

back room and front room of a data warehouse

© Jouni Huotari & Ari Hovi6

© Kimball, Caserta:

The Data Warehouse ETL Toolkit

(2004)

18.10.2015

Page 7: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

7

Data Warehouse vs. Operational DBMS

• OLTP (on-line transaction processing)

– Major task of traditional relational DBMS

– Day-to-day operations: purchasing, inventory, banking, manufacturing,

payroll, registration, accounting, etc.

• OLAP (on-line analytical processing)

– Major task of data warehouse system

– Data analysis and decision making

• Distinct features (OLTP vs. OLAP):

– User and system orientation: customer vs. market

– Data contents: current, detailed vs. historical, consolidated

– Database design: ER + application vs. star + subject

– View: current, local vs. evolutionary, integrated

– Access patterns: update vs. read-only but complex queries

Page 8: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

• Star schema: A fact table in the middle connected to a set of dimension tables

• Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake

• Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

• Dimensions describe who, what, when, where and why for the facts.

Conceptual Modeling of Data Warehouses

8Octobe

r 18,

Page 9: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

Example of Star Schema

Page 10: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

10Octobe

r 18,

Another example of Star Schema

time_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city

state_or_province

country

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_type

item

branch_key

branch_name

branch_type

branch

Page 11: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

11Octobe

r 18,

Example of Snowflake Schema

time_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_key

item

branch_key

branch_name

branch_type

branch

supplier_key

supplier_type

supplier

city_key

city

state_or_province

country

city

Page 12: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

12Octobe

r 18,

Example of Fact Constellation

time_key

day

day_of_the_week

month

quarter

year

time

location_key

street

city

province_or_state

country

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_key

item_name

brand

type

supplier_type

item

branch_key

branch_name

branch_type

branch

Shipping Fact Table

time_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shipped

shipper_key

shipper_name

location_key

shipper_type

shipper

Page 13: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

13Octobe

r 18,

Multidimensional Data

• Sales volume as a function of product, month, and region

Pro

duct

Month

•Dimensions: Product, Location, Time

•Hierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Page 14: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

A Sample Data CubeTotal annual sales

of TV in U.S.A.Date

Cou

ntr

ysum

sumTV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Page 15: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

15

Browsing a Data Cube

• Visualization

• OLAP capabilities

• Interactive manipulation

Page 16: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

• Install http://www.olapcube.com/ to your virtual machine and play with the dimensions, create a cube and examine the result (dashboard)

• http://www.olapcube.com/help/writer/panorama/

• http://www.pentaho.com/testdrive

• http://www.databaseanswers.org/downloads/Data_Warehousing_by_Example.pdf

• Example (real) data: http://www.gapminder.org/data/

Exercises

© Jouni Huotari & Ari Hovi18.10.201516

Page 17: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

• BI refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context (Wikipedia)

• The goal is to gain insight into the business by bringing together data, formatting it in a way that enables better analysis, and then providing tools that give users power—not just to examine and explore the data, but to quickly understand it. (Business Intelligence with Microsoft Office

PerformancePoint Server)

What is Business Intelligence (BI)?

18.10.201517

Page 18: Data warehouse design - Jamkhomes.jamk.fi/~huojo/opetus/IIO30120/DW_design.pdf · Data warehouse design based on the book Hovi, Huotari, ... process.”—W. H. Inmon ... Data Warehouse

• http://en.wikipedia.org/wiki/Data_warehouse

• http://en.wikipedia.org/wiki/Data_mart

• http://en.wikipedia.org/wiki/Star_schema

• http://en.wikipedia.org/wiki/Snowflake_schema

• http://en.wikipedia.org/wiki/OLAP

More information from wikipedia:

© Jouni Huotari & Ari Hovi18

18.10.2015