the data warehouse chapter 6. 6.1 operational databases = transactional database designed to...

24
The Data Warehouse Chapter 6

Upload: sheena-nash

Post on 08-Jan-2018

220 views

Category:

Documents


1 download

DESCRIPTION

Building a database: Data Modeling  Normalization One-to-One Relationships One-to-Many Relationships Many-to-Many Relationships ERD (Entity Relationship Diagram)

TRANSCRIPT

Page 1: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

The Data Warehouse

Chapter 6

Page 2: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

6.1 Operational Databases

= transactional database

designed to process individual transaction quickly and efficiently

On-Line Transactional Processing

(OLTP) Data Warehouse

Page 3: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Building a database: Data Modeling Normalization

• One-to-One Relationships• One-to-Many Relationships• Many-to-Many Relationships

ERD (Entity Relationship Diagram)

Page 4: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.1 A simple entity-relationship diagram

Type IDYear

Make

Income Range

Customer ID

Vehicle - Type Customer

Page 5: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Normalization• First Normal Form (atomic value)• Second Normal Form (No 부분종속 ) R (A, B, C, D, E)

•Third Normal Form (No 이전종속 ) R (A, B, C, D, E)

Page 6: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

The Relational Model

주문서 ( 주문번호 , 주문일 , 고객번호 , 고객명 , 주소 , 제품번호 , 제품명 , 수량 , 단가 )

주 문 서

주문번호 : 주문일 :

고객번호 : 고객명 : 주소 :

제품번호 제품명 수량 단가 금액

1111 MP3 2 60,000 120,000

2115 공 CD 3 10,000 30,000

합계 : 150,000

Page 7: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Table 6.1a • Relational Table for Vehicle-Type

Type ID Make Year

4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20012390 Cadillac 1997

Table 6.1b • Relational Table for Customer

Customer IncomeID Range ($) Type ID

0001 70–90K 23900002 30–50K 43710003 70–90K 69400004 30–50K 45950005 70–90K 2390

Page 8: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Table 6.2 • Join of Tables 6.1a and 6.1b

Customer IncomeID Range ($) Type ID Make Year

0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997

Page 9: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

6.2 Data Warehouse Design

OLTP Data Warehouse

Process Oriented Subject Oriented

Normalized Denormalized

Day-to-day operation Historical

Constant Update Not subject to change (read only)

Lowest level of granularity Design issue

Page 10: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.2 A data warehouse process model

OperationalDatabase(s)

Decision Support SystemDataWarehouse

IndependentData Mart

ExternalData

ETL Routine(Extract/Transform/Load)

DependentData Mart

Extract/Summarize Data

Report

Page 11: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Structuring the Data Warehouse:

• Fact Table (dimension key + fact)• Dimension Tables ( Not Normalized,

Slowly Changing Dimensions )

(1)Multidimensional Database

(2)Relational Database Multidimensional Format

Star Schema

Page 12: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.3 A star schema for credit cared purchases

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharlston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Purchase Key Category1 Supermarket

.

.

.

.

.

.

2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.

Page 13: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

The Multidimensionality of the Star Schema

PurchaseKey

Location Key

Time Key

A(C i,1,2,10)

Cardholder Ci

Figure 6.4 Dimensions of the fact table shown in Figure 6.3

Page 14: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Additional Relational Schemas

• Snowflake Schema Dimension tables are further subdivided

•Constellation Schema Sharing dimensions

Page 15: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.5 A constellation schema for credit card purchases and promotions

Cardholder Key Purchase Key1 2

Purchase Fact TableAmountTime KeyLocation Key

101 14.50

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15 4 115 8.251 2 103 22.40

Time Key Month5 Dec

Time DimensionYearQuarterDay

431 2001

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8 Jan 13 200210 Jan 15 2002

Promotion Key DescriptionPromotion Dimension

Cost

.

.

.

.

.

.

.

.

.

1 watch promo 15.25

Purchase Key Category1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Location Key Street5 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

Cardholder Key Promotion Key1 1

Promotion Fact TableResponseTime Key

5 Yes

.

.

.

.

.

.

.

.

.

.

.

.

2 1 5 No

GenderMale

.

.

.

Female

Income Range50 - 70,000

.

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

.

.

.

.

.

2 Sara Smith

Cardholder Dimension

Page 16: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Decision Support: Analyzing the Warehouse Data

• Reporting Data• Analyzing Data (multidimensional data analysis tool)• Knowledge Discovery (through data mining)

Page 17: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

6.3 On-line Analytical Processing (OLAP)

- Query based methodology

- Supports data analysis in multidimensional environment

- Storage methods

(1) Relational data store Star Schema

(2) Multidimensional array data store

Page 18: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

OLAP Operations

• Slice – A single dimension operation• Dice – A multidimensional operation• Roll-up – A higher level of generalization• Drill-down – A greater level of detail• Rotation – View data from a new perspective

Page 19: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.6 A multidemensional cube for credit card purchases

Dec.

Mar.

Feb.

Apr.

May

Jun.

Jul.

Aug.

Sep.

Oct.

Nov.

Jan.

Mon

th

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

FourThreeTwo

Month = Dec.

Count = 110Amount = 6,720Region = TwoCategory = Vehicle

Page 20: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Concept Hierarchy

A mapping that allows attributes to be viewed from varying levels of detail.

Region

Street Address

City

State

Page 21: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.8 Rolling up from months to quarters

Q4

Q2

Q3

Tim

e

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

Q1

Month = Oct./Nov/Dec.

Region = OneCategory = Supermarket

Page 22: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

6.4 Excel Pivot Tables for Multidimensional Data Analysis

Page 23: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.15 A credit card promotion cube

No

YesWat

ch P

rom

o

No

Life Insurance Promo

Magazine

Promo

No

Yes

Yes

Watch Promo = No

Magazine Promo = YesLife Insurance Promo = Yes

Page 24: The Data Warehouse Chapter 6. 6.1 Operational Databases = transactional database  designed to process individual transaction quickly and efficiently

Figure 6.16 A pivot table with page variables for credit card promotions