building the data warehouse - chapter 03 the data warehouse and design
TRANSCRIPT
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
1/95
Building Data Warehouse
By InmonChapter 3: The Data Warehouse and Design
http://it-slideshares.blogspot.com/
http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/ -
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
2/95
3.0 Introduction
There are two major components tobuilding a data warehouse:
The design of the interface from
operational systems.
The design of the data warehouse
itself.
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
3/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
4/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
5/95
3.1 Beginning with Operational Data
(Encoding Transformation)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
6/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
7/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
8/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
9/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
10/95
3 2 P d D t M d l
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
11/95
3.2 Process and Data Models
and the Architected
Environment (ct)Data models are discussed in depth in thefollowing section.
1. Functional decomposition
2. Context-level zero diagram
3. Data flow diagram
4. Structure chart
5. State transition diagram
6. HIPO chart7. Pseudo code
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
12/95
3.3 The Data Warehouse and Data Models
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
13/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
14/95
3.3.1 The Data Warehouse Data Model
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
15/95
3.3.1 The Data Warehouse Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
16/95
3.3.1 The Data Warehouse Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
17/95
3.3.2 The Midlevel Data
Model
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
18/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
19/95
3.3.2 The Midlevel Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
20/95
3.3.2 The Midlevel Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
21/95
3.3.2 The Midlevel Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
22/95
3.3.2 The Midlevel Data Model (ct)
Of particular interest is the case where a grouping ofdata has two type of lines emanating from it, asshown in Figure 3-17. The two lines leading to theright indicate that there are two type of criteria.One type of criteria is by activity typeeither adeposit or a withdrawal. The other line indicatesanother activity typeeither an ATM activity or ateller activity. Collectively, the two types of activityencompass the following transactions:
ATM deposit
ATM withdrawal Teller deposit
Teller withdrawal
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
23/95
3.3.2 The Midlevel Data Model (ct)
The physical table entries that resultedcame from the following two
transactions:
An ATM withdrawal that occurred at1:31 p.m. on January 2
A teller deposit that occurred at 3:15
p.m. on January 5
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
24/95
3.3.2 The Midlevel Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
25/95
3.3.3 The Physical Data Model
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
26/95
3.3.3 The Physical Data Model (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
27/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
28/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
29/95
3.3.3 The Physical Data Model (cont)
Note: This is not an issue of blindly
transferring a large number of records
from DASD to main storage. Instead, itis a more sophisticated issue of
transferring a bulk of records that have
a high probability of being accessed.
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
30/95
3.4 The Data Model and
Iterative DevelopmentWhy iterative developmentis important ? The industry track record of success
strongly suggests it.
The end user is unable to articulate many
requirements until the first iteration is done.
Management will not make a fullcommitment until at least a few actualresults are tangible and obvious.
Visible results must be seen quickly.
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
31/95
3.4 The Data Model and
Iterative Development (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
32/95
3.4 The Data Model and
Iterative Development (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
33/95
3.5 Normalization and Denormalization(ERD: Entity Relationship Diagram)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
34/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
35/95
3.5 Normalization and Denormalization
(hash algorithm: better search ability)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
36/95
3.5 Normalization and Denormalization(Use of Redundancy data search performance)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
37/95
3.5 Normalization and Denormalization(Separation of data & access probability)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
38/95
3.5 Normalization and Denormalization
(Derived Data What is it ?)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
39/95
3.5 Normalization and Denormalization
(Data Indexing vs. Profiles Why ?)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
40/95
3.5 Normalization and Denormalization(Referential data Integrity)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
41/95
3 5 1 S h t i th D t
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
42/95
3.5.1 Snapshots in the Data
Warehouse (Primary Data)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
43/95
3.5.1 Snapshots in the Data
Warehouse (Primary & 2nd data)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
44/95
3 6 1 M i R f
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
45/95
3.6.1 Managing Reference
Tables in a Data Warehouse
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
46/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
47/95
3 7 C li it f D t Th
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
48/95
3.7 Cyclicity of DataThe
Wrinkle of Time (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
49/95
3 8 C l it f
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
50/95
3.8 Complexity of
Transformation and Integration
As data passes from the operational, legacyenvironment to the data warehouse
environment, requires transformations and
or change in technologies
Extraction data from different sourcing
systems
Transformation encoding rules and data
types Loading to new environment
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
51/95
3.8 Complexity of
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
52/95
3.8 Complexity ofTransformation and Integration
(ct)
Data is cleansed as it passes from the operationalenvironment to the data warehouse environment.
Multiple input sources of data exist and must be merged asthey pass into the data warehouse.
When there are multiple input files, key resolution must bedone before the files can be merged.
With multiple input files, the sequence of the files may notbe the same or even compatible.
Multiple outputs may result. Data may be produced atdifferent levels of summarization by the same datawarehouse creation program.
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
53/95
3.8 Complexity of
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
54/95
3.8 Complexity ofTransformation and Integration
(ct) The input record type conversion Fixed-length records
Variable-length records
Occurs depending on
Occurs clause
Understand semantic (logicalmeanings) data relationship of old
systems
3.8 Complexity of
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
55/95
3.8 Complexity ofTransformation and Integration
(ct) Data format conversion must be done.EBCDIC to ASCII (or vice versa) must be
spelled out.
Massive volumes of input must be
accounted for.
The design of the data warehouse must
conform to a corporate data model.
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
56/95
3 9 Triggering the Data
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
57/95
3.9 Triggering the Data
Warehouse Record The basic business interaction that
populated data warehouse is called an
event-snapshot interaction.
3 9 2 Components of the
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
58/95
3.9.2 Components of the
SnapshotThe snapshot placed in the data warehouse normally
contains several components.
The unit of time that marks the occurrence of theevent.
The key that identifies the snapshot.
Theprimary (nonkey) data that relates to the key
Artifact of the relationship (secondary data that hasbeen incidentally captured as of the moment of thetaking of the snapshot and placed in the snapshot)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
59/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
60/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
61/95
3.10 Profile Records (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
62/95
3 12 Creating Multiple Profile
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
63/95
3.12 Creating Multiple Profile
Records Individual call records can be used to
create:
A customer profile record
A district traffic profile record
A line analysis profile record so forth.
3.13 Going from the Data
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
64/95
g
Warehouse to the Operational
Environment
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
65/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
66/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
67/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
68/95
3 15 1 An Airline Commission
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
69/95
3.15.1 An Airline Commission
Calculation System (ct)
3 15 2 A Retail
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
70/95
3.15.2 A Retail
Personalization SystemThe retail sales
representative could findout some otherinformation about cust.
The last type of purchasemade
The market segment orsegments in which thecustomer belongs
While engaging thecustomer in conversation,the sales representativemay initiates
I see its been since
February that we lastheard from you.
How was that bluesweater you purchased?
Did the problems you
had with the pants getresolved?
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
71/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
72/95
3 15 2 A Retail
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
73/95
3.15.2 A Retail
Personalization System (ct)
Periodically, the analysis program spinsoff a file to the operational
environment that contains such
information as the following: Last purchase date
Last purchase type
Market analysis/segmenting
3 15 3 Credit Scoring
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
74/95
3.15.3 Credit Scoring
based on (Demographics data)
The background check relies on the data warehouse.In truth, the check is an eclectic one, in which manyaspects of the customer are investigated, such asthe following:
Past payback history
Home/property ownership Financial management
Net worth
Gross income
Gross expenses
Other intangibles
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
75/95
3.15.3 Credit Scoring (ct)
The analysis program is run periodicallyand produces a prequalified file foruse in the operational environment. Inaddition to other data, the prequalified
file includes the following: Customer identification
Approved credit limit
Special approval limit
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
76/95
3.15.3 Credit Scoring (ct)
3 16 Indirect Use of Data
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
77/95
3.16 Indirect Use of Data
Warehouse Data
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
78/95
3 16 Indirect Use of Data
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
79/95
3.16 Indirect Use of Data
Warehouse Data (ct)
The online pre-analyzed data file: Contains only a small amount of data per unit of
data
May contain collectively a large amount of data(because there may be many units of data)
Contains precisely what the online clerk needs Is not updated, but is periodically refreshed on a
wholesale basis
Is part of the online high-performanceenvironment
Is efficient to access Is geared for access of individual units of data,
not massive sweeps of data
3 17 St J i
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
80/95
3.17 Star Joins
There are several very good reasons whynormalization and a relational approachproduces the optimal design for a datawarehouse:
It produces flexibility. It fits well with very granular data.
It is not optimized for any given set ofprocessing requirements.
It fits very nicely with the data model.
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
81/95
3.17 Star Joins (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
82/95
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
83/95
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
84/95
3.17 Star Joins (ct)
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
85/95
3.17 Star Joins (ct)
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
86/95
3.17 Star Joins (ct)
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
87/95
3.17 Star Joins (ct)
3 17 St J i ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
88/95
3.17 Star Joins (ct)
3 18 S ti th ODS
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
89/95
3.18 Supporting the ODSIn general, there are four classes of ODS:
Class IIn a class I ODS, updates of data from theoperational environment to the ODS are synchronous.
Class IIIn a class II ODS, the updates between theoperational environment and the ODS occur within a two-to-three-hour time frame.
Class IIIIn a class III ODS, the synchronization of updatesbetween the operational environment and the ODS occursovernight.
Class IVIn a class IV ODS, updates into the ODS from thedata warehouse are unscheduled. Figure 3-56 shows thissupport.
3 18 S ti th ODS ( t)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
90/95
3.18 Supporting the ODS (ct)
The customer has been active for several years. Theanalysis of the transactions in the data warehouseis used to produce the following profile informationabout a single customer:
Customer name and ID
Customer volumehigh/low Customer profitabilityhigh/low
Customer frequency of activityvery frequent/veryinfrequent
Customer likes/dislikes (fast cars, single maltscotch)
3 18 S pporting the ODS (ct)
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
91/95
3.18 Supporting the ODS (ct)
3.19 Requirements and the
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
92/95
q
Zachman Framework
3.19 Requirements and the
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
93/95
q
Zachman Framework (ct)
Summary
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
94/95
y
Design of data warehouse
Corporate Data model
Operational data model
Iterative approach since requirements are a non-priori
Different SDLC approach
Data warehouse construction considerations
Data Volume (large size) Data Latency (late arrival of data set)
Require transformation and understand of legacy
Data Models (granularities)
Low level
Mid Level High Level
Structure of typical record in data warehouse
Time stamp, a surrogate key, direct data, secondary data
Summary
-
7/29/2019 Building the Data WareHouse - Chapter 03 The Data Warehouse and Design
95/95
(cont)
Reference tables must be manage in time-variantmanner
Data Latency wrinkles of time
Data Transformation is complex Different architectures
Different technologies Different encoding rules and complex logics
Creation of data warehouse record is triggered byon event (activity)
A profile record is a composite representation of
data (historical activities) Star Join (is a preferred database design
techniques