bi session 10 2 lecture-06-conceptuall-model
TRANSCRIPT
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
1/27
Data Integration
Dr. N. P. Singh
Professor
Management Development Institute
Mehrauli Road, Sukhrali
Gurgaon -122001E-mail: [email protected]
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
2/27
Extract-Transform-Load (ETL)
Sources
Extract Transform
& Clean
DW
Load
DSA
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
3/27
Extract-Transform-Load (ETL)
Sources DSA DW
Extract Transform
& Clean
Load
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
4/27
The lifecycle of a Data Warehouse and its
ETL processes
Conceptual
Model for
DW, Sources& Activities
Logical Design
Tuning
Full Activity
Description
Software
Construction
Administration
of DW
Reverse Engineering
of Sources &
Requirements
CollectionSoftware &
SW Metrics
Physical
Model for
DW, Sources
& Activities
Logical
Model forDW, Sources
& Activities
Metrics
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
5/27
Conceptual Model
Entities of our model:
Concepts
Attributes
Part-of Relationships
Transformations
Serial Composition of Transformations
Provider Relationships
Notes
ETL Constraints
Candidate Relationships
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
6/27
Conceptual Model
concept
active canditate
provider
1:1
part of
attribute
{XOR}
candidate1
candidaten
...
Note
provider
N:M
target
ETL_constraint
transformation
serial
composition
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
7/27
Conceptual Model
Concepts
a name, finite set of attributes
represent an entity in the source
database or in the DW
Attributes
same role as in ER/dimensional
models
a granular module of information
attribute
concept
We do not employ standard UML notation for concepts and attributes, for thereason that we need to treat attributes as first class citizens of our model
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
8/27
Conceptual Model
Part-of Relationships
finite set of attributes
emphasize the fact that
a concept is composed
of a set of attributes
part of
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
9/27
Conceptual Model
Example
Source 1
S1.PARTSUPP {PKEY, SUPPKEY, QTY, COST}
Data Warehouse
DW.PARTSUPP {PKEY, SUPPKEY, DATE,
QTY, COST}
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
10/27
Conceptual Model
S1.PARTSUPP DW.PARTSUPP
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
11/27
Conceptual Model
Transformations
finite set of input/outputattributes, a symbol
abstractions that represent
parts, or full modules of
code, executing a single
task
transformation
two categories:
filtering or data cleaning operations
(e.g., foreign key violations)
transformation operations
(e.g., aggregation)
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
12/27
Conceptual Model
Provider Relationships finite set of input/output attributes, an
appropriate transformation
map a set of input attributes to a set of
output attributes through a relevant
transformation*
provider
N:M
provider1:1
* If the attributes are semantically and physically compatible, no transformation
is required
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
13/27
Conceptual Model
S1.PARTSUPP DW.PARTSUPP
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
f
SK
NN
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
14/27
Conceptual Model
Notes
informal tags, exactly as in
UML modeling
used for:
simple comments explaining
design decisions
explanation of the semantics
of the applied transformation
tracing of runtime
constraints
Note
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
15/27
Conceptual Model
S1.PARTSUPP DW.PARTSUPP
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
Date = SysDate()
f
SK
NN
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
16/27
Conceptual Model
ETL Constraints
finite set of attributes, a
single transformation
express the fact that the
data of a certain concept
fulfill several requirements
ETL_constraint
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
17/27
Conceptual Model
S1.PARTSUPP DW.PARTSUPP
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
Date = SysDate()
f
SK
PK
NN
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
18/27
Conceptual Model
Candidate Relationships
a single candidate concept, a single target concept used when a certain DW concept is populated by a
finite set of more than one candidate source
concepts
Active Candidate Relationship a certain candidate that has been selected for the
population of the target concept
a specialization of candidate relationships
target
active canditate
{XOR}
candidate1
candidaten
...
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
19/27
Conceptual Model
Annual
PartSupps
Recent
PartSupps
{XOR}
S1.PartSupp
S2.PartSupp
DW.PartSupp
Necessary providers:
S1 and S2
Due to acccuracy
and small size
(< update window)
U
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
20/27
Conceptual Model
S1.PARTSUPPS2.PARTSUPP DW.PARTSUPP
American toEuropean Date
$2 Date = SysDate()
SK
f
SUM(S2.Cost)
SUM(S2.Qty)
S2.Date
S2.PKe
y
S2.Supp
Key
f
NN
f
SK
PK
Cost
Qty
Date
Department
PKey
SuppKey
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
AnnualPartSupps
RecentPartSupps
{XOR}
Due to acccuracyand small size
(< update window)
Necessary prov iders :S1 and S2
{Duration
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
21/27
Conceptual Model: first attempts
S1.PARTSUPPS2.PARTSUPP DW.PARTSUPP
American toEuropean Date
$ Date = SysDate()
SK
f
SUM(S2.Cost)
SUM(S2.Qty)
S2.Date
S2.PKe
y
S2.Sup
pKey
f NN
f
SK
PK
Cost
Qty
Date
PKey
SuppKey
Cost
Qty
PKey
SuppKey
Cost
Date
Qty
PKey
SuppKey
Annual
PartSupps
RecentPartSupps
{XOR}
Due to acccuracyand small size
(< update window)
Necessary providers:S1 and S2
{Duration
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
22/27
Instantiation & Specialization
Layers The key issues:
generecity
identification of a small set ofgeneric constructs to
capture all cases
usability
construction of a palette offrequently used types
I t ti ti & S i li ti L
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
23/27
Instantiation & Specialization Layers
Metamodel layer
a set of generic entities, able to represent any ETL
scenario
involves classes: Concept, Attribute, Transformation,
ETL Constraint and Relationship
Template layer a set of built-in specializations of the entities of theMetamodel layer, specifically tailored for the most
frequent elements of ETL scenarios
Schema layer a specific ETL scenario
all the entities of the Schema layer are instances of
the classes of the Metamodel layer
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
24/27
Instantiation & Specialization
Layers
InstanceOf
IsA
Concept Transformation RelationshipAttribute
Fact Table
ER EntityERRelationship
DimensionAmerican to
European Date
$2
Surrogate Key
AssignmentAggregation
Provider
CandidatePart Of
Serial
Composition
S2.PartSupp
Metamodel
Layer
Template
Layer
ETL_Constraint
DW.PartSupp
Candidate
1
Candidate
2
Schema
Layer
SK
f
f
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
25/27
Instantiation & Specialization
Layers Template layer
Four groups of logical transformations
Filters
Unary transformations
Binary transformations
Composite transformations
Two groups of physical transformations Transfer operations
File operations
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
26/27
Instantiation & Specialization
LayersFiltersSelection ()
Not null (NN)
Primary key violation (PK)
Foreign key violation (FK)
Unique value (UN)
Domain mismatch DM)Unary transformationsPush
Aggregation ()
Projection ()
Function application (f)
Surrogate key assignment(SK)
Tuple normalization (N)
Tuple denormalization (DN)
Binary transformationsUnion (U)
Join ()
Diff ()
Update Detection (UPD)
Composite transformationsSlowly changing dimension
(Type 1,2,3) (SDC-1/2/3)
Format mismatch (FM)
Data type conversion (DTC)
Switch (*)
Extended union (U)
File operationsEBCDIC to ASCII conversion
(EB2AS)
Sort file (Sort)
Transfer operationsFtp (FTP)
Compress/Decompress (Z/dZ)
Encrypt/Decrypt (Cr/dCr)
-
7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model
27/27
Methodology
Step 1
Identification of the proper data stores
Step 2
Candidates and active candidates for theinvolved data stores
Step 3
Attribute mapping between the providers andthe consumers
Step 4
Annotating the diagram with runtime
constraints