bi session 10 2 lecture-06-conceptuall-model

Upload: vaibhav-gupta

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    1/27

    Data Integration

    Dr. N. P. Singh

    Professor

    Management Development Institute

    Mehrauli Road, Sukhrali

    Gurgaon -122001E-mail: [email protected]

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    2/27

    Extract-Transform-Load (ETL)

    Sources

    Extract Transform

    & Clean

    DW

    Load

    DSA

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    3/27

    Extract-Transform-Load (ETL)

    Sources DSA DW

    Extract Transform

    & Clean

    Load

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    4/27

    The lifecycle of a Data Warehouse and its

    ETL processes

    Conceptual

    Model for

    DW, Sources& Activities

    Logical Design

    Tuning

    Full Activity

    Description

    Software

    Construction

    Administration

    of DW

    Reverse Engineering

    of Sources &

    Requirements

    CollectionSoftware &

    SW Metrics

    Physical

    Model for

    DW, Sources

    & Activities

    Logical

    Model forDW, Sources

    & Activities

    Metrics

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    5/27

    Conceptual Model

    Entities of our model:

    Concepts

    Attributes

    Part-of Relationships

    Transformations

    Serial Composition of Transformations

    Provider Relationships

    Notes

    ETL Constraints

    Candidate Relationships

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    6/27

    Conceptual Model

    concept

    active canditate

    provider

    1:1

    part of

    attribute

    {XOR}

    candidate1

    candidaten

    ...

    Note

    provider

    N:M

    target

    ETL_constraint

    transformation

    serial

    composition

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    7/27

    Conceptual Model

    Concepts

    a name, finite set of attributes

    represent an entity in the source

    database or in the DW

    Attributes

    same role as in ER/dimensional

    models

    a granular module of information

    attribute

    concept

    We do not employ standard UML notation for concepts and attributes, for thereason that we need to treat attributes as first class citizens of our model

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    8/27

    Conceptual Model

    Part-of Relationships

    finite set of attributes

    emphasize the fact that

    a concept is composed

    of a set of attributes

    part of

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    9/27

    Conceptual Model

    Example

    Source 1

    S1.PARTSUPP {PKEY, SUPPKEY, QTY, COST}

    Data Warehouse

    DW.PARTSUPP {PKEY, SUPPKEY, DATE,

    QTY, COST}

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    10/27

    Conceptual Model

    S1.PARTSUPP DW.PARTSUPP

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    11/27

    Conceptual Model

    Transformations

    finite set of input/outputattributes, a symbol

    abstractions that represent

    parts, or full modules of

    code, executing a single

    task

    transformation

    two categories:

    filtering or data cleaning operations

    (e.g., foreign key violations)

    transformation operations

    (e.g., aggregation)

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    12/27

    Conceptual Model

    Provider Relationships finite set of input/output attributes, an

    appropriate transformation

    map a set of input attributes to a set of

    output attributes through a relevant

    transformation*

    provider

    N:M

    provider1:1

    * If the attributes are semantically and physically compatible, no transformation

    is required

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    13/27

    Conceptual Model

    S1.PARTSUPP DW.PARTSUPP

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

    f

    SK

    NN

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    14/27

    Conceptual Model

    Notes

    informal tags, exactly as in

    UML modeling

    used for:

    simple comments explaining

    design decisions

    explanation of the semantics

    of the applied transformation

    tracing of runtime

    constraints

    Note

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    15/27

    Conceptual Model

    S1.PARTSUPP DW.PARTSUPP

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

    Date = SysDate()

    f

    SK

    NN

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    16/27

    Conceptual Model

    ETL Constraints

    finite set of attributes, a

    single transformation

    express the fact that the

    data of a certain concept

    fulfill several requirements

    ETL_constraint

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    17/27

    Conceptual Model

    S1.PARTSUPP DW.PARTSUPP

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

    Date = SysDate()

    f

    SK

    PK

    NN

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    18/27

    Conceptual Model

    Candidate Relationships

    a single candidate concept, a single target concept used when a certain DW concept is populated by a

    finite set of more than one candidate source

    concepts

    Active Candidate Relationship a certain candidate that has been selected for the

    population of the target concept

    a specialization of candidate relationships

    target

    active canditate

    {XOR}

    candidate1

    candidaten

    ...

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    19/27

    Conceptual Model

    Annual

    PartSupps

    Recent

    PartSupps

    {XOR}

    S1.PartSupp

    S2.PartSupp

    DW.PartSupp

    Necessary providers:

    S1 and S2

    Due to acccuracy

    and small size

    (< update window)

    U

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    20/27

    Conceptual Model

    S1.PARTSUPPS2.PARTSUPP DW.PARTSUPP

    American toEuropean Date

    $2 Date = SysDate()

    SK

    f

    SUM(S2.Cost)

    SUM(S2.Qty)

    S2.Date

    S2.PKe

    y

    S2.Supp

    Key

    f

    NN

    f

    SK

    PK

    Cost

    Qty

    Date

    Department

    PKey

    SuppKey

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

    AnnualPartSupps

    RecentPartSupps

    {XOR}

    Due to acccuracyand small size

    (< update window)

    Necessary prov iders :S1 and S2

    {Duration

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    21/27

    Conceptual Model: first attempts

    S1.PARTSUPPS2.PARTSUPP DW.PARTSUPP

    American toEuropean Date

    $ Date = SysDate()

    SK

    f

    SUM(S2.Cost)

    SUM(S2.Qty)

    S2.Date

    S2.PKe

    y

    S2.Sup

    pKey

    f NN

    f

    SK

    PK

    Cost

    Qty

    Date

    PKey

    SuppKey

    Cost

    Qty

    PKey

    SuppKey

    Cost

    Date

    Qty

    PKey

    SuppKey

    Annual

    PartSupps

    RecentPartSupps

    {XOR}

    Due to acccuracyand small size

    (< update window)

    Necessary providers:S1 and S2

    {Duration

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    22/27

    Instantiation & Specialization

    Layers The key issues:

    generecity

    identification of a small set ofgeneric constructs to

    capture all cases

    usability

    construction of a palette offrequently used types

    I t ti ti & S i li ti L

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    23/27

    Instantiation & Specialization Layers

    Metamodel layer

    a set of generic entities, able to represent any ETL

    scenario

    involves classes: Concept, Attribute, Transformation,

    ETL Constraint and Relationship

    Template layer a set of built-in specializations of the entities of theMetamodel layer, specifically tailored for the most

    frequent elements of ETL scenarios

    Schema layer a specific ETL scenario

    all the entities of the Schema layer are instances of

    the classes of the Metamodel layer

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    24/27

    Instantiation & Specialization

    Layers

    InstanceOf

    IsA

    Concept Transformation RelationshipAttribute

    Fact Table

    ER EntityERRelationship

    DimensionAmerican to

    European Date

    $2

    Surrogate Key

    AssignmentAggregation

    Provider

    CandidatePart Of

    Serial

    Composition

    S2.PartSupp

    Metamodel

    Layer

    Template

    Layer

    ETL_Constraint

    DW.PartSupp

    Candidate

    1

    Candidate

    2

    Schema

    Layer

    SK

    f

    f

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    25/27

    Instantiation & Specialization

    Layers Template layer

    Four groups of logical transformations

    Filters

    Unary transformations

    Binary transformations

    Composite transformations

    Two groups of physical transformations Transfer operations

    File operations

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    26/27

    Instantiation & Specialization

    LayersFiltersSelection ()

    Not null (NN)

    Primary key violation (PK)

    Foreign key violation (FK)

    Unique value (UN)

    Domain mismatch DM)Unary transformationsPush

    Aggregation ()

    Projection ()

    Function application (f)

    Surrogate key assignment(SK)

    Tuple normalization (N)

    Tuple denormalization (DN)

    Binary transformationsUnion (U)

    Join ()

    Diff ()

    Update Detection (UPD)

    Composite transformationsSlowly changing dimension

    (Type 1,2,3) (SDC-1/2/3)

    Format mismatch (FM)

    Data type conversion (DTC)

    Switch (*)

    Extended union (U)

    File operationsEBCDIC to ASCII conversion

    (EB2AS)

    Sort file (Sort)

    Transfer operationsFtp (FTP)

    Compress/Decompress (Z/dZ)

    Encrypt/Decrypt (Cr/dCr)

  • 7/27/2019 BI Session 10 2 Lecture-06-CONCEPTUALl-Model

    27/27

    Methodology

    Step 1

    Identification of the proper data stores

    Step 2

    Candidates and active candidates for theinvolved data stores

    Step 3

    Attribute mapping between the providers andthe consumers

    Step 4

    Annotating the diagram with runtime

    constraints