planning & project management

Upload: fecaxeyivu

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Planning & Project Management

    1/65

    Prof. Chandan Singhavi

  • 8/11/2019 Planning & Project Management

    2/65

    Requirements gathering

    Requirements definition document with

    information packages)

    Data design

    Dimensional model

  • 8/11/2019 Planning & Project Management

    3/65

  • 8/11/2019 Planning & Project Management

    4/65

    Choosing the process Selecting the subjects from the information pakages for

    the first set of logical structures to be designed

    Choosing the grain Determining the level of detail for the data in the data

    structures

    Identifying and conforming dimensions Making sure that each particular data element in every

    business dimension is conformed to one another

    Choosing the facts Selecting the metrics or units of measures (eg, product

    sales unit, dollar sales, dollar revenue) to be included infirst set

    Choosing the duration of the database Determining how far back in time you should go for

    historical data.

  • 8/11/2019 Planning & Project Management

    5/65

    It is logical design technique to structure thebusiness dimensions and metrics that areanalyzed along these techniques

    Gets its name form business dimensions

    The model has also proved to provide high

    performance for queries and analysis

    Information Package is the foundation

  • 8/11/2019 Planning & Project Management

    6/65

    Reviewing the information package diagram,we notice three types of entities

    Measurements or metrics

    Business dimensions

    Attributes for each business dimension

  • 8/11/2019 Planning & Project Management

    7/65

  • 8/11/2019 Planning & Project Management

    8/65

    It represents business dimensions Facts are used for the analysis

    The attributes in the dimension table acts as filtersin our queries

    Each dimension table has an equal chance of aquery

    Each dimension table has direct relationship withthe fact table in the middle

    Each dimension table has one to many relationship

    with the fact table Such organization looks like a STAR

  • 8/11/2019 Planning & Project Management

    9/65

  • 8/11/2019 Planning & Project Management

    10/65

    Dimension modeling should primarilyfacilitate queries and analysis

    Typical query could be

    How much sales proceeds did the jeepcherokee, year 2000 model with standardoptions, generate in jan 2000 at big sam

    autodealership for buyers who own theirhomes and who took three years leases,financed by diamler chrysler financing.

  • 8/11/2019 Planning & Project Management

    11/65

    Some criteria for combining the tables intodimension model

    Model should provide best data access Must be query centric Optimize for queries and analysis Must show the dimension tables interact with fact

    table Should be structured in a way that every

    dimension should interact equally to the facttable Should allow drilling down or rolling up along

    dimension hierarchy

  • 8/11/2019 Planning & Project Management

    12/65

  • 8/11/2019 Planning & Project Management

    13/65

  • 8/11/2019 Planning & Project Management

    14/65

    definition A simple database design in which dimensional

    data are separated from fact or eventdata(describing individual businesstransactions).

    Also known as dimension model Suitable for Ad-hoc queries Simplest star schema consists of one fact table

    surrounded by many dimension tables

    Fact table Contain factual or quantitative data about a

    business such as Units sold, order booked etc. PK of fact table is composite of FK

  • 8/11/2019 Planning & Project Management

    15/65

  • 8/11/2019 Planning & Project Management

    16/65

  • 8/11/2019 Planning & Project Management

    17/65

  • 8/11/2019 Planning & Project Management

    18/65

    Key component of dimension model is set ofdimension tables

  • 8/11/2019 Planning & Project Management

    19/65

  • 8/11/2019 Planning & Project Management

    20/65

  • 8/11/2019 Planning & Project Management

    21/65

  • 8/11/2019 Planning & Project Management

    22/65

  • 8/11/2019 Planning & Project Management

    23/65

    STAR scheme is a relational model, it is not a

    normalized model:

    Easy for user to understand

    Optimizes navigation

    Most suitable for query processing

    STARjoin and STARindex

  • 8/11/2019 Planning & Project Management

    24/65

    Over time size of fact table goes onincreasing -- may be new records or updates

    Dimension table are more stable and lessvolatile

  • 8/11/2019 Planning & Project Management

    25/65

    Slowly changing dimensions

    Type 1 changes: correction of errors

    Type 2 changes: preservation of history

    Type 3 changes: tentative soft revisions

  • 8/11/2019 Planning & Project Management

    26/65

    Most dimensions are constant over time Change slowly

    Product key of source record does not change

    Description and other attribute Changesslowly over the time

    overwriting is not always appropriate

  • 8/11/2019 Planning & Project Management

    27/65

    Principles Change relate to correction of errors

    Change in the source system have nosignificance

    Need not be preserve in the data warehouse

  • 8/11/2019 Planning & Project Management

    28/65

  • 8/11/2019 Planning & Project Management

    29/65

    True changes in the source system Need to preserve history in the data

    warehouse

    Partitions the history in the data warehouse

    Every change for the same attribute must bepreserve

  • 8/11/2019 Planning & Project Management

    30/65

  • 8/11/2019 Planning & Project Management

    31/65

    They usually relate to soft or tentativechanges in the source system

    There is a need to keep track of history withold and new values of the changed attribute

    They are used to compare performanceacross the transition.

    They provide the ability to track forward and

    backward

  • 8/11/2019 Planning & Project Management

    32/65

  • 8/11/2019 Planning & Project Management

    33/65

  • 8/11/2019 Planning & Project Management

    34/65

    Large dimensions, multiple hierarchies

    Rapidly changing dimensions

    Junk dimensions

  • 8/11/2019 Planning & Project Management

    35/65

    Very deep or wide

    Customer

    Product

  • 8/11/2019 Planning & Project Management

    36/65

    Need to address following issues by using effectivedesign methods, by choosing proper indexes and byapplying other optimization techniques

    Population of very large dimension tables

    Browse performance of unconstrained dimension,especially where the cardinality of the attribute is low

    Browsing time for cross constrained values ofdimension attributes

    Inefficiencies in fact table queries when largedimensions need to be used

    Additional rows created to handle type 2 slowingchanging dimension

  • 8/11/2019 Planning & Project Management

    37/65

  • 8/11/2019 Planning & Project Management

    38/65

    Dimension table could be littered with a verylarge number of additional rows created everytime there is an incremental load.

    Effective approach is break the largedimension table may be separated into one ormore simpler dimension table.

  • 8/11/2019 Planning & Project Management

    39/65

  • 8/11/2019 Planning & Project Management

    40/65

    Miscellaneous flags and textual field

    Choices Exclude and discard all flags and texts. Place the flags and texts unchanged in the fact

    table Make each flag and text a separate dimension

    table on its own. Keep only those flags and texts that are

    meaningful; group all the useful flags into a

    single dimension junk These junk dimension attributes are useful for

    constraining queries based on flag/text values.

  • 8/11/2019 Planning & Project Management

    41/65

    Options to normalize

    Advantages and disadvantages

    When to snowflake

  • 8/11/2019 Planning & Project Management

    42/65

    Snowflaking is a method of normalizing thedimension tables in a STAR schema.

    When you completely normalize all the

    dimension tables, the resultant structureresembles a snowflake with the fact table inthe middle.

  • 8/11/2019 Planning & Project Management

    43/65

  • 8/11/2019 Planning & Project Management

    44/65

  • 8/11/2019 Planning & Project Management

    45/65

  • 8/11/2019 Planning & Project Management

    46/65

  • 8/11/2019 Planning & Project Management

    47/65

    Advantages Small saving in storage space

    Normalized structures are easier to update andmaintain

    Disadvantages Schema less intuitive and end users are put off by

    the complexity

    Ability to browse through the contents difficult Degraded query performance because of additionaljoins

  • 8/11/2019 Planning & Project Management

    48/65

    Snow flaking is not generally recommendedin a data warehouse environment. Queryperformance takes highest priority

  • 8/11/2019 Planning & Project Management

    49/65

    space Sub dimension

  • 8/11/2019 Planning & Project Management

    50/65

  • 8/11/2019 Planning & Project Management

    51/65

  • 8/11/2019 Planning & Project Management

    52/65

  • 8/11/2019 Planning & Project Management

    53/65

  • 8/11/2019 Planning & Project Management

    54/65

  • 8/11/2019 Planning & Project Management

    55/65

    Tremendous boost to performance

  • 8/11/2019 Planning & Project Management

    56/65

  • 8/11/2019 Planning & Project Management

    57/65

    Effect of sparsity on aggregation When you go for higher levels of aggregates, The

    sparsity percentage moves up. You have to payattention to this problem

    Aggregation option

  • 8/11/2019 Planning & Project Management

    58/65

  • 8/11/2019 Planning & Project Management

    59/65

    Almost all data warehouses contain multiple

    STAR scheme structures figure 11-16)

    Snapshot and transaction tables figure 11-

    17)

    Core and custom tables figure 11-18)

    Supporting enterprise value chain

    Conforming dimensions,

    standardizing facts

  • 8/11/2019 Planning & Project Management

    60/65

  • 8/11/2019 Planning & Project Management

    61/65

  • 8/11/2019 Planning & Project Management

    62/65

  • 8/11/2019 Planning & Project Management

    63/65

    A conformed dimension is a comprehensivecombination of attributes from the sourcesystem after resolving all discrepancies andconflicts.

    Confirm dimension allows rollup acrossdatamarts

  • 8/11/2019 Planning & Project Management

    64/65

  • 8/11/2019 Planning & Project Management

    65/65