data war housing

Upload: biswadipsaha

Post on 10-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Data War Housing

    1/12

    Data Warehousing

    Venkataraj Jayaraj

  • 8/8/2019 Data War Housing

    2/12

    Venkataraj Jayaraj

    Data Warehousing

  • 8/8/2019 Data War Housing

    3/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential3

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Normal Reporting Architecture

    Source

    Reports

    Reports

    Reports

    Source

    Reports

    Reports

    Reports

    From a Source the reports are generated directly without any transformations.

    Benefits:

    Represent current data

    Simple and easy to design and generates the reports

    Drawbacks

    No historical data May not be useful in decision making process.

    Data Warehouse Architecture

    Source

    Analysis

    Reporting

    Data Mining

    Staging

    Area

    Data

    Warehouse

    Data Mart

    Metadata

    Raw

    Data

    Summary

    Data

    Oracle

    Teradata

    DB2

    SQL Server

  • 8/8/2019 Data War Housing

    4/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential4

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Benefits:

    Performance

    Report generation simplified

    Contain history

    Drawbacks:

    No current data

    Administration overhead

    Source:

    Its a Database where from extract the data. Ex: Oracle, Teradata,Sybase,DB2

    Staging area:

    Its a temporary storage area used for the process of data

    Meta Data:

    Data about the data OrDescription of the data.

    Data mart

    A Data mart is nothing but a Data warehouse but for specific domain

    A Data mart can be divided into two types:

    Independent Data mart

    Dependent Data mart

  • 8/8/2019 Data War Housing

    5/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential5

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Independent Data mart

    SourceAnalysis

    Reporting

    Data Mining

    StagingArea

    DataWarehouse

    Data Mart

    Metadata

    Raw

    Data

    Summary

    Data

    Oracle

    Teradata

    DB2

    SQL Server

    Independent Data mart Architecture

    SourceAnalysis

    Reporting

    Data Mining

    StagingArea

    DataWarehouse

    Data Mart

    Metadata

    Raw

    Data

    Summary

    Data

    Oracle

    Teradata

    DB2

    SQL Server

    Independent Data mart Architecture

    Such Data marts extract the data from source databases directly and these Data marts are

    merged into Data warehouse.

    Advantages:

    Maximum utilization of resources

    Hardware ,Software,Manpower

    Easy maintains

    Risk of failure is reduced

    Disadvantages:

    Total cost of development is very high

    Integration problem

    This approach is good for: Large organizations

  • 8/8/2019 Data War Housing

    6/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential6

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Dependent Data mart

    Dependent Data mart Architecture

    SourceAnalysis

    Reporting

    Data Mining

    StagingArea

    Data

    Warehouse

    Data Mart

    Metadata

    Raw

    Data

    Summary

    Data

    Dependent Data mart Architecture

    SourceAnalysis

    Reporting

    Data Mining

    StagingArea

    Data

    Warehouse

    Data Mart

    Metadata

    Raw

    Data

    Summary

    Data

    Such Data mart extract data from Data warehouse

    Advantages:

    Total cost & time of development is very low

    No integration problem

    Disadvantages:

    Cant use the full resources.

    This approach is good for:

    Small & medium sized organization

    new organization

    What are Data Warehouses?

    Data warehouses store large volumes of data which are frequently used by DSS.

    It is maintained separately from the organizations operational databases.

    Data warehouses are relatively static with only infrequent updates.

    A data warehouse is a stand-alone repository of information, integrated from several, possibly

    heterogeneous operational databases.

  • 8/8/2019 Data War Housing

    7/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential7

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Data Warehousing

    Is the enabling technology that facilitates improved business decision-making.Its a process,

    not a product

    A technique for assembling and managing a wide variety of data from multiple operational

    systems for decision support and analytical processing.

    Data Warehouse is a

    Subject-Oriented- Integrated - Time-Variant- Non-volatile

    collection of data in support of managements decision

    Subject Oriented Analysis

    SalesSales

    CustomersCustomers

    ProductsProducts

    Entry

    Sales Rep

    Quantity Sold

    Prod Number

    Date

    Customer Name

    Product Description

    Unit Price

    Mail Address

    Process Oriented Subject Oriented

    Transactional Storage Data Warehouse Storage

  • 8/8/2019 Data War Housing

    8/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential8

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Integration of Data

    Data Warehouse StorageTransactional Storage

    Appl. A - M, F

    Appl. B - 1, 0

    Appl. C - X, Y

    Appl. A - pipeline cm.

    Appl. B - pipeline inches

    Appl. C - pipeline mcf

    Appl. A - balance dec(13,2)

    Appl. B - balance PIC

    9(9)V99

    Appl. C - balance float

    Appl. A - date (Julian)

    Appl. B - date (yymmdd)

    Appl. C - date (absolute)

    M, F

    pipeline cm

    balance dec(13, 2)

    date (Julian)

    Encoding

    Unit of

    Attributes

    Physical

    Attributes

    Data

    Consistency

    Volatility of Data

    Mass Load / Access of DataRecord-by-Record Data

    Manipulation

    Insert

    Access

    Change

    Delete

    Change

    Volatile Non-Volatile

    Data Warehouse StorageTransactional Storage

    Load

  • 8/8/2019 Data War Housing

    9/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential9

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    To Data warehouse structure we can use Dimensional Modeling.

    1. Measurable Data (Measures)

    2. Dimension Data (Dimension)

    Measurable Data

    Those numeric data that can be used in mathematical operations and can be summarized and

    aggregated.Ex: net profit

    Measurable data is required to evaluate the performance of a person, object etc for example

    Net profit of a company can be used & evaluate company performance.

    Measurable data are analyzed from different angles referred as dimension.At least two dimension are required to evaluate a measure(s)

    Dimension Data

    An angle to evaluates measures are referred as dimension.

    A Dimension can be collection of sub-dimension referred as levels.

    These sub-dimensions with in a dimension. We arranged in hierarchical relation

    It means two sub-dimension can not be at the same level.

    Types of schemas

    Star Schema

    Starflake schema

    Snow flake schema

    Star schema

    Measurable data in center surrounded by different dimensions

    A dimension will have only one level , so these in no hierarchy.

    No relation should be defined between two dimension.

    Combination of measures with related dimensions is referred as cube.

  • 8/8/2019 Data War Housing

    10/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential10

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Collection of measures at database level becomes table ( referred as fact table )

    Levels ( sub dimensions) with in a dimension also become a table at database level( referred

    as dimension table)

    Database term

    Schema

    -----

    Table (dimension table)

    Table (fact table)

    Constraint

    Database

    Columns

    Data ware term

    Cube

    Dimension

    Level

    Measure

    Relation/hierarchy

    Data ware/data mart

    Attributes

    Starflake schema

    Same as star schema but the cube will have at least one dimension with Two / more levels in

    single hierarchy.

    Snowflake schema

    Same use starflake schema but the cube will have at least one dimension with two/more

    levels under at least Two hierarchy.

    ETL

    Extract, Transform, and Load (ETL) is a process in data warehousing that involves

    extracting data from outside sources, transforming it to fit business needs (which can include

    quality levels), and ultimately loading it into the end target, i.e. the data warehouse.ETL process can be created using almost any programming language, creating them from

    scratch is quite complex. ETL tools available to help in the creation of ETL processes.

    A good ETL tool must be able to communicate with the many different relational databases

    and read the various file formats used throughout an organization.

  • 8/8/2019 Data War Housing

    11/12

    Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted

    in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.

    TCS Confidential11

    Data Warehousing

    Confidential & Proprietary

    Copyright 2008 The Nielsen Company

    Some of the ETL Tools available in the Market are:

    Ab Initio

    Apatar

    BusinessObjects Data Integrator

    Clover.ETL

    DMExpress

    Data Junction

    Data Transformation Services

    IBM WebSphere DataStage

    Informatica

    LogiXMLPentaho

    Pervasive Data Integrator

    RODIN Data Asset Management

    SQL Server Integration Services

    Scriptella

    Sprog (software)

    Sunopsis

    Talend Open Studio

  • 8/8/2019 Data War Housing

    12/12