bidw concepts

Upload: ajujan

Post on 10-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 BIDW Concepts

    1/56

  • 8/8/2019 BIDW Concepts

    2/56

    Agenda

    Data warehousing

    overview

    Data warehouse Vs OLTP

    Data warehouse Vs DataMart

  • 8/8/2019 BIDW Concepts

    3/56

    integration * intelligence * insight

    What is BI? Business intelligence (BI) is a broad category of

    application programs and technologies for gathering,storing, analyzing, and providing access to data tohelp enterprise users make better businessdecisions.

    BI applications include the activities of decisionsupport, query and reporting, online analyticalprocessing (OLAP), statistical analysis, forecasting,

    and data mining.

    Examples : Business Objects :www.businessobjects.com

    3

  • 8/8/2019 BIDW Concepts

    4/56

    integration * intelligence * insight

    BI- Nutshell

    4

    RawData

  • 8/8/2019 BIDW Concepts

    5/56

    Which are ourlowest/highest margin

    customers ?

    Who are my customersand what products

    are they buying?

    Which customers

    are most likely to goto the competition ?

    What impact willnew products/services

    have on revenue

    and margins?

    What product prom-

    -otions have the biggestimpact on revenue?

    What is the mosteffective distribution

    channel?

    A producer wants to know.

  • 8/8/2019 BIDW Concepts

    6/56

    Data, Data everywhereyet ...

    I cant find the data I need

    data is scattered over thenetwork

    many versions, subtledifferences

    I cant get the data I need

    need an expert to get the data

    I cant understand the data Ifound

    available data poorly documented

    I cant use the data I found

    results are unexpected

    data needs to be transformedfrom one form to other

  • 8/8/2019 BIDW Concepts

    7/56

    What is a Data Warehouse?

    A single, complete andconsistent store of dataobtained from a variety

    of different sourcesmade available to endusers in a what theycan understand and use

    in a business context.

    [Barry Devlin]

  • 8/8/2019 BIDW Concepts

    8/56

    What are the users saying...

    Data should be integratedacross the enterprise

    Summary data has a real

    value to the organization

    Historical data holds thekey to understanding data

    over timeWhat-if capabilities are

    required

  • 8/8/2019 BIDW Concepts

    9/56

    What is Data Warehousing?

    A process of

    transforming data intoinformation andmaking it available tousers in a timelyenough manner to

    make a difference

    Data

    Information

  • 8/8/2019 BIDW Concepts

    10/56

    Evolution

    60s: Batch reports

    hard to find and analyze information

    inflexible and expensive, reprogram every newrequest

    70s: Terminal-based DSS and EIS (executive

    information systems)still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis tools

    query tools, spreadsheets, GUIs

    easier to use, but only access operational databases 90s till now: Data warehousing with

    integrated OLAP engines and tools, real timeDW

  • 8/8/2019 BIDW Concepts

    11/56

    Data Warehouse

    A data warehouse is a

    subject-oriented

    integrated

    time-varying

    non-volatile

    Accessible

    collection of data that is used primarily in

    organizational decision making.

    -- Bill Inmon, Building the Data Warehouse 1996

  • 8/8/2019 BIDW Concepts

    12/56

    Explorers, Farmers and Tourists

    Explorers: Seek out the unknown andpreviously unsuspected rewards hiding inthe detailed data

    Farmers: Harvest informationfrom known access paths

    Tourists: Browse informationharvested by farmers

  • 8/8/2019 BIDW Concepts

    13/56

    Data Warehouse Architecture

    Data Warehouse

    Engine

    Optimized Loader

    ExtractionCleansing

    Analyze

    Query

    Metadata Repository

    RelationalDatabases

    LegacyData

    Purchased

    Data

    ERPSystems

  • 8/8/2019 BIDW Concepts

    14/56

    Data Mining works with WarehouseData

    Data Warehousingprovides the Enterprisewith a memory

    Data Mining providesthe Enterprise withintelligence

  • 8/8/2019 BIDW Concepts

    15/56

    What makes data mining possible?

    Advances in the following areas aremaking data mining deployable:

    data warehousing

    better and more data (i.e., operational,behavioral, and demographic)

    the emergence of easily deployed data

    mining tools andthe advent of new data mining

    techniques. -- Gartner Group

  • 8/8/2019 BIDW Concepts

    16/56

    Why Separate Data Warehouse?

    Performance

    Operational database designed & tuned for known transactions &workloads.

    Complex OLAP queries would degrade performance. for optransactions.

    Special data organization, access & implementation methodsneeded for multidimensional views & queries.

    Function

    Missing data: Decision support requires historical data, which

    Operational database do not typically maintain.

    Data consolidation: Decision support requires consolidation(aggregation, summarization) of data from many heterogeneoussources: operational databases, external sources.

    Data quality: Different sources typically use inconsistent datare resentations codes and formats which have to be reconciled.

  • 8/8/2019 BIDW Concepts

    17/56

    Benefits of a Data Warehouse

    Reliable reporting

    Rapid access to data

    Integrated dataFlexible presentation of data

    Better decision making

  • 8/8/2019 BIDW Concepts

    18/56

    So, whats different?

  • 8/8/2019 BIDW Concepts

    19/56

    Application-Orientation vs. Subject-Orientation

    Application-Orientation

    Operational

    Database

    LoansCreditCard

    Trust

    Savings

    Subject-Orientation

    Data

    Warehouse

    Customer

    Vendor

    Product

    Activity

  • 8/8/2019 BIDW Concepts

    20/56

    OLTP vs Data Warehouse

    OLTP

    Application Oriented

    Used to run business

    Detailed data

    Current up to date

    Isolated Data

    Repetitive access

    Clerical User

    Warehouse (DSS)

    Subject Oriented

    Used to analyze business

    Summarized and refined

    Snapshot data

    Integrated Data

    Ad-hoc access

    Knowledge User

    (Manager)

  • 8/8/2019 BIDW Concepts

    21/56

    OLTP vs Data Warehouse

    OLTP

    Performance Sensitive

    Few Records accessed at

    a time (tens)

    Read/Update Access

    No data redundancy

    Database Size 100MB-100 GB

    Thousands of users

    Data Warehouse

    Performance relaxed

    Large volumes accessed

    at a time(millions)Mostly Read (Batch

    Update)

    Redundancy present

    Database Size

    100 GB - few terabytesHundreds of users

  • 8/8/2019 BIDW Concepts

    22/56

    To summarize ...

    OLTP Systems areused to runabusiness

    The DataWarehouse helpsto optimize thebusiness

  • 8/8/2019 BIDW Concepts

    23/56

    Why Now?

    Data is being produced

    ERP provides clean data

    The computing power is available

    The computing power is affordable

    The competitive pressures are strong

    Commercial products are available

  • 8/8/2019 BIDW Concepts

    24/56

    Data Warehouses:Architecture, Design & Construction

    DW Architecture

    Loading, refreshing

    Structuring/Modeling

    DWs and Data Marts

  • 8/8/2019 BIDW Concepts

    25/56

    Stages in Data Warehousing Life Cycle

  • 8/8/2019 BIDW Concepts

    26/56

    Data Warehouse Architectures

    Generic Two-Level Architecture

    Independent Data Mart

    Dependent Data Mart andOperational Data Store

    All involve some form ofextraction,transformation and loading (ETL)

  • 8/8/2019 BIDW Concepts

    27/56

    Generic two-level architecture

    E

    T

    L

    One,company-

    wide

    warehouse

    Periodic extraction data is not completely current in warehouse

    Independent Data Mart

  • 8/8/2019 BIDW Concepts

    28/56

    Independent Data MartData marts:Mini-warehouses, limited in scope

    E

    T

    L

    Separate ETL for each

    independent data mart

    Data access complexity

    due tomultiple data marts

    Dependent data mart with operational data store

  • 8/8/2019 BIDW Concepts

    29/56

    Dependentdata mart with operational data store

    E

    T

    L

    Single ETL for

    enterprise data warehouse

    (EDW)

    Simpler data access

    ODS provides option for

    obtainingcurrent data

    Dependent data marts

    loaded from EDW

  • 8/8/2019 BIDW Concepts

    30/56

    The ETL Process

    Capture

    Scrub or data cleansing

    Transform

    Load

    ETL = Extract, transform, and load

  • 8/8/2019 BIDW Concepts

    31/56

    Steps in data reconciliation

    Static extract = capturing a

    snapshot of the source data at

    a point in time

    Incremental extract =

    capturing changes that have

    occurred since the last static

    extract

    Capture = extractobtaining a snapshot

    of a chosen subset of the source data for

    loading into the data warehouse

  • 8/8/2019 BIDW Concepts

    32/56

    Steps in data reconciliation (continued)

    Scrub = cleanseuses pattern

    recognition and AI techniques to

    upgrade data quality

    Fixing errors: misspellings,erroneous dates, incorrect field usage,

    mismatched addresses, missing data,

    duplicate data, inconsistencies

    Also: decoding, reformatting, timestamping, conversion, key generation,

    merging, error detection/logging,

    locating missing data

  • 8/8/2019 BIDW Concepts

    33/56

    Steps in data reconciliation (continued)

    Transform = convert data from format

    of operational system to format of data

    warehouse

    Record-level:Selectiondata partitioning

    Joiningdata combining

    Aggregationdata summarization

    Field-level:single-fieldfrom one field to one field

    multi-fieldfrom many fields to one, or

    one field to many

  • 8/8/2019 BIDW Concepts

    34/56

    Steps in data reconciliation (continued)

    Load/Index= place transformed data

    into the warehouse and create indexes

    Refresh mode: bulk rewriting oftarget data at periodic intervals

    Update mode: only changes insource data are written to data

    warehouse

  • 8/8/2019 BIDW Concepts

    35/56

    Data Warehouse vs. Data Marts

    What comes first ?

  • 8/8/2019 BIDW Concepts

    36/56

    Data Mart

    Data mart is:

    A functional segmentof an enterpriserestricted for purposes of security, locality,

    performance, or business necessity usingmodeling and information deliverytechniques identical to data warehousing.

  • 8/8/2019 BIDW Concepts

    37/56

    Data Mart

    Why build a data mart?

    Allows an organization to visualize the large but focuson the small and attainable.

    Provides a platform for rapid delivery of an operationalsystem.

    Minimizes risk.

    A corporate warehouse can be constructed from theunion of the enterprise data marts.

  • 8/8/2019 BIDW Concepts

    38/56

    Data Mart- Approach

    Physical data warehouse (physical)

    Data warehouse --> data marts

    Data marts --> data warehouse

    Parallel data warehouse and data marts

    T d

  • 8/8/2019 BIDW Concepts

    39/56

    Top-down

    SOURCE DATA

    ExternalData

    Operational Data

    Staging Area

    Data Warehouse Data Marts

    Physical Data Warehouse:Data Warehouse --> Data Marts

    B tt h

  • 8/8/2019 BIDW Concepts

    40/56

    Bottom-up approach

    SOURCE DATA

    ExternalData

    Operational Data

    Staging Area

    Data Warehouse

    Data Marts

    Physical Data Warehouse:Data Marts --> Data Warehouse

  • 8/8/2019 BIDW Concepts

    41/56

    Hybrid

    SOURCE DATA

    External

    Data

    Operational Data

    Staging Area

    Data Warehouse

    Data Marts

    Physical Data Warehouse:Parallel Data Warehouse & Data Marts

  • 8/8/2019 BIDW Concepts

    42/56

    42

    Schema Design

    Database organizationmust look like business

    must be recognizable by business user

    approachable by business userMust be simple

    Schema Types

    Star SchemaFact Constellation Schema

    Snowflake schema

    C l M d l f

  • 8/8/2019 BIDW Concepts

    43/56

    Conceptual Modeling ofData Warehouses

    Modeling data warehouses: dimensions &

    measures

    Star schema: A fact table in the middle connected to a

    set of dimension tablesSnowflake schema: A refinement of star schema where

    some dimensional hierarchy is normalized into a set of

    smaller dimension tables, forming a shape similar to

    snowflakeFact constellations: Multiple fact tables share dimension

    tables, viewed as a collection of stars, therefore called

    galaxy schema or fact constellation

  • 8/8/2019 BIDW Concepts

    44/56

    44

    Dimension Tables

    Dimension tablesDefine business in terms already

    familiar to users

    Wide rows with lots of descriptive textSmall tables (about a million rows)

    Joined to fact table by a foreign key

    heavily indexed

    typical dimensionstime periods, geographic region (markets,

    cities), products, customers, salesperson,etc.

  • 8/8/2019 BIDW Concepts

    45/56

    45

    Fact Table

    Central table

    mostly raw numeric items

    narrow rows, a few columns at most

    large number of rows (millions to abillion)

    Access via dimensions

  • 8/8/2019 BIDW Concepts

    46/56

    Example of Star Schema

    time_key

    day

    day_of_the_week

    month

    quarter

    year

    time

    location_key

    street

    city

    province_or_street

    country

    location

    Sales Fact Table

    time_key

    item_key

    branch_key

    location_key

    units_solddollars_sold

    avg_sales

    Measures

    item_key

    item_name

    brand

    type

    supplier_type

    item

    branch_key

    branch_namebranch_type

    branch

  • 8/8/2019 BIDW Concepts

    47/56

    Example of Snowflake Schema

    time_key

    day

    day_of_the_week

    month

    quarter

    year

    time

    location_key

    street

    city_key

    location

    Sales Fact Table

    time_key

    item_key

    branch_key

    location_key

    units_sold

    dollars_sold

    avg_sales

    Measures

    item_key

    item_name

    brand

    type

    supplier_key

    item

    branch_key

    branch_namebranch_type

    branch

    supplier_key

    supplier_type

    supplier

    city_key

    city

    province_or_stree

    country

    city

  • 8/8/2019 BIDW Concepts

    48/56

    Example of Fact Constellation

    time_key

    day

    day_of_the_week

    month

    quarter

    year

    time

    location_key

    streetcity

    province_or_street

    country

    location

    Sales Fact Table

    time_key

    item_key

    branch_key

    location_key

    units_sold

    dollars_sold

    avg_sales

    Measures

    item_key

    item_name

    brand

    type

    supplier_type

    item

    branch_key

    branch_name

    branch_type

    branch

    Shipping Fact Table

    time_key

    item_key

    shipper_key

    from_location

    to_location

    dollars_cost

    units_shipped

    shipper_key

    shipper_name

    location_keyshipper_type

    shipper

  • 8/8/2019 BIDW Concepts

    49/56

    Dimensional model

    Visualise a dimensional model as a CUBE (hypercubebecause dimensions can be more than 3 in number)

    Operations for OLAP

    Drill Down :Higher level of detail

    Roll Up: summarized level of data

    (The navigation path is determined by hierarchies withindimensions.)

    Slice: cuts through the cube.Users can focus on specificperspectives

    Dice: rotates the cube to another perspective (change the

    dimension)

    D ill d R ll

  • 8/8/2019 BIDW Concepts

    50/56

    Drill down . Roll up

    Slice and Dice

  • 8/8/2019 BIDW Concepts

    51/56

    Slice and Dice

  • 8/8/2019 BIDW Concepts

    52/56

    Metadata Repository

    Administrative metadata

    source databases and their contents

    gateway descriptions

    warehouse schema, view & derived data definitions

    dimensions, hierarchies

    pre-defined queries and reports

    data mart locations and contents

    data partitions

    data extraction, cleansing, transformation rules,defaults

    data refresh and purging rules

    user profiles, user groups

    security: user authorization, access control

  • 8/8/2019 BIDW Concepts

    53/56

    Metdata Repository .. 2

    Business data

    business terms and definitions

    ownership of data

    charging policies

    operational metadata

    data lineage: history of migrated data and

    sequence of transformations appliedcurrency of data: active, archived, purged

    monitoring information: warehouse usagestatistics, error reports, audit trails.

    The BI/DW Lifecycle

  • 8/8/2019 BIDW Concepts

    54/56

    The BI/DW Lifecycle

    Source:

    http://www.atre.com/navigator/#3

    The BI/DW Lifecycle

    http://www.atre.com/navigator/http://www.atre.com/navigator/
  • 8/8/2019 BIDW Concepts

    55/56

    The BI/DW Lifecycle

    Source: http://www.atre.com

    Popular BI/DW Suites & Tools

    http://www.atre.com/http://www.atre.com/
  • 8/8/2019 BIDW Concepts

    56/56

    Popular BI/DW Suites & Tools Oracle

    LDMs & Database Oracle Warehouse Builder Oracle Discoverer & Oracle Reporting BI Beans & JOLAP API

    Microsoft Database SQL Server Analysis Services SQL Server Reporting Services SQL Server Integration Services

    Teradata

    Redbrick

    Hyperion Essbase

    Oracle Express Server

    Informatica

    Ab initio

    Any Database SQL Language or any other

    Programming Language

    Cognos BI Suite

    BusinessObjects & Crystal

    Microstrategy

    Actuate

    Hyperion/Brio (Acquired byHyperion)

    SAP BW

    Peoplesoft EPM

    Embarcadero Suite

    Erwin

    Cognos PerformanceApps

    Planning &Budgeting

    Full Suites

    Reporting

    Tools

    ETL Tools

    Databases Specialized

    Tools

    IBM Logical Data Model & IBM DB2 Database

    DB2 Cube Views

    ETL Ascential DataStage

    DB2 Alphabox

    SAS 9the BI Platform

    Logical Data Model & SAS Database SAS ETL

    BI and Reporting

    SAS Data Mining