data warehouse

33
Chapter 2: Chapter 2: Data Warehousing Data Warehousing Business Intelligence: Business Intelligence: A Managerial Approach A Managerial Approach (2 (2 nd nd Edition) Edition)

Upload: khaled-k-badr

Post on 28-Oct-2015

101 views

Category:

Documents


0 download

DESCRIPTION

Data Warehouse intro

TRANSCRIPT

Chapter 2:Chapter 2:

Data WarehousingData Warehousing

Business Intelligence: Business Intelligence: A Managerial Approach A Managerial Approach

(2(2ndnd Edition) Edition)

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-2

Learning ObjectivesLearning Objectives Understand the basic definitions and

concepts of data warehouses Learn different types of data warehousing

architectures; their comparative advantages and disadvantages

Describe the processes used in developing and managing data warehouses

Explain data warehousing operations Explain the role of data warehouses in

decision support

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-3

Learning ObjectivesLearning Objectives

Explain data integration and the extraction, transformation, and load (ETL) processes

Describe real-time (a.k.a. right-time and/or active) data warehousing

Understand data warehouse administration and security issues

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-4

Opening Vignette…Opening Vignette…

“DirecTV Thrives with Active Data Warehousing”

Company backgroundProblem descriptionProposed solutionResultsAnswer & discuss the case questions.

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-5

Main Data Warehousing TopicsMain Data Warehousing Topics

DW definition Characteristics of DW Data Marts ODS, EDW, Metadata DW Framework DW Architecture & ETL Process DW Development DW Issues

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-6

What is a Data Warehouse?What is a Data Warehouse?

A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-7

Characteristics of DWCharacteristics of DW Subject oriented Integrated Time-variant (time series) Nonvolatile Summarized Not normalized Metadata Web based, relational/multi-dimensional Client/server Real-time and/or right-time (active)

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-8

Data MartData Mart

A departmental data warehouse that stores only relevant data

Dependent data mart A subset that is created directly from a data warehouse

Independent data martA small data warehouse designed for a strategic business unit or a department

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-9

Data Warehousing DefinitionsData Warehousing Definitions Operational data stores (ODS)

A type of database often used as an interim area for a data warehouse

Oper marts An operational data mart

Enterprise data warehouse (EDW)A data warehouse for the enterprise

Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-10

DW FrameworkDW Framework

DataSources

ERP

Legacy

POS

OtherOLTP/wEB

External data

Select

Transform

Extract

Integrate

Load

ETL Process

EnterpriseData warehouse

Metadata

Replication

A P

I

/ M

iddl

ewar

e Data/text mining

Custom builtapplications

OLAP,Dashboard,Web

RoutineBusinessReporting

Applications(Visualization)

Data mart(Engineering)

Data mart(Marketing)

Data mart(Finance)

Data mart(...)

Access

No data marts option

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-11

DW ArchitectureDW Architecture

Three-tier architecture1. Data acquisition software (back-end)2. The data warehouse that contains the data

& software3. Client (front-end) software that allows

users to access and analyze data from the warehouse

Two-tier architectureFirst 2 tiers in three-tier architecture is

combined into one

Sometimes there is only one tier

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-12

DW ArchitecturesDW Architectures

Tier 2:Application server

Tier 1:Client workstation

Tier 3:Database server

Tier 1:Client workstation

Tier 2:Application & database server

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-13

A Web-based DW ArchitectureA Web-based DW Architecture

WebServer

Client(Web browser)

ApplicationServer

Datawarehouse

Web pages

Internet/Intranet/Extranet

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-14

Data Warehousing Architectures Data Warehousing Architectures Issues to consider when deciding

which architecture to use: Which database management system

(DBMS) should be used? Will parallel processing and/or

partitioning be used? Will data migration tools be used to load

the data warehouse? What tools will be used to support data

retrieval and analysis?

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-15

Alternative DW ArchitecturesAlternative DW Architectures

SourceSystems

Staging Area

Independent data marts(atomic/summarized data)

End user access and applications

ETL

(a) Independent Data Marts Architecture

SourceSystems

Staging Area

End user access and applications

ETL

Dimensionalized data marts linked by conformed dimentions

(atomic/summarized data)

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

SourceSystems

Staging Area

End user access and applications

ETL

Normalized relational warehouse (atomic data)

Dependent data marts(summarized/some atomic data)

(c) Hub and Spoke Architecture (Corporate Information Factory)

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-16

Alternative DW ArchitecturesAlternative DW Architectures

SourceSystems

Staging Area

Normalized relational warehouse (atomic/some

summarized data)

End user access and applications

ETL

(d) Centralized Data Warehouse Architecture

End user access and applications

Logical/physical integration of common data elements

Existing data warehousesData marts and legacy systmes

Data mapping / metadata

(e) Federated Architecture

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-17

Alternative DW ArchitecturesAlternative DW Architectures

1. Independent Data Marts2. Data Mart Bus Architecture3. Hub-and-Spoke Architecture4. Centralized Data Warehouse5. Federated Data Warehouse

Each has pros and cons!

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-18

Teradata Corp. DW ArchitectureTeradata Corp. DW Architecture

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-19

Data Warehousing Architectures Data Warehousing Architectures

1. Information interdependence between organizational units

2. Upper management’s information needs

3. Urgency of need for a data warehouse

4. Nature of end-user tasks5. Constraints on resources

6. Strategic view of the data warehouse prior to implementation

7. Compatibility with existing systems

8. Perceived ability of the in-house IT staff

9. Technical issues10.Social/political factors

Ten factors that potentially affect Ten factors that potentially affect the architecture selection decision:the architecture selection decision:

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-20

Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess Data integration

Integration that comprises three major processes: data access, data federation, and change capture

Enterprise application integration (EAI)A technology that provides a vehicle for pushing data from source systems into a data warehouse

Enterprise information integration (EII) An evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-21

Extraction, transformation, and load (ETL)

Data Integration and the Extraction, Data Integration and the Extraction, Transformation, and Load (ETL) Transformation, and Load (ETL) ProcessProcess

Packaged application

Legacy system

Other internal applications

Transient data source

Extract Transform Cleanse Load

Datawarehouse

Data mart

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-22

ETL ETL Issues affecting the purchase of ETL tool

Data transformation tools are expensive Data transformation tools may have a long

learning curve Important criteria in selecting an ETL tool

Ability to read from and write to an unlimited number of data sources/architectures

Automatic capturing and delivery of metadata A history of conforming to open standards An easy-to-use interface for the developer and

the functional user

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-23

Data Warehouse DevelopmentData Warehouse Development Data warehouse development approaches

Inmon Model: EDW approach (top-down) Kimball Model: Data mart approach (bottom-

up) Which model is best?

There is no one-size-fits-all strategy to DW

One alternative is the hosted warehouse Data warehouse structure:

The Star Schema vs. Relational Real-time data warehousing?

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-24

Hosted Data WarehousesHosted Data Warehouses Benefits:

Requires minimal investment in infrastructure Frees up capacity on in-house systems Frees up cash flow Makes powerful solutions affordable Enables powerful solutions that provide for

growth Offers better quality equipment and software Provides faster connections Enables users to access data remotely Allows a company to focus on core business Meets storage needs for large volumes of data

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-25

Representation of Data in DWRepresentation of Data in DW Dimensional Modeling – a retrieval-based system

that supports high-volume query access Star schema – the most commonly used and the

simplest style of dimensional modeling Contain a fact table surrounded by and connected to

several dimension tables Fact table contains the descriptive attributes (numerical

values) needed to perform decision analysis and query reporting

Dimension tables contain classification and aggregation information about the values in the fact table

Snowflakes schema – an extension of star schema where the diagram resembles a snowflake in shape

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-26

MultidimensionalityMultidimensionality Multidimensionality

The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)

Multidimensional presentation Dimensions: products, salespeople, market

segments, business units, geographical locations, distribution channels, country, or industry

Measures: money, sales volume, head count, inventory profit, actual versus forecast

Time: daily, weekly, monthly, quarterly, or yearly

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-27

Star vs Snowflake SchemaStar vs Snowflake Schema

Fact TableSALES

UnitsSold

...

DimensionTIME

Quarter

...

DimensionPEOPLE

Division

...

DimensionPRODUCT

Brand

...

DimensionGOGRAPHY

Coutry

...

Fact TableSALES

UnitsSold

...

DimensionDATE

Date

...

DimensionPEOPLE

Division

...

DimensionPRODUCT

LineItem

...

DimensionSTORE

LocID

...

DimensionBRAND

Brand

...

DimensionCATEGORY

Category

...

DimensionLOCATION

State

...

DimensionMONTH

M_Name

...

DimensionQUARTER

Q_Name

...

Star Schema Snowflake Schema

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-28

Analysis of Data in DWAnalysis of Data in DW Online analytical processing (OLAP)

Data driven activities performed by end users to query the online system and to conduct analyses

Data cubes, drill-down / rollup, slice & dice, …

OLAP Activities Generating queries (query tools) Requesting ad hoc reports Conducting statistical and other analyses Developing multimedia-based applications

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-29

Analysis of Data Stored in DWAnalysis of Data Stored in DWOLTP vs. OLAPOLTP vs. OLAP OLTP (online transaction processing)

A system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, POS,

The main focus is on efficiency of routine tasks OLAP (online analytic processing)

A system is designed to address the need of information extraction by providing effectively and efficiently ad hoc analysis of organizational data

The main focus is on effectiveness

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-30

OLAP vs. OLTPOLAP vs. OLTP

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-31

OLAP OperationsOLAP Operations Slice – a subset of a multidimensional array Dice – a slice on more than two dimensions Drill Down/Up – navigating among levels of

data ranging from the most summarized (up) to the most detailed (down)

Roll Up – computing all of the data relationships for one or more dimensions

Pivot – used to change the dimensional orientation of a report or an ad hoc query-page display

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-32

OLAPOLAP

Product

Time

Geo

grap

hy

Sales volumes of a specific Product on variable Time and Region

Sales volumes of a specific Region on variable Time and Products

Sales volumes of a specific Time on variable Region and Products

Cells are filled with numbers representing

sales volumes

A 3-dimensional OLAP cube with slicing operations

Slicing Slicing Operations on Operations on a Simple Tree-a Simple Tree-DimensionalDimensionalData CubeData Cube

Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall2-33

Variations of OLAP Variations of OLAP

Multidimensional OLAP (MOLAP)OLAP implemented via a specialized multidimensional database (or data store) that summarizes transactions into multidimensional views ahead of time

Relational OLAP (ROLAP)The implementation of an OLAP database on top of an existing relational database

Database OLAP and Web OLAP (DOLAP and WOLAP); Desktop OLAP,…