09. data warehouse (dw) & on-line analytic processing (olap) rev: feb, 2013 euiho (david) suh,...

21
09. Data Warehouse (DW) & On-line Analytic Processing (OLAP) Rev: Feb, 2013 Euiho (David) Suh, Ph.D. POSTECH Strategic Management of Information and Technology Laborato (POSMIT: http://posmit.postech.ac.kr) Dept. of Industrial & Management Engineering POSTECH

Upload: elaine-boyd

Post on 23-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

09. Data Warehouse (DW) &On-line Analytic Processing (OLAP)

Rev: Feb, 2013

Euiho (David) Suh, Ph.D.

POSTECH Strategic Management of Information and Technology Laboratory(POSMIT: http://posmit.postech.ac.kr)

Dept. of Industrial & Management EngineeringPOSTECH

Contents1 Data Warehouse

1) Introduction of Data Warehouse

2) Concepts for Data Warehouse

3) Difficulties and Trends

2 On-line Analytic Processing (OLAP)

1) Introduction of OLAP

2) Concepts for OLAP

3 Case Study

3

■ Data Warehouse

– Stores static data that has been extracted from other databases in an organization– Central source of data that has been cleaned, transformed, and cataloged– Data is used for data mining, analytical processing, analysis, research, decision sup-

port

Definition of Data Warehouse 1. Data Warehouse1) Introduction of Data Warehouse

Integrated

Non-volatile

Time variant

A data warehouse is a collection of data in support of manage-ment’s decisions

Scattered Information Cleaned Data Warehouse Query & Distribute to End User

0

50

100

SalesHR

Cost

Finance

Bond

Customer

4

■ Data Warehouse architecture

Data Warehouse Architecture 1. Data Warehouse1) Introduction of Data Warehouse

* Building the Data Warehouse *Use of Data Warehouse

Data Warehouse

External file

OLTP System

Back up file

Enterprise server

Workgroup server

Query, Reporting tool

OLAP tool

Datamining Application

EIS/DSS Application

Web browserSlice/Dice

SQLSQL

SQL

SQL

SQL

SQL

SQL

Data MartSource Data

MDB

RDB

Infra, Data integration and Administration

Application development, Data access & Use

5

■ Technical architecture for a data warehousing system

Data Warehouse Architecture

DataAcquisitionComponent

DesignComponent

DataManager

Component

InformationDirectory

Component

DataDelivery

Component

MiddlewareComponent

Data AccessComponent

warehousedata

warehousemetadata

externaldata

externalmetadata

sourcedata

Management Component

1. Data Warehouse1) Introduction of Data Warehouse

6

■ Definition of database– Integrated collection of logically related data elements

■ Common Database Structures (Types)– Hierarchical

• Early DBMS structure• Records arranged in tree-like structure• Relationships are one-to-many

– Network• Used in some mainframe DBMS packages• Many-to-many relationships

– Relational• Most widely used structure• Data elements are stored in tables• Row represents a record; column is a field• Can relate data in one file with data in another,

if both files share a common data element

– Multidimensional• Variation of relational model• Uses multidimensional structures to

organize data• Data elements are viewed as being in cubes• Popular for analytical databases that support Online Analytical Processing (OLAP)

– Object-Oriented• Store data together with the appropriate methods for accessing it i.e. encapsulation• Information is represented in the form of objects as used in object-oriented programming

Introduction of Database 1. Data Warehouse2) Concepts for Data Warehouse

Relational Struc-ture

Object-Oriented Structure

7

■ Metadata– Data about data (similar to catalog card in library)– Define the data in the data warehouse– Enable to find the data in data warehouse, more easily and fast

■ Data Marts– Collection of database– Comparing with Data Warehouse, data marts are usually smaller and focus on a par-

ticular subject or department. – Data marts are subsets of larger Data Warehouse

■ Data Warehouse vs. Data Mart– Data in Data Warehouse• The data needs to be gathered from all the relevant transactional systems that produce it,

cleansed and validated, and made available from a system-of-record that ensures the referential integrity of the data

– Data in Data Mart• The data needs to be presented in a structure that is intuitive to the users and facilitates their

ability to query the data that is relevant to their needs

Metadata and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse

8

■ Data Warehouse built on top of DB

Information Flow 1. Data Warehouse2) Concepts for Data Warehouse

Internal / External

Database

Data Warehouse

Metadata

Repository

Internal / External

Database

Data Marts

Finance Management Reporting

Accounting

SalesMarketing

9

■ Data Warehouse Components

Data Warehouse Components 1. Data Warehouse2) Concepts for Data Warehouse

10

■ Applications and Data Marts

Applications and Data Marts 1. Data Warehouse2) Concepts for Data Warehouse

11

Difficulties in implementing DW

■ Complete Alignment– Make sure you have full involvement and buy -in from those that represent your users -

the consumers of your data warehouse.

■ Iterative & Frequent Update– Consider all aspects of the process of researching your data sources, capturing and

transmitting that data to the data warehouse, transforming and loading it into the data warehouse and accounting for its lineage.

■ Risk– Make sure you develop a proper risk management plan.

1. Data Warehouse3) Difficulties and Trends

12

Future Trends

■ Enterprise Data Warehouse– The enterprise data warehouse, whether a single store or integrated data marts across

a variety of platforms, yields a view of the operation previously unattainableby Don Hatcher, SAS

■ Real-time– Organization move to more real-time data transformation and seek to better leverage

common metadata across applications by Allan Houpt, CA

■ Capacity– The future of data warehousing is all about ever larger data warehouses - in fact I just

read about a U.S. Government effort to create petabyte repositoriesby Roman Bukary, SAP Director of Market Strategy

1. Data Warehouse3) Difficulties and Trends

13

Definition of OLAP

■ OLAP (On-Line Analytical Processing)– The dynamic enterprise analysis required to create, manipulate, animate and synthesis

information from Enterprise Data Models * Providing OLAP: An IT Mandate

E.F. Codd (1993)

– FASMI (Fast Analysis of Shared Multidimensional Information)• This definition was first used in early 1995, and has not needed revision since

Pendse & Greeth (1995)

2. OLAP1) Introduction of OLAP

FAST

ANALYSIS

SHARED

MULTIDIMENSIONAL

INFORMATION

14

OLAP Architecture

■ OLAP Architecture

2. OLAP1) Introduction of OLAP

15

From OLTP to OLAP

■ Data used in OLAP– Sales data of June? (OLTP)– Multi-dimensional data (having many features) (OLAP)

■ Direct Access: EUC Environment

■ From What to Why– OLTP: Storing primitive data, supporting routine business operation (What) – OLAP: Storing cumulative data, supporting business goal (Why)

2. OLAP2) Concepts for OLAP

Information Source

Information Broker

Information Consumer

16

OLTP vs. OLAP

■ OLTP vs. OLAP

2. OLAP2) Concepts for OLAP

OLTP OLAP

Definition On-Line Transaction Processing On-Line Analytical Processing

Objective Operational Analytical

Focus Daily repetitious work Decision support in organization

Developer Computer expert End-user

User Simple operator Special analyst

Storing Current valueSummarized and Consolidated

dataUse Repetitive Unstructured

Response Immediate Delayed

Data Updated Summarized

Update Field Recomputation

Amount of Data Small Much

Data Structure Complex Simple

Database RDB MDB

Data period Past, Current Past, Current, Future

Query type Regular Irregular, Analytical

17

Enterprise IT Architecture

■ OLTP/OLAP Enterprise IT Architecture

2. OLAP2) Concepts for OLAP

18

Data Warehouse vs. OLAP Server

■ Data Warehouse vs. OLAP Server

2. OLAP2) Concepts for OLAP

Data Warehouse OLAP Server

Objective Ready to all kinds of retrieval Specialized retrieval

Characteristics Data Storage Computation Engine

Query Type Read only Read/Write

Response Flexible Consistent, rapid

Content Historical, present Historical, present, Future

Data Structure Plain Multi-dimensional

Amount of Data Huge, much detail Much, detail

Development pe-riod

A few month, yrs A few weeks, months

19

Two types of OLAP

■ MOLAP

■ ROLAP

2. OLAP2) Concepts for OLAP

Clients

Clients

Clients

MDBMS

RDBMS MD Processing

Query

SQL

SQL Respond

MD Processing

Query

Respond

20

From RDB to MDB

■ Basic Data Structure of MDB & RDB

– RDB: OLTP, Data Warehouse

■ RDB as OLAP Server– Cannot handle and represent Multi-dimensional relationship well– Cannot summarize data well

■ MDB as OLAP Server– Gives many managerial viewpoints– EUC– Supports analysis functionality

Table

Field, Row

Record,Column

Cube

Dimension

Hierarchy

– MDB: OLAP

2. OLAP2) Concepts for OLAP

21

Reference

■ Euiho Suh, “EIS_DSS_OLAP_DW (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)

■ Euiho Suh, “OLAP (PPT Slide)”, POSMIT Lab. (POSTECH Strategic Management of Information and Technology Laboratory)

■ O’Brien & Marakas, “Introduction to Information Systems – Sixteenth Edition”, McGraw – Hill, Chapter 5