data warehouse introduction - data & social methodologymike2.openmethodology.org › w ›...

61
© 2005 BearingPoint, Inc. DW Introduction Data Warehouse Introduction Kenneth Domantay - Senior Manager Data and Knowledge Management

Upload: others

Post on 06-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

DW Introduction

Data Warehouse Introduction

Kenneth Domantay - Senior Manager

Data and Knowledge Management

Page 2: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Table of Contents

DW Introduction

DW Architectures

DW Implementation Considerations

Page 3: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

“The Data Warehouse”

- Origin to Architecture -

Page 4: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Definition:

Data Warehousing is a Data Warehousing is a processprocess not a productnot a productIt is a approach to “properly” assemble, validate

consolidate and manage data from various sources. Allows business questions to be

answered which were not previously possible.

It is evolves through ‘Iterations’not a one time process

Page 5: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Characteristic of a Data Warehouse

"A Data Warehouse :

• Is Subject vs. Application oriented

• Contains Integrated Data

• Is Nonvolatile - limited if any Updates / Deletes

• Contains Detail and Summary Data

• Contains Current / Historical Data

• Is Time variant - Range of Time periods

Page 6: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Contrasting Environments

DataWarehousingOLTP

Transactional

Simple

Point-in-Time

Known

Static

Structured

Business Need

Query

Timeframe

Business Question

Environment

Usage

Analytical

Complex

Historical

Unknown

Dynamic

Unstructured

Page 7: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Cost Justification examples of DW

Business Drivers Technology Drivers

Improved Management information• spreads & margins, asset utilization, • asset quality, overhead control

Improved customer quality• profit and risk

Improved marketing effectiveness• focussed, efficient, • proactive, planned

Etc….

Enables offload volumes from mainframes• reduced costs, improve

responsiveness

Reduced maintenance effort and costs• reduced development costs

Etc….

Page 8: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Warehouse Architectures

- Enterprise- Data Mart- ODS- Active DW

Page 9: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

Enterprise - DATA WAREHOUSE ARCHITECTURE

Internal Data

External Data

MOM

Staging Area

or

ODS

Data Warehouse

SOURCE DATA LAYERDATA ACQUISITION LAYER

DATA MANAGEMENT LAYER

USER ACCESS LAYER

Source System Analyst Data Acquisition DeveloperBusiness Analyst, Data Modeler, DBA, OLAP Developer

Reports, OLAP, Data Mining, Knowledge Discovery etc.

Business Users

Data Entry ASCII, Excel etc.

DatamartDatamart

Page 10: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Marts

Page 11: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

DATA WAREHOUSE ARCHITECTURE

Internal Data

External Data

MOM

Staging Area

or

ODS

Data Warehouse

SOURCE DATA LAYERDATA ACQUISITION LAYER

DATA MANAGEMENT LAYER

USER DATA ACCESS LAYER

Source System Analyst Data Acquisition DeveloperBusiness Analyst, Data Modeler, DBA, OLAP Developer

Reports, OLAP, Data Mining, Knowledge Discovery etc.

Business Users

Data Entry ASCII, Excel etc.

DatamartDatamart

Page 12: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Marts - Characteristics

Departmental use

Normally Point Solution based (I.e. Profitability, etc.)

Normally not Enterprise level (I.e. few Subject areas)

Focus on one to few problem(s)

Easiest to build

Often confused with Multi Dimensional Database (MDDB)

Page 13: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Departmental Data Marts

Advantages• Intuitive Data Navigation Functionality• Faster Query Performance

Issues• No Single Version of the Truth • Lacks Enterprise Model • Limited Cross-Application Analysis • Little or No Data Transformation• Limited Data Set

Page 14: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Independent versus Dependent Data Mart

An independent data mart is an isolated copy of existing data from operational and/or external systems, specially organized to serve a specific purpose.It typically services a department or specific group of users.

No DW involved

A dependent data mart is an integral subset of a data warehouse, organized by subject area, to enhance access and performance. It sources data from the Enterprise Data Warehouse. Therefore, the data remains consistent with the data warehouse to which it is connected;

DW involved

Hub & Spoke

Page 15: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

A success story becomes a failure

3+ years later = Re-architect / Redo

Clients may spend millions of dollars to recreate current/past problem

Easy

Harder

4+Subject areas = Problems

* Data Mart Consolidation Sales Opportunity

Page 16: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Why Data Marts Fail - General

You need to know the questions & answers. If new or changed requirements = hard to change or extendLittle to no Data Transformation• Dirty Data is the biggest challenge • How do you know how dirty your data is?• Issues left for later and for someone else (9 to 12 mths later)

Usually rely on tools to hide the problems, until it’s too lateArchitecture does not support long term corporate goals

(non-enterprise solution)Higher Total Cost of ownership to maintain, change and fix

Page 17: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Operational Data Store (ODS)

Page 18: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

DATA WAREHOUSE ARCHITECTURE

Internal Data

External Data

MOM

Staging Area

or

ODS

Data Warehouse

SOURCE DATA LAYERDATA ACQUISITION LAYER

DATA MANAGEMENT LAYER

USER DATA ACCESS LAYER

Source System Analyst Data Acquisition DeveloperBusiness Analyst, Data Modeler, DBA, OLAP Developer

Reports, OLAP, Data Mining, Knowledge Discovery etc.

Business Users

Data Entry ASCII, Excel etc.

DatamartDatamart

Page 19: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

ODS Vs. DW - High level

Data WarehouseOperational data store

- subject oriented (normally)- Integrated- May contain Operational data- volatile (update is normal)- limited history data- Only detailed data

- subject oriented - Integrated- Should not contain Operational data- nonvolatile(update is not normal)- historical data (1+ years)- detailed and summary data

At first glance it may appear that the data warehouse andthe ODS are the same thing. They are decidedly not thesame thing

Page 20: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Active or Near-Real time

Data Warehouse

(Next Generation)

Page 21: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

• Active

What is an Active Data Warehouse?

Data Warehousing

– Also for in-the-field / Tactical” decision makers– Day-to-day decision making– Tactical focus with strategic implications– Real to Near real time ETL and access (Fact/fiction = cutting corners)

• Traditional Data Warehousing

– More for Strategic” decision makers– Long-term decision making– Strategic focus

Business needs both strategic andTactical decision support capabilities.

Page 22: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Active Vs. Traditional DW

Strategic decisions focus

Highly parameterized reporting, often using pre-built summary tables or data marts

Limited feedback loop or event based usage

Results sometimes hard to measure

Daily, Weekly, Mthly Data currency is acceptable, summaries often appropriate

Power users, knowledge workers, internal users

Also drives tactical decisions

Complex data mining to discover new hypotheses vs. confirming prior ones

High feedback loop and event based activity

Results measured with operations

Within Minutes; only comprehensive detailed data is acceptable

Operational staffs, call centers, external users

Traditional DW (Static) Active DW (Dynamic)

Page 23: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Active Data Warehouse Process/Flow

FinanceFinance

EISEIS

Web SiteWeb Site

Active Active Data WarehouseData Warehouse

Automated Data AnalysisPre-defined queriesAdd-hoc Analysisdata Mining

Actions triggered automatically from

event definitions and pre-defined rules

TacticalDecisions

Continuous data feeds (Event triggers, etc)

LoadLoadExtraction &Extraction &TransformationTransformation

Action Triggered Automated

OperationalOperationalSystemsSystems

Page 24: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

DW Implementation Considerations

Implementation Methodology – MIKE

1 - Discovery – Req. Identification / Gathering2 – Data Modeling3 – Data Base Design4 – ETL5 – Information Reporting / Access / OLAP 6 – Data Mining7 – Data Quality8 – Metadata9 – Infrastructure

Page 25: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

MIKE Methodology Overview

Page 26: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Requirements Gathering / Identification

1 – Discovery

Page 27: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Business Discovery – High level

Business Discovery is a process through which organizations determine and validate :

• Key business objectives

• Key issues which influence the achievement of the business

objectives

• Applications

Business need for the application

Potential for Payback

• Implementation List of Priorities – Which CSF’s, KPI’s Reports to

start with and end with

Page 28: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Information Discovery – Mid level

Information Discovery is a process which organizations clarify scope and gain direction through:

• Review of Business Requirements Results / Priorities• Identify information requirement’s for Pain Points• Access and Validate Data Requirements, issues, gaps• Design high level business model• Discuss Project Constraints and issues• Determine which pain points to pursue for initial project

Functional Specifications can be driven from the results

Page 29: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

2 - Data Modeling

Page 30: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

What is a Logical Data Model

• A picture of the organization's data and the relationship (boxes, attributes & lines)

• A process to break down the complexity of an organization's data into manageable portions

• A tool used by “Modelers” to collect, discuss and validate data and relationships with “Business Users”

• One of the “first steps” in the creation of anything related to a Data Warehouse

• The blueprint for the construction of a database (Physical Data Model = PDM)

Customer

Loan Facility

Collateral

Loan Drawdown

Credit Check

Application

Transaction

Accounts

Loan Delinquent

History

Page 31: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Why create a LDM

Portable method to identify, validate and Integrate requirements

Helps resolve problems when integrating data from multiple systems :

Lack of Common Corporate Definitions/Standards (what does it mean ?)Data redundancy between different systems - One fact in One place(Causes Inaccurate and inconsistent reporting)Identify if data is suppose to exist - Relationship validation process (Nulls)Referential Integrity (Optional vs. Mandatory & 0:1:M)Inadequate or nonexistent meta dataSingle View of Customer and related information Full vs. partial view of requirements (Current / Future Planning)………

Bonus = Decrease development and maintenance time and cost (cheap to add/change - Upstream vs. Downstream in SDLC)

Page 32: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Logical Data Modeling - LDM

Application CIF Extension CIF

Facility Account

Application Facility Account

Customer Account Relationship

Application Transaction

Application Monthly Summary

Application Daily Summary

Sales $

Revenue

Volume

Time

Geography

Product

Currency

DIMENSIONS DIMENSIONS

FACTS

3 NF- General (Common) VS Star Schema - Customized (Complex)

Page 33: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

3 - Data Base Design

Page 34: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Transpose LDM to PDM

•Translate PDM design created from the logical data model

•Identifying and designing Database Tables, columns and indexes

•Identifying and designing Views (Pre canned SQL statements)

Pursue Capacity and “Performance” assessment (later)

Optimize PDM design for specific / actual reasons (later)

Page 35: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

4 - Extraction Transformation Load

(ETL)

Page 36: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

DW ETL Flow

Extract TransformOperational Systems Informational Processing

Transforming Legacy Data to Data Warehouse Data:

ExtractIntegrateSummarizeFilterConvert

Set default values RestructureReformatEstablish time varianceCreate consistency

Data Warehouse• Subject-oriented• Historical

Load

60-80% of time and effort spent during ETL

Page 37: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Transformation challenges

Data Integrity• No time basis for data• Inconsistent extract criteria• Missing / inconsistent data• No common source of data• Uncontrolled use of external data

Reduced Productivity• Extended time to do analysis• Customized extract programs• Tedious activities for IT staff

Page 38: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

5a - Information Reporting

Page 39: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Decision Making Evolution Process

WHAT happened? WHY did it happen? WHAT WILL happen?

STAGE 1 STAGE 2 STAGE 3

“Show Nokia Hand Phone models 20% or more below plan”

“Show Nokia Hand Phone models 20% or more below planhaving zero inventory”

Ad Hoc Queries Provide the Value - Stage 1Evolving Queries Are More Complex - Stage 2Data mining, Forecasting is Evolutionary - Stage 3

Pre-defined ReportsAd Hoc QueriesAnalytical Modeling

Page 40: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Categorizations - Information Reporting-

EISStrategic

DSS - OLAP

Application/Operational Reporting

Tactical

Operational

Organization

Page 41: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

EIS Perspective

• Performance against strategic objectives is measured and reported :

– By how executive works

– Involves pre-defined requirements and formats

– “Minimal” hands on computer activity is required

– Graphical performance indicators

– Performance trend information

– Measurement data and Bar charts

Involves pre-definedrequirements and formats

– “Minimal” hands on computer activity is required

• Performance against strategic objectives is measured and reported :

– By how executive works

– Graphical performance indicators

– Performance trend information

– Measurement data and Bar charts

Page 42: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

DSS - Perspective

• Performance against strategic objectives

- Minimal pre-defined format

- Report writing capabilities

- Analytic Features- Add-Hoc/Query

functionality- Cross-Dimensional

calculations- Filtering

– Hands On Computer Activity Required !

• Performance against strategic objectives

- Minimal pre-defined format

- Report writing capabilities

- Analytic Features- Add-Hoc/Query

functionality- Cross-Dimensional

calculations- Filtering

– Hands On Computer Activity Required !

Page 43: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

5b - Information Access

Page 44: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Information Access Tools - Categories

Information Access Tools can be classified into the following types:

1 - Reporting & Query tools

Multidimensional

SpreadsheetsData Visualization

Visual BasicPowerbuilder

Development

EIS

2 - Decision Support Systems (DSS) with :

Multidimensional - MOLAP

Relational OLAP - ROLAP

Database OLAP - DOLAP

Hybrid OLAP - HOLAP

3 - Executive Information Systems (EIS)

Business Intelligence tools help organizations provide end users with improved access to data, enhancing their decision making ability.”

Page 45: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Information Access - Performance Ladder

EIS(OLAP)MDDB

DSS (OLAP)MDDB

DSS (OLAP) Relational

Report (SQL) Access Tool

ReportingData Access Activity “within Tool”

--Queries / Reporting

Static - Higher Performance

Dynamic - Low to MediumPerformance

Tool Efficiency

Page 46: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

5c – OLAP

(On-Line Analytical Processing)

Page 47: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

OLAP - General Characteristics

Allow users to view data in multiple dimensions - multi dimensional analysis Data is logically organized as multi dimensional arrays (cubes)Architected to quickly manipulate and display data in different combinations using an OLAP engine

UM

LR

R/3

R/2

BW

APO

Smith

Mill

er

C&

Y

KD

S

01.’98

02.’98

03.’98

Customer

Product Month

020406080

100

Jan Mar May Jul Sep Nov

Series1

020406080

100

Jan

MarMay Ju

lSep Nov

Series1

0

50

100

150

200

Jan Mar May Jul Sep Nov

Series1

Page 48: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

How does OLAP fit ?

Data Warehouse / Data Mart

OLAP

e-intelligence

BalancedScore Card

Page 49: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

6 - Data Mining

Page 50: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Mining

Data Mining = Concept and/or TechnologyProvides insight & understanding• identify patterns, relationships, rules

Predictive analysis• Build forecasting models from generated rules

020406080

100

J F M A M J J A S O N D

When Age > 35 & married &...thenbuys gold card - 40%needs schools plan - 32%needs family a/c - 63%

.... ....... ....

Page 51: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Mining - Considerations

Data mining• Get answers for difficult to “visualize” questions• can be manual (using OLAP tools) or purpose built tools

(data mining tools)• Solutions e.g. market segmentation, decision tree

(understanding decision process)• various techniques available (statistical, neural network,

genetic algorithms etc.)Issues• availability of clean data is the key• normally considered “after” the Data Warehouse is

mature

Page 52: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

7 - Data Quality

Page 53: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Problems with Legacy Data

Data fragmented across multiple systems and platformsExtensive data redundancy between different application systemsLack of corporate data standardsInadequate or nonexistent meta dataUser perception of data quality not based on factsUser perception that a warehouse will fix data problemsMissing data from operational systemsData Integrity - Inconsistent,Incorrect, Incompatible Etc..

Page 54: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Data Integrity Problems - Examples

Same HK ID number, different name spellings• David Jones; David Johns; David G. Jones etc.

Use of old (non standard) address codes• HK, H.K., Hong Kong, SARHK, etc .

Multiple ways to denote company name• BP, BP Ltd, Bearing Point

Different account numbers generated by different applications for the same customer Invalid product codes collected at point of sale• Manual entry leads to mistakes• “In case of a problem use 999999999”

Required fields left blank• No enforcement of data collection rules

Page 55: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

8 - Metadata

Page 56: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Metadata

Metadata•data about data (134584 = Data, Customer ID definition = Metadata)

•defines data structures, definition of measures, transformation rules etc.

•used to understand how and what data is stored

Page 57: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Metadata – Many Islands

IBM Compatible

ETL Development Activity(Automated ETL tool)

M

IBM Compatible

Data Mining

IBM Compatible

DSS

IBM Compatible

IBM Compatible

EIS

IBM Compatible

Data Modeltool

Data Stores(DB2, Oracle, etc..)

MM

M

M

M

MDDB

Page 58: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

Meta Data - Views

Source Target

Data Store (DW/Data Mart/MDDB)

Data Store Columns

DSSQueries

Stewards

Locations

DSSReports

DSSAccess Tools

DSSViews

Keywords

Synonyms

ConstructionEnvironment

Tables Mapping Groups

Columns Transformations

Business ViewTechnical View

Business ItemsBusiness Items

Business Items

BusinessSubject AreasBusiness

Subject AreasBusinessSubject Areas

modelingphysical designmappingDDL, DMLtransformationtechnically enabled

ETL Automated Tools / Metadata Repository

Page 59: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

9 - INFRASTRUCTURE

(On going)

Page 60: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

System Management & Operations

Organisational issues• Roles, ops procedures, security, changes etc

Systems Management• Resources management/utilisation• Configuration & Change management• Distributed data warehouse• Operations management (archive/purge, backup etc)

Policies• SLA’s & Charge-back• Data Flowback policy• Time-lag on updates (OLTP vs DSS)• Responsibilities and roles

Page 61: Data Warehouse Introduction - Data & Social Methodologymike2.openmethodology.org › w › images › b › b6 › DW_101... · DATA WAREHOUSE ARCHITECTURE Internal Data External

© 2005 BearingPoint, Inc.

The END !

PCCW - KSD