data-ware housing

31
Data-ware Housing

Upload: profnilesh-magar

Post on 29-Nov-2014

1.576 views

Category:

Education


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data-ware Housing

Data-ware Housing

Page 2: Data-ware Housing

Introduction

Page 3: Data-ware Housing

Definition : Simplex perception- No more than collection of Key

pieces of information used to manage & direct the business for the most profitable outcome.

Precise Definition- It concentrate on data- Data should be subject oriented, be consistent across sources & so on.

Pearson’s Definition- It is more than vast data- it is also process involved in getting that data from source to table & from table to analyst’s.

** In other word **

“A DWH is the data (Meta/fact/dimension/aggregate) and

process manager (load/warehouse/query) that make information

available, enabling people to make informed decision.

Page 4: Data-ware Housing

Data-ware housing Architecture :

DWH must architected to support three major driving

factors.

1) Populating DWH.

2) Day-to-Day management of DWH.

3) The ability to cope with requirement evolution.

Page 5: Data-ware Housing

Typical Process flow within D.W.H

Source

Extract & load

Warehouse

Data transformation and movement

User

Query

Archive data

Page 6: Data-ware Housing

Processes :

1. Extract & load the data

2. Clean & transform data in to a form that can cope with

large data volume & provide good query performance.

3. Back up & Archive data

4. Manage queries & direct them to appropriate data

Sources.

Page 7: Data-ware Housing

Extract & load process:

Op. Data

Suitable for operational System,May have been modified & extended over yr’s to support performance.

D.W.HReconstructed

Page 8: Data-ware Housing

1) Extract & load process:

a. Controlling the processes: determine when to start

extracting the data, run transformation, consistency

check & so on. Eg: Retail sales analysis

b. When to initiate the extract: Data should be in a

consistent state. Same instances of time. Eg. Telecom

c. Loading the data: Temporary Data store. Clean up

& Consistency check. X Eg. Current subscriber &

Current Event DB.

d. Copy Management tools & data clean-up.:

coding

Page 9: Data-ware Housing

2) Clean & transformation

a. Clean & transform the data in to a structure that speed up queries

b. Partition data in order to speed up queries, optimize h/w performance& simplify the management of DWH

Page 10: Data-ware Housing

Clean & transformation

a. Clean & transform the data in to a structure that speed up queries

• Make sure data is consistent within itself. Eg: row

• Make sure data is consistent with other data

With in the same source.

• Make sure data is consistent with data in the

other source system

• Make sure data is consistent with the information already in

the warehouse.

Page 11: Data-ware Housing

3) Back-up & archive process :

Back-up regularly- recover from loss/failure

In Archiving older data is removed from system

Page 12: Data-ware Housing

4) Query management process :

Directing query to most effective data source.

Page 13: Data-ware Housing

Process Architecture

Page 14: Data-ware Housing

Process Function System

manager

Extract & load Extract & load the data,

performing simple

transformations before & during

load.

Load Manager

Clean & transform

Data

Transforms & Manages data Warehouse

manager

Backup & archive Backs up & archives data

warehouse

Ware house

manager

Query Manager Directs & manages queries Query Manager

Page 15: Data-ware Housing

Operational Data

Operational Data

LOAD

MANAGER

Detailed informatio

n

Summary info

Meta Data

QUERY

MANAGER

Warehouse Manager

Data dipper

OLAP tools

Data Information Decision

Architecture of data-ware house

Page 16: Data-ware Housing

Load Manager

System Component that perform all the operations necessary to support the

extract and load process.

Off-the-Shelf tools, bespoke coding, C programs & Shell script.

Size & Complexity will vary between specific solutions from d.h.w to d.h.w.,

larger the degree of overlap between source systems, the larger the load

manager will be.

Third-Party tools max-20 to 25 % of the total system fun.

Page 17: Data-ware Housing

Load Manager Architecture

1) Extract the data from source systems.

2) Fast load the extracted data into a temporary data store.

3) Perform Simple transformations into a structure similar to the one in the data

ware house.

Each of these function has to be operate automatically & recover from any

error it encounters, to very large extent with no human intervention.

Page 18: Data-ware Housing

Extract data from source system

In order get hold of the source data it has to be transfer from Source

systems, and made available to D.W.H..

ASCII files are FTP across the LAN.

Current gateways tech. operates too slowly to compete to FTP.

Page 19: Data-ware Housing

Fast Load

Data should be loaded into warehouse in the fastest possible time, in

order to minimize the total load window.

This becomes critical as the no. of data sources increases and time

window shrinks.

In practice it is more effective to load the data in to a relational D.B. prior

to applying transformation & checks.(ASCII)

Page 20: Data-ware Housing

Simple Transformation

Before or during the load there will be an opportunity to perform simple

Transformations on the data.

Here we perform those transformation that does not require complex

Logic, or use of relational set operators.

Eg: retail management system.:

1)Strip out all the column that are not required in DWH.

2)Convert all the values to the required data types;

Page 21: Data-ware Housing

Load Manager Architecture

File structur

e

Temporary data Store

Ware house

str.

Load Manager

Controlling Process

Stored Procedure

Copy management

tools

Fast loader

Page 22: Data-ware Housing

Ware-house Manager

System Component that perform all the operations necessary to support the

Ware house management process.

Third party system management tools, bespoke coding, C programs &

Shell script.

As the Load manager size & Complexity of ware-house manager will vary

between specific solution. Unlike L.M. the complexity of WH manager is

driven by extend to which the operational management of the DHW has been

automated.

Third-Party tools max-40 % of the total system fun.

Page 23: Data-ware Housing

Ware-house Manager Architecture

1) Analyze the data to perform consistency & referential integrity check

2) Transform & merge the source data in to a temporary data source into the

Published DWH.

3) Create indexes, business view, partition views & so on.

4) Generate denormalization if appropriate.

Page 24: Data-ware Housing

Ware house Manager Architecture

Temporary data store

Star flake schema

Summary tables

Ware-house Manager

Controlling Process

Stored Procedure

Backup /recovery tool

SQL scripts

Page 25: Data-ware Housing

Using temporary destination table :

Once the data is in temporary Store, the next step is to crate a set of tables

identical to the destination table in the DWH.

Ex: if the data in DWH is highly partitioned….

As we r abt. to execute substantial constancy check, data should not be

loaded until it has been cleaned up.

If consistency check fails

Although Relational databases some form rollback, but in practice it is easy

to load data in temporary area, clean it up & then publish it to the DWH.

Page 26: Data-ware Housing

Complex Transformation

Reconcile data

Page 27: Data-ware Housing

Transform into a star flake schema:

Transform it into a form suitable for decision support queries.

Transform into a form in which the bulk of factual data lies in the center.

Star schema, snowflake schema, star flake schems.

Page 28: Data-ware Housing

Create Indexes & views:

One would expect the index creation time to be significant, even if we

need only to create index against fact table partition.

Because of this most relational technology have facilities to create

indexes in parallel, distributing the load across the H/W & significantly

reducing the elapsed time.

Overhead of inserting a row into a table.

Page 29: Data-ware Housing

Generate the summaries:

Ware-house manager has to create a set of the aggregation to

speed up query performance.

Generated Automatically.

Page 30: Data-ware Housing

Query manager:

System Component that perform all the operations necessary to support the

Query management process.

User access tools, specialist data-ware housing monitoring tools,

native

data base facilities, bespoke coding, C programs & Shell script.

Size & Complexity will vary between specific solutions.

Unlike the L.M. complexity of Q.M. is driven by th extent to which the facilities

are provided by user access tools or native DB facilities.

Page 31: Data-ware Housing

Query Manager Architecture

1. Direct queries to the appropriate tables2. Schedule the execution of the user queries.