data warehouse

37
Data Warehousing Presentation by Prof. Jyotindra Zaveri 1 Data Warehousing J. Zaveri

Upload: digital-vivekananda-digital-library-by-jyoti

Post on 02-Dec-2014

124 views

Category:

Documents


2 download

DESCRIPTION

A process of transforming data into information and making it available to users in a timely enough manner to make a difference - used by finance and insurance industry - Historical data held the key to understanding data over time - Presentation introducing basic concept of data warehouse by Prof Jyoti -

TRANSCRIPT

Page 1: Data Warehouse

Data Warehousing

Presentation by Prof. Jyotindra Zaveri

1 Data Warehousing J. Zaveri

Page 2: Data Warehouse

Data, Data everywhere

yet ...

� I can’t find the data I need

� data is scattered over the network

� many versions, subtle differences

� I can’t get the data I need

Data Warehousing

� I can’t get the data I need

need an expert to get the data

� I can’t understand the data I found

available data poorly documented

� I can’t use the data I found

results are unexpected

data needs to be transformed from one form to other

J. Zaveri2

Page 3: Data Warehouse

What is Data Warehousing?

A process of transforming data into information and

Information

information and making it available to users in a timely enough manner to make a difference

Data

3 Data Warehousing J. Zaveri

Page 4: Data Warehouse

Why Data Warehouse? [DW]� For organizational learning to take data

from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW).Warehousing (DW).

� DW allows an organization (enterprise) to remember what it has noticed about its data.

� Data Mining techniques make use of the data in a DW.

4 Data Warehousing J. Zaveri

Page 5: Data Warehouse

Data Warehouse

�A data warehouse is a copy of transaction data specifically

structured for querying, analysis and reporting

�The data warehouse contains a copy of the transactions which are �The data warehouse contains a copy of the transactions which are

not updated or changed later by the transaction system.

�This data is specially structured, and may have been transformed

when it was copied into the data warehouse.

5 Data Warehousing J. Zaveri

Page 6: Data Warehouse

Data Warehouse

Customers Orders

Enterprise“Database”

Transactions

Etc…

Vendors Etc…

DataWarehouse

Copied, organizedsummarized

6 Data Warehousing

Data Mining

Data Miners

J. Zaveri

Page 7: Data Warehouse

Data Warehouse Architecture

Optimized Loader

ExtractionCleansing

RelationalDatabases

ERPSystems

Data Warehouse Engine Analyze

Query

Metadata RepositoryLegacyData

Purchased Data

Systems

7 Data Warehousing

J. Zaveri

Page 8: Data Warehouse

Data warehouseMergeClean

DirectQuery

Reportingtools

MiningtoolsOLAP

Decision support tools

RelationalDBMS+

Crystal reportsIntelligent Miner

Online Analytic Processing

Mumbai branch Delhi branch Pune branchCensusdata

Operational data

Detailed transactionaldata

Data warehouseCleanSummarize

DBMS+e.g. Oracle

GISdata

8 Data Warehousing J. Zaveri

Page 9: Data Warehouse

Application Areas

Industry Application

Finance Credit Card Analysis

Insurance Claims, Fraud Analysis

Telecommunication Call record analysisTelecommunication Call record analysis

Transport Logistics management

Consumer goods promotion analysis

Data Service providers Value added data

Utilities Power usage analysis

9 Data Warehousing J. Zaveri

Page 10: Data Warehouse

Data Mart

� A Data Mart is a smaller, more focused Data Warehouse – a

mini-warehouse.

10 Data Warehousing J. Zaveri

Page 11: Data Warehouse

Data Warehouse to Data Mart

Data Mart

DecisionSupport

Information

DataWarehouse Data Mart

Data Mart

DecisionSupport

Information

DecisionSupport

Information

11 Data Warehousing J. Zaveri

Page 12: Data Warehouse

Data Warehouse definition

A single, complete and consistent store of data obtained from a variety obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

Data Warehousing 12J. Zaveri

Page 13: Data Warehouse

Which are ourlowest/highest margin

Which are ourlowest/highest margin

customers ?Who are my customers Who are my customers

and what products are they buying?

What is the most What is the most effective distribution

Why Data Warehousing?

are they buying?

to the competition ?

Which customersare most likely to go to the competition ?

What impact will

and margins?

What impact will new products/services

have on revenue and margins?

What product prom--options have the biggest

What product prom--options have the biggest

impact on revenue?

effective distribution channel?

Data Warehousing1

3 J. Zaveri

Page 14: Data Warehouse

Decision Support� Used to manage and control business

� Data is historical or point-in-time

� Optimized for inquiry rather than update

� Use of the system is loosely defined and can be ad-hoc

Used by managers and end-users to understand the business

Data Warehousing14

� Used by managers and end-users to understand the business and make judgments

J. Zaveri

Page 15: Data Warehouse

What are the users saying...

� Data should be integrated across the enterprise

� Summary data had a real value to the organization

Data Warehousing15

� Historical data held the key to understanding data over time

� What-if capabilities are required

J. Zaveri

Page 16: Data Warehouse

Data Warehousing --

It is a process

� Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible

Data Warehousing16

were not previous possible

� A decision support database maintained separately from the organization’s operational database

J. Zaveri

Page 17: Data Warehouse

Traditional RDBMS used for OLTP � Database Systems have been used traditionally for OLTP

� clerical data processing tasks

� detailed, up to date data

� structured repetitive tasks

� read/update a few records

Data Warehousing17

� read/update a few records

� isolation, recovery and integrity are critical

� Will call these operational systems

J. Zaveri

Page 18: Data Warehouse

OLTP vs Data WarehouseOLTP vs Data WarehouseOLTP vs Data WarehouseOLTP vs Data Warehouse

� OLTP� Application Oriented

� Used to run business

� Clerical User

Detailed data

� Warehouse (DSS)� Subject Oriented

� Used to analyze business

� Manager/Analyst

� Summarized and refined

Data Warehousing18

� Detailed data

� Current up to date

� Isolated Data

� Repetitive access by small transactions

� Read/Update access

� Summarized and refined

� Snapshot data

� Integrated Data

� Ad-hoc access using large queries

� Mostly read access (batch update)

J. Zaveri

Page 19: Data Warehouse

From the Data Warehouse to Data

Marts

IndividuallyStructured

Less

Information

Data Warehousing19

DepartmentallyStructured

Data WarehouseOrganizationallyStructured

More

HistoryNormalizedDetailed

DataJ. Zaveri

Page 20: Data Warehouse

Users have different views of Data

OLAP

Tourists: Browse information harvestedby farmers

Data Warehousing20

Organizationallystructured

Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data

Farmers: Harvest informationfrom known access paths

J. Zaveri

Page 21: Data Warehouse

Walmart Case Study

� Founded by Sam Walton

� One the largest Super Market Chains in the US

Data Warehousing21

� Walmart: 3000+ Retail Stores

� SAM's Clubs 200+Wholesalers Stores

J. Zaveri

Page 22: Data Warehouse

Old Retail Paradigm

� Walmart� Inventory Management

� Merchandise Accounts Payable

� Purchasing

� Supplier Promotions: National,

� Suppliers � Accept Orders

� Promote Products

� Provide special Incentives

� Monitor and Track The

Data Warehousing22

� Supplier Promotions: National, Region, Store Level

� Monitor and Track The Incentives

� Bill and Collect Receivables

� Estimate Retailer Demands

J. Zaveri

Page 23: Data Warehouse

New (Just-In-Time) Retail Paradigm� No more deals� Shelf-Pass Through (POS Application)

� One Unit Price� Suppliers paid once a week on ACTUAL items sold

� Walmart Manager� Daily Inventory Restock

Data Warehousing23

� Daily Inventory Restock� Suppliers (sometimes Same Day) ship to Walmart

� Warehouse-Pass Through� Stock some Large Items

� Delivery may come from supplier� Distribution Center

� Supplier’s merchandise unloaded directly onto Walmart Trucks

J. Zaveri

Page 24: Data Warehouse

Information as a Strategic Weapon� Daily Summary of all Sales Information� Regional Analysis of all Stores in a logical area� Specific Product Sales� Specific Supplies Sales� Trend Analysis, etc.

Data Warehousing24

� Trend Analysis, etc.� Wal-Mart uses information when negotiating with

� Suppliers� Advertisers etc.

J. Zaveri

Page 25: Data Warehouse

Data Granularity in Warehouse� Summarized data stored

� reduce storage costs

� reduce CPU usage

� increases performance since smaller number of records to be processed

Data Warehousing25

processed

� design around traditional high level reporting needs

� tradeoff with volume of data to be stored and detailed usage of data

J. Zaveri

Page 26: Data Warehouse

Granularity in Warehouse� Solution is to have dual level of granularity

� Store summary data on disks� 95% of DSS processing done against this data

� Store detail on tapes� 5% of DSS processing against this data

Data Warehousing26J. Zaveri

Page 27: Data Warehouse

Levels of Granularity

Operational

accountactivity date

accountmonth

# transwithdrawalsdepositsmonthly account

Banking Example

Data Warehousing27

60 days ofactivity

activity dateamounttellerlocationaccount bal

depositsaverage bal

amountactivity date

amountaccount bal

monthly accountregister -- up to 10 years

Not all fieldsneed be archived

J. Zaveri

Page 28: Data Warehouse

Data Integration Across Sources

Trust Credit cardSavings Loans

Data Warehousing28

Same data different name

Different data Same name

Data found here nowhere else

Different keyssame data

J. Zaveri

Page 29: Data Warehouse

Data Transformation

Sequential Legacy Relational ExternalOperational/Source Data

Data Transformation

Accessing Capturing Extracting House holding FilteringReconciling Conditioning Loading Validating Scoring

Data Warehousing29

� Data transformation is the foundation for achieving single version of the truth

� Major concern for IT� Data warehouse can fail if appropriate data

transformation strategy is not developed

J. Zaveri

Page 30: Data Warehouse

Data Integrity Problems

� Same person, different spellings� Agarwal, Agrawal, Aggarwal etc...

� Multiple ways to denote company name� Persistent Systems, PSPL, Persistent Pvt. LTD.

� Use of different names� Mumbai, Bombay

Data Warehousing30

� Mumbai, Bombay

� Different account numbers generated by different applications for the same customer

� Required fields left blank

� Invalid product codes collected at point of sale� manual entry leads to mistakes� “in case of a problem use 9999999”

J. Zaveri

Page 31: Data Warehouse

Data Transformation Terms

� Extracting

� Conditioning

� Scrubbing

� Merging

� Enrichment

� Scoring

� Loading

� Validating

Data Warehousing31

Merging

� House holding

Validating

� Delta Updating

J. Zaveri

Page 32: Data Warehouse

Data Transformation Terms� House holding

� Identifying all members of a household (living at the same address)

� Ensures only one mail is sent to a household

� Can result in substantial savings: 1 million catalogues at Rs. 50

Data Warehousing32

� Can result in substantial savings: 1 million catalogues at Rs. 50 each costs Rs. 50 million . A 2% savings would save Rs. 1 million

J. Zaveri

Page 33: Data Warehouse

Deploying Data Warehouses

� What business information keeps you in business today? What business information can put you out of business tomorrow?

� What business information should be a mouse click away?

Data Warehousing33

click away?

� What business conditions are the driving the need for business information?

J. Zaveri

Page 34: Data Warehouse

Cultural Considerations

� Not just a technology project

� New way of using information to support daily activities and decision making

� Care must be taken to prepare organization for change

Data Warehousing34

for change

� Must have organizational backing and support

J. Zaveri

Page 35: Data Warehouse

Data Mining works with

Warehouse Data

� Data Warehousing provides the Enterprise with a memory

Data Warehousing 35

� Data Mining provides the Enterprise with intelligence

J. Zaveri

Page 36: Data Warehouse

Join Prof. Zaveri on Facebook and [email protected]

eMail [email protected]

eLearning site http://www.dnserp.com

Follow on Twitter @followERP

Connect on LinkedIn http://in.linkedin.com/in/jyotindrazaveri

Connect on Facebook http://www.facebook.com/jyotindra

Subscribe YouTube http://www.youtube.com/dnserp

J. ZaveriData WarehousingSlide # 36

Page 37: Data Warehouse

Thank You

� Question / Answersession

� Please clarify your doubts

Data Warehousing37 J. Zaveri