data warehouse & mining
DESCRIPTION
Data Warehouse & Mining. Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2). Why a Data Warehouse (DWH)?. Data recording and storage is growing: Almost every industry has huge amount of operational data. - PowerPoint PPT PresentationTRANSCRIPT
Dr. Abdul Basit Siddiqui Assistant Professor
FUIEMS(Lecture Slides Week # 2)
Why a Data Warehouse (DWH)?Data recording and storage is growing:
Almost every industry has huge amount of operational data.
Careful use/analysis of historic information may result in excellent prediction for the future: Knowledge worker wants to turn available data into useful
information. This information is used by them to support strategic
decision making.Gives total view of the organization:
It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker
can make correct decisions.Intelligent decision-support is required for decision-
making.
Data Warehouse & Mining- Spring 2012
Why a Data Warehouse? (Contd.)
From business perspective:It is latest marketing weapon.Helps to keep customers by learning more
about their needs. Valuable tool in today’s competitive fast
evolving world.
Data Warehouse & Mining- Spring 2012
Reason-I: Why a Data Warehouse (DWH)?
Data sets are growing:
How Much Data is that?1 MB 220 or 106 bytes Small novel 3½ Disk.
1 GB 230 or 109 bytesPaper reams that could fill the back of a pickup van.
1 TB 240 or 1012 bytes50,000 trees chopped and converted into paper and printed.
2 PB 1 PB = 250 or 1015 bytes Academic research libraries across USA.
5 EB 1 EB = 260 or 1018 bytesAll words ever spoken by the Human Beings.
Data Warehouse & Mining- Spring 2012
Reason-I: Why a Data Warehouse (DWH)?
Size of Data Sets are going up.Cost of Data Storage is coming down.
The amount of data average business collects and stores is doubling every year.
Total hardware and software cost to store and manage 1 MB of data: 1990: $ 15 2002: ¢ 15 (down 100 times) 2010: < ¢ 1 (down 150 times)
A few examples: Wall Mart: 24+ TB Finance Telecom: 100+ TB CERN: Upto 20 PB by 2006 Stanford Linear Accelerator Center (SLAC): 500 TB Telenor, Ufone, Mobilink, Warid, Zong ???
Data Warehouse & Mining- Spring 2012
Caution!
A Warehouse of Data is NOT a
Data Warehouse.
Data Warehouse & Mining- Spring 2012
Caution!
Sizeis NOT
Everything.
Data Warehouse & Mining- Spring 2012
Reason-2: Why a Data Warehouse (DWH)?
DBMS Approach List of all items that were
sold last month?
List of all makeup items purchased by Sassi?
The total sales of the last month grouped by branch?
How many sales transactions occurred during the month of January?
Intelligent Enterprise Which items sell together?
Which items to stock?
Where and how to place the items? What discounts to offer?
How best to target customers to increase sales at a branch?
Which customers are most likely to respond to my next promotional campaign, and why?
Data Warehouse & Mining- Spring 2012
Businesses demand Intelligence (BI). Complex questions from integrated data. “Intelligent Enterprise”
Reason-3: Why a Data Warehouse (DWH)?
Businesses want much more …What happened?Why it happened?What will happen?What is happening?What do you want to happen?
Data Warehouse & Mining- Spring 2012
What is a Data Warehouse?
A complete repository of historical corporate data
extracted from transaction systems that is available for ad-hoc access by knowledge
workers.
Data Warehouse & Mining- Spring 2012
What is a Data Warehouse?Transaction System:
Management Information System (MIS)Could be typed sheets (NOT transaction system)
Ad-Hoc Access:Does not have a certain access patternQueries not known in advanceDifficult to write SQL in advance
Knowledge Workers:Typically NOT IT literate (Executives, Analysts,
Managers)NOT clerical workersDecision makers
Data Warehouse & Mining- Spring 2012
What is a Data Warehouse?Inmons’s Definition:
A Data Warehouse is: Subject-oriented Integrated Time-variant Nonvolatile
Collection of data in support of management’s decision making process.
Data Warehouse & Mining- Spring 2012
Another View of a DWH
Data Warehouse & Mining- Spring 2012
Subject Oriented
Integrated
Time Variant
Non Volatile
Subject-orientedData Warehouse is organized around subjects such as sales,
product, customer. It focuses on modeling and analysis of data for decision makers.Excludes data not useful in decision support process.
Data Warehouse & Mining- Spring 2012
Integration
Data Warehouse is constructed by integrating multiple heterogeneous sources.
Data Preprocessing are applied to ensure consistency.
Data Warehouse & Mining- Spring 2012
RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
Time-variant
Provides information from historical perspective e.g. past 5-10 years.
Every key structure contains either implicitly or explicitly an element of time.
Data Warehouse & Mining- Spring 2012
Nonvolatile
Data once recorded cannot be updated.Data Warehouse requires two operations in
data accessingInitial loading of dataAccess of data
Data Warehouse & Mining- Spring 2012
load
access
Summary: What is a Data Warehouse?
It is a blend of many technologies, the basic concept being:Take all data from different operational systemsIf necessary, add relevant data from industryTransform all data and bring into a uniform
formatIntegrate all data as a single entityStore data in a format supporting easy access
for decision supportCreate performance enhancing indicesImplement performance enhancement joinsRun ad-hoc queries with slow selectivity
Data Warehouse & Mining- Spring 2012
Benefits of Data Warehouse
High returns on investment.Substantial competitive advantage.Increased productivity of corporate decision-
makers.Fast reporting for decision making process.Reduced reporting load on transactional systems.Making institutional data more user-friendly and
accessible for knowledge workers.Integrated data from different source systems.Enabled ‘point-in-time’ analysis and trending over
time.Helps in identifying and resolving data integrity
issues, either in the warehouse itself or in the source systems that collect the data.
Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different?
1. Decision making is Ad-Hoc
Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different?
2. Different patterns of hardware utilization
Data Warehouse & Mining- Spring 2012
Bus Service vs. Train
Data Warehouse: How is it Different?
3. Combines operational and historic data Don’t do data entry into a DWH. OLTP or ERP
are the source systems. OLTP systems don’t keep history, cannot get
balance statement more than a year old. DWH keep historical data, even of bygone
customers. Why? In the context of bank, want to know why the
customer left? What are the events that led to his/her leaving?
Why? Customer retention
Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different?
How much history? Depends on:
Industry Cost of storing historical data Economic value of historical data
Industry and history Telecom calls are much much more as compared to
bank transactions 18 months
Retailers interested in analyzing yearly seasonal patterns
65 weeks, why? Insurance companies want to do actuary analysis, use
the historical data in order to predict risk 7 years
Hence NOT a complete repository of data.
Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different?
How much history?
Economic value of data vs. storage cost
Data Warehouse a complete repository of data?
Data Warehouse & Mining- Spring 2012