data warehousing, olap, and data mining
TRANSCRIPT
OLAP, and OLTP
2
Introduction
• Data, data, data…everywhere!• Information…that’s another story!• Especially, the right information @ the right time!• Data ware housing's goal is to make the right
information available @ the right time• Data warehousing is a data store (eg., a
database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes
3
Different Goal
• Aggregation, summarization and exploration• Of historical data• To help management make informed decisions
Product Branch Time Price
Coke (0.5 gallon) Convoy Street 2006-03-01 09:00:01 $1.00
Pepsi (0.5 gallon) UTC 2006-03-01 09:00:01 $1.03
Coke (1 gallon) UTC 2006-03-01 09:00:02 $1.50
Altoids Costa Verde 2006-03-01 09:01:33 $0.30
...
• Find the total sales for each product and month• Find the percentage change in the total monthly
sales for each product
4
OLAP and OLTP
• OLTP-Online Transaction processing system (relies on solely on relational databases) record at time
• OLAP-Online analytical processing system (class of technologies that are designed for adhoc data access and analysis) deals with summarized data
5
6
Different Requirements
OLTP OLAP
Tasks Day to day operation High level decision support
Size of database Gigabytes Terabytes
Time span Recent, up-to-date Spanning over months / years
Size of working set Tens of records, accessed through primary keys
Consolidated data from multiple databases
Workload Structured / repetitive Ad-hoc, exploratory queries
Performance Transaction throughput Query latency
• OLTP – On-Line Transaction Processing• OLAP – On-Line Analytical Processing
7
Data Warehouse
Customers
Etc…
Vendors Etc…
Orders
DataWarehouse
Enterprise“Database”
Transactions
Copied, organizedsummarized
Data Mining
Data Miners:• “Farmers” – they know• “Explorers” - unpredictable
8
General Architecture for Data Warehousing
• Source systems
• Extraction, (Clean),
Transformation, &
Load (ETL)
• Central repository
• Metadata repository
• Data marts
• Operational feedback
• End users (business)
9
Where does OLAP fit in?
10
OLAP Overview
• Interactive, exploratory analysis of multidimensional data to discover patterns
age accid
ents
gen
de
r
11
OLAP Architecture
12
Server Options
• Single processor
• Symmetric
multiprocessor (SMP)
• Massively parallel
processor (MPP)
13
OLAP Server Options
• Multi-dimensional OLAP (MOLAP)– ‘A k-dimensional matrix based on a non relational
storage structure.’ [Agrawal et al]
• Relational OLAP (ROLAP)– ‘A relational back-end wherein operations of the data
are translated to relational queries.’ [Agrawal et al]
• Hybrid OLAP (HOLAP)– Integration of MOLAP with ROLAP.
• Desktop OLAP (DOLAP)– Simplified versions of MOLAP or ROLAP.
• ZOLAP– Speak with your chemist (normally only prescribed for
death march victims)
14
OLAP – Online Analytical Processing
• A definition:
• Data representation is in the form of a CUBE• OLAP goes beyond SQL with its analysis
capabilities• Key feature of OLAP: Relevant multi-dimensional
views such as products, time, geography
15
OLAP Cube - 1
16
OLAP Cube - 2
17
OLAP Cube - 3
• Star Structure (quite common)
Facts
Week
Product
Product
Year
Region
Time
Channel
Revenue
Expenses
Units
Model
Type
Color
Channel
Region
Nation
District
Dealer
Time
18
A Sample Data CubeTotal annual salesof TV in U.S.A.Date
Produ
ct
Cou
ntr
ysum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
19
OLAP Cube - 5
Three-Dimensional
CubeDisplay
Page ColumnsRegion:North
Sales
Redblob
Blueblob
Total
1996Rows 1997Year Total
20
OLAP Cube - 6
Six-Dimensional
Cube
Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment BusinessProduct group DesksPeriod JanuaryVariable Units sold
21
Rotation (Pivot Table)
22
Drill Down
23
OLAP Examples
• http://perso.wanadoo.fr/bernard.lupin/english/example.htm
• Excel Pivot Table example (similar to OLAP cube)
24
Sample of OLAP products
Just a snippet from http://www.olapreport.com/ProductsIndex.htm ; not an endorsement
25
Data Mining versus OLAP
26
Data Mining versus OLAP
• OLAP - Online
Analytical Processing
– Provides you with a very
good view of what is
happening, but can not
predict what will happen
in the future or why it is
happening
27
Results of Data Mining Include:
• Forecasting what may happen in the future• Classifying people or things into groups by
recognizing patterns• Clustering people or things into groups
based on their attributes• Associating what events are likely to occur
together• Sequencing what events are likely to lead
to later events
28
Thanks for listening.