database and data warehouse 2. data warehouse and...
TRANSCRIPT
2. DATA WAREHOUSE AND OLAP
Database and Data warehouse
1
From Database to Data Warehouse
Data Warehouse:
combines and
reorganizes
current and
historical data
from multiple
data sources into
a central storage
for decision
making purposes.
Very larger than
DB. Why?
2
Data Warehouse
Definition
“A subject-oriented, integrated, time-variant and
nonvolatile (non-updatable) collection of data in
support of decision-making process.”
◦ Subject? Customer, product, sales, etc.
◦ Integrated? From many sources
◦ Time-variant? Historical
◦ Nonvolatile? Accumulated
3
Data Warehouse
Data mart
◦ A subset of data warehouse relevant to specific purposes
Two major applications of Data warehouse
◦ OLAP: Multidimensional analysis of DW data
◦ Data Mining: Knowledge discovery from DW data
ETL = Extraction, transformation, and loading
◦ A process that extracts information from internal and external
databases, transforms the information using a common set of
enterprise definitions, and loads the information into a data
warehouse
4
ETL
• Standardization • Removing redundancies
• Missing data? fill-up or discard • Incorrect data? discard or correction • Outlier? Smoothing or discard
• Attribute reduction • Data compression (Ex: histogram, regression)
• Generalization/specialization • Summation • Normalization (ex: btw 0-1) • Attribute construction
5
DW is Multidimensional
DB is relational - databases contain information in a
series of tables (relations)
In a data warehouse and data mart, information is
multidimensional, it contains layers of columns and
rows
Northeast
Southeast
Central
West
Northeast
Southeast
Central
West
Northeast
Southeast
Central
West
Quarter1
Quarter2
Quarter3
Quarter4
Quarter1
Quarter2
Quarter3
Quarter4
Quarter1
Quarter2
Quarter3
Quarter4
Gizmo
Widget
Gizmo
Widget
Gizmo
Widget
6
Multidimensional Data
7
OLTP and OLAP
OLTP
◦ Real time processing of transactions such as sales, flight
reservation, cancellation, etc.
◦ OLTP works on database
◦ online processing ≈ real time processing ⇔ batch processing
OLAP
◦ OLAP provides online multidimensional analysis functionality
working on Data Warehouse or Data Mart.
◦ Need a data warehouse and OLAP or mining system to analyze
patterns, trends, or outliers!
◦ What is required for the following analysis?
Effects of oil price increase on car manufacturer?
Cell phone usage patterns of college students in urban area?
8
OLTP and OLAP
OLTP (DB) OLAP (DW)
users clerk, IT professional knowledge worker
#users thousands hundreds
function Operations/transactions
processing
decision support
DB design application-oriented subject-oriented
data Current (up-to-date)
detailed
dispersed over applications
historical
summarized
multidimensional, integrated
usage repetitive on occasions
access Updating (read/write)
Indexing on primary key
Loading and accessing
unit of work short, simple transaction complex analytic query
Data size 100MB-GB (each DB) 100GB-TB
(many source and historical)
quality metric transaction throughput query throughput
9
OLAP using DW Example Example: Sales volume is a function of month, product, and customer.
1. What is the sale amount of Digital Camera in Feb., 2005 by Fred Smith?
2. What if you want to know the temporal sales trends of Fred Smith?
10
From Tables to Data Cube
A DW stores data in Data Cube (not in relations)
Data Cube maintains both dimension tables and fact
tables
◦ Dimension tables: attribute structures such as item (item_name,
brand, type), time (day, week, month, quarter, year) etc.
◦ Fact tables: values such as dollars_sold, unit_sold, etc.
Data Cube and Cuboid
◦ Data cube: composed of cuboids
◦ Apex cuboid: the top most 0-D cuboid, the highest-level of
summarization
◦ Base cuboid: n-D base cube. n is the number of dimensions
11
Data Cube = A Collection of Cuboids
12
quarter, prod
quarter,prod,country
quarter, prod, country, supplier
all
quarter product country supplier
quarter,country
quarter,supplier
prod,country
prod,supplier
country,supplier
quarter,prod,supplier
quarter,country,supplier
prod,country,supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
Typical OLAP Operations
Roll up
◦ Drill up (취합): summarize data
◦ Climbing up the cube: dimension reduction
Drill down
◦ Roll down (분해): reverse of roll-up
◦ Going down the cube: introducing new dimensions
Slicing
◦ Fixing dimensions (to look at only one dimension)
Pivoting
◦ View data from different perspective by reorienting the cube
Computation
◦ Computation at each cuboids on vast amount of data in advance
13
Typical OLAP Operations
Sales volume as a function of product, month, and measures
all
month product measure
month, product
month, measure
product, measure
month, product, measure
Roll up
Drill down
14
OLAP Requirements
Types of Analysis
◦ What-if Analysis, Sensitivity Analysis, Goal Seeking Analysis, Rank
Analysis, Exception Analysis, Prediction, etc.
OLAP Requirements
◦ FASMI: Fast Analysis of Shared Multidimensional Information
◦ Fast: 1 sec, 2 sec, 5 sec or 30 sec. according to the type of tasks
◦ Analysis: various analysis tools
◦ Shared: supports many users and purposes
◦ Multi-dimensional Information: many attributes, hierarchical
attributes.
15
OLAP Types
OLAP Data storage
◦ Stored in Cubes or Relations
Types of OLAP (according to data storages)
◦ MOLAP (Multidimensional OLAP)
Store data in Cubes and performs operations such as drill-down, roll-up,
slicing, etc., for analysis
Very fast but pre-computation required
◦ ROLAP
Use data directly from relational DB(current and historical data)
Very scalable
◦ Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
Flexibility, e.g., low level: relational, high-level: array
16
Part II Summary
Database
◦ Data, information, knowledge, wisdom
◦ File and database
◦ Relational DBMS
◦ SQL
Data warehouse
◦ Data warehouse
◦ Data cubes
◦ OLTP and OLAP
17