1 on-line analytic processing warehousing data cubes

27
1 On-Line Analytic Processing Warehousing Data Cubes

Upload: margaretmargaret-hicks

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 On-Line Analytic Processing Warehousing Data Cubes

1

On-Line Analytic Processing

Warehousing

Data Cubes

Page 2: 1 On-Line Analytic Processing Warehousing Data Cubes

2

Overview

• Traditional database systems are tuned to many, small, simple queries.

• Some new applications use fewer, more time-consuming, complex analytic queries.

• New architectures have been developed to handle analytic queries efficiently.

Page 3: 1 On-Line Analytic Processing Warehousing Data Cubes

3

OLTP

• Most database operations involve On-Line Transaction Processing (OTLP).– Short, simple, frequent queries and/or

modifications, each involving a small number of tuples.

– ExamplesAnswering queries from a Web interfacesales at cash registersselling airline tickets

Page 4: 1 On-Line Analytic Processing Warehousing Data Cubes

4

OLAP

• On-Line Analytic Processing (or A for “application”) queries: – Few, but complex queries– May query a large amount of data and run for

hours– Do not depend on having an absolutely up-to-

date database.

Page 5: 1 On-Line Analytic Processing Warehousing Data Cubes

Example: OLAP Application

• Analysts at Wal-Mart look for items with increasing sales in some region recently.Sales(saledate,item,store,qty)

Items(item,size,color)

Stores(store,city,provice)

SELECT item,city,SUM(qty)

FROM Sales NATURAL JOIN Stores

WHERE saledate >= ‘2009-1-1’

GROUP BY item,city;

Page 6: 1 On-Line Analytic Processing Warehousing Data Cubes

6

Data Warehouse

• It’s better for OLAP applications to take place in a separate copy of the master database.

• Analysis may involve data from various sources across the enterprise.

• Data warehouse is the most common form of data integration.– Copy sources into a single DB (warehouse) and try to

keep it up-to-date.

– Usual method: periodic reconstruction of the warehouse, perhaps overnight.

Page 7: 1 On-Line Analytic Processing Warehousing Data Cubes

7

Common Architecture

• Databases at store branches handle OLTP.

• Local store databases copied to a central warehouse overnight.

• Analysts use the warehouse for OLAP.

Page 8: 1 On-Line Analytic Processing Warehousing Data Cubes

Star Schemas

• A star schema is a common organization for data at a warehouse. It consists of:– Fact table : a very large accumulation of facts

such as sales.Often “insert-only.”

– Dimension tables : smaller, generally static information about the entities involved in the facts.

Page 9: 1 On-Line Analytic Processing Warehousing Data Cubes

9

Example

• Suppose we want to record in a warehouse information about sales of products:– the store where the item is sold– the item sold– the customer who bought the item– the time when the item is sold– the price

• The fact table is a relation:Sales(store, item, customer, timeID, price)

Page 10: 1 On-Line Analytic Processing Warehousing Data Cubes

10

Example(cont.)

• The dimension tables include information about stores, items, customers and time “dimensions”:

Stores(store,city,province)

Items(item, size, color, manf)

Customers(customer, addr, phone)

Time(timeID, day, week, month, year)

Page 11: 1 On-Line Analytic Processing Warehousing Data Cubes

Visualization: Star Schema

11

Dimension Table Items Dimension Table Time

Dimension Table Customers

Dimension Table Stores

Fact Table - Sales

Dimension Attrs. Dependent Attr.

Page 12: 1 On-Line Analytic Processing Warehousing Data Cubes

12

Dimension/Dependent Attributes

• Two classes of fact-table attributes:– Dimension attributes: the key of a dimension

table.Foreign key for fact table.

– Dependent attributes: a value determined by the dimension attributes of the tuple.More often called “measure” attributes.

Page 13: 1 On-Line Analytic Processing Warehousing Data Cubes

13

Example: Dependent Attribute

• price is the dependent attribute of our example Sales relation.– Other dependent attributes can also be present,

e.g., quantity.

• It is determined by the combination of dimension attributes: store, item, customer and time attributes.

Page 14: 1 On-Line Analytic Processing Warehousing Data Cubes

14

Approaches to Implementation

• ROLAP = relational OLAP: Use relational DBMS to support star schemas.

• MOLAP = multidimensional OLAP: Use a specialized multidimensional data structure.– e.g., data cube

• HOLAP = hybrid OLAP: Use both the above.

Page 15: 1 On-Line Analytic Processing Warehousing Data Cubes

15

Data Cube

• OLAP data can be modelled in a multidimensional space manner.

• Keys of dimension tables are the dimensions of a hypercube.– Example: for the Sales data, the four

dimensions are store, item, customer and time.

• Dependent attributes (e.g., price) appear as points within the multidimensional space.

Page 16: 1 On-Line Analytic Processing Warehousing Data Cubes

Visualization: Data Cube

16

price

store

item

customer

Page 17: 1 On-Line Analytic Processing Warehousing Data Cubes

17

Data Cube w/ Aggregations

• Raw-data cube: original data in the fact table.

• Formal data cube: also includes points that represent aggregation (typically SUM) of the raw-data grouped in all subsets of dimensions.– Precomputed aggregations– Critical for fast response upon an analytic

query.

Page 18: 1 On-Line Analytic Processing Warehousing Data Cubes

Visualization: Formal Data Cube

18

price

store

item

customer

SUM o

ver

all c

usto

mer

s

Page 19: 1 On-Line Analytic Processing Warehousing Data Cubes

Tuple w/ Aggregate Components

• Think of each dimension as having an additional value *.– Stands for “all”.

• A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s.

• Example: Sales(‘Shop-1’, ‘TV’, *, *) holds the sum of prices, over all customers and all time, of the TV sets sold at Shop-1.

19

Page 20: 1 On-Line Analytic Processing Warehousing Data Cubes

Building Data Cube in SQL

• In SQL:1999SELECT store,item,customer,SUM(price)

FROM Sales

GROUP BY CUBE(store,item,customer);

– Group by 23 subsets of the three dimensions

– Use NULL for the “*”

– In SQL Server: GROUP BY … WITH CUBE

• To store the cube:CREATE MATERIALIZED VIEW myCube AS

… cube-generating statement here …

Lu Chaojun, SJTU

Page 21: 1 On-Line Analytic Processing Warehousing Data Cubes

Variant of CUBE

• ROLLUP operator:SELECT store,item,customer,SUM(price)

FROM Sales

GROUP BY ROLLUP(store,item,customer);

– Group by 4 subsets of the three dimensions: {store, item, customer}, {store, item}, {store}, {}.

– In SQL Server: GROUP BY … WITH ROLLUP

Lu Chaojun, SJTU

Page 22: 1 On-Line Analytic Processing Warehousing Data Cubes

Operations on Cube: Dicing

• Dicing and Slicing– Each dimension is partitioned at some level of

granularity.e.g., “store” dimension may be partitioned by store,

by city, by province. “time” dimension may be partitioned by day, by week, by month, by year.

– A choice of partition for each dimension “dices” the cube.

– A choice of partition for one dimension generate a “slices” of the cube.

Lu Chaojun, SJTU

Page 23: 1 On-Line Analytic Processing Warehousing Data Cubes

Example: Slicing/Dicing

SELECT city, color, SUM(price)

FROM (((Sales NATURAL JOIN Stores)

NATURAL JOIN Items)

NATURAL JOIN Times)

WHERE year = 2009

GROUP BY city, color;– Slice in time dimension, and dice in store and

item dimension.

Lu Chaojun, SJTU

Page 24: 1 On-Line Analytic Processing Warehousing Data Cubes

24

Operations on Cube: Roll-up

• Roll-up– Aggregate along one or more dimensions.– From finer gruanularity to coarser granularity:

Going up the dimension hierarchy.Reducing dimensions.

• Example– Given sales data of each store, roll it up into

aggregated data by cities. – Or simply omit the store dimension.

Page 25: 1 On-Line Analytic Processing Warehousing Data Cubes

25

Operations on Cube: Drill-down

• Drill-down– “de-aggregate”: break an aggregate into its

constituents.– From coarser gruanularity to finer granularity:

Going down the dimension hierarchy.Adding dimensions.

• Example: having found that Shop-1 doesn’t sell TV well, break down its TV sales by particular size, or by the Time dimension.

Page 26: 1 On-Line Analytic Processing Warehousing Data Cubes

Example: Roll-up/Drill-down

26

TV PC Refrige

Shop-1

45 33 30

Shop-2

50 36 42

Shop-3

38 31 40

Qty by store/itemQty by province/item

Roll upstore by province

Qty by city/item

Drill downstore by city

TV PC Refrige

Jiangsu 133

100

112

TV PC Refrige

Nanjing 60 36 80

Suzhou 73 74 32

Page 27: 1 On-Line Analytic Processing Warehousing Data Cubes

End