![Page 1: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/1.jpg)
Data Models for WarehouseData Models for Warehouse
Session-12/13
Data Management for Decision Support
![Page 2: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/2.jpg)
Data ModelsData Models
Data Models relations stars & snowflakes cubes
Operators slice & dice roll-up, drill down pivoting other
![Page 3: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/3.jpg)
Data ModelsData Models
Star schemas are database schemas that exploit the structure of data for decision support query Queries in DSS tend to
Examine a set of factual transactions- POS, Customer events
Facts are analyzed in variety of ways - POS transaction by week, or store
For example a retail store POS is at the center Product information - SKU, hierarchy of ( section dept, BU) Time information - day, week, month, year Stores - Store-id, hierarchy (regions, city, locality) Suppliers- Sup-id, location, discounts
![Page 4: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/4.jpg)
Data ModelsData Models
Sales Transactions
Products Time
SuppliersStores
Information is split between two classes- Factual information and Reference information
![Page 5: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/5.jpg)
FACT DATAFACT DATA
Fact data records the information on factual event that occurred in the business- POS, Phone calls, Banking transactions
Typically 70% of Warehouse data is Fact data Important to identify and define structure right in
the first place as restructuring is an expensive process
Detail content of FACT is derived from the business requirement
Recorded Facts do not change as they are events of past
![Page 6: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/6.jpg)
Dimension DataDimension Data
Information that is used for analyzing the elemental data, for example, product hierarchy, time periods, customers, stores
It is the reference data used for analysis of Facts
Organizing the information in separate reference tables offers better query performance
It differs from Fact data as it changes over time, due to changes in business, reorganization
It should be structured to permit rapid changes
![Page 7: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/7.jpg)
FACT and Dimensions FACT and Dimensions
Millions to billions of rows
Multiple foreign keys Numeric Does not change
Tens to millions of rows
One primary key Textual decription Frequently modifies
![Page 8: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/8.jpg)
Decision Support QueriesDecision Support Queries
Examples Average number of sales of Haldiram per store
over last month (various types within the brand) Projected sales of Deepavali gift packs against
the actual The top 20% customers (spending) over last
quarter The customers with average balance in excess of
Rs. 25000 for past one year ==> Each of these queries is based on Factual
data
![Page 9: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/9.jpg)
Decision Support QueriesDecision Support Queries
Examples
POS Transaction
Membership card Transaction
Account transactions
Sales of Haldiram
Customer Spend
Account Balance
Quantity SoldProductStore Date, TimeRevenue Realized
Customer-IdStoreTransaction ValueDate and Time
CustomerAC numbertype of transactionamount
![Page 10: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/10.jpg)
Star SchemaStar Schema
The star schema is a data-modeling technique used to map multidimensional decision support into a relational database.
Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structure of the operational database.
Four Components: Facts Dimensions Attributes Attribute hierarchies
![Page 11: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/11.jpg)
A Simple Star Schema
![Page 12: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/12.jpg)
Star SchemaStar Schema
Facts Facts are numeric measurements (values) that
represent a specific business aspect or activity.
The fact table contains facts that are linked through their dimensions.
Facts can be computed or derived at run-time (metrics).
Dimensions Dimensions are qualifying characteristics that provide
additional perspectives to a given fact.
Dimensions are stored in dimension tables.
![Page 13: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/13.jpg)
Identifying Facts and DimensionsIdentifying Facts and Dimensions
Elemental Transaction
Determine Key Dimensions
Check if Fact is a dimension
Check if dimensions is a Fact
![Page 14: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/14.jpg)
Identification: Step 1Identification: Step 1
Examine the enterprise model and identify the transaction that or of interest- driven by business requirement analysis
These will be transaction that describes events fundamental to the business e.g., #calls for Telecom, account transactions in banking
For each potential Fact ask a question- Is this information operated upon by business process? Daily sales versus POS, even if system reports daily sales POS may be the FACT
The limit of current recording should not influence Warehouse design
![Page 15: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/15.jpg)
Identification: Step 1Identification: Step 1
Sector and Business
Retail
SalesShrinkage
Retail Banking Customer profiling ProfitabilityInsurance Product ProfitabilityTelecom Call Analysis Customer Analysis
Fact Table
POS Transaction
Stock movement and position
Customer eventsAccount transactions
Claims and receipts
Call eventsCustomer events(install, disconnect, payment)
![Page 16: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/16.jpg)
Identification: Step 2Identification: Step 2
Look at the logical model to find the entities associated with entities in the fact table. List out all such logically associate entities.
These are candidate References, the task is to find key dimension entities that may not be directly associated.
For example, retail banking account transaction are candidate fact table. The account transaction is candidate reference. But, the customer I indirectly related to transaction. Although, a better choice.
Analyze account transaction by account? Analyze how customers use our services? You store both relationships but customer becomes a
dimension
![Page 17: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/17.jpg)
Identification: Step3Identification: Step3
FACT is not actually a denormalized dimension table Consider the following:
house-details Cable-laid Sales-persons visit connected to the service promotional material sent subscription cancelled …
Home-details - candidate fact Operational events Report on number of connections quarter-to-date Time-lag between laying and subscrition
![Page 18: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/18.jpg)
Identification: Step 4Identification: Step 4
Dimension is not a FACT Lot depends on DSS requirements-
Customer can be FACT or Dimension Promotions can be fact or dimensions
Ask questions using other dimensions- Using how many other dimensions, Can I view this entity.
Can I view promotion by Time? Can I view promotions by product? Can I view promotion by store? Can I vie promotions by suppliers?
If answer to these question is yes, then it is a FACT
![Page 19: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/19.jpg)
Star SchemaStar Schema
Attributes Each dimension table contains attributes. Attributes are
often used to search, filter, or classify facts. Dimensions provide descriptive characteristics about
the facts through their attributes.
Possible Attributes For Sales Dimensions
![Page 20: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/20.jpg)
Three Dimensional View Of Sales
![Page 21: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/21.jpg)
Slice And Dice View Of Sales
![Page 22: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/22.jpg)
Star SchemaStar Schema
Attribute Hierarchies
Attributes within dimensions can be ordered in a well-defined attribute hierarchy.
The attribute hierarchy provides a top-down data organization that is used for two main purposes:
Aggregation
Drill-down/roll-up data analysis
![Page 23: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/23.jpg)
A Location Attribute Hierarchy
![Page 24: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/24.jpg)
Attribute Hierarchies In Multidimensional Analysis
![Page 25: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/25.jpg)
Star SchemaStar Schema
Star Schema Representation
Facts and dimensions are normally represented by physical tables in the data warehouse database.
The fact table is related to each dimension table in a many-to-one (M:1) relationship.
Fact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.
![Page 26: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/26.jpg)
Star Schema For Sales
![Page 27: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/27.jpg)
Orders Star Schema
![Page 28: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/28.jpg)
The Multi-Dimensional ModelThe Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
Prod Code Time Code Store Code Sales Qty
Store Info
Product Info
Time Info
. . .
Numerical MeasuresKey columns joining fact table
to dimension tables
Fact table for measures
Dimension tables
![Page 29: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/29.jpg)
Dimensional ModelingDimensional Modeling
Dimensions are organized into hierarchies E.g., Time dimension: days weeks quarters E.g., Product dimension: product product line brand
Dimensions have attributes
![Page 30: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/30.jpg)
Dimension Hierarchies
Store Dimension Product Dimension
District
Region
Total
Brand
Manufacturer
Total
Stores Products
![Page 31: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/31.jpg)
ROLAP: Dimensional Modeling Using Relational DBMS
ROLAP: Dimensional Modeling Using Relational DBMS
Special schema design: star, snowflake Special indexes: bitmap, multi-table join Special tuning: maximize query throughput Proven technology (relational model, DBMS), tend to
outperform specialized MDDB especially on large data sets Products
IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
![Page 32: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/32.jpg)
MOLAP: Dimensional Modeling Using the Multi Dimensional Model
MOLAP: Dimensional Modeling Using the Multi Dimensional Model
MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products
Pilot, Arbor Essbase, Gentia
![Page 33: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/33.jpg)
Star Schema (in RDBMS)Star Schema (in RDBMS)
![Page 34: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/34.jpg)
Star Schema ExampleStar Schema Example
![Page 35: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/35.jpg)
Star Schema with Sample Data
![Page 36: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/36.jpg)
The “Classic” Star Schema
A single fact table, with detail and summary data
Fact table primary key has only one key column per dimension
Each key is generated Each dimension is a single table,
highly denormalized
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagResolutionSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level
Product Desc.BrandColorSizeManufacturerLevel
STORE KEY
![Page 37: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/37.jpg)
The “Classic” Star Schema
The biggest drawback: dimension tables must carry a level indicator for every record and every query must use it. In the example below, without the level constraint, keys for all stores in the NORTH region, including aggregates for region and district will be pulled from the fact table, resulting in error.
Example: Select A.STORE_KEY, A.PERIOD_KEY, A.dollars from Fact_Table A
where A.STORE_KEY in (select STORE_KEYfrom Store_Dimension Bwhere region = “North” and Level = 2)
and etc...
Level is neededwhenever aggregates are stored with detail facts.
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagResolutionSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level
Product Desc.BrandColorSizeManufacturerLevel
STORE KEY
![Page 38: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/38.jpg)
The “Level” Problem
Level is a problem because because it causes potential for error. If the query builder, human or program, forgets about it, perfectly reasonable looking WRONG answers can occur.
One alternative: the FACT CONSTELLATION model...
![Page 39: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/39.jpg)
The “Fact Constellation” Schema
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY
DollarsUnitsPrice
Region Fact Table
Region_IDPRODUCT_KEYPERIOD_KEY
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.
Product Desc.BrandColorSizeManufacturer
STORE KEY
![Page 40: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/40.jpg)
The “Fact Constellation” Schema
In the Fact Constellations, aggregate tables are created separately from the detail, therefor it is impossible to pick up, forexample, Store detail when queryingthe District Fact Table.
Major Advantage: No need for the “Level” indicator in the dimension tables, since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can slow performance; front-end must be able to detect existence of aggregate facts, which requires more extensive metadata
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY
DollarsUnitsPrice
Region Fact Table
Region_IDPRODUCT_KEYPERIOD_KEY
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Period DescYearQuarterMonthDayCurrent FlagSequence
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.
Product Desc.BrandColorSizeManufacturer
STORE KEY
![Page 41: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/41.jpg)
Another Alternative to “Level”
Fact Constellation is a good alternative to the Star, but when dimensions have very high cardinality, the sub-selects in the dimension tables can be a source of delay.
An alternative is to normalize the dimension tables by attribute level, with each smaller dimension table pointing to an appropriate aggregated fact table, the “Snowflake Schema” ...
![Page 42: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/42.jpg)
The “Snowflake” Schema
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.
District_IDDistrict Desc.Region_ID
Region_ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY Dollars
UnitsPrice
RegionFact Table
Region_IDPRODUCT_KEYPERIOD_KEY
![Page 43: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/43.jpg)
The “Snowflake” Schema
No LEVEL in dimension tables Dimension tables are normalized by
decomposing at the attribute level Each dimension table has one key for each
level of the dimensionís hierarchy The lowest level key joins the dimension table
to both the fact table and the lower level attribute table
How does it work? The best way is for the query to be built by understanding which summary levels exist, and finding the proper snowflaked attribute tables, constraining there for keys, then selecting from the fact table.
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.
District_ IDDistrict Desc.Region_ ID
Region_ ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY Dollars
UnitsPrice
RegionFact Table
Region_IDPRODUCT_KEYPERIOD_KEY
![Page 44: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/44.jpg)
The “Snowflake” Schema
Additional features: The original Store Dimension table, completely de-normalized, is kept intact, since certain queries can benefit by its all-encompassing content.
In practice, start with a Star Schema and create the “snowflakes” with queries. This eliminates the need to create separate extracts for each table, and referential integrity is inherited from the dimension table.
Advantage: Best performance when queries involve aggregation
Disadvantage: Complicated maintenance and metadata, explosion in the number of tables in the database
STORE KEY
Store Dimension
Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.
District_ IDDistrict Desc.Region_ ID
Region_ ID
Region Desc.Regional Mgr.
STORE KEYPRODUCT KEYPERIOD KEY
DollarsUnitsPrice
Store Fact Table
DollarsUnitsPrice
District Fact Table
District_IDPRODUCT_KEYPERIOD_KEY Dollars
UnitsPrice
RegionFact Table
Region_IDPRODUCT_KEYPERIOD_KEY
![Page 45: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/45.jpg)
Advantages of ROLAP Dimensional ModelingAdvantages of ROLAP Dimensional Modeling
Define complex, multi-dimensional data with simple model
Reduces the number of joins a query has to process Allows the data warehouse to evolve with rel. low
maintenance HOWEVER! Star schema and relational DBMS are not
the magic solution Query optimization is still problematic
![Page 46: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/46.jpg)
Aggregates
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
81
![Page 47: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/47.jpg)
Aggregates
Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans date sum1 812 48
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
![Page 48: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/48.jpg)
Another Example
Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId
sale prodId date amtp1 1 62p2 1 19p1 2 48
drill-down
rollup
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
![Page 49: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/49.jpg)
Aggregates
Operators: sum, count, max, min, median, ave
“Having” clause Using dimension hierarchy
average by region (within store) maximum by month (within date)
![Page 50: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/50.jpg)
ROLAP vs. MOLAP
ROLAP:Relational On-Line Analytical Processing
MOLAP:Multi-Dimensional On-Line Analytical Processing
![Page 51: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/51.jpg)
The MOLAP Cube
sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8
s1 s2 s3p1 12 50p2 11 8
Fact table view: Multi-dimensional cube:
dimensions = 2
![Page 52: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/52.jpg)
3-D Cube
dimensions = 3
Multi-dimensional cube:Fact table view:
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
![Page 53: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/53.jpg)
ExampleExample
Store
Pro
duct
Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NYSF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:Time, Product, Store
Attributes:Product (upc, price, …)Store ……
Hierarchies:Product Brand …Day Week QuarterStore Region Country
roll-up to week
roll-up to brand
roll-up to region
![Page 54: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/54.jpg)
Cube Aggregation: Roll-up
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
. . .
drill-down
rollup
Example: computing sums
![Page 55: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/55.jpg)
Cube Operators for Roll-up
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
. . .
sale(s1,*,*)
sale(*,*,*)sale(s2,p2,*)
![Page 56: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/56.jpg)
s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129
Extended CubeExtended Cube
day 2 s1 s2 s3 *p1 44 4 48p2* 44 4 48s1 s2 s3 *
p1 12 50 62p2 11 8 19* 23 8 50 81
day 1
*
sale(*,p2,*)
![Page 57: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/57.jpg)
Aggregation Using Hierarchies
region A region Bp1 56 54p2 11 8
store
region
country
(store s1 in Region A;stores s2, s3 in Region B)
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
![Page 58: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/58.jpg)
Slicing
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 12 50p2 11 8
TIME = day 1
![Page 59: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/59.jpg)
Productsd1 d2
Store s1 Electronics $5.2Toys $1.9
Clothing $2.3Cosmetics $1.1
Store s2 Electronics $8.9Toys $0.75
Clothing $4.6Cosmetics $1.5
ProductsStore s1 Store s2
Store s1 Electronics $5.2 $8.9Toys $1.9 $0.75
Clothing $2.3 $4.6Cosmetics $1.1 $1.5
Store s2 ElectronicsToys
Clothing
($ millions)d1
Sales($ millions)
Time
Sales
Slicing &Pivoting
![Page 60: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/60.jpg)
Summary of OperationsSummary of Operations
Aggregation (roll-up) aggregate (summarize) data to the next higher dimension element e.g., total sales by city, year total sales by region, year
Navigation to detailed data (drill-down) Selection (slice) defines a subcube
e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’ Calculation and ranking
e.g., top 3% of cities by average income Visualization operations (e.g., Pivot) Time functions
e.g., time average
![Page 61: Data Models for Warehouse Session-12/13 Data Management for Decision Support](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56649eb35503460f94bba6ef/html5/thumbnails/61.jpg)
Query & Analysis Tools Query Building Report Writers (comparisons, growth, graphs,…)
Spreadsheet Systems Web Interfaces Data Mining