dynamat a dynamic view management system for data warehouses
DESCRIPTION
DynaMat A Dynamic View Management System for Data Warehouses. Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung. Outline. Introduction Background DynaMat Experiments Conclusions References. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
1
DynaMatA Dynamic View Management System
for Data Warehouses
Vicky :: Cao Hui PingSherman :: Chow Sze Ming
CTH :: Chong Tsz HoRonald :: Woo Lok Yan
Ken :: Yiu Man Lung
2
Outline
Introduction Background DynaMat Experiments Conclusions References
3
Introduction On-Line Analytical Processing (OLAP)
Why OLAP? A dominant factor for Support Decision Application
Ad-hoc data-intensive queries Costly multi-joins and aggregations
Materialized View Why materialize view?
Data amount in data warehouses is very big OLAP query is very complex and costly OLAP query result maybe summary data
Represent a set of redundant entities in a data warehouse that are used to accelerate OLAP.
4
Introduction(cont.) Basic rule to materialize view
Given some space restriction, select some suitable views to materialize.
Data warehouse
Materialized View
Query
Not all data redundant ?How many?
Which?
5
Background
Research topics on materialized viewStore summary data as materialized viewEfficiently compute and update views
Static selection of viewsPre-determine which view should be
materialized and materialize them before the queries come
Static!
6
Background(cont.) Limitations of Static Selection of Views
Many queries can’t be answered by the materialized data since query patterns change
Update is costly as data is changing overtimeAdministrator:
Monitor query patterns Re-calibrate such views by rerunning the query
Automated view selectionDynamic View Management: DynaMat
workload heavy!!!
7
DynaMat
Charactmaeristics: Dynamically materializes information at different
granularity View Selection + View maintenance in a single
framework System overview View pool organization Directory index Query execution Pool maintenance
8
System Overview
Components Two phrases
On-line QueryOff-line Update
Store materialized data
Support sub-linear search in V
Whether the materialized data can be used to answer query?
Off-line update
Maintain View Pool
1
2
3.2
4.2
3.1
4.1S
9
View Pool Organization Multi-Range
query(MRQ) Hyper-plane: n-vector
n: number of group by attributes
Ri: full range of the domain; single value; empty range
Select product, year, sum(sales)
From F
Where product=‘p1’
Group by product, year
},...,2,1{ RnRRq
)}2000,1995(,1{pq
F (product, country, year, sales)
Product(p1, p35)
Country (c1, c30)
Year (1995,2000)
10
View Pool Organization(cont.) MRF(Multidimensional Range Fragments)
Each fragment can also be represented by a hyper-plane Basic logical unit in the pool
Many fragments in the View Pool
Product year Country Sales
P1 1997 C1 30
P1 1997 C2 50
P1 1999 C1 40
P1 1999 C3 60
P2 1997 C1 40
P2 1998 C2 50
P2 1998 C3 30
FProduct year Country Sales
P1 1997 All 80
P1 1999 All 100
MRF
f
11
Directory Index Facilitate the search in view pool Directory index is a R-tree based on fragment’s hyper-planes. Each fragment corresponds to one entity in directory index
YearP1
1995 2000
Product
P15
P10
1997Directory Index
}1997),35,1{(1 ppq
)}2000,1995(),15,10{(2 ppq
12
Query Execution Query Step:
From MR query, get its hyper-plane Query the view pool based on the directory index
Year)}2000,1995(,13{3 pq
P11995 2000
Product
P15
P10
1997Directory Index
f2
f3
)}2000,1995(),15,10{(2 ppq
}1997),35,1{(1 ppq
13
Query Execution(cont.) Query cases:
One fragment f matches the query exactly Retrieve f and return it back to the user
No exact match, but many fragments can be used to answer the query
Choose the best fragment to answer the query
The query can not be answered by the view pool Perform the query directly on the DW
Query results ACE in the later two cases
14
Pool Maintenance Admission Control Entity(ACE) Two cases to maintenance
New query results come Data in base relation changes
Space Bound &Time Bound Space bound: View pool hits the pre-defined space
window Wspace replace Time bound: the system restrict the time window Wtime to
refresh the fragments. Goodness measure to determine whether a
fragment is good enough.
15
Pool Maintenance(cont.) Pool maintenance during queries
New query results can be stored in the view pool if it has enough space
Call replace algorithm if it hits the space constraint. If goodness(new result) >goodness(fvictim), Evict fvictim,
This process doesn’t stop until there is enough space for the new query result.
Maintenance of the father pointers
evicted
fvictim
fnew:
new query result
Goodness(fvictim)< goodness(fnew)
f1
f2
16
Pool Maintenance(cont.) Pool maintenance during updates
Condition:data in base relation changes Step:
For each fragment compute minimum update cost UC(f) Get all necessary deltas, which make change to the DW Get from the directory index Calculate dV and update each f by querying dV
Total update cost: Evict fragments from the view pool according to the non-ascending order of
their cost, if the UC(V) is greater than the time bound
Product year Country Sales
P30 1999 C1 30
P1 2000 C2 50
P4 1999 C1 40
P1 1999 C6 60
Product year Country Sales
P30 1999 C1 30
P4 1999 C1 40
P1 2000 C2 50
P1 1999 C6 60
dV
dV
={(p1,p35)},(1995,2000),(c1,C10)}dV
VffUCVUC )()(
Delta
17
Pool Maintenance(cont.)
YearP1
1995 2000
Product
P15
P10
1997
)}2000,1995(,20{1 pq
)}2000,1995(),15,10{(2 ppq
Product year Country Sales
P30 1999 C1 30
P1 2000 C2 50
P4 1999 C1 40
P1 1999 C6 60
Product year Country Sales
P4 1999 C1 40
P1 2000 C2 50
P1 1999 C6 60
dV
={(p1,p20)},(1995,2000),(c1,C10)}dVDelt
a
18
Experiments
Measure: Detailed Cost Savings Ratio
Ci: Cost of answering queries in DW Si: Saving cost when answering queries in view pool The greater the DCSR, the better the performance
19
Experiments(cont.) Comparison with the optimal static view
selection 1 Fact table: 6 dims, 20 million records updates: 40 sets * 100 thousand records Time constraint: 2% of the full Data Cube Queries: 40 sets*500 MR Queries.
20
Conclusion
DynaMat: A view management systemDynamically materializes results from
incoming queriesExploits them to future useConsidering time and space constraintBetter performance than static methods
21
Reference
Y. Kotidis, N. Roussopoulos. DynaMat: A Dynamic View Management System for Data Warehouses. In Proceedings of ACM SIGMOD International Conference on Management of Data, 371-382, Philadelphia, Pennsylvania, June 1999.
Y. Kotidis, N. Roussopoulos. A Case for Dynamic View Management. ACM Transactions on Database Systems, Volume 26(4), 388-423, 2001.
Original presentation by the author, http://www.cs.umd.edu/~kotidis/Publications/Sigmod99
22
Thanks!
Q&A?