excerpt from presentation by s. geffner dynamic data cubes ...olap (online analytical processing):...
TRANSCRIPT
1USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
OLAP (Online Analytical Processing):Dynamic Data Cubes
Excerpt from Presentation by S. Geffner
2USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Content
� Introduction to Multidimensional Databases (from A.R. 20 and 21)
� Focus Application: OLAP� Prefix-Sum Data Cube (from A.R. 16)� Dynamic Data Cube (from A.R. 17)� Iterative Data Cube (from A.R. 18)� Wavelet-based approaches
• Compact Data Cube (from A.R. 19)• ProPolyne (from A.R. 22 and 23)
3USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
The Dynamic Data Cube
S. Geffner, D. Agrawal, and A. EL Abbadi
Dynamic Data Cube
4USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Problem Description� Dynamic Data Cube� Improving Update� Conclusion
Outline
5USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Problem Description: with original array A
191322427
3
2
1
8
5
2
3
7
39172546
58163325
74331244
25351233
43332422
17862371
64221530
6543210Index
Size= N2
6USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Complexity:� Arbitrary range queries : O(nd)� Update: O(1)
where:d: # of dimensionn: size in each dimension
Problem Description: with original array A
7USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
2302051681271036955277
206
172
142
117
88
62
26
7
182154115936149256
15112695755040215
12310380611535194
998667513529153
786753402924122
575093292118101
231713119830
6543210Index
191322427
3
2
1
8
5
2
3
7
39172546
58163325
74331244
25351233
43332422
17862371
64221530
6543210Index
Precomputearray P
Original array A
Problem Description: with prefix-sum array P
8USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Query with Prefix-sum?
2302051681271036955277
206
172
142
117
88
62
26
7
182154115936149256
15112695755040215
12310380611535194
998667513529153
786753402924122
575093292118101
231713119830
6543210Index
Size= N2
9USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Update with Prefix-sum?
2302051681271036955277
206
172
142
117
88
62
26
7
182154115936149256
15112695755040215
12310380611535194
998667513529153
7867534029*24122
575093292118101
231713119830
6543210Index
Size= N2
10USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Complexity:� Arbitrary range queries : O(1)� Update in worst case: O(nd)
where:d: # of dimensionn: size in each dimension
Problem Description: with prefix-sum array P
11USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Same as for Prefix-sum, DDC can also be applied to obtain count (special case of SUM) and average (SUM/COUNT).
� Generalization: applicable to any binary operator “+”which has an inverse operator “–” such that a+b-b=a
� We focus on SUM
Solution: Dynamic Data Cube
12USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Overlay Box
� Step1: A set of disjoint rectangles of equal size that completely partition cells of array A
� Step2: Each box stores exactly (kd-(k-1)d) values, where
k: The length of the overlay box in each dimensiond: # of dimension
Basic Data Structure: Overlay Box
13USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
191322427
3
2
1
8
5
2
3
7
39172546
58163325
74331244
25351233
43332422
17862371
64221530
6543210Index
Overlay Box: Step 1Original array A
14USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
n=8, k=n/2=4, # of values in the overlay box=(kd-(k-1)d)
6154308523426127
47
31
15
66
48
33
15
7
426
245
104
483516513529153
402
291
110
6543210Index
k
kk
k
Overlay Box: Step 2
15USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Dynamic Data Cube
� A tree structure which recursively partitions original array A into overlay boxes
� Each overlay box will contain information regarding relative sums of the corresponding regions of A
� Organize overlay boxes into a tree to recursively partition array A into non-overlapping regions
� Each node forms children by dividing its range in each dimension in half
16USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
How to Construct DDC?
7
7
6
5
4
3
2
1
0
6543210Index
17USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
7
7
6
5
4
3
2
1
0
6543210Index
How to Construct DDC? Level 2
n=8, k=n/2, # of values in the overlay box=(kd-(k-1)d)
18USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
6154308523426127
47
31
15
66
48
33
15
7
426
245
104
483516513529153
402
291
110
6543210Index
n=8, k=n/2, # of values in the overlay box=(kd-(k-1)d)
How to Construct DDC? Level 2
19USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
513529153
402
291
110
3210Index
k=n/4=2113115
56
1131810
38k=n/2=4
# of values in the overlay box=(kd-(k-1)d) =3
How to Construct DDC? Level 1
20USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
1131153
562
11318101
380
3210Index
k=n/8=1
k=n/4=2
37
53
37
53
51
32
62
21
How to Construct DDC? Level 0
21USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
The DDC Hierarchy
Level 2
(Root node)
113115
56
1131810
38
1612144
610
1512164
87
134156
99
134116
46
196146
96
1272110
96
Level 1
6154308523426127
47
31
15
66
48
33
15
7
426
245
104
483516513529153
402
291
110
6543210
37
53
42
54
32
24
23
42
21
36
78
42
62
21
51
32
19
33
22
72
63
31
81
43
25
17
82
54
53
33
13
91
Level 0
(Leaf node)
22USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
6154308523426127
47
31
15
66
48
33
15
7
426
*245
104
483516513529153
402
291
110
6543210Index
How to Query DDC?
23USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
6154308523426127
47
31
15
66
48
33
15
7
426
*245
104
483516513529153
402
291
110
6543210Index
How to Query DDC? (cont’d)
24USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
6154308523426127
47
31
15
66
48
33
15
7
426
*245
104
483516513529153
402
291
110
6543210
Level 2
(Root node)
113115
56
1131810
38
1612144
610
15*12164
87
134156
99
134116
46
196146
96
1272110
96
Level 1
51+48+24+16+12=151
How to Query DDC? (cont’d)
25USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Level 2
(Root node)
113115
56
1131810
38
1612144
610
1512164
87
134156
99
134116
46
196146
96
1272110
96
Level 1
6154308523426127
47
31
15
66
48
33
15
7
426
245
104
483516513529153
402
291
110
6543210
37
53
42
54
32
24
23
42
21
36
78
42
62
21
51
32
19
33
22
72
63
31
81
43
25
17
82
54
53
33
13
91
Level 0
(Leaf node)
6
How to Update DDC?
26USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Level 2
(Root node)
113115
56
1131810
38
1612144
610
1512*164
87
134156
99
134116
46
196146
96
1272110
96
Level 1
6154308523426127
47
31
15
66
48
33
15
7
426
*245
104
483516513529153
402
291
110
6543210
37
53
42
54
32
24
23
42
21
36
78
42
62
21
51
32
19
33
22
72
63
31
81
43
2*5
17
82
54
53
33
13
91
Level 0
(Leaf node)
6
How to Update DDC? (cont’d)
27USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Level 2
(Root node)
113115
56
1131810
38
1612144
610
1613164
87
134156
99
134116
46
196146
96
1272110
96
Level 1
6255308523426127
48
32
15
66
48
33
15
7
426
245
104
483516513529153
402
291
110
6543210
37
53
42
54
32
24
23
42
21
36
78
42
62
21
51
32
19
33
22
72
63
31
81
43
26
17
82
54
53
33
13
91
Level 0
(Leaf node)
How to Update DDC? (cont’d)
28USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Query cost for arbitrary range queries : O(log n)
� Update cost in worst case should be evaluated considering:1. O(log n) hierarchy traversal cost
• Whichever cell you update, only one overlay box is updated at each tree level
2. The cost of updating the values in the relevant overlay box at each level
The 2nd cost might be high depending on the placement of the updated cell. Hence, the total cost may become as bad as O(n)
Query and Update Complexity with DDC
29USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� The high update cost of the overlay boxes is a consequence of dependencies between successive row sum values
Improving Update: Problem
X8X7X6X5X4X3X2X1
30USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Storing row sum values in an array
Store row sum values separately in
Cumulative B Tree (BC Tree)
Improving Update: Solution
31USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� Each leaf of the BC tree corresponds to one row-sum cell
� Interior nodes of the BC tree maintain subtree sums (STS)
� For each node entry, the STS stores the sum of the subtree from the left branch associated with leaf value
How to Construct BC Tree?
32USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
665345332314
Key 4STS:33
Key 3STS:9
Key 2STS:14
Key 6STS:8
Key 5STS:12
Leaf 1Value 14
Leaf 6Value 13
Leaf 5Value 8
Leaf 4Value 12
Leaf 3Value 10
Leaf 2Value 9
Overlay box
How to Construct BC Tree?
23-14 33-23
33USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Query with BC Tree?
665345332314
Key 3STS:9
Key 2STS:14
Leaf 1Value 14
Leaf 6Value 13
Leaf 4Value 12
Leaf 3Value 10
Leaf 2Value 9
� Traverse the tree using the cell’s index as the key� If descending to a node’s right branch, we sum each
preceding STS in the node with the key less than or equal to the query key
654321
Key 4STS:33
5
5
Key 6STS:8
Key 5STS:12
Leaf 5Value 8
Key 6 is not added because it does not precede key 5
34USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Update with BC Tree?
665345332314
Key 3STS:9
Key 2STS:14
Leaf 1Value 14
Leaf 6Value 13
Leaf 4Value 12
Leaf 3Value 10
Leaf 2Value 9
� Reflect the change using a bottom-up method
654321
Leaf 5Value 8
Key 6STS:8
Key 5STS:12
Key 4STS:33 38
15
38 50 58 71
Key 3 is not updatedbecause leaf 3 is not in its left subtree
35USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
� A two dimensional overlay box has two groups of row sum values, each of which is one dimensional
� In general, an overlay box of d dimensions has dgroups of row sum values and each group is (d-1) dimensional
� Hence, both query and update complexity is O(logdn)
Query and Update Complexity with Improved DDC
36USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008
Conclusion
O(logdn)O(logdn)DDC
O(nd)O(1)Prefix sum
O(1)O(nd)Naïve approach
UpdateQueryMethod