a new point access method based on wavelet trees nieves r. brisaboa, miguel r. luaces, diego seco...
Post on 21-Dec-2015
216 views
TRANSCRIPT
A New Point Access Method based on Wavelet Trees
Nieves R. Brisaboa,
Miguel R. Luaces,
Diego Seco
Database LaboratoryDatabase LaboratoryUniversity of A CoruñaUniversity of A CoruñaA Coruña, SpainA Coruña, Spain
Gonzalo NavarroDepartment of Computer ScienceDepartment of Computer ScienceUniversity of ChileUniversity of ChileSantiago, ChileSantiago, Chile
Gramado - SeCoGIS 2009 2 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 3 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 4 11th November, 2009
Motivation Spatial indexes are a key component in GIS
Large collections of geographic data Geographic operations are very complex
Sequential search is not feasible
Spatial index classification (indexable objects) Point Access Methods (PAMs)
E.g.: K-d-tree family
Spatial Access Methods (SAMs) E.g.: R-tree family
Gramado - SeCoGIS 2009 5 11th November, 2009
Motivation Typical requirements of spatial indexes:
Dynamic operations: inserts, deletes, updates, … Secondary storage management
Space consumption is a less important issue
Nowadays, some of these requirements have changed Static data collections are useful in many domains Memory hierarchy evolution
Reduction of the main memory cost New levels (flash memory)
Our goal is a new point access method Static geographic data collections Main memory: compact Efficiency similar to classical indexes
Gramado - SeCoGIS 2009 6 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 7 11th November, 2009
Compressed Data Structures Same features as classical data structures with
few storage cost Based on two very efficient bit vector operations:
rank and select Rank: returns the number of times bit b appears
in the prefix B1,i
0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 01 2 3 4 5 6 7 8 9 1 0 11 12 13 14 1 5 1 6 1 7 1 8 19 20 2 1
B =
rank1(B,6) = 3
Gramado - SeCoGIS 2009 8 11th November, 2009
Compressed Data Structures Same features of classical data structures with
few storage cost Based on two very efficient bit vector operations:
rank and select Rank: returns the number of times bit b appears
in the prefix B1,i
rank1(B,6) = 3
B = 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 01 2 3 4 5 6 7 8 9 1 0 11 12 13 14 1 5 1 6 1 7 1 8 19 20 2 1
rank0(B,16) = 10
Gramado - SeCoGIS 2009 9 11th November, 2009
Compressed Data Structures Select: returns the position i of the j-th
appearance of bit b in B1,n
0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 01 2 3 4 5 6 7 8 9 1 0 11 12 13 14 1 5 1 6 1 7 1 8 19 20 2 1
B =
select1(B,2) = 5
0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 01 2 3 4 5 6 7 8 9 1 0 11 12 13 14 1 5 1 6 1 7 1 8 19 20 2 1
B =
select0(B,9) = 14
Gramado - SeCoGIS 2009 10 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 11 11th November, 2009
PW-tree Abstraction
N points distributed in a two-dimensional space Construction of an N x N matrix One point for each row i and one for each column j
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 o
2 o
3 o
4 o
5 o
6 o
7 o
8 o
9 o
10 o
11 o
12 o
13 o
14 o
15 o
16 o
Gramado - SeCoGIS 2009 12 11th November, 2009
PW-tree Abstraction
N points distributed in a two-dimensional space Construction of an N x N matrix One point for each row i and one for each column j
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 o2 o3 o4 o5 o6 o7 o8 o9 o
10 o11 o12 o13 o14 o15 o16 o
Column 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row 15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
Column 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row 15 1 4
Column 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row 15 1
Column 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row 15
Column 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Row
Gramado - SeCoGIS 2009 13 11th November, 2009
PW-tree Wavelet tree construction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16 o
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1
0 1
1 4 8 7 3 5 2 6
0 0 1 1 0 1 0 1
15 11 16 12 10 13 14 9
1 0 1 0 0 1 1 0
0 1
1 4 3 2
0 1 1 0
8 7 5 6
1 1 0 0
0 1
1 2
0 1
4 3
1 0
0 1
1 2
0 1
3 4
0 1
5 6
0 1
8 7
1 0
0 1
5 6
0 1
7 8
0 1
11 12 10 9
1 1 0 0
15 16 13 14
1 1 0 0
0 1
10 9
1 0
11 12
0 1
0 1
9 10
0 1
11 12
0 1
13 14
0 1
15 16
0 1
0 1
13 14
0 1
15 16
[1, 16]
[1, 8]
[1, 4]
[1, 2]
[9, 16]
[5, 8]
[3, 4]
[9, 12] [13, 16]
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0 0
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
[1,8] → 0
[9,16] → 1
1 4 8 7 3 5 2 61 41 15 11 16 12 10 13 14 9
Gramado - SeCoGIS 2009 14 11th November, 2009
PW-tree Obtain the row of the point that is in the column 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 o2 o3 o4 o5 o6 o7 o8 o9 o
10 o11 o12 o13 o14 o15 o16 o
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1
0 1
1 4 8 7 3 5 2 6
0 0 1 1 0 1 0 1
15 11 16 12 10 13 14 9
1 0 1 0 0 1 1 0
0 1
1 4 3 2
0 1 1 0
8 7 5 6
1 1 0 0
0 1
1 2
0 1
4 3
1 0
0 1
1 2
0 1
3 4
0 1
5 6
0 1
8 7
1 0
0 1
5 6
0 1
7 8
0 1
11 12 10 9
1 1 0 0
15 16 13 14
1 1 0 0
0 1
10 9
1 0
11 12
0 1
0 1
9 10
0 1
11 12
0 1
13 14
0 1
15 16
0 1
0 1
13 14
0 1
15 16
[1 16]
[1, 8]
[1, 4]
[1, 2]
[9, 16]
[5, 8]
[3, 4]
[9, 12] [13, 16]
rank1(B, 8) = 6
rank0(B’’, 3) = 1
rank0(B’’’, 1) = 1
rank1(B’, 6) = 3
Gramado - SeCoGIS 2009 15 11th November, 2009
PW-tree Obtain the column of the point that is in the row 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 o2 o3 o4 o5 o6 o7 o8 o9 o
10 o11 o12 o13 o14 o15 o16 o
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1
0 1
1 4 8 7 3 5 2 6
0 0 1 1 0 1 0 1
15 11 16 12 10 13 14 9
1 0 1 0 0 1 1 0
0 1
1 4 3 2
0 1 1 0
8 7 5 6
1 1 0 0
0 1
1 2
0 1
4 3
1 0
0 1
1 2
0 1
3 4
0 1
5 6
0 1
8 7
1 0
0 1
5 6
0 1
7 8
0 1
11 12 10 9
1 1 0 0
15 16 13 14
1 1 0 0
0 1
10 9
1 0
11 12
0 1
0 1
9 10
0 1
11 12
0 1
13 14
0 1
15 16
0 1
0 1
13 14
0 1
15 16
[1 16]
[1, 8]
[1, 4]
[1, 2]
[9, 16]
[5, 8]
[3, 4]
[9, 12] [13, 16]
select1(B’’’, 1) = 2
selecto(B’’, 2) = 4
select1(B’, 4) = 8
select0(B, 8) = 15
Gramado - SeCoGIS 2009 16 11th November, 2009
PW-tree Solve the range query q:{r[12,16], c[6,10]}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 o2 o3 o4 o5 o6 o7 o8 o9 o
10 o11 o12 o13 o14 o15 o16 o
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
15 1 4 11 16 12 10 13 8 7 3 5 2 14 6 9
1 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1
0 1
1 2 3 4 5 6 7 8
1 4 8 7 3 5 2 6
0 0 1 1 0 1 0 1
1 2 3 4 5 6 7 8
15 11 16 12 10 13 14 9
1 0 1 0 0 1 1 0
0 1
1 2 3 4
1 4 3 2
0 1 1 0
1 2 3 4
8 7 5 6
1 1 0 0
0 1
1 2
1 2
0 1
1 2
4 3
1 0
0 1
1 2
0 1
3 4
0 1
1 2
5 6
0 1
1 2
8 7
1 0
0 1
5 6
0 1
7 8
0 1
1 2 3 4
11 12 10 9
1 1 0 0
1 2 3 4
15 16 13 14
1 1 0 0
0 1
1 2
10 9
1 0
1 2
11 12
0 1
0 1
9 10
0 1
11 12
0 1
1 2
13 14
0 1
1 2
15 16
0 1
0 1
13 14
0 1
15 16
[1, 16]
[1, 8]
[1, 4]
[1, 2]
[9, 16]
[5, 8]
[3, 4]
[9, 12] [13, 16]
q (13, 8)
(12, 6)
rank1(B, 6-1)+1 = 4rank1(B, 10) = 6
rank1(B’, 4-1)+1 = 3rank1(B’, 6) = 3
rank0(B’’, 3) = 1
rank0(B’’’, 1) = 1
rank0(B’, 4-1)+1 = 2rank0(B’, 6) = 3
[9, 10] ¢ [12, 16]
[1, 8] ¢ [12, 16]
[9, 10]
Gramado - SeCoGIS 2009 17 11th November, 2009
PW-tree Solve the range query q:{r[12,16], c[6,10]}
Point identifiers must be returned Ordered array to store the relation between rows (or
columns) and identifiers Wavelet tree solutions are used to access this
ordered array to obtain the identifiers
Columna 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Id 65 45 43 34 78 86 98 10 44 12 14 24 28 99 84 20
Wavelet tree solution: (12, 6) y (13, 8)
Gramado - SeCoGIS 2009 18 11th November, 2009
PW-tree Two variants of this structure:
DPW-tree Point identifiers are stored in the same order of the
tree leaves The algorithm always needs to reach these leaves
UPW-tree Point identifiers are stored in the same order of the
root node The first tree traversal can be stopped without
reaching the leaves A second ascending traversal is necessary
Gramado - SeCoGIS 2009 19 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 20 11th November, 2009
Experiments (space)
Structure TotalBytes
per point
PW-tree 20N +(N lg N x 1,375)/8 23,69
R-tree 20N + 36N/(M-1) 21,24
K-d-tree 20N + 16(2h-1+(N mod 2└lg N┘)) 36,00Notes:
• R-tree: M = 30 (best experimental performance)
• K-d-tree: h = ┌lg N┐
Gramado - SeCoGIS 2009 21 11th November, 2009
Results (time) Uniform distribution
2 4 6 8 10 12 14 16
x 106
10-5
10-4
10-3
10-2
Selectivity 0.01%
Tim
e (m
s.)
Number of points
UPW-tree
DPW-tree
R*-treeSTR R-tree
K-d-tree
2 4 6 8 10 12 14 16
x 106
10-4
10-3
10-2
10-1
Selectivity 0.1%
Tim
e (m
s.)
Number of points
UPW-tree
DPW-tree
R*-treeSTR R-tree
K-d-tree
2 4 6 8 10 12 14 16
x 106
10-3
10-2
10-1
100
Selectivity 1%
Tim
e (m
s.)
Number of points
UPW-tree
DPW-tree
R*-treeSTR R-tree
K-d-tree
0 5 10 15
x 106
10-3
10-2
10-1
100
101
Selectivity 10%
Tim
e (m
s.)
Number of points
UPW-tree
DPW-tree
R*-treeSTR R-tree
K-d-tree
Gramado - SeCoGIS 2009 22 11th November, 2009
Results (time) Zipf distribution
Zipf distribution
0,0000000,0000200,0000400,0000600,0000800,0001000,0001200,0001400,0001600,000180
0.001% 0.01% 0.1% 1% 10%
Selectivity
Tim
e (
ms
.) UPW-tree
DPW-tree
R*-tree
STR R-tree
K-d-tree
0,000190
0,002190
0,004190
0,006190
0,008190
0,010190
0,012190
0,014190
Gramado - SeCoGIS 2009 23 11th November, 2009
Results (time) Gauss distribution
Gauss distribution
0,0000000,0000200,0000400,0000600,0000800,0001000,0001200,0001400,0001600,000180
0.001% 0.01% 0.1% 1% 10%
Selectivity
Tim
e (
ms
.) UPW-tree
DPW-tree
R*-tree
STR R-tree
K-d-tree
0,000019
0,050019
0,100019
0,150019
Gramado - SeCoGIS 2009 24 11th November, 2009
Results (time) North East dataset (123,593 postal addresses)
NE dataset
0,000000
0,000010
0,000020
0,000030
0,000040
0,000050
0,000060
0,000070
0,000080
0.001% 0.01% 0.1% 1% 10%
Selectivity
Tim
e (
ms
.) UPW-tree
DPW-tree
R*-tree
STR R-tree
K-d-tree
0,000090
0,002090
0,004090
0,006090
0,008090
0,010090
0,012090
0,014090
Gramado - SeCoGIS 2009 25 11th November, 2009
Results (time) Geonames gazetteer (2,693,569 populated places)
Geonames
0,000000
0,000100
0,000200
0,000300
0,000400
0,000500
0,000600
0.001% 0.01% 0.1% 1% 10%
Selectivity
Tim
e (
ms
.) UPW-tree
DPW-tree
R*-tree
STR R-tree
K-d-tree
0,000700
0,100700
0,200700
0,300700
0,400700
0,500700
0,600700
Gramado - SeCoGIS 2009 26 11th November, 2009
Outline
Motivation Compressed Data Structures PW-Tree Experiments Conclusions and Future Work
Gramado - SeCoGIS 2009 27 11th November, 2009
Conclusions and Future Work Conclusions:
A new PAM based on compressed data structures (wavelet tree, rank, select)
Two variants (DPW-tree, UPW-tree) Good experimental performance
Future Work: Algorithms to solve other queries (k-NN, spatial join) Support for dynamic operations New spatial compressed data structures:
Spatial access methods based on wavelet trees Balanced representation of a K-d-tree