the optimal-location query donghui zhang northeastern university coauthors: yang du, tian xia
TRANSCRIPT
The Optimal-Location Query
Donghui ZhangNortheastern University
Coauthors: Yang Du, Tian Xia
Motivation
• “What is the optimal location in Boston area to build a new McDonald’s store?”
• Optimality: maximize the number of customers who think the new store is closer to them.
Formal Definition
• Given a set S of sites, a set O of weighted objects, and a query range Q ,
• Find a location l Q which maximizes
oO o.weight s.t. sS, d(o, l) d(o,s).
• We consider the L1 distance:
|x1 - x2|+|y1 - y2|
Formal Definition
• Given a set S of sites, a set O of weighted objects, and a query range Q ,
• Find a location l Q which maximizes
oO o.weight s.t. sS, d(o, l) d(o,s).
• We consider the L1 distance:
|x1 - x2|+|y1 - y2|
Example
o :3 2
o :4 1 o :5 3
o :6 4
Q
1ss2
Example
l1
1
s2
o :3 2
o :4 1
o :6 4
1210
s
Q
19
22 o :5 3
The “Influence” of l1 is 5+6=11.
Example
l1
1
s2
o :3 2
o :4 1
o :6 4
1218
s
Q19
22 o :5 3
The “Influence” of l1 is 5+6=11.
l2
The Influence of l2 is 5.
Content
• Problem Definition
• Straightforward Solution
• Problem Transformation
• The R-tree-based solution
• The OL-tree
• The VOL-tree
• Performance
Using the RNN Algorithm…
l1
1
s2
o :3 2
o :4 1
o :6 4
1210
s
19
22 o :5 3
The RNNs of l1 are O3 and O4.
Straightforward Solution
1
s2
o :3 2
o :4 1
o :6 4
s
o :5 3
Compute the influence for every location in Q.
Problematic: infinite number of candidates!.
Content
• Problem Definition
• Straightforward Solution
• Problem Transformation
• The R-tree-based Solution
• The OL-tree
• The VOL-tree
• Performance
nn_buffer of an Object
• Any location within the nn_buffer is a closer site if built.
• nn_buffer is a diamond.
O1:4
O2:3
O3:5 O4:6
S1S2
nn_buffer of O4.
Problem Transformation
• Find a location with maximum overlap among objects’ nn_buffer.
O1:4
O2:3
O3:5 O4:6
S1S2
Q Any location here is an optimal location!
The Rotated Coodinate
• Rotate the coordinate 45°.
• All nn_buffers become axis-parallel squares.• Focus on the rotated coordinate.
45o
oX'
X
Y
Y'
x
yx'
y'
Content
• Problem Definition
• Straightforward Solution
• Problem Transformation
• The R-tree-based Solution
• The OL-tree
• The VOL-tree
• Performance
The R-tree-based Solution
• Store the objects in an R-tree.• Retrieve the objects whose nn_buffers
intersect Q.• Plane sweep to find a region which has
maximum overlap.
Two Contributions
1. Object retrieval:– Store point objects,– but retrieve nn_buffers in increasing order of
lower X.
2. Plane sweep:– Straightforwardly: O(n2).– Our method: O(n log n).
Best-first Retrieval• Keep a heap of index entries + objects.
• Sorted in increasing order of nn_buffer’s lower X.
• While heap is not empty, pop an entry.
• If pop an object, send it to plane sweep.• If pop an index entry, push its children
(intersecting Q).
t t
Naïve Plane Sweep
X
Y
O1:4O2:3
O3:52
5
89
12
4
O4:6
-∞ 2 5 8 9 12 +∞0 5 12 7 3 0
Not Efficient! O(n2)
-∞ 2 5 8 9 12 +∞0 5 12 7 3 0
Suppose next insertion: add 2 to the Y-range [2,11].
+2
-∞ 2 5 8 9 12 +∞0 7 14 9 3 0
115
The aSB-tree
-∞ 2 5 8 9 12 +∞0 5 12 7 3 0
-∞ 5 9 +∞0 0 0
Extended from the SB-tree [YW01]:• keeps max overlap information at index entries.• handle a query range Q.
-∞ 2 5 8 9 12 +∞0 5 12 7 3 0
Suppose next insertion: add 2 to the Y range [2,11].
+2
-∞ 5 9 +∞0 0 0
The aSB-tree
-∞ 2 5 8 9 12 +∞0 5 12 7 3 0
Suppose next insertion: add 2 to the Y range [2,11].
-∞ 5 9 +∞0 2 0
+2 +2
The aSB-tree
-∞ 2 5 8 9 12 +∞0 7 12 7 3 0
Suppose next insertion: add 2 to the Y range [2,11].
-∞ 5 9 +∞0 2 0
511
The aSB-tree
Content
• Problem Definition
• Straightforward Solution
• Problem Transformation
• The R-tree-based Solution
• The OL-tree
• The VOL-tree
• Performance
The OL-tree
• Idea: partition the space, and keep max overlapped region for each partition!
• Like a k-d-B-tree.
• An nn_buffer may have multiple copies.
• Stores nn_buffers. 1
2
3
4
1: add to fullcover.2,3,4: recursively insert.
• Index entry has, besides range:– fullcover: total weight of nn_buffers fully
covering the whole area;– localmax: among the nn_buffers inserted into the
sub-tree, max overlap.– maxrange: the region where localmax occurred.
• Leaf entry:– A rectangle and its weight.
Stored Information
r1 , 0, 4)(
r2 , 1, 4)(
r 3 , 2, 7)(
r32( , 2, 3) r31, 4, 3)(
r33( , 1, 2)
rroot( , 0, 9)
sub-trees omitted
r1 , 0, 4)(
r2 , 1, 4)(
r 3 , 2, 7)(
r32( , 2, 3) r31, 4, 3)(
r33( , 1, 2)
rroot( , 0, 9)
sub-trees omitted
fullcover: 2 nn_buffers fully cover r3
localmax: Among those inserted,
max overlap is 7
maxrange: where localmax occurred
Query Processing• Start with root, insert index entries into heap.
• Sorting key: upper bound of real max overlap in the sub-tree.– localmax + fullcovers of ancestor entries.– Accurate if Q intersects with maxrange.
r1 , 0, 4)(
r2 , 1, 4)(
r 3 , 2, 7)(
r32( , 2, 3) r31, 4, 3)(
r33( , 1, 2)
rroot( , 0, 9)
sub-trees omitted
localmax
Real max overlap = 0+2+1 +localmax = 5
Query Processing• Start with root, insert index entries into heap.
• Sorting key: upper bound of real max overlap in the sub-tree.– localmax + fullcovers of ancestor entries.– Accurate if Q intersects with maxrange.
• Keep a running value: max overlap M.
• Pruning 1: Q intersects with maxrange.
• Pruning 2: upper bound of max overlap < M.
r1 , 0, 4)(
r2 , 1, 4)(
r 3 , 2, 7)(
r32( , 2, 3) r31, 4, 3)(
r33( , 1, 2)
rroot( , 0, 9)
sub-trees omitted
Q • r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5.
• r1 is pruned since the upper bound of overlap = 4 < M.
r1 , 0, 4)(
r2 , 1, 4)(
r 3 , 2, 7)(
r32( , 2, 3) r31, 4, 3)(
r33( , 1, 2)
rroot( , 0, 9)
sub-trees omitted
Sometimes, we need to examine a leaf node. Plane sweep it!
OL-tree VOL-tree
• OL-tree is not practical – worst-case space complexity O(n2)– complex re-organization
• How to improve?– Only keep a few top levels of the OL-tree.
==> Virtual OL-tree!
VOL-tree
Example
If Q is here, perform range search on the R-tree.
Comparison with R-tree Approach
• The R-tree approach examines all nn_buffers intersecting with Q.
• By using a small, in-memory VOL-tree, the new approach can prune the search space.
Challenge
• With dynamic updates, to keep localmax and maxrange is expensive.
To insert an nn_buffer
here, recompute!
• Index entry(range, fullcover, maxrange, localmax)
lowermax, uppermax
• lowermax ≤ localmax ≤ uppermax
Solution
• Index entry(range, fullcover, maxrange, localmax)
lowermax, uppermax
• lowermax ≤ localmax ≤ uppermax• Any location in maxrange has overlap =
lowermax. • At a location outside maxrange, the overlap
can be more than lowermax, but < uppermax.
Solution
Update
• Case 1: the new nn_buffer does not intersect with maxrange.
• Case 2: intersects.
Case 1: increase
uppermax.
Case 2: increase uppermax and
lowermax.
Query
• Similar to the OL-tree.• To compute upper bound of max
overlap, use uppermax.• When Q intersects maxrange, may or
may not prune.
Content
• Problem Definition
• Straightforward Solution
• Problem Transformation
• The R-tree-based Solution
• The OL-tree
• The VOL-tree
• Performance
Setup
• Digital Chart from the R-tree Portal.– O: 24,493 populated places.
– S: 9,203 cultural landmarks.
• Pagesize: 1KB. Buffersize: 256 pages.• Object R-tree: 753 pages.• Pentium IV Dell PC, 3.2GHz. • Java.• Measure total I/O of 100 random queries.
Size of the VOL-tree
Small Query Area
Large Query Area
Varying Buffer Size
Effect of Update
Conclusions
• Introduced the optimal-location query.• Proposed three solutions.• The VOL-tree approach is the best.• More improvement with larger query area.
(5% query area = 6 times improvement.)• More updates decreases the improvement.
(50% updates = no improvement.) But can bulk-load.