the optimal-location query donghui zhang northeastern university coauthors: yang du, tian xia

51
The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Upload: carlos-sinclair

Post on 26-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

The Optimal-Location Query

Donghui ZhangNortheastern University

Coauthors: Yang Du, Tian Xia

Page 2: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Motivation

• “What is the optimal location in Boston area to build a new McDonald’s store?”

• Optimality: maximize the number of customers who think the new store is closer to them.

Page 3: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Formal Definition

• Given a set S of sites, a set O of weighted objects, and a query range Q ,

• Find a location l Q which maximizes

oO o.weight s.t. sS, d(o, l) d(o,s).

• We consider the L1 distance:

|x1 - x2|+|y1 - y2|

Page 4: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Formal Definition

• Given a set S of sites, a set O of weighted objects, and a query range Q ,

• Find a location l Q which maximizes

oO o.weight s.t. sS, d(o, l) d(o,s).

• We consider the L1 distance:

|x1 - x2|+|y1 - y2|

Page 5: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Example

o :3 2

o :4 1 o :5 3

o :6 4

Q

1ss2

Page 6: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Example

l1

1

s2

o :3 2

o :4 1

o :6 4

1210

s

Q

19

22 o :5 3

The “Influence” of l1 is 5+6=11.

Page 7: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Example

l1

1

s2

o :3 2

o :4 1

o :6 4

1218

s

Q19

22 o :5 3

The “Influence” of l1 is 5+6=11.

l2

The Influence of l2 is 5.

Page 8: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Content

• Problem Definition

• Straightforward Solution

• Problem Transformation

• The R-tree-based solution

• The OL-tree

• The VOL-tree

• Performance

Page 9: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Using the RNN Algorithm…

l1

1

s2

o :3 2

o :4 1

o :6 4

1210

s

19

22 o :5 3

The RNNs of l1 are O3 and O4.

Page 10: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Straightforward Solution

1

s2

o :3 2

o :4 1

o :6 4

s

o :5 3

Compute the influence for every location in Q.

Problematic: infinite number of candidates!.

Page 11: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Content

• Problem Definition

• Straightforward Solution

• Problem Transformation

• The R-tree-based Solution

• The OL-tree

• The VOL-tree

• Performance

Page 12: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

nn_buffer of an Object

• Any location within the nn_buffer is a closer site if built.

• nn_buffer is a diamond.

O1:4

O2:3

O3:5 O4:6

S1S2

nn_buffer of O4.

Page 13: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Problem Transformation

• Find a location with maximum overlap among objects’ nn_buffer.

O1:4

O2:3

O3:5 O4:6

S1S2

Q Any location here is an optimal location!

Page 14: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

The Rotated Coodinate

• Rotate the coordinate 45°.

• All nn_buffers become axis-parallel squares.• Focus on the rotated coordinate.

45o

oX'

X

Y

Y'

x

yx'

y'

Page 15: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Content

• Problem Definition

• Straightforward Solution

• Problem Transformation

• The R-tree-based Solution

• The OL-tree

• The VOL-tree

• Performance

Page 16: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

The R-tree-based Solution

• Store the objects in an R-tree.• Retrieve the objects whose nn_buffers

intersect Q.• Plane sweep to find a region which has

maximum overlap.

Page 17: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Two Contributions

1. Object retrieval:– Store point objects,– but retrieve nn_buffers in increasing order of

lower X.

2. Plane sweep:– Straightforwardly: O(n2).– Our method: O(n log n).

Page 18: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Best-first Retrieval• Keep a heap of index entries + objects.

• Sorted in increasing order of nn_buffer’s lower X.

• While heap is not empty, pop an entry.

• If pop an object, send it to plane sweep.• If pop an index entry, push its children

(intersecting Q).

t t

Page 19: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Naïve Plane Sweep

X

Y

O1:4O2:3

O3:52

5

89

12

4

O4:6

-∞ 2 5 8 9 12 +∞0 5 12 7 3 0

Page 20: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Not Efficient! O(n2)

-∞ 2 5 8 9 12 +∞0 5 12 7 3 0

Suppose next insertion: add 2 to the Y-range [2,11].

+2

-∞ 2 5 8 9 12 +∞0 7 14 9 3 0

115

Page 21: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

The aSB-tree

-∞ 2 5 8 9 12 +∞0 5 12 7 3 0

-∞ 5 9 +∞0 0 0

Extended from the SB-tree [YW01]:• keeps max overlap information at index entries.• handle a query range Q.

Page 22: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

-∞ 2 5 8 9 12 +∞0 5 12 7 3 0

Suppose next insertion: add 2 to the Y range [2,11].

+2

-∞ 5 9 +∞0 0 0

The aSB-tree

Page 23: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

-∞ 2 5 8 9 12 +∞0 5 12 7 3 0

Suppose next insertion: add 2 to the Y range [2,11].

-∞ 5 9 +∞0 2 0

+2 +2

The aSB-tree

Page 24: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

-∞ 2 5 8 9 12 +∞0 7 12 7 3 0

Suppose next insertion: add 2 to the Y range [2,11].

-∞ 5 9 +∞0 2 0

511

The aSB-tree

Page 25: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Content

• Problem Definition

• Straightforward Solution

• Problem Transformation

• The R-tree-based Solution

• The OL-tree

• The VOL-tree

• Performance

Page 26: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

The OL-tree

• Idea: partition the space, and keep max overlapped region for each partition!

• Like a k-d-B-tree.

• An nn_buffer may have multiple copies.

• Stores nn_buffers. 1

2

3

4

1: add to fullcover.2,3,4: recursively insert.

Page 27: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

• Index entry has, besides range:– fullcover: total weight of nn_buffers fully

covering the whole area;– localmax: among the nn_buffers inserted into the

sub-tree, max overlap.– maxrange: the region where localmax occurred.

• Leaf entry:– A rectangle and its weight.

Stored Information

Page 28: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

r1 , 0, 4)(

r2 , 1, 4)(

r 3 , 2, 7)(

r32( , 2, 3) r31, 4, 3)(

r33( , 1, 2)

rroot( , 0, 9)

sub-trees omitted

Page 29: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

r1 , 0, 4)(

r2 , 1, 4)(

r 3 , 2, 7)(

r32( , 2, 3) r31, 4, 3)(

r33( , 1, 2)

rroot( , 0, 9)

sub-trees omitted

fullcover: 2 nn_buffers fully cover r3

localmax: Among those inserted,

max overlap is 7

maxrange: where localmax occurred

Page 30: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Query Processing• Start with root, insert index entries into heap.

• Sorting key: upper bound of real max overlap in the sub-tree.– localmax + fullcovers of ancestor entries.– Accurate if Q intersects with maxrange.

Page 31: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

r1 , 0, 4)(

r2 , 1, 4)(

r 3 , 2, 7)(

r32( , 2, 3) r31, 4, 3)(

r33( , 1, 2)

rroot( , 0, 9)

sub-trees omitted

localmax

Real max overlap = 0+2+1 +localmax = 5

Page 32: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Query Processing• Start with root, insert index entries into heap.

• Sorting key: upper bound of real max overlap in the sub-tree.– localmax + fullcovers of ancestor entries.– Accurate if Q intersects with maxrange.

• Keep a running value: max overlap M.

• Pruning 1: Q intersects with maxrange.

• Pruning 2: upper bound of max overlap < M.

Page 33: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

r1 , 0, 4)(

r2 , 1, 4)(

r 3 , 2, 7)(

r32( , 2, 3) r31, 4, 3)(

r33( , 1, 2)

rroot( , 0, 9)

sub-trees omitted

Q • r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5.

• r1 is pruned since the upper bound of overlap = 4 < M.

Page 34: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

r1 , 0, 4)(

r2 , 1, 4)(

r 3 , 2, 7)(

r32( , 2, 3) r31, 4, 3)(

r33( , 1, 2)

rroot( , 0, 9)

sub-trees omitted

Sometimes, we need to examine a leaf node. Plane sweep it!

Page 35: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

OL-tree VOL-tree

• OL-tree is not practical – worst-case space complexity O(n2)– complex re-organization

• How to improve?– Only keep a few top levels of the OL-tree.

==> Virtual OL-tree!

Page 36: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

VOL-tree

Page 37: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Example

If Q is here, perform range search on the R-tree.

Page 38: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Comparison with R-tree Approach

• The R-tree approach examines all nn_buffers intersecting with Q.

• By using a small, in-memory VOL-tree, the new approach can prune the search space.

Page 39: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Challenge

• With dynamic updates, to keep localmax and maxrange is expensive.

To insert an nn_buffer

here, recompute!

Page 40: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

• Index entry(range, fullcover, maxrange, localmax)

lowermax, uppermax

• lowermax ≤ localmax ≤ uppermax

Solution

Page 41: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

• Index entry(range, fullcover, maxrange, localmax)

lowermax, uppermax

• lowermax ≤ localmax ≤ uppermax• Any location in maxrange has overlap =

lowermax. • At a location outside maxrange, the overlap

can be more than lowermax, but < uppermax.

Solution

Page 42: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Update

• Case 1: the new nn_buffer does not intersect with maxrange.

• Case 2: intersects.

Case 1: increase

uppermax.

Case 2: increase uppermax and

lowermax.

Page 43: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Query

• Similar to the OL-tree.• To compute upper bound of max

overlap, use uppermax.• When Q intersects maxrange, may or

may not prune.

Page 44: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Content

• Problem Definition

• Straightforward Solution

• Problem Transformation

• The R-tree-based Solution

• The OL-tree

• The VOL-tree

• Performance

Page 45: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Setup

• Digital Chart from the R-tree Portal.– O: 24,493 populated places.

– S: 9,203 cultural landmarks.

• Pagesize: 1KB. Buffersize: 256 pages.• Object R-tree: 753 pages.• Pentium IV Dell PC, 3.2GHz. • Java.• Measure total I/O of 100 random queries.

Page 46: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Size of the VOL-tree

Page 47: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Small Query Area

Page 48: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Large Query Area

Page 49: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Varying Buffer Size

Page 50: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Effect of Update

Page 51: The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Conclusions

• Introduced the optimal-location query.• Proposed three solutions.• The VOL-tree approach is the best.• More improvement with larger query area.

(5% query area = 6 times improvement.)• More updates decreases the improvement.

(50% updates = no improvement.) But can bulk-load.