slice: reviving regions-based pruning for reverse k nearest neighbors queries

18
Never Stand Still Faculty of Engineering Computer Science and Engineering Click to edit Present’s Name Never Stand Still Faculty of Engineering Computer Science and Engineering SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1 , Muhammad Aamir Cheema 2,1 , Xuemin Lin 1,3 , Ying Zhang 4,1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China 4 University of Technology, Sydney, Australia

Upload: dixon

Post on 23-Feb-2016

82 views

Category:

Documents


2 download

DESCRIPTION

SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries. Shiyu Yang 1 , Muhammad Aamir Cheema 2,1 , Xuemin Lin 1,3 , Ying Zhang 4,1. 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

Never Stand Still Faculty of Engineering Computer Science and Engineering

Click to edit Present’s Name

Never Stand Still Faculty of Engineering Computer Science and Engineering

SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries

Shiyu Yang1, Muhammad Aamir Cheema2,1, Xuemin Lin1,3, Ying Zhang4,1

1The University of New South Wales, Australia2 Monash University, Australia

3 East China Normal University, China4University of Technology, Sydney, Australia

Page 2: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering2

Introduction• k Nearest Neighbor Query

– Find the facility that is one of k-closest facilities to the query user.

• Reverse k Nearest Neighbor Query– Find every user for which the query

facility is one of the k-closest facilities.

• RkNNs are the potential customers of a facility

u1

f1

u2

u3f3

f2

K=1

Page 3: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering3

Related Work

Pruning Verification

Half-space

Region-based

TPL (VLDB 2004),FINCH (VLDB 2008),InfZone (ICDE 2011)

Six-regions (SIGMOD 2000)

Six-regions (SIGMOD 2000)

TPL (VLDB 2004)

FINCH (VLDB 2008)

Boost (SIGMOD 2010)

InfZone (ICDE2011)

Page 4: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering4

Related Work• Regions-based Pruning: -Six-regions(SIGMOD 2000)

1. Divide the whole space centred at the query q into six equal regions

2. Find the k-th nearest neighbor in each Partition.

3. The k-th nearest facility of q in each region defines the area that can be pruned

k=2

The user points that cannot be pruned should be verified by range query

b ac

d

q

u1

u2

Page 5: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering5

Related Work• Half-space Pruning: the space that is contained by k half- spaces can be pruned

-TPL (VLDB 2004)

1. Find the nearest facility f in the unpruned area.

2. Draw a bisector between q and f, prune by using the half-space

3. Iteratively access the nearest facility in unpruned area.

k=2

b ac

d

q

Page 6: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering6

Related Work• Half-space Pruning: -InfZone(ICDE 2011)

1. The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning.

2. A point p is a RkNN of q if and only if p lies inside unpruned area.

3. No verification phase.

Half-space pruning is expensive especially when k is large.

k=2

b ac

d

q

Page 7: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering7

Related WorkRegions-

basedHalf-

spaceVS

Range query

Pruning CostO(m log k) O(km2

)

Pruning Power

Verification Cost

Low High

Can regions-based pruning do better?

O(log m)

SLICE

O(m log m)

High

O(k)

m is the # of facilities considered for pruning

Page 8: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering8

Notations• Partition: P

• Subtended angle: ∠a

• Maximum (minimal) subtended angle w.r.t P (, )

• Upper (lower) arc– Center: q– Radius: =

q

f p

a

θmi

n θmax

PUppe

r

Lower 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)

𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )

𝒅𝒊𝒔𝒕 ( 𝒇 ,𝒒 )𝟐𝒄𝒐𝒔 (θ𝒎𝒊𝒏)

Page 9: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering9

Observation -- Pruning• A facility f prunes every point

p ∈ P for whichdist(p,q) > (UpperArc)< 90◦• We can prove a < b.

– a2=b2+c2-2bc∙cos()– b> = – c2-2bc∙cos() < c2-2 c∙cos() = c2(1- ) <0

• Facility prunes area outside the upper arc of f for every partition P for which < 90◦

q

f p

θ

PUppe

ra

cb 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)

𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )θmax

Page 10: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering10

Comparison with Six-regions

q

fSix-region SLICE

Partitions Pruned

No. of Partitions

One

6

Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2cos(𝜃)

< 90o

any

VS

Page 11: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering11

Pruning Algorithm• Divide space into t partitions

• Compute the upper arc of each partition for facilities.

• The area outside the k-th smallest upper arc (rB) in each partition can be pruned.

• Users in the pruned area can be pruned

• Users in the unpruned area will be verified by accessing significant facilities

q

f1f2

u1

u2

k=2

Page 12: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering12

Significant Facility Verification• Significant facility:

– A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P.

MN

𝐫 𝐁

P

𝐫 𝐁 𝐫 𝐁

Significant facility cannot be in red area

• Verification for a candidate

Issuing range query

for each candidate

Accessing significant

facilities (O(k))

High I/O cost No additional I/O cost

Regions-based

2

SLICE

q

Page 13: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering13

Theoretical Analyses• Number of significant facilities

• More analyses can be found in paper

• I/O Cost• Pruning phase:

– Same as circular range query centered at q with radius 2rB

• Verification phase:– Same as circular range query

centered at q with radius rB

2.34k ( θ ⇒ 0)

9k ( θ = 60o)

Page 14: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering14

Experiments• Data Set :• Synthetic data :

– Size:50000, 100000, 150000 or 200000

– Distribution: Uniform or Normal

• Real data: The real data set consists of 175, 812 points in North America

• Algorithms: – Six-regions, InfZone and

SLICE

– Page size 4KB and number of buffers for Six-regions is 10

– Number of partitions for SLICE is 12

Page 15: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering15

Experiments• Effect of different values of k

I/O CPU

Page 16: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering16

Experiments• Effect of data distribution • Effect of % users

Page 17: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

School of Computer Science and Engineering17

Experiments• Effect of partitions • Number of significant facilities

Number of partitions

Value of k

Page 18: SLICE: Reviving  Regions-Based Pruning  for Reverse k Nearest  Neighbors  Queries

Thanks!Q&A