slice: reviving regions-based pruning for reverse k nearest neighbors queries
DESCRIPTION
SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries. Shiyu Yang 1 , Muhammad Aamir Cheema 2,1 , Xuemin Lin 1,3 , Ying Zhang 4,1. 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China - PowerPoint PPT PresentationTRANSCRIPT
Never Stand Still Faculty of Engineering Computer Science and Engineering
Click to edit Present’s Name
Never Stand Still Faculty of Engineering Computer Science and Engineering
SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries
Shiyu Yang1, Muhammad Aamir Cheema2,1, Xuemin Lin1,3, Ying Zhang4,1
1The University of New South Wales, Australia2 Monash University, Australia
3 East China Normal University, China4University of Technology, Sydney, Australia
School of Computer Science and Engineering2
Introduction• k Nearest Neighbor Query
– Find the facility that is one of k-closest facilities to the query user.
• Reverse k Nearest Neighbor Query– Find every user for which the query
facility is one of the k-closest facilities.
• RkNNs are the potential customers of a facility
u1
f1
u2
u3f3
f2
K=1
School of Computer Science and Engineering3
Related Work
Pruning Verification
Half-space
Region-based
TPL (VLDB 2004),FINCH (VLDB 2008),InfZone (ICDE 2011)
Six-regions (SIGMOD 2000)
Six-regions (SIGMOD 2000)
TPL (VLDB 2004)
FINCH (VLDB 2008)
Boost (SIGMOD 2010)
InfZone (ICDE2011)
School of Computer Science and Engineering4
Related Work• Regions-based Pruning: -Six-regions(SIGMOD 2000)
1. Divide the whole space centred at the query q into six equal regions
2. Find the k-th nearest neighbor in each Partition.
3. The k-th nearest facility of q in each region defines the area that can be pruned
k=2
The user points that cannot be pruned should be verified by range query
b ac
d
q
u1
u2
School of Computer Science and Engineering5
Related Work• Half-space Pruning: the space that is contained by k half- spaces can be pruned
-TPL (VLDB 2004)
1. Find the nearest facility f in the unpruned area.
2. Draw a bisector between q and f, prune by using the half-space
3. Iteratively access the nearest facility in unpruned area.
k=2
b ac
d
q
School of Computer Science and Engineering6
Related Work• Half-space Pruning: -InfZone(ICDE 2011)
1. The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning.
2. A point p is a RkNN of q if and only if p lies inside unpruned area.
3. No verification phase.
Half-space pruning is expensive especially when k is large.
k=2
b ac
d
q
School of Computer Science and Engineering7
Related WorkRegions-
basedHalf-
spaceVS
Range query
Pruning CostO(m log k) O(km2
)
Pruning Power
Verification Cost
Low High
Can regions-based pruning do better?
O(log m)
SLICE
O(m log m)
High
O(k)
m is the # of facilities considered for pruning
School of Computer Science and Engineering8
Notations• Partition: P
• Subtended angle: ∠a
• Maximum (minimal) subtended angle w.r.t P (, )
• Upper (lower) arc– Center: q– Radius: =
q
f p
a
θmi
n θmax
PUppe
r
Lower 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)
𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )
𝒅𝒊𝒔𝒕 ( 𝒇 ,𝒒 )𝟐𝒄𝒐𝒔 (θ𝒎𝒊𝒏)
School of Computer Science and Engineering9
Observation -- Pruning• A facility f prunes every point
p ∈ P for whichdist(p,q) > (UpperArc)< 90◦• We can prove a < b.
– a2=b2+c2-2bc∙cos()– b> = – c2-2bc∙cos() < c2-2 c∙cos() = c2(1- ) <0
• Facility prunes area outside the upper arc of f for every partition P for which < 90◦
q
f p
θ
PUppe
ra
cb 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)
𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )θmax
School of Computer Science and Engineering10
Comparison with Six-regions
q
fSix-region SLICE
Partitions Pruned
No. of Partitions
One
6
Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2cos(𝜃)
< 90o
any
VS
School of Computer Science and Engineering11
Pruning Algorithm• Divide space into t partitions
• Compute the upper arc of each partition for facilities.
• The area outside the k-th smallest upper arc (rB) in each partition can be pruned.
• Users in the pruned area can be pruned
• Users in the unpruned area will be verified by accessing significant facilities
q
f1f2
u1
u2
k=2
School of Computer Science and Engineering12
Significant Facility Verification• Significant facility:
– A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P.
MN
𝐫 𝐁
P
𝐫 𝐁 𝐫 𝐁
Significant facility cannot be in red area
• Verification for a candidate
Issuing range query
for each candidate
Accessing significant
facilities (O(k))
High I/O cost No additional I/O cost
Regions-based
2
SLICE
q
School of Computer Science and Engineering13
Theoretical Analyses• Number of significant facilities
• More analyses can be found in paper
• I/O Cost• Pruning phase:
– Same as circular range query centered at q with radius 2rB
• Verification phase:– Same as circular range query
centered at q with radius rB
2.34k ( θ ⇒ 0)
9k ( θ = 60o)
School of Computer Science and Engineering14
Experiments• Data Set :• Synthetic data :
– Size:50000, 100000, 150000 or 200000
– Distribution: Uniform or Normal
• Real data: The real data set consists of 175, 812 points in North America
• Algorithms: – Six-regions, InfZone and
SLICE
– Page size 4KB and number of buffers for Six-regions is 10
– Number of partitions for SLICE is 12
School of Computer Science and Engineering15
Experiments• Effect of different values of k
I/O CPU
School of Computer Science and Engineering16
Experiments• Effect of data distribution • Effect of % users
School of Computer Science and Engineering17
Experiments• Effect of partitions • Number of significant facilities
Number of partitions
Value of k
Thanks!Q&A