hypersphere dominance: an optimal approach

27
1 Hypersphere Dominance: An Optimal Approach Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min Xie The Hong Kong University of Science and Technology Prepared by Cheng Long Presented by Cheng Long 24 June, 2014

Upload: hien

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Hypersphere Dominance: An Optimal Approach. Cheng Long , Raymond Chi-Wing Wong, Bin Zhang, Min Xie The Hong Kong University of Science and Technology Prepared by Cheng Long Presented by Cheng Long 24 June, 2014. Hyperspheres. A hypersphere in a d-dimensional space (center, radius ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hypersphere  Dominance: An Optimal Approach

1

Hypersphere Dominance: An Optimal Approach

Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min XieThe Hong Kong University of Science and Technology

Prepared by Cheng LongPresented by Cheng Long

24 June, 2014

Page 2: Hypersphere  Dominance: An Optimal Approach

Hyperspheres A hypersphere in a d-dimensional space

(center, radius) the set of all points that have their distances from

the center bounded by the radius

2

π‘π‘Ÿ π‘Ÿ

𝑐

2D: a disk 3D: a ball

Page 3: Hypersphere  Dominance: An Optimal Approach

Hyperspheres are commonly used Uncertain databases

the location of an uncertain object Spatial databases

SS-tree, SS+-tree, M-tree, VP-tree and SR-tree

3

SS-tree: similar to R-tree with hyperrectangles replaced by hyperspheres

SS-tree based on A-Hlayout of 8 objects: A-H

Page 4: Hypersphere  Dominance: An Optimal Approach

Motivating example Scenario

Ada has her location uncertain, but constrained in a disk Sa. Bob has his location uncertain, but constrained in a disk Sb. Connie has her location uncertain, but constrained in a disk Sq.

Question Is Ada always closer to Connie than Bob?

4

(Ada)

Sb (Bob)

Sq (Connie) Sq

(Connie)

(Ada)

Sb (Bob)

NoFor this specification of the locations, Ada is closer to Connie than Bob

In fact, for all specifications of the locations, Ada is closer to Connie than Bob

Yes

Page 5: Hypersphere  Dominance: An Optimal Approach

Hypersphere dominance: definition

5

Definition 1: Hypersphere dominanceGiven

, , and , it decides whether

Dominance conditionYes: No:

Basic operator used in many queries Probabilistic RkNN query [Lian and Chen, VLDBJ’09] AkNN query [Emrich et al., SSDBM’10] kNN query [Long et al., SIGMOD’14]

Page 6: Hypersphere  Dominance: An Optimal Approach

Hypersphere dominance: existing solutionsβ€”overview MinMax [Roussopoulos et al., SIGMOD Record’95; Hjaltason and Samet,

TODS’99] MBR [Emrich et al., SIGMOD’10]

GP [Lian and Chen, VLDBJ’09]

Trigonometric [Emrich et al., SSDBM’10]

6

Page 7: Hypersphere  Dominance: An Optimal Approach

Hypersphere dominance: existing solutionsβ€”MinMax (1)

7

π‘†π‘Ž 𝑆𝑏

𝑐 π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)=𝐷𝑖𝑠𝑑 (𝑐 π‘Ž ,𝑐𝑏 )+π‘Ÿπ‘Ž+π‘Ÿ 𝑏 =

( and Sb overlap), – – ( and Sb do not overlap)

π‘†π‘Žπ‘ π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

𝑆𝑏

Definition: the maximum distance between a point in and a point in Sb

Definition: the minimum distance between a point in Sa and a point in Sb

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏) 𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)=0

𝑆𝑏𝑆 π‘Žπ‘ π‘Ž 𝑐 π‘π‘Ÿ π‘Ž π‘Ÿ 𝑏

Page 8: Hypersphere  Dominance: An Optimal Approach

Hypersphere dominance: existing solutionsβ€”MinMax (2)

8

MinMaxCompute Compute If

Return Else

Return

𝑆 π‘ŽSb

Sq 𝑆 π‘Ž

Sb

Sq

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž,π‘†π‘ž)𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (𝑆𝑏 ,π‘†π‘ž)

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘†π‘Ž ,𝑆𝑏)

𝑀𝑖𝑛𝐷𝑖𝑠𝑑 (𝑆𝑏 ,π‘†π‘ž)

π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)=π‘‘π‘Ÿπ‘’π‘’MinMax returns

β€œfalse negative”

<

MinMax returns

>

correct π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)=π‘‘π‘Ÿπ‘’π‘’

bisector and

Page 9: Hypersphere  Dominance: An Optimal Approach

Hypersphere dominance: existing solutions--Insufficiency

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes Yes

9

Criteria of a method:1. Correctness: No false positive2. Soundness: No false negative3. Efficiency: runs in O(d) where d is the number of dimensionality

Our approach is the only one which is correct, sound and efficient!

Our approach(Hyperbola)

Yes Yes Yes

Page 10: Hypersphere  Dominance: An Optimal Approach

Our approach: major idea Step 1: pre-checking

Do the decision directly Step 2: dominance checking

Drive an equivalent condition of which is easier to decide Do the decision

10

For cases where it is easy to decide whether the dominance condition is true For cases where it is difficult to decide whether the dominance condition is true directly

Page 11: Hypersphere  Dominance: An Optimal Approach

Our approach: pre-checking

11

𝑆 π‘Ž

Sb

Sq 𝑆 π‘Ž

Sb

Sq

Step 1: Pre-checking:If and Sb overlap

Return If Sb and Sq overlap

Return and Sb overlap

π·π‘œπ‘š(π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)= π‘“π‘Žπ‘™π‘ π‘’Sb and Sq overlap

π·π‘œπ‘š (π‘†π‘Ž ,𝑆𝑏 ,π‘†π‘ž)= π‘“π‘Žπ‘™π‘ π‘’

Page 12: Hypersphere  Dominance: An Optimal Approach

Our approach: dominance checking (1)

12

Dominance condition:

Equivalent condition (1):

Proof of the equivalence between Condition (1) and Condition (2):β€œ=>”: By contradiction β€œ<=”:

Step 2: Dominance checking:Derive an equivalent condition of and check whether the derived condition is true

Page 13: Hypersphere  Dominance: An Optimal Approach

Our approach: dominance checking (5)

13

Equivalent condition (2):

Equivalent condition (3):

π‘€π‘Žπ‘₯𝐷𝑖𝑠𝑑 (π‘ž ,𝑆 π‘Ž)=𝐷𝑖𝑠𝑑 (π‘ž ,π‘π‘Ž )+π‘Ÿ π‘Ž+0=𝐷𝑖𝑠𝑑 (π‘ž ,π‘π‘Ž )+π‘Ÿ π‘Ž π‘€βˆˆπ·π‘–π‘ π‘‘(π‘ž ,𝑆𝑏)=𝐷𝑖𝑠𝑑 (π‘ž ,𝑐𝑏 )βˆ’π‘Ÿ π‘βˆ’0=𝐷𝑖𝑠𝑑 (π‘ž ,𝑐𝑏)βˆ’π‘Ÿ 𝑏

Page 14: Hypersphere  Dominance: An Optimal Approach

Our approach: dominance checking (3)

14

Space partitioning: Boundary : Region : Region :

Boundary : Region Ra

Region Rb

Equivalent condition (4): is in Region ( is in Region )

SaSb

ca

cb

Sqcq

Equivalent condition (3):

Page 15: Hypersphere  Dominance: An Optimal Approach

Our approach: dominance checking (4)

15

Equivalent condition (5): is in Region and

Equivalent condition (4): is in Region

rq

π‘šπ‘–π‘›π‘₯βˆˆπ‘ƒπ·π‘–π‘ π‘‘ (π‘π‘ž ,π‘₯ )

SaSb

ca

cb

Region Ra

Region Rb

Sqcq

Boundary :

π‘šπ‘–π‘›π‘₯βˆˆπ‘ƒπ·π‘–π‘ π‘‘ (π‘π‘ž ,π‘₯ )>π‘Ÿπ‘ž

is Region

is in Region

Page 16: Hypersphere  Dominance: An Optimal Approach

Our approach (2)

Compute constraint: objective: minimize

We use the Lagrange Multiplier (LM) method. Details could be found in the paper

16

correct sound efficientThe condition (3) is equivalent to the dominance conditionEach condition transformation takes O(d) time and the cost of LM is also O(d)

Equivalent condition (5): is in Region and

Space partitioning: Boundary : Region : Region :

Page 17: Hypersphere  Dominance: An Optimal Approach

Empirical study: set-up Datasets:

Real datasets: NBA, Color, Texture, and Forest Synthetic datasets

Algorithms: MinMax, MBR, GP, Trigonometric, Hyperbola (our

method) Measures:

precision = TP/(TP+FP) recall = TP/(TP+FN) running time

17

A correct method has the precision always equal to 1A sound method has the recall always equal to 1

Criteria of a method:1. Correctness: No false positive (FP)2. Soundness: No false negative (FN)3. Efficiency: runs in O(d) where d is the number of dimensionality

Page 18: Hypersphere  Dominance: An Optimal Approach

Empirical study: results (precision, NBA) All algorithms except Trigonometric have

precisions = 1.

18

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes YesOur approach Yes Yes Yes

Page 19: Hypersphere  Dominance: An Optimal Approach

Empirical study: results (recall, NBA) Only our approach (Hyperbola) and

Trigonometirc have recalls = 1.

19

Methods Correct? Sound? Efficient?MinMax Yes No Yes

MBR Yes No YesGP Yes No Yes

Trigonometric No Yes YesOur approach Yes Yes Yes

Page 20: Hypersphere  Dominance: An Optimal Approach

Empirical study: results (running time, NBA) MinMax < GP < Hyperbola (our method) <

MBR < Trigonometric

20

Page 21: Hypersphere  Dominance: An Optimal Approach

Conclusion First solution for the hypersphere dominance

problem, which is correct, sound and efficient for any dimension

An application study: kNN Experiments

21

Page 22: Hypersphere  Dominance: An Optimal Approach

Q & A

22

Page 23: Hypersphere  Dominance: An Optimal Approach

The following slides are for backup use only

23

Page 24: Hypersphere  Dominance: An Optimal Approach

Hyperspheres in uncertain databases Song and Roussopoulos [SSTD’01] Cheng et al. [TKDE’04] Chen and Cheng [ICDE’07] Beskales et al. [PVLDB’08]

24

Page 25: Hypersphere  Dominance: An Optimal Approach

Our approach (1)

25

Dominance condition:

Equivalent condition (1): :

Major idea:Derive an equivalent condition of and check whether the derived condition is true

Equivalent condition (2):

Equivalent condition (3): and :

Definition 1: Hypersphere dominanceGiven

, , and , it decides whether

Dominance conditionYes: No:

Page 26: Hypersphere  Dominance: An Optimal Approach

An application study: kNN qeury kNN query:

Given a set D of hyperspheres, , , …, , a query hypershere , and an integer ,

the query finds a set of hyperspheres in D each of which is not dominated by wrt where is the hypersphere in D with the k-th smallest maximum distance from .

Solution: A best-first search algorithm based on SS-tree Some pruning strategies 26

Page 27: Hypersphere  Dominance: An Optimal Approach

27

Boundary : Region Ra

Region RbIllustration 1: 2D space, and are two points (i.e., = 0, = 0)Sb ()

SqSa () cq