![Page 1: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/1.jpg)
Flexible Aggregate Similarity Search
Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4
1Department of Computer Science and Engineering 2Department of Computer ScienceShanghai Jiao Tong University, China Florida State University, USA
3Department of Computer ScienceHong Kong University of Science and Technology 4HP Labs China
![Page 2: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/2.jpg)
Outline
1 Motivation and Problem Formulation
2 Basic Aggregate Similarity Search
3 Flexible Aggregate Similarity Search
4 Experiments
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 3: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/3.jpg)
Introduction and motivation
Similarity search (aka nearest neighbor search, NN search) is afundamental tool in retrieving the most relevant data w.r.t. userinput in working with massive data: extensively studied.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 4: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/4.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 5: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/5.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
Given an aggregation σ, a similarity/distance function d , a datasetP, and any query group Q:
rp = σ{d(p,Q)} = σ{d(p, q1), . . . , d(p, q|Q|)}, for any p
aggregate similarity distance of p
Find p∗ ∈ P having the smallest rp value (rp∗ = r∗).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 6: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/6.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
Given an aggregation σ, a similarity/distance function d , a datasetP, and any query group Q:
rp = σ{d(p,Q)} = σ{d(p, q1), . . . , d(p, q|Q|)}, for any p
aggregate similarity distance of pFind p∗ ∈ P having the smallest rp value (rp∗ = r∗).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 7: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/7.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
: dataset P
: group Q of query points
Figure: Aggregate similarity search in Euclidean space: max and sum.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 8: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/8.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
: dataset P
p1
p2
p3
p4
p5
p6
q1
agg= max, p∗ = p3, r∗ = d(p3, q1)
: group Q of query points
Figure: Aggregate similarity search in Euclidean space: max and sum.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 9: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/9.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
: dataset P
p1
p2
p3
p4
p5
p6
: group Q of query points
agg=sum, p∗ = p4, r∗ =∑
q∈Q d(p4, q)
Figure: Aggregate similarity search in Euclidean space: max and sum.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 10: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/10.jpg)
Introduction and motivation
However, often time, users may be interested at retrieving objectsthat are similar to a group Q of query objects, instead of just one.
Aggregate similarity search (Ann) may need to deal with data inhigh dimensions.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 11: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/11.jpg)
Outline
1 Motivation and Problem Formulation
2 Basic Aggregate Similarity Search
3 Flexible Aggregate Similarity Search
4 Experiments
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 12: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/12.jpg)
Existing methods for Ann
R-tree method: branch and bound principle [PSTM04, PTMH05].
Some other heuristics to further improve the pruning.
Can be extended to other metric space using M-tree [RBTFT08].
Limitations:
No bound on the query cost.Query cost increases quickly as dataset becomes larger and/ordimension goes higher.
[PSTM04]: Group Nearest Neighbor Queries. In ICDE, 2004.
[PTMH05]: Aggregate nearest neighbor queries in spatial databases. In TODS, 2005.
[RBTFT08]: A Novel Optimization Approach to Efficiently Process Aggregate Similarity Queries in Metric Access Methods. In
CIKM, 2008.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 13: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/13.jpg)
Our approach for σ = max: Amax1
We proposed Amax1 (TKDE’10):
B(c, rc) is a ball centered at c with radius rc ;MEB(Q) is the minimum enclosing ball of a set of points Q;nn(c, P) is the nearest neighbor of a point c from the dataset P.
: dataset P
: group Q of query points
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 14: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/14.jpg)
Our approach for σ = max: Amax1
We proposed Amax1 (TKDE’10):
B(c, rc) is a ball centered at c with radius rc ;MEB(Q) is the minimum enclosing ball of a set of points Q;
nn(c, P) is the nearest neighbor of a point c from the dataset P.
: dataset P
: group Q of query points
1. B(c , rc) = MEB(Q)
crc
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 15: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/15.jpg)
Our approach for σ = max: Amax1
We proposed Amax1 (TKDE’10):
B(c, rc) is a ball centered at c with radius rc ;MEB(Q) is the minimum enclosing ball of a set of points Q;nn(c, P) is the nearest neighbor of a point c from the dataset P.
: dataset P
: group Q of query points
1. B(c , rc) = MEB(Q)
crc
2. return p = nn(c ,P)
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 16: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/16.jpg)
Our approach for σ = max: Amax1
An algorithm returns (p, rp) for Ann(Q,P) is an c-approximation iffr∗ ≤ rp ≤ c · rp.
In low dimensions, BBD-tree [AMNSW98] gives (1 + ε)-approximateNN search; in high dimensions, LSB-tree [TYSK10] gives(2 + ε)-approximate NN search with high probability; and(1 + ε)−MEB algorithm exists even in high dimensions [KMY03].
Theorem
Amax1 is a√
2-approximation in any dimension d given (exact)nn(c ,P) and MEB(Q).
Theorem
In any dimension d, given an α-approximate MEB algorithm and anβ-approximate NN algorithm, Amax1 is an
√α2 + β2-approximation.
[AMNSW98]: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. In JACM, 1998.
[TYSK10]: Efficient and Accurate Nearest Neighbor and Closest Pair Search in High Dimensional Space. In TODS, 2010.
[KMY03]: Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets. In JEA, 2003
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 17: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/17.jpg)
Our approach for σ = max: Amax1
An algorithm returns (p, rp) for Ann(Q,P) is an c-approximation iffr∗ ≤ rp ≤ c · rp.
In low dimensions, BBD-tree [AMNSW98] gives (1 + ε)-approximateNN search; in high dimensions, LSB-tree [TYSK10] gives(2 + ε)-approximate NN search with high probability; and(1 + ε)−MEB algorithm exists even in high dimensions [KMY03].
Theorem
Amax1 is a√
2-approximation in any dimension d given (exact)nn(c ,P) and MEB(Q).
Theorem
In any dimension d, given an α-approximate MEB algorithm and anβ-approximate NN algorithm, Amax1 is an
√α2 + β2-approximation.
[AMNSW98]: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. In JACM, 1998.
[TYSK10]: Efficient and Accurate Nearest Neighbor and Closest Pair Search in High Dimensional Space. In TODS, 2010.
[KMY03]: Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets. In JEA, 2003
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 18: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/18.jpg)
Our approach for σ = max: Amax1
An algorithm returns (p, rp) for Ann(Q,P) is an c-approximation iffr∗ ≤ rp ≤ c · rp.
In low dimensions, BBD-tree [AMNSW98] gives (1 + ε)-approximateNN search; in high dimensions, LSB-tree [TYSK10] gives(2 + ε)-approximate NN search with high probability; and(1 + ε)−MEB algorithm exists even in high dimensions [KMY03].
Theorem
Amax1 is a√
2-approximation in any dimension d given (exact)nn(c ,P) and MEB(Q).
Theorem
In any dimension d, given an α-approximate MEB algorithm and anβ-approximate NN algorithm, Amax1 is an
√α2 + β2-approximation.
[AMNSW98]: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. In JACM, 1998.
[TYSK10]: Efficient and Accurate Nearest Neighbor and Closest Pair Search in High Dimensional Space. In TODS, 2010.
[KMY03]: Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets. In JEA, 2003
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 19: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/19.jpg)
Our approach for σ = max: Amax1
An algorithm returns (p, rp) for Ann(Q,P) is an c-approximation iffr∗ ≤ rp ≤ c · rp.In low dimensions, BBD-tree [AMNSW98] gives (1 + ε)-approximateNN search; in high dimensions, LSB-tree [TYSK10] gives(2 + ε)-approximate NN search with high probability; and(1 + ε)−MEB algorithm exists even in high dimensions [KMY03].
Theorem
Amax1 is a√
2-approximation in any dimension d given (exact)nn(c ,P) and MEB(Q).
Theorem
In any dimension d, given an α-approximate MEB algorithm and anβ-approximate NN algorithm, Amax1 is an
√α2 + β2-approximation.
[AMNSW98]: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. In JACM, 1998.
[TYSK10]: Efficient and Accurate Nearest Neighbor and Closest Pair Search in High Dimensional Space. In TODS, 2010.
[KMY03]: Approximate Minimum Enclosing Balls in High Dimensions Using Core-Sets. In JEA, 2003
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 20: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/20.jpg)
Our approach for σ = sum: Asum1
We proposed Asum1 (TKDE’10):
let gm be the geometric median of Q;return nn(gm, P).
: dataset P
: group Q of query points
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 21: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/21.jpg)
Our approach for σ = sum: Asum1
We proposed Asum1 (TKDE’10):
let gm be the geometric median of Q;
return nn(gm, P).
: dataset P
: group Q of query points
gm
1. gm is the geometric median of Q
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 22: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/22.jpg)
Our approach for σ = sum: Asum1
We proposed Asum1 (TKDE’10):
let gm be the geometric median of Q;return nn(gm, P).
: dataset P
: group Q of query points
gm
1. gm is the geometric median of Q
2. return nn(gm,P)
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 23: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/23.jpg)
Our approach for σ = sum: Asum1
Using the Weiszfeld algorithm (iteratively re-weighted least squares),gm can be computed to an arbitrary precision efficiently.
Both Amax1 and Asum1 can be easily extended to work for kAnnsearch while the bounds are maintained.
Theorem
Asum1 is a 3-approximation in any dimension d given (exact) geometricmedian and nn(c ,P).
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum1 is an3β-approximation.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 24: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/24.jpg)
Our approach for σ = sum: Asum1
Using the Weiszfeld algorithm (iteratively re-weighted least squares),gm can be computed to an arbitrary precision efficiently.
Both Amax1 and Asum1 can be easily extended to work for kAnnsearch while the bounds are maintained.
Theorem
Asum1 is a 3-approximation in any dimension d given (exact) geometricmedian and nn(c ,P).
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum1 is an3β-approximation.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 25: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/25.jpg)
Our approach for σ = sum: Asum1
Using the Weiszfeld algorithm (iteratively re-weighted least squares),gm can be computed to an arbitrary precision efficiently.
Both Amax1 and Asum1 can be easily extended to work for kAnnsearch while the bounds are maintained.
Theorem
Asum1 is a 3-approximation in any dimension d given (exact) geometricmedian and nn(c ,P).
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum1 is an3β-approximation.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 26: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/26.jpg)
Outline
1 Motivation and Problem Formulation
2 Basic Aggregate Similarity Search
3 Flexible Aggregate Similarity Search
4 Experiments
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 27: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/27.jpg)
Definition of flexible aggregate similarity search
Flexible aggregate similarity search (Fann): given support φ ∈ (0, 1]and find an object in P that has the best aggregate similarity to(any) φ|Q| query objects (our work in SIGMOD’11).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 28: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/28.jpg)
Definition of flexible aggregate similarity search
Flexible aggregate similarity search (Fann): given support φ ∈ (0, 1]and find an object in P that has the best aggregate similarity to(any) φ|Q| query objects (our work in SIGMOD’11).
: dataset P
: group Q of query points
σ = max, φ = 40%, p∗ = p4, r∗ = d(p4, q3)
p1
p2
p3
p4
p5
p6
q3
q4
Figure: Fann in Euclidean space: max, φ = 0.4.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 29: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/29.jpg)
Exact methods for Fann
For ∀p ∈ P, rp = σ(p,Qpφ), where Qp
φ is p’s φ|Q| NNs in Q.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 30: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/30.jpg)
Exact methods for Fann
For ∀p ∈ P, rp = σ(p,Qpφ), where Qp
φ is p’s φ|Q| NNs in Q.
: dataset P
: group Q of query points
Qpφ : top φ|Q| NNs of p in Q
φ = 0.4, |Q| = 5, φ|Q| = 2
p
Qpφ
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 31: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/31.jpg)
Exact methods for Fann
For ∀p ∈ P, rp = σ(p,Qpφ), where Qp
φ is p’s φ|Q| NNs in Q.
R-tree method, with the branch and bound principle, can still beapplied based on this observation.
In high dimensions, take the brute-force-search (BFS) approach:
For each p ∈ P, find out Qpφ and calculate rp.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 32: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/32.jpg)
Approximate methods for σ = sum: Asum
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = sum
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 33: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/33.jpg)
Approximate methods for σ = sum: Asum
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = sum
p2 = nn(q1,P)
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 34: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/34.jpg)
Approximate methods for σ = sum: Asum
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = sum
Qpφ : top φ|Q| NNs of p in Q
q2
p2 = nn(q1,P)
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 35: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/35.jpg)
Approximate methods for σ = sum: Asum
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = sum
Qpφ : top φ|Q| NNs of p in Q
q2
rp2 = d(p2, q1) + d(p2, q2)
p2 = nn(q1,P)
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 36: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/36.jpg)
Approximation quality of Asum
Theorem
In any dimension d, given an exact NN algorithm, Asum is an3-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum is an(β + 2)-approximation.
Asum only needs |Q| times of NN search in P...
Asum still needs |Q| times of NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 37: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/37.jpg)
Approximation quality of Asum
Theorem
In any dimension d, given an exact NN algorithm, Asum is an3-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum is an(β + 2)-approximation.
Asum only needs |Q| times of NN search in P...
Asum still needs |Q| times of NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 38: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/38.jpg)
Approximation quality of Asum
Theorem
In any dimension d, given an exact NN algorithm, Asum is an3-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Asum is an(β + 2)-approximation.
Asum only needs |Q| times of NN search in P...
Asum still needs |Q| times of NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 39: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/39.jpg)
An improvement to Asum
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < ε, λ < 1, executing Asum algorithm only on a randomsubset of f (φ, ε, λ) points of Q returns a (3 + ε)-approximate answer toFann search in any dimensions with probability at least 1− λ, where
f (φ, ε, λ) =log λ
log(1− φε/3)= O(log(1/λ)/φε).
For |Q| = 1000, φ = 0.4, λ = 10%, ε = 0.5, only needs 33 NNsearch in any dimension.
(much less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 40: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/40.jpg)
An improvement to Asum
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < ε, λ < 1, executing Asum algorithm only on a randomsubset of f (φ, ε, λ) points of Q returns a (3 + ε)-approximate answer toFann search in any dimensions with probability at least 1− λ, where
f (φ, ε, λ) =log λ
log(1− φε/3)= O(log(1/λ)/φε).
For |Q| = 1000, φ = 0.4, λ = 10%, ε = 0.5, only needs 33 NNsearch in any dimension.
(much less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 41: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/41.jpg)
An improvement to Asum
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < ε, λ < 1, executing Asum algorithm only on a randomsubset of f (φ, ε, λ) points of Q returns a (3 + ε)-approximate answer toFann search in any dimensions with probability at least 1− λ, where
f (φ, ε, λ) =log λ
log(1− φε/3)= O(log(1/λ)/φε).
For |Q| = 1000, φ = 0.4, λ = 10%, ε = 0.5, only needs 33 NNsearch in any dimension.
(much less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 42: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/42.jpg)
An improvement to Asum
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < ε, λ < 1, executing Asum algorithm only on a randomsubset of f (φ, ε, λ) points of Q returns a (3 + ε)-approximate answer toFann search in any dimensions with probability at least 1− λ, where
f (φ, ε, λ) =log λ
log(1− φε/3)= O(log(1/λ)/φε).
For |Q| = 1000, φ = 0.4, λ = 10%, ε = 0.5, only needs 33 NNsearch in any dimension. (much less in practice, 1
φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 43: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/43.jpg)
An improvement to Asum
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < ε, λ < 1, executing Asum algorithm only on a randomsubset of f (φ, ε, λ) points of Q returns a (3 + ε)-approximate answer toFann search in any dimensions with probability at least 1− λ, where
f (φ, ε, λ) =log λ
log(1− φε/3)= O(log(1/λ)/φε).
For |Q| = 1000, φ = 0.4, λ = 10%, ε = 0.5, only needs 33 NNsearch in any dimension. (much less in practice, 1
φ is enough!)
Independent of dimensionality, |P|, and |Q|!Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 44: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/44.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 45: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/45.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1q2
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qqφ : top φ|Q| NNs of q in Q, including q
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 46: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/46.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1q2
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qqφ : top φ|Q| NNs of q in Q, including q
c1
MEB(Qq1
φ ) = MEB({q1, q2})
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 47: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/47.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1q2
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qqφ : top φ|Q| NNs of q in Q, including q
c1
MEB(Qq1
φ ) = MEB({q1, q2})p3 = nn(c1,P)
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 48: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/48.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qp3
φ : top φ|Q| NNs of p3 in Q
c1
p3 = nn(c1,P)
q2
q5
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 49: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/49.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qp3
φ : top φ|Q| NNs of p3 in Q
c1
p3 = nn(c1,P)
q2
q5rp3
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 50: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/50.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qp3
φ : top φ|Q| NNs of p3 in Q
c1
p3 = nn(c1,P)
q2
q5rp3
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 51: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/51.jpg)
Approximate methods for σ = max: Amax
: dataset P
: group Q of query points
q1
φ = 0.4, |Q| = 5, φ|Q| = 2, σ = max
Qp3
φ : top φ|Q| NNs of p3 in Q
c1
p3 = nn(c1,P)
q2
q5rp3
Repeat this for every qi ∈ Q, return the p with the smallest rp.
Identical to Asum, except using p = nn(ci ,P) instead ofp = nn(qi ,P), where ci is the center of MEB(qi ,Q
qi
φ ).
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 52: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/52.jpg)
Approximation quality of Amax
Theorem
In any dimension d, given an exact NN algorithm, Amax is an(1 + 2
√2)-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Amax is an((1 + 2
√2)β)-approximation.
Amax only needs |Q| times of MEB and |Q| NN search in P...
Amax still needs |Q| times of MEB and |Q| NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 53: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/53.jpg)
Approximation quality of Amax
Theorem
In any dimension d, given an exact NN algorithm, Amax is an(1 + 2
√2)-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Amax is an((1 + 2
√2)β)-approximation.
Amax only needs |Q| times of MEB and |Q| NN search in P...
Amax still needs |Q| times of MEB and |Q| NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 54: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/54.jpg)
Approximation quality of Amax
Theorem
In any dimension d, given an exact NN algorithm, Amax is an(1 + 2
√2)-approximation.
Theorem
In any dimension d, given an β-approximate NN algorithm, Amax is an((1 + 2
√2)β)-approximation.
Amax only needs |Q| times of MEB and |Q| NN search in P...
Amax still needs |Q| times of MEB and |Q| NN search in P!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 55: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/55.jpg)
An improvement to Amax
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < λ < 1, executing Amax algorithm only on a random subsetof f (φ, λ) points of Q returns a (1 + 2
√2)-approximate answer to the
Fann query with probability at least 1− λ in any dimensions, where
f (φ, λ) =log λ
log(1− φ)= O(log(1/λ)/φ).
For |Q| = 1000, φ = 0.4, λ = 10%, only needs 5 MEB and NNsearch in any dimension.
(even less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 56: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/56.jpg)
An improvement to Amax
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < λ < 1, executing Amax algorithm only on a random subsetof f (φ, λ) points of Q returns a (1 + 2
√2)-approximate answer to the
Fann query with probability at least 1− λ in any dimensions, where
f (φ, λ) =log λ
log(1− φ)= O(log(1/λ)/φ).
For |Q| = 1000, φ = 0.4, λ = 10%, only needs 5 MEB and NNsearch in any dimension.
(even less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 57: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/57.jpg)
An improvement to Amax
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < λ < 1, executing Amax algorithm only on a random subsetof f (φ, λ) points of Q returns a (1 + 2
√2)-approximate answer to the
Fann query with probability at least 1− λ in any dimensions, where
f (φ, λ) =log λ
log(1− φ)= O(log(1/λ)/φ).
For |Q| = 1000, φ = 0.4, λ = 10%, only needs 5 MEB and NNsearch in any dimension.
(even less in practice, 1φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 58: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/58.jpg)
An improvement to Amax
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < λ < 1, executing Amax algorithm only on a random subsetof f (φ, λ) points of Q returns a (1 + 2
√2)-approximate answer to the
Fann query with probability at least 1− λ in any dimensions, where
f (φ, λ) =log λ
log(1− φ)= O(log(1/λ)/φ).
For |Q| = 1000, φ = 0.4, λ = 10%, only needs 5 MEB and NNsearch in any dimension. (even less in practice, 1
φ is enough!)
Independent of dimensionality, |P|, and |Q|!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 59: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/59.jpg)
An improvement to Amax
: dataset P
: group Q of query points
randomly select a subset of Q!
Theorem
For any 0 < λ < 1, executing Amax algorithm only on a random subsetof f (φ, λ) points of Q returns a (1 + 2
√2)-approximate answer to the
Fann query with probability at least 1− λ in any dimensions, where
f (φ, λ) =log λ
log(1− φ)= O(log(1/λ)/φ).
For |Q| = 1000, φ = 0.4, λ = 10%, only needs 5 MEB and NNsearch in any dimension. (even less in practice, 1
φ is enough!)
Independent of dimensionality, |P|, and |Q|!Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 60: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/60.jpg)
Other issues
All algorithms for Fann can be extended to work for top-k Fann.
Most algorithms work for any metric space, except Amax whichworks for metric space when MEB is properly defined.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 61: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/61.jpg)
Other issues
All algorithms for Fann can be extended to work for top-k Fann.
Most algorithms work for any metric space, except Amax whichworks for metric space when MEB is properly defined.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 62: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/62.jpg)
Outline
1 Motivation and Problem Formulation
2 Basic Aggregate Similarity Search
3 Flexible Aggregate Similarity Search
4 Experiments
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 63: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/63.jpg)
Experiments: setup and datasets
Experiments are performed in a Linux machine with 4GB of RAMand an Intel Xeon 2GHz CPU.Datasets:
2-dimension: Texas (TX) points of interest and road-network datasetfrom the Open Street Map project: 14 million points (we have other49 states as well).
2-6 dimensions: synthetic datasets of random clusters (RC).High dimensions: datasets fromhttp://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html,http://yann.lecun.com/exdb/mnist/,and http://www.scl.ece.ucsb.edu/datasets/index.htm
dataset number of points dimensionalityTX 14, 000, 000 2RC synthetic 2− 6Color 68, 040 32MNIST 60, 000 50Cortina 1, 088, 864 74
report the average of 40 independent queries, as well as the 5%-95%interval.sampling rate of 1
φis enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 64: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/64.jpg)
Experiments: setup and datasets
Experiments are performed in a Linux machine with 4GB of RAMand an Intel Xeon 2GHz CPU.Datasets:
2-dimension: Texas (TX) points of interest and road-network datasetfrom the Open Street Map project: 14 million points (we have other49 states as well).2-6 dimensions: synthetic datasets of random clusters (RC).
High dimensions: datasets fromhttp://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html,http://yann.lecun.com/exdb/mnist/,and http://www.scl.ece.ucsb.edu/datasets/index.htm
dataset number of points dimensionalityTX 14, 000, 000 2RC synthetic 2− 6Color 68, 040 32MNIST 60, 000 50Cortina 1, 088, 864 74
report the average of 40 independent queries, as well as the 5%-95%interval.sampling rate of 1
φis enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 65: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/65.jpg)
Experiments: setup and datasets
Experiments are performed in a Linux machine with 4GB of RAMand an Intel Xeon 2GHz CPU.Datasets:
2-dimension: Texas (TX) points of interest and road-network datasetfrom the Open Street Map project: 14 million points (we have other49 states as well).2-6 dimensions: synthetic datasets of random clusters (RC).High dimensions: datasets fromhttp://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html,http://yann.lecun.com/exdb/mnist/,and http://www.scl.ece.ucsb.edu/datasets/index.htm
dataset number of points dimensionalityTX 14, 000, 000 2RC synthetic 2− 6Color 68, 040 32MNIST 60, 000 50Cortina 1, 088, 864 74
report the average of 40 independent queries, as well as the 5%-95%interval.sampling rate of 1
φis enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 66: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/66.jpg)
High dimensions: query cost, all datasets
1
1.2
1.4
1.6
1.8r p/r
*
MNIST Color Cortina
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 67: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/67.jpg)
High dimensions: query cost, all datasets
100
101
102
103
104
105
IO
MNIST Color Cortina
BFSASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 68: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/68.jpg)
The end
Thank You
Q and A
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 69: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/69.jpg)
Existing methods for Ann
R-tree method: brunch and bound principle [PSTM04, PTMH05].
N1 N2
N3 N4 N5 N6
p1 p2 p4 p5p3 p6 p7 p8 p9 p10 p11 p12
p1 p2 p5
p3 p4 p6
p8 p9
p10 p11
p12
p7
N1N3 N4
N2
N6
N5
The R-tree
[PSTM04]: Group Nearest Neighbor Queries. In ICDE, 2004.
[PTMH05]: Aggregate nearest neighbor queries in spatial databases. In TODS, 2005.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 70: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/70.jpg)
Existing methods for Ann
R-tree method: brunch and bound principle [PSTM04, PTMH05].
For a query point q and a MBR node Ni :
∀p ∈ Ni , mindist(q, Ni ) ≤ d(p, q) ≤ maxdist(q, Ni ).
N1 N2
N3 N4 N5 N6
p1 p2 p4 p5p3 p6 p7 p8 p9 p10 p11 p12
p1 p2 p5
p3 p4 p6
p8 p9
p10 p11
p12
p7
N1N3 N4
N2
N6
N5
The R-tree
qmindist(q, N1)
maxdist(q, N1)
[PSTM04]: Group Nearest Neighbor Queries. In ICDE, 2004.
[PTMH05]: Aggregate nearest neighbor queries in spatial databases. In TODS, 2005.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 71: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/71.jpg)
Existing methods for Ann
R-tree method: brunch and bound principle [PSTM04, PTMH05].
For a query group Q and σ = max,
∀p ∈ Ni , maxq∈Q
(mindist(q, Ni )) ≤ rp ≤ maxq∈Q
(maxdist(q, Ni )).
N1 N2
N3 N4 N5 N6
p1 p2 p4 p5p3 p6 p7 p8 p9 p10 p11 p12
p1 p2 p5
p3 p4 p6
p8 p9
p10 p11
p12
p7
N1N3 N4
N2
N6
N5
The R-tree
qmindist(q, N1)
maxdist(q, N1)
[PSTM04]: Group Nearest Neighbor Queries. In ICDE, 2004.
[PTMH05]: Aggregate nearest neighbor queries in spatial databases. In TODS, 2005.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 72: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/72.jpg)
Existing methods for Ann
R-tree method: brunch and bound principle [PSTM04, PTMH05].
For a query group Q and σ = sum,
∀p ∈ Ni ,Xq∈Q
mindist(q, Ni ) ≤ rp ≤Xq∈Q
maxdist(q, Ni ).
N1 N2
N3 N4 N5 N6
p1 p2 p4 p5p3 p6 p7 p8 p9 p10 p11 p12
p1 p2 p5
p3 p4 p6
p8 p9
p10 p11
p12
p7
N1N3 N4
N2
N6
N5
The R-tree
qmindist(q, N1)
maxdist(q, N1)
[PSTM04]: Group Nearest Neighbor Queries. In ICDE, 2004.
[PTMH05]: Aggregate nearest neighbor queries in spatial databases. In TODS, 2005.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 73: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/73.jpg)
The List algorithm for Fann
The List algorithm for any dimensions:
· · ·
a2 a3a1 a|Q|
For ∀p ∈ P, ai = d(p, qi )
q2 q3q1 q|Q|
[FLN01]: Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 74: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/74.jpg)
The List algorithm for Fann
The List algorithm for any dimensions:
· · ·
a2 a3
rp = σ(p,Qpφ) is monotone w.r.t. ai ’s
a1 a|Q|
For ∀p ∈ P, ai = d(p, qi )
q2 q3q1 q|Q|
[FLN01]: Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 75: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/75.jpg)
The List algorithm for Fann
The List algorithm for any dimensions:
· · ·
a2 a3
rp = σ(p,Qpφ) is monotone w.r.t. ai ’s
apply the TA algorithm [FLN01]
a1 a|Q|
For ∀p ∈ P, ai = d(p, qi )
q2 q3q1 q|Q|
[FLN01]: Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 76: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/76.jpg)
The List algorithm for Fann
The List algorithm for any dimensions:
· · ·
a2 a3
rp = σ(p,Qpφ) is monotone w.r.t. ai ’s
apply the TA algorithm [FLN01]
a2,j
a2,j is the jth NN of q2 in P!
a1 a|Q|
For ∀p ∈ P, ai = d(p, qi )
q2 q3q1 q|Q|
[FLN01]: Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 77: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/77.jpg)
The List algorithm for Fann
The List algorithm for any dimensions:
rp = σ(p,Qpφ) is monotone w.r.t. ai ’s
apply the TA algorithm [FLN01]
a2,j
a2,j is the jth NN of q2 in P!
· · ·
a2 a3a1 a|Q|
For ∀p ∈ P, ai = d(p, qi )
q2 q3q1 q|Q|
[FLN01]: Optimal Aggregation Algorithms for Middleware. In PODS, 2001.
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 78: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/78.jpg)
Experiments: defaults
Default values:
Symbol Definition DefaultM |Q| 200φ support 0.5A query group volume 5% (of the entire data space)
points in a query group random cluster distribution
Default values for low dimensions:
Symbol Definition DefaultN |P| 2,000,000d dimensionality 2
dataset TX, when d = 2dataset RC, when vary d from 2 to 6
Values for high dimensions:
Symbol Definition DefaultN |P| 200,000d dimensionality 30
default dataset Cortina
report the average of 40 independent, randomly generated queries,as well as the 5%-95% interval.sampling rate of 1
φ is enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 79: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/79.jpg)
Experiments: defaults
Default values:
Symbol Definition DefaultM |Q| 200φ support 0.5A query group volume 5% (of the entire data space)
points in a query group random cluster distribution
Default values for low dimensions:
Symbol Definition DefaultN |P| 2,000,000d dimensionality 2
dataset TX, when d = 2dataset RC, when vary d from 2 to 6
Values for high dimensions:
Symbol Definition DefaultN |P| 200,000d dimensionality 30
default dataset Cortina
report the average of 40 independent, randomly generated queries,as well as the 5%-95% interval.sampling rate of 1
φ is enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 80: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/80.jpg)
Experiments: defaults
Default values:
Symbol Definition DefaultM |Q| 200φ support 0.5A query group volume 5% (of the entire data space)
points in a query group random cluster distribution
Default values for low dimensions:
Symbol Definition DefaultN |P| 2,000,000d dimensionality 2
dataset TX, when d = 2dataset RC, when vary d from 2 to 6
Values for high dimensions:
Symbol Definition DefaultN |P| 200,000d dimensionality 30
default dataset Cortina
report the average of 40 independent, randomly generated queries,as well as the 5%-95% interval.sampling rate of 1
φ is enough for both Asum and Amax!
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 81: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/81.jpg)
Experiments: defaults
Default values:
Symbol Definition DefaultM |Q| 200φ support 0.5A query group volume 5% (of the entire data space)
points in a query group random cluster distribution
Default values for low dimensions:
Symbol Definition DefaultN |P| 2,000,000d dimensionality 2
dataset TX, when d = 2dataset RC, when vary d from 2 to 6
Values for high dimensions:
Symbol Definition DefaultN |P| 200,000d dimensionality 30
default dataset Cortina
report the average of 40 independent, randomly generated queries,as well as the 5%-95% interval.sampling rate of 1
φ is enough for both Asum and Amax!Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 82: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/82.jpg)
Low dimensions: approximation quality
0 0.1 0.3 0.5 0.7 0.9 11
1.2
1.4
1.6
1.8
φ
r p/r*
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 83: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/83.jpg)
Low dimensions: approximation quality
1 2 3 4 51
1.1
1.2
1.3
1.4
1.5
N:X106
r p/r*
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 84: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/84.jpg)
Low dimensions: query cost, vary M
1 2 3 4 510
0
101
102
103
104
M:X102
IO
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 85: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/85.jpg)
Low dimensions: query cost, vary M
1 2 3 4 510
−4
10−2
100
102
104
M:X102
runn
ing
time(
seco
nds)
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 86: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/86.jpg)
Low dimensions: query cost, vary N
1 2 3 4 510
0
101
102
103
104
N:X106
IO
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 87: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/87.jpg)
Low dimensions: query cost, vary N
1 2 3 4 510
−4
10−2
100
102
104
N:X106
runn
ing
time(
seco
nds)
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 88: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/88.jpg)
Low dimensions: query cost, vary φ
0.1 0.3 0.5 0.7 0.9 110
0
101
102
103
104
φ
IO
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 89: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/89.jpg)
Low dimensions: query cost, vary φ
0.1 0.3 0.5 0.7 0.9 110
−4
10−2
100
102
104
φ
runn
ing
time(
seco
nds)
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 90: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/90.jpg)
Low dimensions: query cost, vary d
2 3 4 5 610
0
101
102
103
104
d
IO
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 91: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/91.jpg)
Low dimensions: query cost, vary d
2 3 4 5 610
−4
10−2
100
102
104
d
runn
ing
time(
seco
nds)
BFS R−treeSUM R−treeMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 92: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/92.jpg)
High dimensions: approximation quality
0.1 0.3 0.5 0.7 0.9 11
1.1
1.2
1.3
φ
r p/r*
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 93: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/93.jpg)
High dimensions: approximation quality
1 2 3 4 51
1.1
1.2
1.3
N:X105
r p/r*
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 94: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/94.jpg)
High dimensions: approximation quality
10 20 30 40 501
1.1
1.2
1.3
d
r p/r*
ASUMAMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 95: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/95.jpg)
High dimensions: query cost, vary M
4 8 16 32 64 128 256 51210
2
103
104
M
IO
BFS List−iDistSUM List−iDistMAX
List−LsbSUM List−LsbMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 96: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/96.jpg)
High dimensions: query cost, vary M
4 8 16 32 64 128 256 51210
−4
10−2
100
102
M
runn
ing
time(
seco
nds)
BFS List−iDistSUM List−iDistMAX
List−LsbSUM List−LsbMAX
ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 97: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/97.jpg)
High dimensions: query cost, vary N
1 2 3 4 510
1
102
103
104
N:X105
IO
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 98: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/98.jpg)
High dimensions: query cost, vary N
1 2 3 4 510
−4
10−2
100
102
N:X105
runn
ing
time(
seco
nds)
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 99: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/99.jpg)
High dimensions: query cost, vary φ
0.1 0.3 0.5 0.7 0.9 110
1
102
103
104
φ
IO
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 100: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/100.jpg)
High dimensions: query cost, vary φ
0.1 0.3 0.5 0.7 0.9 110
−4
10−2
100
102
φ
runn
ing
time(
seco
nds)
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 101: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/101.jpg)
High dimensions: query cost, vary d
10 20 30 40 5010
1
102
103
104
d
IO
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search
![Page 102: Flexible Aggregate Similarity Search › ~lifeifei › papers › fannslides.pdf · Flexible Aggregate Similarity Search Yang Li1, Feifei Li2, Ke Yi3, Bin Yao2, Min Wang4 1Department](https://reader031.vdocuments.mx/reader031/viewer/2022041109/5f0e42687e708231d43e6081/html5/thumbnails/102.jpg)
High dimensions: query cost, vary d
10 20 30 40 5010
−4
10−2
100
102
d
runn
ing
time(
seco
nds)
BFS ASUM AMAX
Yang Li, Feifei Li, Ke Yi, Bin Yao, Min Wang Flexible Aggregate Similarity Search