large relational databases (almost) for free k nearest neighbor …lifeifei/papers/knn.pdf ·...

79
1-1 K Nearest Neighbor Queries and KNN-Joins in Large Relational Databases (Almost) for Free Bin Yao, Feifei Li, Piyush Kumar Florida State University

Upload: others

Post on 03-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

1-1

K Nearest Neighbor Queries and KNN-Joins inLarge Relational Databases (Almost) for Free

Bin Yao, Feifei Li, Piyush Kumar

Florida State University

Page 2: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

2-1

Introduction

KNN queries and KNN-Joins:spatial databases, pattern recognition, DNA sequencing.

Page 3: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

2-2

Introduction

KNN queries and KNN-Joins:spatial databases, pattern recognition, DNA sequencing.

Our goal: design relational algorithms for KNN and KNN-Joins.

Page 4: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

2-3

Introduction

KNN queries and KNN-Joins:spatial databases, pattern recognition, DNA sequencing.

Our goal: design relational algorithms for KNN and KNN-Joins.

Augmented with ad-hoc query conditions and optimized bythe query optimizer.

Readily applied on relational databases without updating theengine.

Do it in SQL!

Page 5: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

3-1

Challenge and benefit in designingrelational algorithms

The main challenge:A query optimizer cannot optimize user-defined functions (UDF).

Page 6: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

3-2

Challenge and benefit in designingrelational algorithms

The main challenge:A query optimizer cannot optimize user-defined functions (UDF).

SELECT TOP k * FROM Address A, Restaurant RWHERE R.Type=’Italian’ AND R.Wine=’French’ORDER BY Euclidean (A.X, A.Y, R.X, R.Y)

Page 7: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

3-3

Challenge and benefit in designingrelational algorithms

The main challenge:A query optimizer cannot optimize user-defined functions (UDF).

SELECT TOP k * FROM Address A, Restaurant RWHERE R.Type=’Italian’ AND R.Wine=’French’ORDER BY Euclidean (A.X, A.Y, R.X, R.Y)

Page 8: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

4-1

Previous work on kNN and kNN-Join

iDistance for high dimensions.

Exact kNN solution:R-tree for low dimensions

Page 9: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

4-2

Previous work on kNN and kNN-Join

iDistance for high dimensions.

Balanced box decomposition tree

Exact kNN solution:R-tree for low dimensions

LSB-tree

Approximate kNN solution:

Locality sensitive hashing

Medrank

Page 10: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

4-3

Previous work on kNN and kNN-Join

iDistance for high dimensions.

Balanced box decomposition tree

the iJoin algorithm

Exact kNN solution:R-tree for low dimensions

LSB-tree

Approximate kNN solution:

Locality sensitive hashing

Medrank

the Gorder algorithm

kNN-Join solution:

Page 11: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

5-1

Problem formulation

Data set P stored in table RP : {pid, Y1, · · · , Yd, A1, · · · , Ah}.Query set Q stored in table RQ: {qid,X1, · · · , Xd, B1, · · · , Bg}.

Page 12: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

5-2

Problem formulation

KNN queries: let A = kNN(q,RP ),

(A ⊆ RP ) ∧ (|A| = k) ∧ (∀a ∈ A,∀r ∈ RP −A, |a, q| ≤ |r, q|).

Data set P stored in table RP : {pid, Y1, · · · , Yd, A1, · · · , Ah}.Query set Q stored in table RQ: {qid,X1, · · · , Xd, B1, · · · , Bg}.

Page 13: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

5-3

Problem formulation

KNN queries: let A = kNN(q,RP ),

(A ⊆ RP ) ∧ (|A| = k) ∧ (∀a ∈ A,∀r ∈ RP −A, |a, q| ≤ |r, q|).

KNN-Join:for ∀s ∈ Q, produce k pairs (s, r), for ∀r ∈ kNN(s, Rp).

Data set P stored in table RP : {pid, Y1, · · · , Yd, A1, · · · , Ah}.Query set Q stored in table RQ: {qid,X1, · · · , Xd, B1, · · · , Bg}.

Page 14: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

5-4

Problem formulation

KNN queries: let A = kNN(q,RP ),

(A ⊆ RP ) ∧ (|A| = k) ∧ (∀a ∈ A,∀r ∈ RP −A, |a, q| ≤ |r, q|).

KNN-Join:for ∀s ∈ Q, produce k pairs (s, r), for ∀r ∈ kNN(s, Rp).

Approximate k nearest neighbors:Suppose q’s kth nn from P is p∗ and r∗ = |q, p∗|,p be the kth NN of q for some kNN algorithm A and rp = |q, p|,(p, rp) ∈ Rd × R is (1 + ε)-approximate solution of kNN ifr∗ ≤ rp ≤ (1 + ε)r∗.

Data set P stored in table RP : {pid, Y1, · · · , Yd, A1, · · · , Ah}.Query set Q stored in table RQ: {qid,X1, · · · , Xd, B1, · · · , Bg}.

Page 15: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

6-1

Z-value and Z-order curve

z-value of a point:For point (2, 6), binary representation is (010, 110), z-value is011100 = 28.

Page 16: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

6-2

Z-value and Z-order curve

z-value of a point:For point (2, 6), binary representation is (010, 110), z-value is011100 = 28.

A well-known approach:

Page 17: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

6-3

Z-value and Z-order curve

z-value of a point:For point (2, 6), binary representation is (010, 110), z-value is011100 = 28.

A well-known approach:

Map points in a multi-dimensional space into one dimensionby using z-values.

Page 18: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

6-4

Z-value and Z-order curve

z-value of a point:For point (2, 6), binary representation is (010, 110), z-value is011100 = 28.

A well-known approach:

Map points in a multi-dimensional space into one dimensionby using z-values.

Translate the kNN search into one dimensional range searchon the z-values.

Page 19: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-1

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

Page 20: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-2

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

q

Page 21: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-3

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

q

Page 22: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-4

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

q

Page 23: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-5

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

Our idea: produce α randomly shifted copies of the input dataset (P 0, . . . , Pα) and repeat the one dimensional range search(γ = O(k) points up and down next to the q) for each copy.

Page 24: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-6

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

q

Page 25: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-7

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

q

−→v

Page 26: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-8

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

−→vq

Page 27: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-9

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

−→vq

γ = 2

Page 28: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-10

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

Our idea: produce α randomly shifted copies of the input dataset (P 0, . . . , Pα) and repeat the one dimensional range search(γ = O(k) points up and down next to the q) for each copy.

Retrieve the kNN from the unioned candidates of the α copies.

Page 29: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

7-11

Approximation by random shifts

Z-values preserve the spatial locality, but not always the case.

Our idea: produce α randomly shifted copies of the input dataset (P 0, . . . , Pα) and repeat the one dimensional range search(γ = O(k) points up and down next to the q) for each copy.

Retrieve the kNN from the unioned candidates of the α copies.

Theorem 1:Using α = O(1) and γ = O(k), zχ-kNN guaranteesan expected constant factor approximate kNN result withO(logf

NB + k/B) number of page accesses (clustered index

on z-values ).

Page 30: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

8-1

Approximation algorithm

Candidates C = ∅;For i = 0, . . . , α {

Find zip as the successor of zq+vi in P i;

Let Ci be γ points up and down next to zip in P i;

For each point p in Ci, let p = p− vi;C = C

⋃Ci;

}Let Aχ = kNN(q, C) and output Aχ.

zχ-kNN (point q, point sets {P 0, . . . , Pα})

Page 31: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

8-2

Approximation algorithm

Candidates C = ∅;For i = 0, . . . , α {

Find zip as the successor of zq+vi in P i;

Let Ci be γ points up and down next to zip in P i;

For each point p in Ci, let p = p− vi;C = C

⋃Ci;

}Let Aχ = kNN(q, C) and output Aχ.

zχ-kNN (point q, point sets {P 0, . . . , Pα})

Page 32: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

8-3

Approximation algorithm

Candidates C = ∅;For i = 0, . . . , α {

Find zip as the successor of zq+vi in P i;

Let Ci be γ points up and down next to zip in P i;

For each point p in Ci, let p = p− vi;C = C

⋃Ci;

}Let Aχ = kNN(q, C) and output Aχ.

zχ-kNN (point q, point sets {P 0, . . . , Pα})

Page 33: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

8-4

Approximation algorithm

Candidates C = ∅;For i = 0, . . . , α {

Find zip as the successor of zq+vi in P i;

Let Ci be γ points up and down next to zip in P i;

For each point p in Ci, let p = p− vi;C = C

⋃Ci;

}Let Aχ = kNN(q, C) and output Aχ.

zχ-kNN (point q, point sets {P 0, . . . , Pα})

Page 34: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

8-5

Approximation algorithm

Candidates C = ∅;For i = 0, . . . , α {

Find zip as the successor of zq+vi in P i;

Let Ci be γ points up and down next to zip in P i;

For each point p in Ci, let p = p− vi;C = C

⋃Ci;

}Let Aχ = kNN(q, C) and output Aχ.

zχ-kNN (point q, point sets {P 0, . . . , Pα})

Page 35: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

9-1

SQL statement for approximation algorithm

SELECT TOP k * FROM(SELECT TOP γ + 1 * FROM RP ,

(SELECT TOP 1 zval FROM RPWHERE RP .zval ≥ q.zvalORDER BY RP .zval ASC ) AS T

WHERE RP .zval≥T.zvalORDER BY RP .zval ASC

UNIONSELECT TOP γ * FROM RPWHERE RP .zval < T.zvalORDER BY RP .zval DESC ) AS C

ORDER BY Euclidean(q.X1,q.X2,C.Y1,C.Y2)

Page 36: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

9-2

SQL statement for approximation algorithm

SELECT TOP k * FROM(SELECT TOP γ + 1 * FROM RP ,

(SELECT TOP 1 zval FROM RPWHERE RP .zval ≥ q.zvalORDER BY RP .zval ASC ) AS T

WHERE RP .zval≥T.zvalORDER BY RP .zval ASC

UNIONSELECT TOP γ * FROM RPWHERE RP .zval < T.zvalORDER BY RP .zval DESC ) AS C

ORDER BY Euclidean(q.X1,q.X2,C.Y1,C.Y2)

Page 37: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

9-3

SQL statement for approximation algorithm

SELECT TOP k * FROM(SELECT TOP γ + 1 * FROM RP ,

(SELECT TOP 1 zval FROM RPWHERE RP .zval ≥ q.zvalORDER BY RP .zval ASC ) AS T

WHERE RP .zval≥T.zvalORDER BY RP .zval ASC

UNIONSELECT TOP γ * FROM RPWHERE RP .zval < T.zvalORDER BY RP .zval DESC ) AS C

ORDER BY Euclidean(q.X1,q.X2,C.Y1,C.Y2)

Page 38: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

9-4

SQL statement for approximation algorithm

SELECT TOP k * FROM(SELECT TOP γ + 1 * FROM RP ,

(SELECT TOP 1 zval FROM RPWHERE RP .zval ≥ q.zvalORDER BY RP .zval ASC ) AS T

WHERE RP .zval≥T.zvalORDER BY RP .zval ASC

UNIONSELECT TOP γ * FROM RPWHERE RP .zval < T.zvalORDER BY RP .zval DESC ) AS C

ORDER BY Euclidean(q.X1,q.X2,C.Y1,C.Y2)

Page 39: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

10-1

Exact KNN retrieval: naive solution

The exact kNN points are enclosed by the approximate kthnearest neighbor ball.

Page 40: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

10-2

Exact KNN retrieval: naive solution

SELECT TOP k * FROM RPWHERE Euclidean(q.X1,q.X2,RP .Y1,RP .Y2)≤ rad(p,Aχ)ORDER BY Euclidean(q.X1,q.X2,RP .Y1,RP .Y2)

The exact kNN points are enclosed by the approximate kthnearest neighbor ball.

Page 41: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

10-3

Exact KNN retrieval: naive solution

SELECT TOP k * FROM RPWHERE Euclidean(q.X1,q.X2,RP .Y1,RP .Y2)≤ rad(p,Aχ)ORDER BY Euclidean(q.X1,q.X2,RP .Y1,RP .Y2)

The exact kNN points are enclosed by the approximate kthnearest neighbor ball.

Can we do better?

Page 42: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-1

Exact KNN retrieval

Mk = 3

Page 43: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-2

Exact KNN retrieval

Mk = 3

approximate kth nn ball

Page 44: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-3

Exact KNN retrieval

Mk = 3

approximate kth nn ball

exact kth nn ball

Page 45: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-4

Exact KNN retrieval

Mk = 3

approximate kth nn ball

exact kth nn ballapproximatekth nn box

Page 46: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-5

Exact KNN retrieval

Lemma 4: For a rectangular box M and its lower-left and upper-right corner points δ`, δh, ∀p ∈M , zp ∈ [z`, zh], where zp standsfor the z-value of a point p and z`, zh correspond to the z-valuesof δ` and δh respectively.

Mk = 3

approximate kth nn ball

exact kth nn ballapproximatekth nn box

Page 47: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-6

Exact KNN retrieval

Corollary 1: Let z` and zh be the z-values of δ` and δhpoints of M(Aχ). For all p ∈ A, zp ∈ [z`, zh].

Mk = 3

approximate kth nn ball

exact kth nn ballapproximatekth nn box

Page 48: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

11-7

Exact KNN retrieval

Corollary 1: Let z` and zh be the z-values of δ` and δhpoints of M(Aχ). For all p ∈ A, zp ∈ [z`, zh].

Mk = 3

approximate kth nn ball

exact kth nn ballapproximatekth nn box

Page 49: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

12-1

Exact KNN retrieval

Let γ` and γh denote the left and right γ-th points close tothe query point, if zγ`

≤ z` and zγh≥ zh in at least one of

the α tables, Aχ = A

Page 50: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

12-2

Exact KNN retrieval

Let γ` and γh denote the left and right γ-th points close tothe query point, if zγ`

≤ z` and zγh≥ zh in at least one of

the α tables, Aχ = A

Page 51: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

13-1

Exact KNN retrieval

If not, we can find A by doing a range query with [zj` , zjh] on any

of the α tables. Ideally, we use the table with smallest [zj` , zjh].

Page 52: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

13-2

Exact KNN retrieval

If not, we can find A by doing a range query with [zj` , zjh] on any

of the α tables. Ideally, we use the table with smallest [zj` , zjh].

Page 53: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

13-3

Exact KNN retrieval

If not, we can find A by doing a range query with [zj` , zjh] on any

of the α tables. Ideally, we use the table with smallest [zj` , zjh].

Page 54: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

13-4

Exact KNN retrieval

If not, we can find A by doing a range query with [zj` , zjh] on any

of the α tables. Ideally, we use the table with smallest [zj` , zjh].

Page 55: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

13-5

Exact KNN retrieval

If not, we can find A by doing a range query with [zj` , zjh] on any

of the α tables. Ideally, we use the table with smallest [zj` , zjh].

Page 56: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

14-1

KNN-join, higher dimensions and updates

Our approach can easily and efficiently support join queries.

Page 57: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

14-2

KNN-join, higher dimensions and updates

Deal with data in any dimension: without changing the frame-work; for large dimensionality (say d > 20), using LSH-basedmethod.

Our approach can easily and efficiently support join queries.

Page 58: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

14-3

KNN-join, higher dimensions and updates

Deal with data in any dimension: without changing the frame-work; for large dimensionality (say d > 20), using LSH-basedmethod.

Updates: for deletion, delete record r based on its pid fromall talbes R0, . . . ,Rα; for insertion, calculate the z-values ofthe point for all randomly shifted versions, insert them intocorresponding tables.

Our approach can easily and efficiently support join queries.

Page 59: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

15-1

Experiment Setup

All algorithms are implemented in Microsoft SQL Server 2005.Experiments are conducted on an Intel Xeon CPU @ 2.33GHz.The memory of the SQL Server is set to 1.5GB.

Page 60: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

15-2

Experiment Setup

All algorithms are implemented in Microsoft SQL Server 2005.Experiments are conducted on an Intel Xeon CPU @ 2.33GHz.The memory of the SQL Server is set to 1.5GB.

Real data sets: points representing the road-networks for statesin USA

Page 61: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

15-3

Experiment Setup

All algorithms are implemented in Microsoft SQL Server 2005.Experiments are conducted on an Intel Xeon CPU @ 2.33GHz.The memory of the SQL Server is set to 1.5GB.

Real data sets: points representing the road-networks for statesin USA

Two synthetic data sets: uniform points and random clusteredpoints.

Page 62: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

15-4

Experiment Setup

All algorithms are implemented in Microsoft SQL Server 2005.Experiments are conducted on an Intel Xeon CPU @ 2.33GHz.The memory of the SQL Server is set to 1.5GB.

Real data sets: points representing the road-networks for statesin USA

Two synthetic data sets: uniform points and random clusteredpoints.

Compare against the Medrank and iDistance algorithms (im-plemented by SQL statement and store precedure).

Page 63: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

16-1

Experiment Setup

The default experimental parameters are summarized below

Symbol Definition Default Valuek number of neighbors 10N size of points set 1,000,000α randomly shifted copies 2γ number of points up and down 2kd dimensionality 2

Page 64: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

17-1

Results for the kNN query: approximation quality

Page 65: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

18-1

Results for the kNN query: approximation quality

Page 66: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

18-2

Results for the kNN query: approximation quality

Page 67: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

18-3

Results for the kNN query: approximation quality

Page 68: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-1

Results for the kNN query: running time

Page 69: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-2

Results for the kNN query: running time

UN

Page 70: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-3

Results for the kNN query: running time

California

Page 71: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-4

Results for the kNN query: running time

UN

Page 72: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-5

Results for the kNN query: running time

California

Page 73: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

19-6

Results for the kNN query: running time

Page 74: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

20-1

Conclusions

Presented a constant approximation for the kNN query, with loga-rithmic page accesses in any fixed dimension and extended it to theexact solution, both using just O(1) random shifts.

Page 75: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

20-2

Conclusions

Presented a constant approximation for the kNN query, with loga-rithmic page accesses in any fixed dimension and extended it to theexact solution, both using just O(1) random shifts.

All the algorithms can be implemented by SQL operators in rela-tional databases.

Page 76: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

20-3

Conclusions

Presented a constant approximation for the kNN query, with loga-rithmic page accesses in any fixed dimension and extended it to theexact solution, both using just O(1) random shifts.

Our approach naturally supports kNN-Joins.

All the algorithms can be implemented by SQL operators in rela-tional databases.

Page 77: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

20-4

Conclusions

Presented a constant approximation for the kNN query, with loga-rithmic page accesses in any fixed dimension and extended it to theexact solution, both using just O(1) random shifts.

Our approach naturally supports kNN-Joins.

No changes are required for different dimensions, and the up-date is trivial.

All the algorithms can be implemented by SQL operators in rela-tional databases.

Page 78: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

20-5

Conclusions

Presented a constant approximation for the kNN query, with loga-rithmic page accesses in any fixed dimension and extended it to theexact solution, both using just O(1) random shifts.

Our approach naturally supports kNN-Joins.

No changes are required for different dimensions, and the up-date is trivial.

Study other related, interesting queries in this framework, e.g.,the reverse nearest neighbor queries.

Examine the relational algorithms to the data space other thanthe Lp-norms, such as the road networks.

Future research:

All the algorithms can be implemented by SQL operators in rela-tional databases.

Page 79: Large Relational Databases (Almost) for Free K Nearest Neighbor …lifeifei/papers/KNN.pdf · 2011-08-09 · Introduction KNN queries and KNN-Joins: spatial databases, pattern recognition,

21-1

The End

THANK Y OU

Q and A