fast algorithm for nearest neighbor search based on a lower bound tree yong-sheng chen yi-ping hung...
TRANSCRIPT
Fast Algorithm for Nearest Neighbor Search Based on a
Lower Bound Tree
Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh
8th International Conference on Computer Vision
2
OutlineIntroductionMultilevel Structure and LB-TreeAgglomerative ClusteringData TransformationWinner-Update SearchConclusions
3
Nearest Neighbor Search Problem
isi
i pq 1min argˆ
q
ip
Given a fixed set of s points in Rd, P={p1,p2,…,ps}
For a query point q, find in P the point, pî, closest to q
1p
2p3p
sp
D1
D2
Dd
4
Applications of Nearest Neighbor Search
Object (pattern) recognition Image matching
Stereo matchingData compression
Block motion estimationVector quantization
Information retrieval in database systems Image and video databasesDNA sequence databases
5
Space partition methodsk-d tree, 1975R-tree, 1984SS-tree, 1996SR-tree, 1997Pyramid, 1998
Elimination-based methodsBranch and bound, 1975Projection elimination, 1975, 1987Threshold rejection, 1997
Literature Review
6
Central Idea of This WorkWe skip distance calculation for a point in data
base by calculating and comparing its distance lower bound.Distance lower bound was derive from Minkowski’s
inequality.Computational cost of the distance lower bound is l
ess than that of the distance itself.Optimal search is guaranteed.
7
Central Idea of This Work Reduce the number of distance lower bounds
actually calculated by using an LB-tree constructed in the preprocessing stage the winner-update search strategy for traversing the
constructed LB-tree
Tighten distance lower bounds by constructing the LB-tree with an agglomerative
clustering method transforming each data point with
Wavelet transform KL transform (principal component analysis)
8
Proposed Algorithm for NN Search
Sample points in database
data transformation
Query point
data transformation
search by using winner-update strategy
agglomerative clustering
multilevel structure
Lower-bound tree
9
Multilevel Structure of a Point
For a point p=[p1, p2, …, pd] in Rd, d=2L, we denote its multilevel structure: {p0, p1, …, pL}
876543213 ppppppppp
43212 ppppp
211 ppp
10 pp
43212 qqqq q
876543213 qqqqqqqqq
211 qq q
10 q q
Llll ,...,022
,qpqp
2
00 qp
2
11 qp
2
22 qp
2
33 qp
EX
10
Lower Bound TreeMultilevel structure
s of every points in the database, p1,p2,…,ps
Idea of node reductionSelect representati
vesHierarchical, agglo
merative clustering
11
Hierarchical, Agglomerative Clustering
D1
D2
Ex:
r1 r2
12
Lower Bound Tree Representative fo
r each cluster mean point ml
j*
distance rlj* of the
farthest point in the cluster from ml
j*
Each point p*l satisfies
l
j
l
j
lr **
2
* mp
13
Distance Lower Bound Using LB-Tree For each point p*, we can derive the lower bound
of its distance to the query point q:
l
j
ll
jr **
2 qm
2
*
2
* llqpqp
p*l
ql
2
*
2**
l
j
lll
jmpqm
mlj*
rlj*
l
j
ll
j
ll
jrd ***
2),( qmqmLB
14
Data TransformationWhy Data Transformation?
Make anterior dims more discriminative than posterior dims.
Lower bound can be tightened.By data content, two transformations used i
n this work:Haar wavelet transform(autocorrelated data)KL transform (principal component analysis)
(object recognition)
15
Winner-Update Search for LB-Tree Traversal
8 2
3 2
9 63
4 7
Given a query point q
Calculate dLB for all the nodes in level 0Choose the node having the minimum dLB as the temporary winner
While the winner is not at the bottom level
Replace the winner node with its childrenCalculate dLB for each new child nodeChoose the node having the minimum dLB as the temporary winner
Output the final winner
16
Other Query TypesWinner-update algorithm can be easily
extended to support other useful query types:Progressive search for k-nearest neighborsSearch for k-nearest neighbors within a
distance thresholdSearch for neighbors close to the nearest
neighbor
17
Conclusions Fast nearest neighbor search techniques, includin
g Lower bound tree Winner-update search strategy Data transformation
Wavelet transform Principal component analysis
Various useful query types According to our experiments on Nayar’s object re
cognition database, our algorithm can be more than one thousand faster than the full search algorithm.
18
Thank you.