approximate nn queries on streams with guaranteed error/performance bounds

10
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore

Upload: bertha-hardy

Post on 31-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

Approximate NN queries on Streams with Guaranteed Error/performance Bounds. Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore. Problem. Problem: kNN search. Environment: data stream (one scan; memory constraint). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Approximate NN queries on Streams with Guaranteed Error/performance

Bounds

Nick Koudas @ AT&T labs-research

Beng Chin Ooi , Kian-Lee Tan , Rui Zhang

@ National University of Singapore

Page 2: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Problem• Problem: kNN search.• Environment: data stream (one scan; memory

constraint).• Approximate Solution: e-approximate kNN (ekNN).• Motivation: Applications in which absolute error is

preferable or more straightforward.

IP: 137.132.48.120137.132.48.121

Page 3: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

• Two Optimization Problems:– memory optimization for a given error

bound: given an error bound e, use as little memory as possible to answer ekNN queries.

– error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries.

• Requirements:– One scan algorithm.– Satisfies the constraints.– Efficient updates and query processing.

Page 4: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

A Framework• Divide space into equal square-shaped cells.• Maintain at most K points in each cell.• For any k≤K, absolute error of kNN distance is

bounded by dM, the maximum distance within a cell. For Euclidean distance: dM =where d is dimensionality; u is the number of cells each dim is divided to.

ud /

Page 5: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Maintenance of the Points--aDaptive Indexing on Streams by space-filling Curves (DISC)

• Cells are not explicitly maintained, only points.

• Cells linearized according to Z-curve.

• Z-value of the cell is the key of a point.

• Points maintained in a B*-tree.

• An efficient merge-cell algorithm possible.

Page 6: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Algorithm: Build index• m: the order of Z-curve, 2m cells each dim.• If e given, , we get .

me is integer, so • If memory constraint given, set a large enough m.• Build index

– Initialize m– Read a record P, calculate Z-value, search the B*-tree and find out Nc:

number of existing points in the cell P belongs to.– If Nc < K

• Insert P to the B*-tree.

– Else• Discard one and insert P.

– If memory runs out //this only happens for the error minimization problem• Merge cells and let m=m-1

– Go back to Step 2 (Read next record)

ed em 2/ )/(log2 edme

)/(log2 edme

Page 7: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Algorithm: Merge Cells

• General Merge-Cell– Apply to any structure.

– For each new cell, find all the points of the old cells in it, and merge them.

• Bulk Merge-Cell– Only apply to DISC.

– Scan all the leaf pages once.

Page 8: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Algorithm: KNN search

• W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s.

• Find the kNN to Q within W.– If the kNN distance is no larger

than the distance between the nearest side of W to Q and Q, search terminates;

– Else increase s by 1/u .

Page 9: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Experiments

Page 10: Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Questions ?