approximate nn queries on streams with guaranteed error/performance bounds
DESCRIPTION
Approximate NN queries on Streams with Guaranteed Error/performance Bounds. Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore. Problem. Problem: kNN search. Environment: data stream (one scan; memory constraint). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/1.jpg)
Approximate NN queries on Streams with Guaranteed Error/performance
Bounds
Nick Koudas @ AT&T labs-research
Beng Chin Ooi , Kian-Lee Tan , Rui Zhang
@ National University of Singapore
![Page 2: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/2.jpg)
Problem• Problem: kNN search.• Environment: data stream (one scan; memory
constraint).• Approximate Solution: e-approximate kNN (ekNN).• Motivation: Applications in which absolute error is
preferable or more straightforward.
IP: 137.132.48.120137.132.48.121
…
![Page 3: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/3.jpg)
• Two Optimization Problems:– memory optimization for a given error
bound: given an error bound e, use as little memory as possible to answer ekNN queries.
– error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries.
• Requirements:– One scan algorithm.– Satisfies the constraints.– Efficient updates and query processing.
![Page 4: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/4.jpg)
A Framework• Divide space into equal square-shaped cells.• Maintain at most K points in each cell.• For any k≤K, absolute error of kNN distance is
bounded by dM, the maximum distance within a cell. For Euclidean distance: dM =where d is dimensionality; u is the number of cells each dim is divided to.
ud /
![Page 5: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/5.jpg)
Maintenance of the Points--aDaptive Indexing on Streams by space-filling Curves (DISC)
• Cells are not explicitly maintained, only points.
• Cells linearized according to Z-curve.
• Z-value of the cell is the key of a point.
• Points maintained in a B*-tree.
• An efficient merge-cell algorithm possible.
![Page 6: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/6.jpg)
Algorithm: Build index• m: the order of Z-curve, 2m cells each dim.• If e given, , we get .
me is integer, so • If memory constraint given, set a large enough m.• Build index
– Initialize m– Read a record P, calculate Z-value, search the B*-tree and find out Nc:
number of existing points in the cell P belongs to.– If Nc < K
• Insert P to the B*-tree.
– Else• Discard one and insert P.
– If memory runs out //this only happens for the error minimization problem• Merge cells and let m=m-1
– Go back to Step 2 (Read next record)
ed em 2/ )/(log2 edme
)/(log2 edme
![Page 7: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/7.jpg)
Algorithm: Merge Cells
• General Merge-Cell– Apply to any structure.
– For each new cell, find all the points of the old cells in it, and merge them.
• Bulk Merge-Cell– Only apply to DISC.
– Scan all the leaf pages once.
![Page 8: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/8.jpg)
Algorithm: KNN search
• W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s.
• Find the kNN to Q within W.– If the kNN distance is no larger
than the distance between the nearest side of W to Q and Q, search terminates;
– Else increase s by 1/u .
![Page 9: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/9.jpg)
Experiments
![Page 10: Approximate NN queries on Streams with Guaranteed Error/performance Bounds](https://reader035.vdocuments.mx/reader035/viewer/2022081002/56812cf0550346895d91bc74/html5/thumbnails/10.jpg)
Questions ?