answering metric skyline queries by pm-tree

12
Answering Metric Skyline Queries by PM- tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague

Upload: nerita

Post on 23-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Answering Metric Skyline Queries by PM-tree. Tomáš Skopal, Jakub Lokoč Department of Software Engineering , FMP, Charles University in Prague. Similarity search. content - based similarity search single- example queries range query kNN query multi - example queries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Answering Metric Skyline Queries by PM-tree

Answering Metric Skyline Queries by PM-tree

Tomáš Skopal, Jakub LokočDepartment of Software Engineering, FMP,

Charles University in Prague

Page 2: Answering Metric Skyline Queries by PM-tree

Similarity search

• content-based similarity search

• single-example queries– range query– kNN query

• multi-example queries– combination of single-example queries is not sufficient– should support

• partial matching• compromise

– metric skylineDATESO 2010, Štědronín - Plazy

Page 3: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Metric skyline query (MSQ)• traditional skyline operator

– linearly order-ed attribute domains

– dominance relation• + MDDRs (minimum

dominating-dominatedrectangles)

– static schema• metric skyline

– multi-example query (not just operator)– attributes specified at query time – ith attribute = distance of database object to ith

query example Qi

– result set interpretation: objects similar to all query examples yet distinct (dissimilar to each other)

– dynamic schema, cannot be reduced to the classic skyline operator for efficient skyline processing• i.e., the coordinate system is established at query time

Page 4: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Generic algorithm for a hierarchic metric index

• branch-and-bound algorithm (originally developed for R-tree and classic/spatial skyline operator)

• dynamic mapping of the metric space into L1 vector space (examples)

• heuristics: data/regions processed in L1 order guarantee no false dismissals– a priority heap is used, storing index entries equipped

by MDDRs to be inspected (higher priority = lower L1 order of MDDR)

The algorithm:0) The entry of the entire index is pushed on the heap (e.g., M-tree root node).1) An entry with the lowest L1 distance of its MDDR is popped from the heap.2) If the entry contains just one data object (e.g., entry in an M-tree leaf), it is added to the

skyline set, while removing all entries from the heap dominated by the entry. Jump to 1.3) If the entry is a region (e.g., entry in an M-tree inner node), its child node is fetched. The

MDDRs of the child node’s entries are checked for dominance by the already determined skyline set, while the dominated ones are filtered from further processing.

4) The MDDRs of the non-filtered child entries are derived, while those not dominated by the current skyline set are pushed into the heap. Jump to 1.

L1 L1

Page 5: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

M-tree• metric index based on B+-tree• inner node contains routing entries

– ball regions (object and radius) + distance to parent region + pointer to subtree

• leaf node contains ground entries– object + distance to parent region

• 2 types of filtering by querying– parent filtering (cheap)

• stored distance to parent is used– basic filtering (expensive)

• distance computation needed

Page 6: Answering Metric Skyline Queries by PM-tree

MSQ implementation using M-tree

DATESO 2010, Štědronín - Plazy

• uses the generic algorithm enhanced by specific M-tree MDDRs, mapping the M-tree regions from metric space into L1 vector space (dimensions are distances of data/regions to the query examples Qi)

• 2 types of M-tree MDDR– Par-MDDR

• the mapped oversized region ball (using the distance to parent)

– B-MDDR• the mapped

region ball

Page 7: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

PM-tree• combination of M-tree and pivot tables (LAESA)• M-tree balls reduced by rings centered in global pivots Pi

• routing and ground entries store also the ring radii• enhanced filtering

– cheaply in pivot space (mapping of data/balls into L∞ vector space)• mapping of the query object into the pivot space is the only extra computation costs

– if not filtered out in pivot space, regular M-tree filtering

Page 8: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Paper contribution: PM-tree MSQ implementation

• B-MDDR, Par-MDDR (inherited from M-tree) • Piv-MDDR

– using PM-tree rings the MDDR can be tightened – for each dimension (example Qi) the maximal lower bound and minimal

upper bound distance to the region is found (to the rings intersection)• pivot skyline

– skyline initialized by pivots mapped to the L1 space

– heavy optimization(reduction of heap size)

• deferred heap processing– reinsertions into heap to save distance computations

Page 9: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Experiments• subset of the CoPhIR database, one million 76-dimensional tuples

representing 2 MPEG7 features on flickr images, Euclidean distance used• Polygons database, 250k 30-dimensional tuples representing 5-15 vertex

2D polygons, Hausdorff distance used• average over 200 metric skyline queries• each metric skyline query defined by 2-5 query examples

Page 10: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Experiments

Page 11: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Experiments

Page 12: Answering Metric Skyline Queries by PM-tree

DATESO 2010, Štědronín - Plazy

Conclusions

• PM-tree based metric skyline query implementation– up to 2x faster in terms of distance computations and I/O

cost (wrt original M-tree implementation)– up to 20x faster in terms of heap operations– needs up to 20x less space for the heap

Thank you for your attention!

Questions?