chap 2 – dynamic versions of r trees

39
Chap 2 – Dynamic Versions of R- trees R-Trees Theory and Applications 指指指指Kun-Ta Chuang 指指Bo-Heng Chen

Upload: hendry-chen

Post on 11-May-2015

344 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chap 2 – dynamic versions of r trees

Chap 2 – Dynamic Versions of R-treesR-Trees Theory and Applications

指導教授: Kun-Ta Chuang學生: Bo-Heng Chen

Page 2: Chap 2 – dynamic versions of r trees

Abstract

• Introduction• Assorted R-tree Variants• Node Splitting• Branch Grafting• Compact R-trees & cR-trees• Deviating Variations• Summary

Page 3: Chap 2 – dynamic versions of r trees

Introduction

• Mentor– Volker Gaede• http://www.informatik.uni-trier.de/~ley/db/indices/a-tr

ee/g/Gaede:Volker.html

– Oliver Günther, U. Potsdam • http://www.informatik.uni-trier.de/~ley/db/indices/a-tr

ee/g/G=uuml=nther:Oliver.html

Page 4: Chap 2 – dynamic versions of r trees

(cont.)

• Dynamic versions of the R-tree– The objects are inserted on a one-by-one basis

• Focus on the way in assorted R-tree variants– dynamic insertions– dynamic splits

Page 5: Chap 2 – dynamic versions of r trees

R+-tree

• The original R-tree has two important disadvantages– The execution of a point location query in an R-tree

may lead to the investigation of several paths from the root to the leaf level• This characteristic may lead to performance deterioration,

specifically when the overlap of the MBRs is significant

– A few large rectangles may increase the degree of overlap significantly• Leading to performance degradation during range query

execution, due to empty space

Page 6: Chap 2 – dynamic versions of r trees

(cont.)

• R+-tree were proposed as a structure– avoids visiting multiple paths during point location

queries• Thus, the retrieval performance could be improved

– MBR overlapping of internal nodes is avoided• R+-trees do not allow overlapping of MBRs at the same

tree level• To achieve this, a specific object’s entries may be

duplicated and redundantly stored in several nodes

Page 7: Chap 2 – dynamic versions of r trees

(cont.)

Object d is stored in two leaf nodes B and C !

Page 8: Chap 2 – dynamic versions of r trees

(cont.)

• R+-tree - Insert

m

m m

m

Call SplitNode(B) !

Perform appropriate tree reorganization to reflect changes

Scenario 1Scenario 2

Call Insert(m,A.ptr) !Call Insert(m,D.ptr) !

Page 9: Chap 2 – dynamic versions of r trees

(cont.)

• R+-tree – Delete– All copies of and object’s MBR must be removed

from the corresponding leaf nodesm

m m

Adjust the MBR of the parent node accordingly

Call Delete(m,A.ptr) !Call Delete(m,D.ptr) !Remove m from A.ptr

entry Aentry D

Calculate the new MBR of the nodeRemove m from D.ptr

Page 10: Chap 2 – dynamic versions of r trees

(cont.)

• Main difference between the R+-tree splitting algorithm and that of R-tree– In the R-tree, upward propagation is sufficient to

guarantee the structure’s integrity– In the R+-tree, downward propagation may be

necessary, in addition to the upward propagation

Page 11: Chap 2 – dynamic versions of r trees

R*-tree

• As already discussed, the R-tree is based solely on the area minimization of each MBR

• The criteria considered by the R*-tree are the following– Minimization of the area covered by each MBR– Minimization of overlap between MBRs– Minimization of MBR margins (perimeters)– Maximization of storage utilization

Page 12: Chap 2 – dynamic versions of r trees

(cont.)

• R*-tree – Insert– For the insertion of a new entry, we have to

decide which branch to follow, at each level of the tree• This algorithm is called ChooseSubtree

choose the entry whose MBR needs the least area enlargement to cover k

For the leaf nodes, ChooseSubtree considers the overlapping minimization criterion, because experimental results in [19]

[19]N. Beckmann, H.P. Kriegel, R. Schneider and B. Seeger: “The R -tree: an Efficient∗and Robust Method for Points and Rectangles”, Proceedings ACM SIGMODConference on Management of Data, pp.322-331, Atlantic City, NJ, 1990.

Page 13: Chap 2 – dynamic versions of r trees

(cont.)

• R*-tree – Insert– In case ChooseSubtree selects a leaf that cannot

accommodate the new entry, the R∗-tree does not immediately resort to node splitting

Assume that N1 is overflowed.

Entry b is selected for reinsertion, as its centroid is the farthest from the centroid of N1Reinsertion is a costly operation. Therefore, only one application of reinsertion is permitted for each level of the tree

Page 14: Chap 2 – dynamic versions of r trees

(cont.)

The final division is the one that has the minimum overlap between the MBRs of the resulting nodes

It considers every division of the sorted list that ensures that each node is at least 40% full

When overflow cannot be handled by reinsertion,node splitting is performed

Page 15: Chap 2 – dynamic versions of r trees

(cont.)

• R*-tree – Delete– Deletion in the R -tree is performed with the ∗

deletion algorithm of the original R-tree

Page 16: Chap 2 – dynamic versions of r trees

The Hilbert R-tree

• Hybrid structure based on the R-tree and the B+-tree• An entry e of an internal node is a triplet(mbr,H,p)

– mbr is the MBR that encloses all the objects in the corresponding subtree

– H is the maximum Hilbert value of the subtree– p is the pointer to the next level

• Entries in leaf nodes are exactly the same as in R-trees, R+-trees, and R*-trees and are of the form (mbr,oid)– mbr is the MBR of the object– oid is the corresponding object identifier

Page 17: Chap 2 – dynamic versions of r trees

(cont.)

• Hilbert tree - Insert

Page 18: Chap 2 – dynamic versions of r trees

(cont.)

• An important characteristic of the Hilbert R-tree that is missing from other variants– There exists an order of the nodes at each tree level,

respecting the Hilbert order of the MBRs• Instead of splitting a node immediately after its

capacity has been exceeded, we try to store some entries in sibling nodes– A split takes places only if all siblings are also full.– This unique property of the Hilbert R-tree helps

considerably in storage utilization increase, and avoids unnecessary split operations

Page 19: Chap 2 – dynamic versions of r trees

(cont.)

Page 20: Chap 2 – dynamic versions of r trees

Linear Node Splitting

• Criterion of this algorithm– distribute the objects between the two nodes as

evenly as possible– the minimization of the overlapping between

them– the minimization of the total coverage

Page 21: Chap 2 – dynamic versions of r trees

Optimal Node Splitting

• Three node splitting algorithms were proposed by Guttman to handle a node overflow– linear algorithm

• More time efficient but fails to determine an optimal rectangle bipartition

– exponential algorithm• Achieve the optimal bipartitioning of the rectangles, at the

expense of increased splitting cost

– quadratic algorithm• The best compromise between efficiency and bipartition

optimally

Page 22: Chap 2 – dynamic versions of r trees

Branch Grafting

• Objectives– achieve better-shaped R-trees and to reduce the

total number of nodes• Both these factors can improve performance during

query processing

Page 23: Chap 2 – dynamic versions of r trees

(cont.)

Page 24: Chap 2 – dynamic versions of r trees

(cont.)

• Example

H

C

Page 25: Chap 2 – dynamic versions of r trees

Compact R-trees

• Motivation– Rtrees, R+-trees, and R*-trees suffer from the

storage utilization problem, which is around 70% in the average case

– Therefore, the authors improve the insertion mechanism of R-trees to a more compact R-tree structure, with no penalty on performance during queries

Page 26: Chap 2 – dynamic versions of r trees

(cont.)

• Among theM+1 entries of an overflowing node during insertions, a set of M entries is selected to remain in this node, under the constraint that the resulting MBR is the minimum possible

• Then the remaining entry is inserted to a sibling that– has available space– whose MBR is enlarged as little as possible

Page 27: Chap 2 – dynamic versions of r trees

(cont.)

• Performance evaluation results have shown that the storage utilization of the new heuristic is between 97% and 99%

• A direct impact of the storage utilization improvement is the fact that fewer tree nodes are required to index a given dataset

• Moreover, less time is required to build the tree by individual insertions, because of the reduced number of split operations required

Page 28: Chap 2 – dynamic versions of r trees

cR-trees

The empirical studies provided in the paper illustrate that the cR-tree query performance was competitive with the R*-tree and was much better than that of the R-tree

Page 29: Chap 2 – dynamic versions of r trees

Deviating Variations

• Sphere-tree– uses minimum bounding spheres instead of MBRs

• Cell-tree– Uses minimum bounding polygons designed to

accommodate arbitrary shape objects• P-tree(Polyhedral tree)– use minimum bounding polygons instead of MBRs

• QR-tree– hybrid access method composed of a Quadtree and a

forest of R-trees

Page 30: Chap 2 – dynamic versions of r trees

PR-trees

• A provably asymptotically optimal variation of the R-tree

• height-balance tree– i.e., all its leaves are at the same level

• Query performance– real data

• PR-trees perform similar to existing R-tree variants

– extreme data(very skewed data)• PR-trees outperform all other variants, due to their

guaranteed worst-case performance

Page 31: Chap 2 – dynamic versions of r trees

LR-trees

• The LR-tree is an index structure based on the logarithmic dynamization method

• Example– base B=2 , capacity c=4– 11 items

Page 32: Chap 2 – dynamic versions of r trees

(cont.)

11 = 10112 12 = 11002

Page 33: Chap 2 – dynamic versions of r trees

Summary

• Evidently, the original R-tree, proposed by Guttman, has influenced all the forthcoming variations of dynamic R-tree structures

• The R*-tree followed an engineering approach and evaluated several factors that affect the performance of the R-tree– it is considered the most robust variant and has

found numerous applications, in both research and commercial systems

Page 34: Chap 2 – dynamic versions of r trees

(cont.)

• The empirical study has shown that the Hilbert R-tree can perform better than the other variants in some cases

• PR-tree is the first approach that offers guaranteed worst-case performance and overcomes the degenerated cases when almost the entire tree has to be traversed– Despite its more complex building algorithm, it has

to be considered the best variant reported so far

Page 35: Chap 2 – dynamic versions of r trees

English

• In this chapter, we are further focusing on the family of R-trees by enlightening the similarities and differences, advantages and disadvantages of the variations in a more exhaustive manner. (P.15)

• We presented dynamic versions of the R-tree, where the objects are inserted on a one-by-one basis, as opposed to the case where a special packing technique can be applied to insert an priori known static set of object into the structure by optimizing the storage overhead and retrieval performance.(P.15)

Page 36: Chap 2 – dynamic versions of r trees

(cont.)

• As already discussed, the R-tree is based solely on the area minimization of each MBR.(P.18)

• The Hilbert R-tree [105] is a hybrid structure based on the R-tree and the B+-tree (P.20)

• Instead of splitting a node immediately after its capacity has been exceeded, we try to store some entries in sibling nodes(P.22)

Page 37: Chap 2 – dynamic versions of r trees

(cont.)

• Therefore, the best compromise between efficiency and bipartition optimality is the quadratic algorithm.(P.24)

• In particular, the objects of an overflowing node are optimally separated in two sets.

• The motivation behind the proposed approach is that R-trees, R+-trees, and R*-trees suffer from the storage utilization problem

Page 38: Chap 2 – dynamic versions of r trees

(cont.)

• It is worth mentioning that the PR-tree, although a variant that deviates from other existing ones, is the first approach that offers guaranteed worst-case performance and overcomes the degenerated cases when almost the entire tree has to be traversed.(P.34)

Page 39: Chap 2 – dynamic versions of r trees

(cont.)

• Therefore, despite its more complex building algorithm, it has to be considered the best variant reported so far.