r-trees: a dynamic index structure for spatial data

21
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman

Upload: octavius-roberts

Post on 05-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

R-Trees: A Dynamic Index Structure for Spatial Data. Antonin Guttman. R-Tree: Why, What … ?. Why do we need R-Trees? What are R-Trees? How do I perform operations? Alternatives? Why not a B+ tree?. Properties of R-Trees. Height Balanced 2 types of nodes Leaves point to disk pages - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: R-Trees: A Dynamic Index Structure for Spatial Data

R-Trees: A Dynamic Index Structure for Spatial Data

Antonin Guttman

Page 2: R-Trees: A Dynamic Index Structure for Spatial Data

R-Tree: Why, What … ?

Why do we need R-Trees? What are R-Trees? How do I perform operations? Alternatives? Why not a B+ tree?

Page 3: R-Trees: A Dynamic Index Structure for Spatial Data

Properties of R-Trees

Height Balanced 2 types of nodes Leaves point to disk pages Records in the leaves point to actual data objects For a max capacity of M, min occupancy should be

M/2 Completely dynamic Guaranteed Fan-out of M/2 Every leaf record is a smallest bounding box. Root has at least two children

Page 4: R-Trees: A Dynamic Index Structure for Spatial Data

R-Trees: The Structure.

Internal nodes : ( rectangle, child pointer)– N dimensional rectangle.– Pointer to all rectangles that are cointained.

Leaf Nodes : (MBR , tuple-identifier)– MBR is minimum bounding rectangle – Tuple-identifier is a pointer to the data object.

Page 5: R-Trees: A Dynamic Index Structure for Spatial Data

R-tree of order 4

Page 6: R-Trees: A Dynamic Index Structure for Spatial Data

Example

a b c d e f g h i j k l

m n o p

Page 7: R-Trees: A Dynamic Index Structure for Spatial Data

Example

a

b

cd

m

a b cd e f g h i j k l

m n o p

Page 8: R-Trees: A Dynamic Index Structure for Spatial Data

Example

a

b

cd

me fn

a b cd e f g h i j k l

m n o p

Page 9: R-Trees: A Dynamic Index Structure for Spatial Data

Example

a

b

cd

me fn

h

g

i

o p

a b cd e f g h i j k l

m n o p

Page 10: R-Trees: A Dynamic Index Structure for Spatial Data

R-Trees: Operations

Inserts Deletes Updates ( delete and re-insert) Queries/Searches

– Names of all the roads in 1 sq km area?– Which buildings would be encountered between Roger’s

Hall and Reitz Union? – Give me all rectangles that are contained in the input

rectangle.– Give me all rectangles intersecting this rectangle.

Page 11: R-Trees: A Dynamic Index Structure for Spatial Data

Insert

Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded.– Which leaf to insert into? (Choose Leaf)– How to split a node? (Node Split)

Page 12: R-Trees: A Dynamic Index Structure for Spatial Data

Insert: Choose Leaf

mn

o p

Page 13: R-Trees: A Dynamic Index Structure for Spatial Data

Insert : Choose Leaf

m

Page 14: R-Trees: A Dynamic Index Structure for Spatial Data

Insert: Choose Leaf

n

Page 15: R-Trees: A Dynamic Index Structure for Spatial Data

Insert: Choose Leaf

o

Page 16: R-Trees: A Dynamic Index Structure for Spatial Data

Insert: Choose leaf

p

Page 17: R-Trees: A Dynamic Index Structure for Spatial Data

Node Splitting

Quadratic method– Select max area gradient in the nodes as seeds.– Start clustering from the seeds

Linear method– Select seeds with max separation using max x, y – Randomly assign rectangles to seeds

Page 18: R-Trees: A Dynamic Index Structure for Spatial Data

Delete

Search for the rectangle If the rectangle is found, remove it. If the node is deficient,

– Put the remaining entries in a re-insert queue.– Adjust the parent rectangle if needed.– Continue this till you reach the root.– Re-insert in such a way that all internal nodes remain above the

leaf nodes. Adjust the rectangles making them smaller. Alternative sibling combination like a B-tree.

– But re-insertion shows similar performance and is simple to implement.

Page 19: R-Trees: A Dynamic Index Structure for Spatial Data

Performance Tests

R-Trees in C under UNIX on VAX11/780 computer running on 2D data(1057) for 5 page sizes

– Linear node split was better than quadratic as expected.– CPU time unchanged with page sizes, indicating that when one

side became full all split algorithms simply put everything in the other side.

– Delete is affected by the fill factor.– Search insensitive to the fill factor and split algorithm used.– Storage space is a function of the fill factor, page size and split

algorithm– All split algorithms came in 10% of the best exhaustive search and

split algorithm.

Page 20: R-Trees: A Dynamic Index Structure for Spatial Data

Performance: 2nd Innings

Same configuration but on various data sizes 1057, 2238, 3295 and 4559 rectangles.– Low CPU cost, close to 150 micro seconds.– Comparable performance of split algorithms– Most space was used by the leaf nodes

Page 21: R-Trees: A Dynamic Index Structure for Spatial Data

Conclusions from the paper.

R-Tree perform well for spatial data with non zero node sizes. With smaller node structure can be used as an in-memory

spatial data index.– CPU performance of in-memory R-tree index is comparable and

there is no IO cost. Linear split was almost as good as others.

– It was fast.– Node split quality was a bit off-target, but it did not hurt the search

performance noticeably. Possible use with abstract data types and abstract indexes to

streamline handling of spatial data.