trees for spatial indexing. tree (data structure) introduction b-tree,b+-tree,b*-tree spatial access...

36
Trees for spatial indexing

Upload: robyn-rogers

Post on 25-Dec-2015

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Trees for spatial indexing

Page 2: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Tree (data structure)

• Introduction• B-Tree,B+-Tree,B*-Tree• Spatial Access Method (SAM) vs Point

Access Method (PAM)• Buddy-Tree, UB-Tree (8 slides)• R-Tree• X-Tree, TV-Tree

Page 3: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Pantheon Problem

• 200’000’000 points are in a database.• Indexing in a B-Tree is not suffisant. We want to optimize

the query range.• Which indexing method should we use ?• What is the best structure ?

Page 4: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Pantheon

Page 5: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

What kind of data structure ?

Structur depends on what kind of data :• point access method : A data structure to search for lines, polygons,

… etc.– k-d tree– quadtree– UB-tree– buddy tree

• Spatial access method : A data structure and associated algorithms primarily to search for points defined in multidimensional space. – D-tree– P-tree– R+-tree– R-tree– R*-tree

Page 6: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Types of queries in spatial data

'geometry' refers to a point, line, box or other two or three dimensional shape, the kind of queries we need are :

• Distance(geometry, geometry) • Equals(geometry, geometry) • Disjoint(geometry, geometry) • Intersects(geometry, geometry) • Touches(geometry, geometry) • Crosses(geometry, geometry) • Overlaps(geometry, geometry) • Contains(geometry, geometry) • Intersects(geometry, geometry) • Several other operations performed on only one geometry such as

length, area and centroid

Page 7: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Introduction

• Some Definitions :

– Node : A node may contain a value or a condition or represent a separate data structure or a tree of its own. Each node in a tree has 0 or more child nodes. A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent.

– Root nodes : The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. Every node in a tree can be seen as the root node of the subtree rooted at that node.

– Leaf nodes : Nodes at the bottom most level of the tree are called Leaf nodes. Since they are at the bottom most level, they will not have any children.

Page 8: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Tree of the treesB-Tree

B+B*

R-Tree

X TV

UB-Tree

UBU

BuddyR*-Tree

Spatial Access Method (SAM) vs Point Access Method (PAM)

……

…… …

? ?

Page 9: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Common Operations

• Enumerating all the items • Searching for an item • Adding a new item at a certain position on the tree • Deleting an item • Removing a whole section of a tree (called pruning) • Adding a whole section to a tree (called grafting) • Finding the root for any node

Page 10: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

B-Tree• a B-tree is a tree data structure that keeps data sorted and allows insertions and

deletions in logarithmic amortized time. It is most commonly used in databases and filesystems.

• in a 2-3 B-tree (often simply 2-3 tree), each internal node may have only 2 or 3 child nodes.

• Each internal node's elements act as separation values which divide its subtrees.

Page 11: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

B+-Tree

• A B+ tree is a variation on a B-tree. In a B+ tree, in contrast to a B-tree, all data is saved in the leaves. Internal nodes contain only keys and tree pointers. All leaves are at the same lowest level. Leaf nodes are also linked together as a linked list to make range queries easy.

Page 12: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

R-Tree

• Extends the B+-Tree• All non-leaf node contains entries of form

(cp,rectangle) where cp is the address of a child node and rectangle is the minimum bounding box rectangle (MBR).

• ~ Leaf nodes contain entries of the form (dataObject,Rectangle).

• We use the term directory rectangle which is the MBR of the underlying rectangles.

Page 13: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

R-Tree properties

• Let M be the maximum number of entries that fit in one node and let m be a parameter specifying the minimum number of entries in a node (2 ≤ m ≤ M), an R-Tree statisfies the following properties– The root has at least two children unless it’s a leaf.– Every non-leaf node has beetween m and M children

unless it’s a root.– Every leaf node contains beetween m and M entries

unless it’s a root.– All leaves appear on the same level.

Page 14: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

PAM’s

• The basic principle of all multidimensional PAMs is to partition the data space into page regions. We classify PAMs according to 3 properties :

Rectangular Avoid empty-space

Disjoint PAM

x x UB-Tree

x Twin-grid file

x x x Buddy-Tree

Page 15: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Buddy-Tree

• The Buddy-Tree uses similar concepts as the R-Tree.

• But it is extended and has more interesting properties :– It does not partition empty space– Insertion and deletion of a record is restricted

to exactly one path.– It does not allow overlap in the directory

nodes.

Page 16: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Buddy-Tree : Formal Definition

• The nodes of the tree-directory consist of a collection of entries {E1,…,Ek}, k ≥ 2.

• Each entry Ei, 1 ≤ i ≤ k, is given by a tuple Ei=(Ri,pi) where Ri is a d-dimensional rectangle and pi is a pointer referring to as subtree or to a data page containing all the records of the file which are in the rectangle Ri.

• The set of rectangles in a directory node must be a regular B-partition

Page 17: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

B-Rectangle, B-partition

• Given 2 d-dimensional rectangles R,S with R ≤ S, R is called a B-rectangle of S iff it can be generated by successive halfing of S.

• A B-region of R, written B(R) is the smallest rectangle such that R ≤ B.

• Such a B-region also exists for a union of rectangles R1 U R2 U … U Rk, k ≥ 1.

• A set of d-dimensional rectangles {R1,…,Rk}, k ≥ 1, is called a B-partition of the data space D, iff B(Ri) ∩ B(Rj) = Ø

Page 18: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

The Buddies

• Let V = {R1,…,Rk} a B-partition, k > 1, and let S,T Є V, S ≠T.

• The rectangles S,T are called buddies iff B(S U T) ∩ B(R) = Ø For all R Є V\{S,T}

S

T

S

T

S,T are Buddies S,T are NOT Buddies

Page 19: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Dynamic behavior

• To obtain an efficient dynamic behavior it must be possible to merge without destroying the order preservation.

• For this the regions of the pages must be buddies.• In the buddy-tree the set of rectangles in a directory

node must be a regular B-partition.• We say that a B-parition is regular iff all B-rectangles

B(Ri) 1 ≤ i ≤ k can be represented in a kd-trie.• A kd-trie is a binary tree where the internal ndoes consist

of an axis and 2 pointers referring to subtrees.

Page 20: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Example

• Here we say a regular B-Partition because we can represent it by a kd-trie

s t2

t1

t3 t1

t2

t3s

B-PartitionKd-trie

Page 21: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

UB-Tree (Universal B-Tree)

• Methods with good performance are guaranted for only 1 dimension. UB-Tree can handle multidimensional data.

• We can implement the UB-Tree on top of any database system. ( by preprocessing techniques )

Page 22: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

UB-Tree (Universal B-Tree)[2]

• Basic Concepts– Area : First we Partition a cube C of dimension n into 2n

subcubes numbered : sc(i) for i=1,2,…,2n.– For example : in 2 dimensions.

Sc(1) Sc(2) Sc(3) Sc(4)

AreaC(k) := Ui=1 to k, sc(i) for k = 0,1,…,2n

AreaC(k.j) := AreaC(k) U Areasc(k+1)(J)Area(3)

Page 23: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Concept of AddressAn address α is a sequence

I1,i2,… il where ij Є 0,1,… 2n

For example this area has address 0.3, noted alpha(A) = 0.3

Page 24: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Definitions and lemmas

• Region : is the difference of 2 areas.• Address of pixel : is the address of the

area defined by including the pixel as the last and smallest subcube contained in this Area.

• There is a one-to-one map beetween Cartesian coordinates (x1,x2,…,xn) of a n-dimensional pixel and its address α.

• Alpha(cart(α)) = α

Page 25: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Definitions and lemmas[2]

• A point (x1,x2,…xn) has address region(β,δ), Γ = alpha(x1,x2,…,xn), it belong to the unique region(β,δ) with the condition β< Γ.

region(0.1,3)

Page 26: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Range Queries

• The query is defined by an interval for each dimension. Each dimension can be beetween (-∞,+∞).

• The query is the cartesian product of the intervals for all dimensions, called the query box.

Page 27: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Range queries (2)• Definition : we call all subcubes of level s of a cube brothers.• Those with a smaller address are younger and those with a larger are

older.

Page 28: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Range queries (3)

Page 29: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Complexity of UB-Tree

• N is the number of objects, k = 1/2M. Let Q be the number of objects intersecting the querybox q. Let r be the number of regions intersecting q.

• Point-Query : O(logk(N))

• Range Query : r * O(logk(N)), For points only it’s : (N*Q/M) * O(logk(N))

• Point insertion : O(logk(N))

Page 30: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Spatial Access Method

• Spatial indexes are used by spatial databases to optimize spatial queries. Indexes used by non-spatial databases cannot effectively handle features such as how far two points differ and whether points fall within a spatial area of interest.

• TV-Tree

• X-Tree

Page 31: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

TV-Tree (Telescopic-Vector tree)

• The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

Page 32: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

TV-tree

• We have also a hierarchical structure:• The objects are clustered into leaf nodes

of the tree, and the (MBR), minimum bounding region is stored in the parent node.

• Parents are recursively grouped, until the root is formed.

• At the top levels it’s optimal because it uses only a few basic features.

Page 33: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

TV-tree

• The TV-tree can be applied to a tree with nodes that describe bounding regions of any shape (cubes,spheres,rectangles, … etc ).

Page 34: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Telescoping function

• The telescoping problem can be described as follows.

• Given an n x 1 feature vector x and m x n (m≤n) contraction matrix Am.

• The Amx is an m-contraction of x.• A sequence of such matrices Am with m=1,…

describes a telescoping function provided that the following condition is satisfied : If the m1-contractions of the 2 vectors x and y are equal, then so are their respective m2-contractions, for every m2 ≤ m1.

Page 35: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

Multiple shapes• We can use for example a sphere,

because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r.

• ~the euclidean distance is a special case of the Lp metrics with p=2.

• For L1 metric (manhattan distance) it defines a diamond shape.

• The TV-tree is working with any Lp-sphere.

Page 36: Trees for spatial indexing. Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree,

TMBR (Telescopic Minimum Bounding Region)

• Each node in the TV-Tree represents the MBR (an Lp-sphere) of all its descendents.

• Each region is represented by a center, which is a vector determined by the telescoping vectors representing the objects and a scalar radius.

• We use the term TMBR to denote an MBR with such a telescopic vector as a center.