![Page 1: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/1.jpg)
![Page 2: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/2.jpg)
Spatial data requires special data structures, similar to B-trees
![Page 3: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/3.jpg)
Types of spatial data Example of spatial and geometric data –
splines and Voronoi diagrams One-dimensional index – interval trees R-trees T-tree variants
![Page 4: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/4.jpg)
Point Data◦ Points in a multidimensional space◦ E.g., Raster data such as satellite imagery,
where each pixel stores a measured value◦ E.g., Feature vectors extracted from text
Region Data◦ Objects have spatial extent with location and
boundary◦ DB typically uses geometric approximations
constructed using line segments, polygons, etc., called vector data.
![Page 5: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/5.jpg)
Different types of sampling are used to collect data
![Page 6: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/6.jpg)
![Page 7: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/7.jpg)
![Page 8: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/8.jpg)
![Page 9: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/9.jpg)
![Page 10: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/10.jpg)
Low-pass filter – the value for the cell is computed as average of other cells
High-pass-continuous surface –low pass
![Page 11: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/11.jpg)
Window size has effect on filtering
![Page 12: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/12.jpg)
![Page 13: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/13.jpg)
Spatial Range Queries◦ Find all cities within 50 miles of Madison◦ Query has associated region (location, boundary)◦ Answer includes ovelapping or contained data regions
Nearest-Neighbor Queries◦ Find the 10 cities nearest to Madison◦ Results must be ordered by proximity
Spatial Join Queries◦ Find all cities near a lake◦ Expensive, join condition involves regions and proximity
![Page 14: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/14.jpg)
Geographic Information Systems (GIS)◦ E.g., ESRI’s ArcInfo; OpenGIS Consortium◦ Geospatial information◦ All classes of spatial queries and data are common
Computer-Aided Design/Manufacturing◦ Store spatial objects such as surface of airplane fuselage◦ Range queries and spatial join queries are common
Multimedia Databases◦ Images, video, text, etc. stored and retrieved by content◦ First converted to feature vector form; high
dimensionality◦ Nearest-neighbor queries are the most common
![Page 15: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/15.jpg)
B+ trees are fundamentally single-dimensional indexes. When we create a composite search key B+ tree, e.g.,
an index on <age, sal>, we effectively linearize the 2-dimensional space since we sort entries first by age and then by sal.
Consider entries:<11, 80>, <12, 10><12, 20>, <13, 75>
11 12 13
70605040302010
80
B+ treeorder
![Page 16: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/16.jpg)
A multidimensional index clusters entries so as to exploit “nearness” in multidimensional space.
Keeping track of entries and maintaining a balanced index structure presents a challenge!
Consider entries:<11, 80>, <12, 10><12, 20>, <13, 75>
Spatialclusters
70605040302010
80
B+ treeorder
11 12 13
![Page 17: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/17.jpg)
Spatial queries (GIS, CAD).◦ Find all hotels within a radius of 5 miles from the
conference venue.◦ Find the city with population 500,000 or more that is
nearest to Kalamazoo, MI.◦ Find all cities that lie on the Nile in Egypt.◦ Find all parts that touch the fuselage (in a plane
design). Multidimensional range queries.
◦ 50 < age < 55 AND 80K < sal < 90K
![Page 18: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/18.jpg)
Similarity queries (content-based retrieval).◦ Given a face, find the five
most similar faces/expressions.
![Page 19: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/19.jpg)
Geometric, 1-dimensional tree Interval is defined by (x1,x2) Split at the middle (5), again at the middle
(3,7), again at the middle (2,8) All intervals intersecting a middle point are
stored at the corresponding root.
1 2 3 4 5 6 7 8 9
(4,6) (4,8)
(6,9)
(7.5,8.5)
(2,4)
![Page 20: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/20.jpg)
Finding intervals – by finding x1, x2 against the nodes
Find interval containing specific value – from the root
Sort intervals within each node of the tree according to their coorsinates
Cost of the “stabbing query”– finding all intervals containing the specified value is O(log n + k), where k is the number of reported intervals.
![Page 21: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/21.jpg)
Constructs the minimal bounding box (mbb)
Check validity (predicate) on mbb Refinement step verifies if actual
objects satisfy the predicate.
![Page 22: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/22.jpg)
An index based on spatial location needed.◦ One-dimensional indexes don’t support
multidimensional searching efficiently. ◦ Hash indexes only support point queries; want
to support range queries as well.◦ Must support inserts and deletes gracefully.
Ideally, want to support non-point data as well (e.g., lines, shapes).
The R-tree meets these requirements, and variants are widely used today.
![Page 23: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/23.jpg)
The R-tree is a tree-structured index that remains balanced on inserts and deletes.
Each key stored in a leaf entry is intuitively a box, or collection of intervals, with one interval per dimension.
Example in 2-D:
X
Y
Root ofR Tree
Leaf level
![Page 24: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/24.jpg)
Leaf entry = < n-dimensional box, rid >◦ This is Alternative (2), with key value being a box.◦ Box is the tightest bounding box for a data object.
Non-leaf entry = < n-dim box, ptr to child node >◦ Box covers all boxes in child node (in fact, subtree).
All leaves at same distance from root. Nodes can be kept 50% full (except root).
◦ Can choose a parameter m that is <= 50%, and ensure that every node is at least m% full.
![Page 25: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/25.jpg)
R8R9
R10
R11
R12
R17
R18
R19
R13
R14
R15
R16
R1
R2
R3
R4
R5
R6
R7
Leaf entry
Index entry
Spatial objectapproximated by bounding box R8
![Page 26: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/26.jpg)
R1 R2
R3 R4 R5 R6 R7
R8 R9 R10 R11R12 R13R14 R15R16 R17R18R19
![Page 27: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/27.jpg)
Start at root.1. If current node is non-leaf, for each entry <E, ptr>, if box E overlaps Q, search subtree identified by ptr.2. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies an object that might overlap Q.
Note: May have to search several subtrees at each node!(In contrast, a B-tree equality search goes to just one leaf.)
![Page 28: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/28.jpg)
It is convenient to store boxes in the R-tree as approximations of arbitrary regions, because boxes can be represented compactly.
But why not use convex polygons to approximate query regions more accurately?◦ Will reduce overlap with nodes in tree, and reduce
the number of nodes fetched by avoiding some branches altogether.
◦ Cost of overlap test is higher than bounding box intersection, but it is a main-memory cost, and can actually be done quite efficiently. Generally a win.
![Page 29: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/29.jpg)
Start at root and go down to “best-fit” leaf L.◦ Go to child whose box needs least enlargement to
cover B; resolve ties by going to smallest area child.
If best-fit leaf L has space, insert entry and stop. Otherwise, split L into L1 and L2.◦ Adjust entry for L in its parent so that the box now
covers (only) L1.◦ Add an entry (in the parent node of L) for L2. (This
could cause the parent node to recursively split.)
![Page 30: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/30.jpg)
The entries in node L plus the newly inserted entry must be distributed between L1 and L2.
Goal is to reduce likelihood of both L1 and L2 being searched on subsequent queries.
Idea: Redistribute so as to minimize area of L1 plus area of L2.
Exhaustive algorithm is too slow; quadratic and linear heuristics are popular in research. GOOD SPLIT!
BAD!
![Page 31: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/31.jpg)
The R* tree uses the concept of forced reinserts to reduce overlap in tree nodes. When a node overflows, instead of splitting:◦ Remove some (say, 30% of the) entries and reinsert them
into the tree. ◦ Could result in all reinserted entries fitting on some existing
pages, avoiding a split. R* trees also use a different heuristic, minimizing box
perimeters rather than box areas during insertion. Another variant, the R+ tree, avoids overlap by
inserting an object into multiple leaves if necessary.◦ Searches now take a single path to a leaf, at cost of
redundancy.
![Page 32: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/32.jpg)
The Generalized Search Tree (GiST) abstracts the “tree” nature of a class of indexes including B+ trees and R-tree variants.◦ Striking similarities in insert/delete/search and even
concurrency control algorithms make it possible to provide “templates” for these algorithms that can be customized to obtain the many different tree index structures.
◦ B+ trees are so important (and simple enough to allow further specialization) that they are implemented specially in all DBMSs.
◦ GiST provides an alternative for implementing other tree indexes.
![Page 33: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/33.jpg)
Typically, high-dimensional datasets are collections of points, not regions.◦ E.g., Feature vectors in multimedia applications.◦ Very sparse
Nearest neighbor queries are common.◦ R-tree becomes worse than sequential scan for most
datasets with more than a dozen dimensions. As dimensionality increases contrast (ratio of
distances between nearest and farthest points) usually decreases; “nearest neighbor” is not meaningful.
![Page 34: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/34.jpg)
Spatial data management has many applications, including GIS, CAD/CAM, multimedia indexing.◦ Point and region data◦ Overlap/containment and nearest-neighbor
queries Many approaches to indexing spatial data
◦ R-tree approach is widely used in GIS systems◦ Other approaches include Grid Files, Quad trees,
and techniques based on “space-filling” curves.◦ For high-dimensional datasets, unless data has
good “contrast”, nearest-neighbor may not be well-separated
![Page 35: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/35.jpg)
Deletion consists of searching for the entry to be deleted, removing it, and if the node becomes under-full, deleting the node and then re-inserting the remaining entries.
Overall, works quite well for 2 and 3 D datasets. Several variants (notably, R+ and R* trees) have been proposed; widely used.
Can improve search performance by using a convex polygon to approximate query shape (instead of a bounding box) and testing for polygon-box intersection.
![Page 36: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/36.jpg)
"Print Gallery," by M.C. Escher. Curious about the blank spot in the middle of Escher’s 1956 lithograph, Hendrik Lenstra set out to learn whether the artist had encountered a mathematical problem he couldn’t solve. ©2002 Cordon Art B.V., Baarn, Holland. All rights reserved.
![Page 37: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/37.jpg)
Fixed grid: Stored as a 2D array,
each entry contains a link to a list of points (object) stored in a grid.
a,b
![Page 38: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/38.jpg)
Too many points in one grid cell:Solution A –overflow (linked list)Solution B- Split the cell and increase index!
![Page 39: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/39.jpg)
Rectangles may share different grid cells Rectangle duplicates are stored Grid cells are of fixed size
![Page 40: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/40.jpg)
In a grid file, the index is dynamically increased in size when overflow happens.
The space is split by a vertical or a horizontal line, and then further subdivided when overflow happens!
Index is dynamically growing Boundaries of cells of different sizes are
stores, thus point and stabbing queries are easy
![Page 41: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/41.jpg)
Instead of using an array as an index, use tree!
Quadtree decomposition – cells are indexed by using quaternary B-tree.
All cells are squares, not polygons. Search in a tree is faster!
![Page 42: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/42.jpg)
First three levels of a quad tree
![Page 43: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/43.jpg)
Project #32: PICTURE REPRESENTATION USING QUAD TREES, McGill University:
8 x 8 pixel picture
represented in a quad
tree
![Page 44: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/44.jpg)
Example of a grid file
![Page 45: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/45.jpg)
B+ index – actual references to rectangles are stored in the leaves, saving more space+ access time
Label nodes according to Z or “pi” order
![Page 46: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/46.jpg)
Level of detail increases as the number of quadtree decompositions increases!
Decompositions have indexes of a form:00,01,02,03,10,11,12,13, 2,300301 ,302 ,303 ,31 ,32 ,33 ◦ Stores as Bplus tree
![Page 47: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/47.jpg)
![Page 48: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/48.jpg)
![Page 49: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/49.jpg)
What is spatial data structure? What is the difference between grid and
grid file? Explain how z or p ordering works? Define interval trees Provide example of R-tree List R-tree variants How spatial index structure differs from
regular B+ tree?
![Page 50: Spatial data requires special data structures, similar to B-trees](https://reader031.vdocuments.mx/reader031/viewer/2022020106/56649d745503460f94a54e8e/html5/thumbnails/50.jpg)
Text 1 instructor’s resources McGill University web space Wikepedia (z order images) Face recognition research SPARCS lab project on image processing