gis spatial data structures

50
© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GIS Carleton University Jörg-Rüdiger Sack School of Computer Science, Carleton University Ottawa, Canada K1S 5B6, [email protected] 95.5204 : Spatial Data Structures for GIS

Upload: scribdwt

Post on 18-Nov-2014

540 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Jörg-Rüdiger Sack

School of Computer Science, Carleton University

Ottawa, Canada K1S 5B6, [email protected]

95.5204 :

Spatial Data Structuresfor GIS

Page 2: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Geometric Objects

A geometric object is an object which characterizes a geometric component, i.e., the

• location and

• shape

of the object in space.

In addition, there is the attribute component which we will ignore for the discussion in this chapter.

Page 3: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Example

Planar subdivisions for example are collections of polygons which represent towns or municipality regions.

The geometric information about the location of the place is stored through the polygon.

(Non-geometric information such as name, size, …. are also stored.)

Page 4: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Operations

There are many operations that need to be carried out on geometric objects, these include:

• point in polygon (point location)

• traversal of a subregion (window queries)

• intersection tests

• ….

• other operations include:

– distance, containment, intersection

Page 5: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Operations cont’d

1. objects are stored on disc examining, i.e., retrieving all objects is extremely inefficient!

2. checking each object is time-consuming (even after retrieval) as the geometry may be complex.

Idea: support spatial queries to geometric objects by realizing a filter, i.e., providing a superset of the solution set and subsequently refine that set to the correct solution.

Page 6: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Filter

Sometimes this approach is referred to as

coarse filter

fine filter

where coarse filter refees to the retrieval of a subset of adjacent objects

followed by the fine filter which analyzes geometric properties of the objects.

Page 7: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

The Idea of a Filter

Create a bounding box for 2-d geometric objects.

Bounding box: = smallest axis parallel rectangle containing the geometric object

The database search key for the geometric object is now that of the bounding box.

There are many data structures for multi-dimensional

For d dimensional objects, let Ui = universe in the ith dimension. Then

U = U1x U2 x U3 … x Ud is the d-dimensional universe containing all geometric objects.

Page 8: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Filter cont’d

G : be a particular set of geometric objectsg ε G described as:

– g.b d-dim bounding box– g.rest other attributes that are not relevant for the search

g = (b, rest)

b= (l1, r1, l2, r2,…, ld, rd) d-dim interval

[l1, r1] x … x [ld, rd] where b.li : left and r.ri is the right interval boundary of the ith interval.

we use: g. li for g.b. li and g. ri for g.b. ri

Page 9: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Example

l1r1

r2

l2

dim 1

dim 2

Page 10: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

The Task

Task: find a secondary storage structure S supporting the following operations:

(1)Range query

(2) Search

(3) Insert

(4)Remove (delete)

more formally next

Page 11: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Rangequery

Rangequery (w, S(G))

range w, G is stored in S

report all objects g in G with g.b ∩ w ≠ Ø

assumption: two rectangles that only intersect at a boundary do not intersect, i.e.,

intersection (A,B) := closure (interior of A ∩ interior of B)

Page 12: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Rangequery cont’d

1

2

3 4

5

6

reports: 1, 6, 3, 5

7

Page 13: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Search

Search (b, S(G))

for bounding box b and G stored in S

report all objects g in G with g.b =b

Page 14: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

search - example

g

the object g (blue)has bounding boxmatching the querybox

g’

Page 15: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Search

Insert (g, S(G))

S(G) := S(G U {g}) add g to G and store it in S

Page 16: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Remove (Delete)

Remove (Delete) (b, S(G))

remove object g is g.b = b andS(G) := S(G \ {g}) remove g from G and store the result

Page 17: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Comments

1. While uniqueness is somewhat the underlying assumption it does not pose any serious implementation difficulties.

2. For insert, search and delete

the key is spatial, but

the spatial location is not referenced

-> this can be handled by traditional secondary data structures such as B-trees, dynamic hashing, …

e.g., map the 2d key components into one 1-dimensional key (lexicographic)

Page 18: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Comments

Thus searchers can be handled!

Problem: Queries of type Rangequery

they are space relevant and the above storage schemes show serious deficiencies

Page 19: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Objective

Find data structure for geometric objects such as points, polygons etc that allow efficient retrieval.

Primary concern:

When accessing data, long chains of pointers that are crossing disk block boundaries must! be avoided.

Game: design data structures with

– small internal memory access structure

– efficient dynamically updates

Page 20: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Basic Concepts

Basic Concepts for spatial structures

access time: DRAM (dynamic random access memory) chips for personal computers have access times of 50 to 150 nanoseconds (billionths of a second).

Fast hard disk drives for personal computers boast access times of about 9 to 15 milliseconds.

Note that this is about 200 times slower than average DRAM.

Page 21: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Basic Concepts

Actually many machines have even larger ratios than that.

Typical numbers are:

Memory access time (seconds): 10-7 … 10-6

Disc access time (seconds): 10-2 … 10-1

ratio disc/memory access time: 104 … 105

Page 22: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Basic Concepts

Typical size of transfer unit (bits):

Memory : 10 … 102

Disc : 104 … 105

ratio disc/memory transfer size: 102 … 103

Page 23: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Basic Concepts

The time for an operation is thus determined by the time to retrieve the data + the time required to carry out the local computation.

For many operations, # of disc accesses is the dominating factor. However, there are geometric problems where also the internal computations are costly.

Page 24: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Objective

Find data structure for geometric objects such as points, polygons etc that allow efficient retrieval.

Primary concern:

When accessing data, long chains of pointers that are crossing disk block boundaries must! be avoided.

Game: design data structures with

– small internal memory access structure

– efficient dynamically updates

Page 25: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Proximity

Data on discs are seen to be organized in BLOCKS.

A block is a unit of data that is retrieved in one shot from a disc.

A block contains many data, these should be useful for the algorithm and its execution,.

1. local maintenance of proximity; i.e, physically close in space

2. global maintenance of proximity; objects stored in adjacent blocks are physically close.

Page 26: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Proximity

especially the last points is very difficult to obtain.

There is no perfect data organization!

Even small improvements in that, yield accelerations that are noticeable.

Page 27: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Central issue

Organizing the embedding space versus organizing its content.

We will discuss data organizations who are dependent on the data and mostly those who are dependent on the space.

This is the key distinction between space and non-spatial data structures.

Page 28: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Non-spatial data structures

Data structures for non-spatial data any search structure that you may have encountered for example: binary search tree.

•searches are comparative:

•structures exist and are readily available also balanced

– AVL, 2-3 trees, red-black trees

excellent search structures also for statistical queries including median, percentiles,

Page 29: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Non-spatial data structures

Such data structures are not designed for, nor can they efficiently handle:• general location queries

– nearest neighbour

– identify clusters in data

Page 30: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Review of address computation schemes

1. Hashing

2. radix trees

3. tries

these assign an address of a storage cell to any key value x

(course notes)

Page 31: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

k-d trees

k-d trees were invented by Bentley ’75

as generalizations of search trees i.e. comparative

other relevant structures:

Lueker 78, Lee&Wong ’77, Willard’78, Bentley’79, Bentley and Maurer’80

Page 32: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

k-d trees

An example:

x : 50

y : 15 y : 4

dim 1

dim 2

dim 3

dim d

dim 1

dim 2

Page 33: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

k-d trees

Problems:

• it is hard to balance these structures, i.e., get log height

• 1-d is easy

• space partitioning created lacks regularity

• difficult neighbour queries

Page 34: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

First approaches

First approaches to spatial data structures

• based on the existing search structures

• data stored!

• not the space in which the data was embedded

Page 35: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

filter illustration fora rectangular space partitioning

query q

hit

drop ignored

querycells

report allobjects thatintersect q

the oval is examinedand then droped

not retrieved

Page 36: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Comment

Spatial data structures cover the space with cells.

Each cell is stored on disc and therefore is associated with a disc block or blocks.

Page 37: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Three-phase model

Three steps:1. Cell addressing

for a given query find all “cells” of the partitiongthat could contain elements relevant to query

2. Coarse filterretrieve the elements found in Step 1 from disc

3. Fine Filterexamine the elements (Step 2) if they fit the query

Page 38: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Tree-based schemes

Work has been done on the internal memory data structures: segment trees and range trees

and how they can be extended external storage.

This is not covered here. Could be a good topic for a class presentation.

Page 39: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Three philosophies

1. Space driven: 1. multi-dimensional linear hashing,2. space filling curves3. ...

2. Data driven1. k-d-B-trees2. ….

3. Combinations1. grid file and its variants2. Bang file, ….

Page 40: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Linear hashing

viewed as a spatial data structurepartition the 1-d data space into intervals

interval sizes half of previous; simple addressing scheme

0

0 1

0 1

1

2

20

3

36 754

Page 41: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

doubling

Doubling is typically adding a bit to the front (or back) of the string created thus far.

e.g., in some of the schemes you would see

0 1 00 10 01 11

addedbit

this means that when you run out of space a piece of the same size is appendedresulting in a doubling of the space used. However address calculations are simple!

Page 42: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

MOLPHE

Multidimensional Order Preserving Linear Hashing

00

00

1

11

2 23 3

4

5

6

7

Note the alternation of split in the dimensions. 1st split by x; 2nd split by y; 3rd split again by x-axis. Note also the each block is split.

Page 43: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

z-hashing

Dynamic z-hashing

00

00

1

1

12

23 3

4 5

6 7

Note the addressing function is different to the one given above. The reason is that proximity is better maintained between adjacent blocks.

Page 44: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

space-filling curves

The above schemes define a traversal of the space.

Here we list other space filling curves that are typically used.

They have different properties and studies have been carried out on them.

E.g., Peano, z-ordering and Hilbert

Page 45: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

space-filling curves

Hilbert

Z-orderG.M. Morton

Page 46: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Z-order

z-order of a point withcoordinate x,yis obtain by bit-wise interleaving ofthe x and y bits.

25

Ex.:y = 2 = 010x = 5 = 101

25 = 0 1 1 0 0 1

Page 47: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Z-order

z-order of a point withcoordinate x,yis obtain by bit-wise interleaving ofthe x and y bits.range queriesare possibleslight care needs to be taken to find successors of point in z-order

Page 48: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Hilbert curve: maping

range queriesmore natural,but successorfunction moredifficult thanwith z-ordering.

Page 49: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Hilbert curve cont’d

direction in whichto draw the elementsof the Hilbnert curve

Page 50: Gis Spatial Data Structures

© Jörg-Rüdiger Sack Course Notes School of Computer Science Computational Aspects of GISCarleton University

Peano