spatial dbms issues otb research institute for housing, urban and mobility studies 2007-04-13 1...

Post on 03-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2007-04-13

Spatial DBMS issues

1

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS Research

GISt lunch meeting

Wilko Quak

2007-04-13 2

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Overview

• Introduction to DBMS Query Processing• Benchmarking a spatial DBMS• The GeoInfoNed project• MonetDB• Discussion

2007-04-13 3

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Introduction to DBMS query processing

• Slides borrowed from Dr. Yang He

2007-04-13 4

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing overview

• Review relational algebra• Query processing

• introduction• stages of query processing• query optimisation• relational algebra tree

2007-04-13 5

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Relational algebra (1)

• a relational languages proposed by Codd • implementable• basis of high-level (SQL) query execution• a collection of simple, 'low-level’

operations used to manipulate relations • input is one or more relations• output is one relation

2007-04-13 6

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Relational algebra (2)

• Relational operations• unary operators

• Restrict (Select) • Project

• binary operators• Cartesian product X

• Union • Intersection • Difference -

• Join

• Divide P

2007-04-13 7

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. two relations Student and Registration

Example relations

Student ( SID, Name, Gender )Registration ( SID, CID, Mark )

SID Name Gender

S1 Kate F

S2 John M

S3 Kate F

S4 Fred M

StudentSID CID Mark

S1 C1 65

S1 C2 45

S2 C2 80

S2 C4 60

S3 C1 50

S3 C2 75

S4 C3 70

Registration

2007-04-13 8

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. “Identify all male students” • in SQL

• in relational algebra

(Gender=‘M’) ( Student )

Queries examples (1)

Select

SID Name Gender

S2 John M

S4 Fred M

SELECT SID, Name, GenderFROM StudentWHERE Gender=′M′;

2007-04-13 9

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. “List student’s name and gender.”• in SQL

• In relational algebra

Name, Gender ( Student )

Queries examples (2)

Project

Name Gender

Kate F

John M

Fred M

SELECT Name, GenderFROM Student;

2007-04-13 10

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Queries examples (3)

• e.g. “Show student ID, name, their course ID and marks”

• in SQL

• in relational algebra

SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

SID, Name, CID, Mark ( Student Registration ) or

Project Natural Join

2007-04-13 11

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• A user query may require several operations to be performed• relational algebra is a procedural

language so query operations are evaluated in the order specified

• a complex query can be executed in different ways, so an efficient one should be used as efficiency is an important DBMS requirement – query optimisation

Queries in relational algebra

2007-04-13 12

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing

• Four stages involved in query processing • query decomposition or parsing• query optimization• code generation• runtime query execution

2007-04-13 13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query optimization (1)

• refers to the activity of choosing an efficient execution strategy or plan for processing a query• rule-based and cost-based strategies• database statistics in system catalog

used for cost estimation• is a prime objective of the query

processing

2007-04-13 14

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query optimization (3)

• In a query processing, disk access takes most time

• The main objective of the query optimisation is to minimize the number of disk accesses

• Many DBMSs use heuristic rules for query optimization

• e.g. “Perform selection and projection operations as early as possible to reduce the cardinality of the relation and the subsequent process of that relation”

2007-04-13 15

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing – an example

• e.g. “Show student ID, name, their course ID and marks”

• in SQL

• it can be transformed into relational algebra query

SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

SID, Name, CID, Mark ( Student Registration ) or

The first one is better: much less disk access than the

second

2007-04-13 16

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g.

Relational algebra query tree (2)

Student Registration

SID, Name SID,CID,Marke

Leaf nodes

Intermediate nodes

Root

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

2007-04-13 17

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Spatial Query processing

In spatial query processing the operator is a spatial operator, for the rest it is the same as non-spatial query processing:• Spatial Select

• Find all objects within given rectangle [99.99%]

• Spatial Join (overlay in GIS terms)• Find all restaurants within national parks

2007-04-13 18

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

DBMS Benchmarking

• Categorization of DBMS usage• Implications for benchmarking• benchmark choices

2007-04-13 19

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Categories of DBMS users

• Static usage:• Predefined queries with changing parameters• Queries can be hand optimized

• Dynamic usage (browsing):• Many different queries• Query optimizer is important

• Access via object-relational mapping (e.g. Hibernate)• Not discussed here

All categories need different benchmarking

2007-04-13 20

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Benchmarking static DBMS usage

Notes:• Critical factor is testing the ‘query processor’.• Query optimizer is not importantBenchmark:• Make small set of simple queries that test one

operation

2007-04-13 21

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Benchmarking dynamic DBMS usage

Notes:• Critical factor is testing the ‘query optimizer’.• Very hard to get quality reproducible results.• It is very hard to assess the quality of the query

optimizer but a small testset might give some insight:select city.name,river.namefrom city,riverwhere city.inhabitants > X and distance(city.geometry,river.geometry) < Y;

2007-04-13 22

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Other benchmarking considerations

• Functionality• Usability• update behaviour

2007-04-13 23

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

GeoInfoNed – RGI-232

2007-04-13 24

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

GeoInfoNed -- What and Why

Build a spatially enabled DBMS because:• A DBMS is at the core of many system. If you

improve the core the whole system improves.• There is a need for an (open source)

experimentation platform for Geo DBMS research.

2007-04-13 25

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Who

• CWI – Leading DBMS experts with MonetDB

• TUDelft/OTB – Knowledge of spatial processes

• CycloMedia – Huge dataset and interesting problems

• RWS/AGI – Large and diverse datasets and interesting problems

2007-04-13 26

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

How

• At CWI there is the MonetDB DBMS. First we will extend it with basic spatial types (According to OpenGIS).

• Together with our ‘Problem Holder’ partners we will find directions for more extensions.

• MonetDB already has support for: Image Data, XML storage and querying etc.

2007-04-13 27

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Example

• Is there a relationship between traffic accidents and objects near the road?

GeoInfoNed

2007-04-13 28

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB Introduction*

• Hardware trends• MonetDB design considerations• MonetDB architecture

*Slides borrowed from CWI

2007-04-13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Hardware Trends50% p/year:

- cpu speed

- mem size

- mem bandwidth

- disk bandwidth1% p/year:

- mem latency

10% p/year:

- disk latency

2007-04-13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Latency is the enemy!

• Commercial DBMS products (oracle, DB2, SQLserver) stem from OLTP roots

• focus on minimizing random I/Os => depend on latency!

• MonetDB: built for bulk access• optimize CPU and memory performance

Latency is one of the killing factors in Friso’s simplicial homology

implementation

2007-04-13 31

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB design considerations

• Multi-model database kernel support• Extensible data types, operators, accelerators• Database hot-set is memory resident• Simple data structures are better• Index management should be automatic• Do not replicate the operating system• Optimize when you know the situation• Cooperative transaction management

2007-04-13 32

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Monetkernels

MAPI protocol

JDBC

C-mapi lib

Perl

End-user application

ODBC PHP Python

SQL XQuery

MonetDB product family

Here a MATLAB interface and Frank’s life would be easier

2007-04-13 33

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB - Physical data organization• Binary Association Tables

ID Day Discount10 4/4/98 0.19511 9/4/98 0.06512 1/2/98 0.17513 7/2/98 0

OID ID100 10101 11102 12103 13104 14

OID Day100 4/4/98101 9/4/98102 1/2/98103 7/2/98104 1/2/99

OID Discount100 0.195101 0.065102 0.175103 0104 0.065

2007-04-13 34

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Discussion

top related