spatial dbms issues otb research institute for housing, urban and mobility studies 2007-04-13 1...

34
2007-04-13 Spatial DBMS issues 1 OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS Research GISt lunch meeting Wilko Quak

Upload: valentine-mathews

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13

Spatial DBMS issues

1

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS Research

GISt lunch meeting

Wilko Quak

Page 2: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 2

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Overview

• Introduction to DBMS Query Processing• Benchmarking a spatial DBMS• The GeoInfoNed project• MonetDB• Discussion

Page 3: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 3

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Introduction to DBMS query processing

• Slides borrowed from Dr. Yang He

Page 4: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 4

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing overview

• Review relational algebra• Query processing

• introduction• stages of query processing• query optimisation• relational algebra tree

Page 5: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 5

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Relational algebra (1)

• a relational languages proposed by Codd • implementable• basis of high-level (SQL) query execution• a collection of simple, 'low-level’

operations used to manipulate relations • input is one or more relations• output is one relation

Page 6: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 6

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Relational algebra (2)

• Relational operations• unary operators

• Restrict (Select) • Project

• binary operators• Cartesian product X

• Union • Intersection • Difference -

• Join

• Divide P

Page 7: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 7

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. two relations Student and Registration

Example relations

Student ( SID, Name, Gender )Registration ( SID, CID, Mark )

SID Name Gender

S1 Kate F

S2 John M

S3 Kate F

S4 Fred M

StudentSID CID Mark

S1 C1 65

S1 C2 45

S2 C2 80

S2 C4 60

S3 C1 50

S3 C2 75

S4 C3 70

Registration

Page 8: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 8

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. “Identify all male students” • in SQL

• in relational algebra

(Gender=‘M’) ( Student )

Queries examples (1)

Select

SID Name Gender

S2 John M

S4 Fred M

SELECT SID, Name, GenderFROM StudentWHERE Gender=′M′;

Page 9: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 9

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g. “List student’s name and gender.”• in SQL

• In relational algebra

Name, Gender ( Student )

Queries examples (2)

Project

Name Gender

Kate F

John M

Fred M

SELECT Name, GenderFROM Student;

Page 10: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 10

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Queries examples (3)

• e.g. “Show student ID, name, their course ID and marks”

• in SQL

• in relational algebra

SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

SID, Name, CID, Mark ( Student Registration ) or

Project Natural Join

Page 11: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 11

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• A user query may require several operations to be performed• relational algebra is a procedural

language so query operations are evaluated in the order specified

• a complex query can be executed in different ways, so an efficient one should be used as efficiency is an important DBMS requirement – query optimisation

Queries in relational algebra

Page 12: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 12

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing

• Four stages involved in query processing • query decomposition or parsing• query optimization• code generation• runtime query execution

Page 13: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query optimization (1)

• refers to the activity of choosing an efficient execution strategy or plan for processing a query• rule-based and cost-based strategies• database statistics in system catalog

used for cost estimation• is a prime objective of the query

processing

Page 14: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 14

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query optimization (3)

• In a query processing, disk access takes most time

• The main objective of the query optimisation is to minimize the number of disk accesses

• Many DBMSs use heuristic rules for query optimization

• e.g. “Perform selection and projection operations as early as possible to reduce the cardinality of the relation and the subsequent process of that relation”

Page 15: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 15

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Query processing – an example

• e.g. “Show student ID, name, their course ID and marks”

• in SQL

• it can be transformed into relational algebra query

SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

SID, Name, CID, Mark ( Student Registration ) or

The first one is better: much less disk access than the

second

Page 16: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 16

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

• e.g.

Relational algebra query tree (2)

Student Registration

SID, Name SID,CID,Marke

Leaf nodes

Intermediate nodes

Root

( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark

Page 17: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 17

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Spatial Query processing

In spatial query processing the operator is a spatial operator, for the rest it is the same as non-spatial query processing:• Spatial Select

• Find all objects within given rectangle [99.99%]

• Spatial Join (overlay in GIS terms)• Find all restaurants within national parks

Page 18: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 18

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

DBMS Benchmarking

• Categorization of DBMS usage• Implications for benchmarking• benchmark choices

Page 19: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 19

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Categories of DBMS users

• Static usage:• Predefined queries with changing parameters• Queries can be hand optimized

• Dynamic usage (browsing):• Many different queries• Query optimizer is important

• Access via object-relational mapping (e.g. Hibernate)• Not discussed here

All categories need different benchmarking

Page 20: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 20

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Benchmarking static DBMS usage

Notes:• Critical factor is testing the ‘query processor’.• Query optimizer is not importantBenchmark:• Make small set of simple queries that test one

operation

Page 21: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 21

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Benchmarking dynamic DBMS usage

Notes:• Critical factor is testing the ‘query optimizer’.• Very hard to get quality reproducible results.• It is very hard to assess the quality of the query

optimizer but a small testset might give some insight:select city.name,river.namefrom city,riverwhere city.inhabitants > X and distance(city.geometry,river.geometry) < Y;

Page 22: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 22

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Other benchmarking considerations

• Functionality• Usability• update behaviour

Page 23: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 23

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

GeoInfoNed – RGI-232

Page 24: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 24

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

GeoInfoNed -- What and Why

Build a spatially enabled DBMS because:• A DBMS is at the core of many system. If you

improve the core the whole system improves.• There is a need for an (open source)

experimentation platform for Geo DBMS research.

Page 25: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 25

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Who

• CWI – Leading DBMS experts with MonetDB

• TUDelft/OTB – Knowledge of spatial processes

• CycloMedia – Huge dataset and interesting problems

• RWS/AGI – Large and diverse datasets and interesting problems

Page 26: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 26

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

How

• At CWI there is the MonetDB DBMS. First we will extend it with basic spatial types (According to OpenGIS).

• Together with our ‘Problem Holder’ partners we will find directions for more extensions.

• MonetDB already has support for: Image Data, XML storage and querying etc.

Page 27: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 27

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Example

• Is there a relationship between traffic accidents and objects near the road?

GeoInfoNed

Page 28: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 28

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB Introduction*

• Hardware trends• MonetDB design considerations• MonetDB architecture

*Slides borrowed from CWI

Page 29: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Hardware Trends50% p/year:

- cpu speed

- mem size

- mem bandwidth

- disk bandwidth1% p/year:

- mem latency

10% p/year:

- disk latency

Page 30: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Latency is the enemy!

• Commercial DBMS products (oracle, DB2, SQLserver) stem from OLTP roots

• focus on minimizing random I/Os => depend on latency!

• MonetDB: built for bulk access• optimize CPU and memory performance

Latency is one of the killing factors in Friso’s simplicial homology

implementation

Page 31: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 31

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB design considerations

• Multi-model database kernel support• Extensible data types, operators, accelerators• Database hot-set is memory resident• Simple data structures are better• Index management should be automatic• Do not replicate the operating system• Optimize when you know the situation• Cooperative transaction management

Page 32: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 32

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Monetkernels

MAPI protocol

JDBC

C-mapi lib

Perl

End-user application

ODBC PHP Python

SQL XQuery

MonetDB product family

Here a MATLAB interface and Frank’s life would be easier

Page 33: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 33

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

MonetDB - Physical data organization• Binary Association Tables

ID Day Discount10 4/4/98 0.19511 9/4/98 0.06512 1/2/98 0.17513 7/2/98 0

OID ID100 10101 11102 12103 13104 14

OID Day100 4/4/98101 9/4/98102 1/2/98103 7/2/98104 1/2/99

OID Discount100 0.195101 0.065102 0.175103 0104 0.065

Page 34: Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies 2007-04-13 1 Spatial DBMS Research GISt lunch meeting Wilko Quak

2007-04-13 34

OTB Research Institute for Housing, Urban and Mobility Studies

Spatial DBMS issues

Discussion