spatial dbms issues otb research institute for housing, urban and mobility studies 2007-04-13 1...
TRANSCRIPT
2007-04-13
Spatial DBMS issues
1
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS Research
GISt lunch meeting
Wilko Quak
2007-04-13 2
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Overview
• Introduction to DBMS Query Processing• Benchmarking a spatial DBMS• The GeoInfoNed project• MonetDB• Discussion
2007-04-13 3
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Introduction to DBMS query processing
• Slides borrowed from Dr. Yang He
2007-04-13 4
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Query processing overview
• Review relational algebra• Query processing
• introduction• stages of query processing• query optimisation• relational algebra tree
2007-04-13 5
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Relational algebra (1)
• a relational languages proposed by Codd • implementable• basis of high-level (SQL) query execution• a collection of simple, 'low-level’
operations used to manipulate relations • input is one or more relations• output is one relation
2007-04-13 6
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Relational algebra (2)
• Relational operations• unary operators
• Restrict (Select) • Project
• binary operators• Cartesian product X
• Union • Intersection • Difference -
• Join
• Divide P
2007-04-13 7
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
• e.g. two relations Student and Registration
Example relations
Student ( SID, Name, Gender )Registration ( SID, CID, Mark )
SID Name Gender
S1 Kate F
S2 John M
S3 Kate F
S4 Fred M
StudentSID CID Mark
S1 C1 65
S1 C2 45
S2 C2 80
S2 C4 60
S3 C1 50
S3 C2 75
S4 C3 70
Registration
2007-04-13 8
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
• e.g. “Identify all male students” • in SQL
• in relational algebra
(Gender=‘M’) ( Student )
Queries examples (1)
Select
SID Name Gender
S2 John M
S4 Fred M
SELECT SID, Name, GenderFROM StudentWHERE Gender=′M′;
2007-04-13 9
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
• e.g. “List student’s name and gender.”• in SQL
• In relational algebra
Name, Gender ( Student )
Queries examples (2)
Project
Name Gender
Kate F
John M
Fred M
SELECT Name, GenderFROM Student;
2007-04-13 10
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Queries examples (3)
• e.g. “Show student ID, name, their course ID and marks”
• in SQL
• in relational algebra
SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;
( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark
SID, Name, CID, Mark ( Student Registration ) or
Project Natural Join
2007-04-13 11
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
• A user query may require several operations to be performed• relational algebra is a procedural
language so query operations are evaluated in the order specified
• a complex query can be executed in different ways, so an efficient one should be used as efficiency is an important DBMS requirement – query optimisation
Queries in relational algebra
2007-04-13 12
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Query processing
• Four stages involved in query processing • query decomposition or parsing• query optimization• code generation• runtime query execution
2007-04-13 13
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Query optimization (1)
• refers to the activity of choosing an efficient execution strategy or plan for processing a query• rule-based and cost-based strategies• database statistics in system catalog
used for cost estimation• is a prime objective of the query
processing
2007-04-13 14
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Query optimization (3)
• In a query processing, disk access takes most time
• The main objective of the query optimisation is to minimize the number of disk accesses
• Many DBMSs use heuristic rules for query optimization
• e.g. “Perform selection and projection operations as early as possible to reduce the cardinality of the relation and the subsequent process of that relation”
2007-04-13 15
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Query processing – an example
• e.g. “Show student ID, name, their course ID and marks”
• in SQL
• it can be transformed into relational algebra query
SELECT s.SID, Name, CID, MarkFROM Student s, Registration rWHERE s.SID = r.SID;
( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark
SID, Name, CID, Mark ( Student Registration ) or
The first one is better: much less disk access than the
second
2007-04-13 16
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
• e.g.
Relational algebra query tree (2)
Student Registration
SID, Name SID,CID,Marke
Leaf nodes
Intermediate nodes
Root
( SID, Name(Student) ) ( (Registration) ) SID,CID,Mark
2007-04-13 17
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Spatial Query processing
In spatial query processing the operator is a spatial operator, for the rest it is the same as non-spatial query processing:• Spatial Select
• Find all objects within given rectangle [99.99%]
• Spatial Join (overlay in GIS terms)• Find all restaurants within national parks
2007-04-13 18
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
DBMS Benchmarking
• Categorization of DBMS usage• Implications for benchmarking• benchmark choices
2007-04-13 19
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Categories of DBMS users
• Static usage:• Predefined queries with changing parameters• Queries can be hand optimized
• Dynamic usage (browsing):• Many different queries• Query optimizer is important
• Access via object-relational mapping (e.g. Hibernate)• Not discussed here
All categories need different benchmarking
2007-04-13 20
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Benchmarking static DBMS usage
Notes:• Critical factor is testing the ‘query processor’.• Query optimizer is not importantBenchmark:• Make small set of simple queries that test one
operation
2007-04-13 21
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Benchmarking dynamic DBMS usage
Notes:• Critical factor is testing the ‘query optimizer’.• Very hard to get quality reproducible results.• It is very hard to assess the quality of the query
optimizer but a small testset might give some insight:select city.name,river.namefrom city,riverwhere city.inhabitants > X and distance(city.geometry,river.geometry) < Y;
2007-04-13 22
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Other benchmarking considerations
• Functionality• Usability• update behaviour
2007-04-13 23
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
GeoInfoNed – RGI-232
2007-04-13 24
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
GeoInfoNed -- What and Why
Build a spatially enabled DBMS because:• A DBMS is at the core of many system. If you
improve the core the whole system improves.• There is a need for an (open source)
experimentation platform for Geo DBMS research.
2007-04-13 25
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Who
• CWI – Leading DBMS experts with MonetDB
• TUDelft/OTB – Knowledge of spatial processes
• CycloMedia – Huge dataset and interesting problems
• RWS/AGI – Large and diverse datasets and interesting problems
2007-04-13 26
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
How
• At CWI there is the MonetDB DBMS. First we will extend it with basic spatial types (According to OpenGIS).
• Together with our ‘Problem Holder’ partners we will find directions for more extensions.
• MonetDB already has support for: Image Data, XML storage and querying etc.
2007-04-13 27
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Example
• Is there a relationship between traffic accidents and objects near the road?
GeoInfoNed
2007-04-13 28
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
MonetDB Introduction*
• Hardware trends• MonetDB design considerations• MonetDB architecture
*Slides borrowed from CWI
2007-04-13
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Hardware Trends50% p/year:
- cpu speed
- mem size
- mem bandwidth
- disk bandwidth1% p/year:
- mem latency
10% p/year:
- disk latency
2007-04-13
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Latency is the enemy!
• Commercial DBMS products (oracle, DB2, SQLserver) stem from OLTP roots
• focus on minimizing random I/Os => depend on latency!
• MonetDB: built for bulk access• optimize CPU and memory performance
Latency is one of the killing factors in Friso’s simplicial homology
implementation
2007-04-13 31
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
MonetDB design considerations
• Multi-model database kernel support• Extensible data types, operators, accelerators• Database hot-set is memory resident• Simple data structures are better• Index management should be automatic• Do not replicate the operating system• Optimize when you know the situation• Cooperative transaction management
2007-04-13 32
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Monetkernels
MAPI protocol
JDBC
C-mapi lib
Perl
End-user application
ODBC PHP Python
SQL XQuery
MonetDB product family
Here a MATLAB interface and Frank’s life would be easier
2007-04-13 33
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
MonetDB - Physical data organization• Binary Association Tables
ID Day Discount10 4/4/98 0.19511 9/4/98 0.06512 1/2/98 0.17513 7/2/98 0
OID ID100 10101 11102 12103 13104 14
OID Day100 4/4/98101 9/4/98102 1/2/98103 7/2/98104 1/2/99
OID Discount100 0.195101 0.065102 0.175103 0104 0.065
2007-04-13 34
OTB Research Institute for Housing, Urban and Mobility Studies
Spatial DBMS issues
Discussion