advanced database techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...advanced...

42
Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan [email protected] Sandor Heman @ cwi.nl Jennie Zhang @ cwi.nl Romulo Goncalves @cwi.nl

Upload: others

Post on 18-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Database Techniques

Martin.Kersten @ cwi.nlStefan [email protected]

Sandor Heman @ cwi.nlJennie Zhang @ cwi.nl

Romulo Goncalves @cwi.nl

Page 2: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Administrative details• The website evolves as during the course• Exam material is marked explicitly• Lab work deadlines are strict

• Email is the preferred way to communicate• Tomorrow the assistants will be available in

person between 11:00-12:00, room REC-P.123

Page 3: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Relational systems• A database system should simplify the

organization, validation, sharing, and bookkeeping of information

• Prerequisite knowledge– Relational data model and algebra– Data structures (B-tree, hash)– Operating system concepts– Using a SQL database system

• What is your practical experience?[Ruby on Rails expertise needed]

Page 4: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Applications• Bread-and-butter applications?

– Web-shop– Banking systems– Inventory systems– Production systems– Shopping systems– Government systems– Health systems– Multimedia systems– Science systems …

Page 5: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Applications• Bread-and-butter applications ???

– Banking systems• What happens if you install a stock trading system

which should handle >100K transactions/minute• How to derive trading advice using compute

intensive applications• How to warn thousands of users about their trading

opportunity

– …. Need for parallel, distributed main-memory database technology…

Page 6: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 7: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced application requirements• Bread-and-butter applications

– Inventory applications• How to install a battlefield inventory systems• How to deliver goods just in time?• How to keep track of moving objects/persons ?

• … need for sensor-based database support and RFID tags … need for a new DBMS ?…

Page 8: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 9: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Applications• Production systems

– How to interact with component suppliers– How to manage the production workflow– How to avoid bad production steps– How to maintain a database with 12000 tables

(SAP)

• … need for interoperability between autonomous systems… datamining and knowledge discovery…

Page 10: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 11: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Applications• Health information systems

– How to monitor your health over 30 years– How to enable quick response to a heart attack

• …need for interoperable database systems …

Page 12: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

HELP

The Ambient Home

Page 13: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

HELP

The Ambient Home

911 called

Page 14: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

Page 15: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

911 called

nucleus

A Shared Tuple Spaceusing an SQL DBMS

Page 16: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

911 called

receptors emittersnucleus

Page 17: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

HELP

MonetDB DataCell

Recall

receptors emittersnucleus

Page 18: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

Keep

911 called

receptors emittersnucleus

Page 19: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

HELP

MonetDB DataCell

forget

receptors emittersnucleus

Page 20: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

Aggregate

911 called

receptors emittersnucleus

Page 21: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

MonetDB DataCell

911 called

receptors emittersnucleus

Recall

Aggregate

Keep

Forget

Page 22: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

SQL work load-- SQL-queries

insert into hospital select ‘John’,* from medic where temp>40.0;

insert into epdselect * from medic where temp>=38.0;

delete from medic ;

Recall

Aggregate

Keep

Forget

Page 23: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

SQL work load

insert into hospital select ‘John’,* from medic where temp>40.0;

insert into epdselect * from medic where temp>=38.0;

delete from medic ;

Start End

Page 24: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Query optimizationThe queries in a datacell have

- a soft/hard deadline- strong flow dependency

The operands to the queries are small tables:

- empty- single value- a few values

Traditional query optimizers are biased towards large operands.

Recall

Aggregate

Keep

Forget

Page 25: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Query optimizationChallenges:

• How to optimize the individual SQL programs to select the proper QEP ?

•How to weave the collection of SQL programs to create an optimal multi-query version?

Recall

Aggregate

Keep

Forget

Page 26: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Applications• Multimedia Systems

– Narrow/broad casting, selective dissemination of volumetric information

– Searching in multimedia storage

• … need for P2P infrastructure …search facilities over feature spaces…

Page 27: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced applications• Government systems

– Security• Biometric data management issues, finger/image

matching

– Public safety• Forensics, manipulate complex objects using

proprietary algorithms

• …need for extensible database technology…need to support unstructured data…

Page 28: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Advanced Applications• Science systems

– The new accelerator in CERN • how to handle >1PTByte files

– The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2.5Tb

• How to query this efficiently

– ..need for P2P and … a novel way to organize data…

Page 29: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 30: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 31: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

LOFAR central processor specs• Streaming Data

– Input: 320 Gbit/s– Internally within correlator: 20 Tbit/s– Into storage: 25 Gbit/s = 250 TByte/day– Final products: 1-3 TByte/day

• High Performance Computing– Correlation: 15 Tflops– Pre processing and filtering: 5 Tflops– Off-line processing (calibration, analysis): 5-10 Tflops– Visualisation, control, scheduling etc: 2 Tflops

• Storage– On-line temporal storage: 500 TByte– Archive: PByte range of data stored in Grid

Page 32: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Technological challenges• Data is often not structured as tables

– XML and XQuery

• Data does not always fit on one system– Distributed and parallel databases

• Querying is more like world-wide searching– Continuous and streaming queries

• A database tells more than facts– Datamining and knowledge discovery

Page 33: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Code bases• Database management systems are BIG software

systems– Oracle, SQL-server, DB2 >1 M lines– PostgreSQL 300K lines– MySQL 500 K lines– MonetDB 200-800 K lines – SQLite 40K lines

• Programmer teams for DBMS kernels range from a few to a few hundred

Page 34: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Performance components• Hardware platform• Data structures• Algebraic optimizer• SQL parser• Application code

– What is the total cost of execution ?– How many tasks can be performed/minute ?– How good is the optimizer?– What is the overhead of the datastructures ?

Page 35: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Not all are equal

0.400.611.704.550.93Big delete and small insert

1.483.211.8113.160.36Big insert after delete

0.752.062.261.310.22Delete with index

0.564.000.971.500.32Delete on text index

1.592.781.5361.360.65Insert from select

1.722.406.9848.1310.3225000 updates on text

3.103.528.1318.798.3325000 updates with index

0.630.638.411.730.431000 updates

1.161.121.274.615.225000 range index selects

3.373.364.6413.402.15100 string range selects

2.522.492.763.620.18100 range selects

1.420.942.184.916.7125000 inserts 1 transaction

0.2213.060.154.300.271000 inserts transactions

SQLlitenosync

SQLiteMySQLPostgreSQLMonetDB

Page 36: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Not all are equal

Page 37: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Not all are equal

Why does it take so long to built a 10Mx2 table?How long will it take to do 10Mx32 on SQLserver Beta 2 ?

Page 38: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl
Page 39: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Gaining insight• Study the code base (inspection + profiling)

– Often not accessible outside development lab

• Study individual techniques (data structures + simulation)– Focus of most PhD research in DBMS

• Detailed knowledge becomes available, but ignores the total cost of execution.

• Study as a functional black box– Analyse a small application framework

Page 40: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

The Jack The Ripper Project• Study the snippet of the database technology and

design an XQuery and SQL application

• What is the schema?

• What are the queries?

• What are unorthodox solutions?

Page 41: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Learning points• My poor knowledge on relational database? Read

the chapters on SQL and relational algebra. Knowledge on data structures comes in handy.

• Database systems are much more than administrative bookkeeping systems

Page 42: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl

Learning points

– Advanced application challenge the technology provided by a DBMS

– Many techniques do not easily scale in size, complexity, functionality

– Effectiveness of a DBMS is determined by many tightly interlocked components