d.r. cheriton school of computer sciencelp (rewriting) ps a s o physical schema and query plans...

40
KR in Database Systems Implementation (or Life beyond Lite Logics and CQ/UCQ) David Toman D.R. Cheriton School of Computer Science Joint work with Alexander Hudek and Grant Weddell David Toman (et al.) KR in DBMS Implementation 1 / 18

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

KR in Database Systems Implementation(or Life beyond Lite Logics and CQ/UCQ)

David Toman

D.R. Cheriton School of Computer Science

Joint work with Alexander Hudek and Grant Weddell

David Toman (et al.) KR in DBMS Implementation 1 / 18

Page 2: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Data and Constraints: the Database Recap

The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)

where V is a (new) relational symbol and ϕ is a query.

Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18

Page 3: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Data and Constraints: the Database Recap

The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)

where V is a (new) relational symbol and ϕ is a query.

Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18

Page 4: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Data and Constraints: the Database Recap

The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)

where V is a (new) relational symbol and ϕ is a query.

Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMappingPhysical Symbols data structures (indices)

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18

Page 5: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Data and Constraints: the Database Recap

The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)

where V is a (new) relational symbol and ϕ is a query.

Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMapping C/C++ gooPhysical Symbols data structures (indices)

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18

Page 6: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Data and Constraints: the Database Recap

The Textbook ViewData represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforces by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ (in our logic)

where V is a (new) relational symbol and ϕ is a query.

Much Bigger Deal: Physical Data IndependenceLogical Symbols user (visible) relations/tablesMapping constraints (+ a minimal runtime)Physical Symbols data structures (indices)

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 18

Page 7: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

The KR Way

Queries and OntologiesQueries are answered not only w.r.t. explicit data (A)

but also w.r.t. background knowledge (T ) under OWA⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (ontology)

List all MORTALs⇒ {Socrtes} (query)

How do we answer queries?Using logical implication (to define certain answers):

Ans(Q,A, T ) := {Q(a1, . . . ,ak ) | T ∪ A |= Q(a1, . . . ,ak )}⇒ answers are ground Q-atoms logically implied by A ∪ T .

David Toman (et al.) KR in DBMS Implementation OBDA Basics 3 / 18

Page 8: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

The KR Way

Queries and OntologiesQueries are answered not only w.r.t. explicit data (A)

but also w.r.t. background knowledge (T ) under OWA⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (ontology)

List all MORTALs⇒ {Socrtes} (query)

How do we answer queries?Using logical implication (to define certain answers):

Ans(Q,A, T ) := {Q(a1, . . . ,ak ) | T ∪ A |= Q(a1, . . . ,ak )}⇒ answers are ground Q-atoms logically implied by A ∪ T .

David Toman (et al.) KR in DBMS Implementation OBDA Basics 3 / 18

Page 9: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Complexity

Good/Standard NewsLOGSPACE/PTIME (data complexity) for query answering:

(U)CQ andDL-Lite/EL⊥/CFD∀nc/“rules”-lite (Horn)

Bad Newsno negative queries/sub-queriesno negations in ABoxno closed-world assumptioncounter-intuitive query answers

David Toman (et al.) KR in DBMS Implementation OBDA Basics 4 / 18

Page 10: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Complexity

Good/Standard NewsLOGSPACE/PTIME (data complexity) for query answering:

(U)CQ andDL-Lite/EL⊥/CFD∀nc/“rules”-lite (Horn)

Bad Newsno negative queries/sub-queriesno negations in ABoxno closed-world assumptioncounter-intuitive query answers

David Toman (et al.) KR in DBMS Implementation OBDA Basics 4 / 18

Page 11: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Difficulties: Unintuitive Answers

ExampleEMP(Sue)

EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))

User: Does Sue have a phone number?Information System: YES

User: OK, tell me Sue’s phone number!Information System: (no answer)

User:

David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18

Page 12: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Difficulties: Unintuitive Answers

ExampleEMP(Sue)

EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))

User: Does Sue have a phone number?Information System: YES

User: OK, tell me Sue’s phone number!Information System: (no answer)

User:

David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18

Page 13: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Difficulties: Unintuitive Answers

ExampleEMP(Sue)

EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))

User: Does Sue have a phone number?Information System: YES

User: OK, tell me Sue’s phone number!Information System: (no answer)

User:

David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18

Page 14: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Difficulties: Unintuitive Answers

ExampleEMP(Sue)

EMP v ∃PHONENUM (or ∀x .EMP(x)→ ∃y .PHONENUM(x , y))

User: Does Sue have a phone number?Information System: YES

User: OK, tell me Sue’s phone number!Information System: (no answer)

User:

David Toman (et al.) KR in DBMS Implementation OBDA Basics 5 / 18

Page 15: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What to do?

Definability and RewritingQueries range-restricted FOL (a.k.a. SQL)Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP

Data CWA (complete information)

users: looks like a single model (of the logical schema)implementation: many models

but definable queries answer the same in each of them

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18

Page 16: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA

Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP

Data CWA (complete information for SA symbols)

ΣL SL ϕoo Logical Schemaand User Queries

ΣLP (rewriting)

��ΣP SA ⊆ SP ψoo Physical Schema

and Query Plans

users: looks like a single model (of the logical schema)implementation: many models

but definable queries answer the same in each of them

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18

Page 17: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA

Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP

Data CWA (complete information for SA symbols)

users: looks like a single model (of the logical schema)implementation: many models

but definable queries answer the same in each of them

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18

Page 18: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SA

Ontology/Schema range-restricted FOL Σ := ΣL ∪ ΣLP ∪ ΣP

Data CWA (complete information for SA symbols)

users: looks like a single model (of the logical schema)implementation: many models

but definable queries answer the same in each of them

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

©2011

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 18

Page 19: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

GRAND UNIFIED APPROACH

TO QUERY COMPILATION

PART I: WHAT CAN IT DO?

David Toman (et al.) KR in DBMS Implementation 7 / 18

Page 20: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do?

GOALGenerate query plans that compete with hand-written programs in C

1 linked data structures, pointers, . . .2 access to search structures (index access and selection),3 hash-based access to data (including hash-joins),4 multi-level storage (aka disk/remote/distributed files), . . .5 materialized views (FO-definable),6 updates through logical schema (needs id invention!), . . .

. . . all without having to code (too much) in C/C++ !

David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 18

Page 21: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do?

GOALGenerate query plans that compete with hand-written programs in C

1 linked data structures, pointers, . . .2 access to search structures (index access and selection),3 hash-based access to data (including hash-joins),4 multi-level storage (aka disk/remote/distributed files), . . .5 materialized views (FO-definable),6 updates through logical schema (needs id invention!), . . .

. . . all without having to code (too much) in C/C++ !

David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 18

Page 22: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Lists and Pointers1 Logical Schema

employee works department

number oo // enumber number//

name dnumber name

salary manager

oo

2 Physical Design: a linked list of emp records pointing to dept records.record emp of

integer numstring nameinteger salaryreference dept

record dept ofinteger numstring namereference manager

3 Access Paths: empfile/1/0, emp-num/2/1, . . . (but no deptfile)4 Integrity Constraints (many), e.g.,

∀x , y , z.employee(x , y , z)→ ∃w .empfile(w) ∧ emp-num(w , x),∀a, x .empfile(a) ∧ emp-num(a, x)→ ∃y , z.employee(x , y , z), . . .

David Toman (et al.) KR in DBMS Implementation What can it do? 9 / 18

Page 23: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: navigating pointers

Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):

.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18

Page 24: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: navigating pointers

Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):

.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18

Page 25: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: navigating pointers

Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.

. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18

Page 26: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: navigating pointers

Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.

. . . needs duplicate elimination during projection.

∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.

. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18

Page 27: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: navigating pointers

Example queries:1 List all employee numbers and names (∃z,w .employee(x , y , z,w)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v ,w .department(x , z,u) ∧ employee(u, y , v ,w)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 18

Page 28: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

What can this do: two-level store

The access path empfile is refined by emppages/1/0 and emprecords/2/1:emppages returns (sequentially) disk pages containing emp records, andemprecords given a disc page, returns emp records in that page.

5 List all employees with the same name(∃z,u, v ,w , t .employee(x1, z,u, v) ∧ employee(x2, z,w , t)):

∃y , z,w , v ,p,q.emppages(p) ∧ emppages(q)∧ emprecords(p, y) ∧ emp-num(y , x1) ∧ emp-name(y ,w)∧ emprecords(q, z) ∧ emp-num(z, x2) ∧ emp-name(z, v)

∧ compare(w , v).

⇒ this plan implements the block nested loops join algorithm.

. . . many more examples inMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

.

David Toman (et al.) KR in DBMS Implementation What can it do? 11 / 18

Page 29: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

GRAND UNIFIED APPROACH

TO QUERY COMPILATION

PART II: HOW DOES IT WORK?

David Toman (et al.) KR in DBMS Implementation 12 / 18

Page 30: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

The Plan

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

ΣL SL ϕoo (Logical Schema)

ΣLP (rewriting)

��ΣP SA ⊆ SP ψoo (Physical Schema)

David Toman (et al.) KR in DBMS Implementation How does it work? 13 / 18

Page 31: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Query Plans via Interpolation

IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

atomic formula 7→ access pathconjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/ duplicate info)disjunction 7→ concatenationnegation 7→ simple complement

⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)

David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18

Page 32: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Query Plans via Interpolation

IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)

David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18

Page 33: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Query Plans via Interpolation

IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)

David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18

Page 34: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Query Plans via Interpolation

IDEA #1:Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ w.r.t. the user query ϕ to Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth Definability of ϕ over Σ and SA resolves the existence of ψ(except for binding patterns)

David Toman (et al.) KR in DBMS Implementation How does it work? 14 / 18

Page 35: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Engineering Issues

Subformula (structural) Property: not enough rewritings (plans)

• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}

Alternative Proofs/Plans: backtracking is too slow

• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA

• logical (non-backtrackable) conditional tableau (T L,T R)

• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR

Non-logical Features: dealing with duplicates et al.

• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2

• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥

⇒ CFDInc description logic approximation of Σ (PTIME reasoning).

. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

.

David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18

Page 36: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Engineering Issues

Subformula (structural) Property: not enough rewritings (plans)

• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}

Alternative Proofs/Plans: backtracking is too slow

• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA

• logical (non-backtrackable) conditional tableau (T L,T R)

• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR

Non-logical Features: dealing with duplicates et al.

• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2

• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥

⇒ CFDInc description logic approximation of Σ (PTIME reasoning).

. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

.

David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18

Page 37: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Engineering Issues

Subformula (structural) Property: not enough rewritings (plans)

• ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x̄ .PL ↔ P ↔ PR | P ∈ SA}

Alternative Proofs/Plans: backtracking is too slow

• conditional formulæ: ϕ[C] where C is a set of (ground) literals over SA

• logical (non-backtrackable) conditional tableau (T L,T R)

• cost-based plan enumeration based on closing sets in (T L,T R) and ΣLR

Non-logical Features: dealing with duplicates et al.

• Q[∃x .Q1] 7→ Q[∃x .Q1] if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x ]} |= y1 ≈ y2

• Q[Q1 ∨Q2] 7→ Q[Q1 ∨Q2] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥

⇒ CFDInc description logic approximation of Σ (PTIME reasoning).

. . . for details seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

.

David Toman (et al.) KR in DBMS Implementation How does it work? 15 / 18

Page 38: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Summary of the Approach

1 FO (DLFDE) tableau based interpolation algorithm⇒ enumeration of plans factored from reasoning⇒ range-restricted queries and constraints→ ground terms only⇒ extra-logical binding patterns and cost model

2 Post processing (using CFDInc approximation)⇒ duplicate elimination elimination⇒ cut insertion

3 Run time⇒ library of common data structures+schema constraints

or an interface to a legacy system⇒ finger data structures to simulate merge joins et al.

David Toman (et al.) KR in DBMS Implementation Summary 16 / 18

Page 39: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Research Directions and Open Issues

1 Dealing with ordered data? (merge-joins etc.: we have a partial solution)

2 Decidable schema languages (decidable interpolation problem)?

3 More powerful schema languages (inductive types, etc.)?

4 Beyond FO Queries/Views (e.g., count/sum aggregates)?

5 Coding extra-logical bits (e.g., binding patterns, postprocessing, etc. )in the schema itself?

6 Standard Designs (a plan can always be found as in SQL)?

7 Explanation(s) of non-definability?

8 Fine(r)-grained updates?

9 . . .

. . . and, as always, performance, performance, performance!

David Toman (et al.) KR in DBMS Implementation Summary 17 / 18

Page 40: D.R. Cheriton School of Computer ScienceLP (rewriting) PS A S o Physical Schema and Query Plans users: looks like a single model (of the logical schema) implementation: many models

Message from our Sponsors

Database Group at the University of Waterloo7 professors, affiliated faculty, postdocs, 30+ graduate students, . . .wide range of research interests

Advanced query processing/Knowledge representationSystem aspects of database systems and Distributed data managementData quality/Managing uncertain data/Data miningNew(-ish) domains (text, streaming, graph data/RDF, OLAP)

research sponsored by governments, and local/global companiesNSERC/CFI/OIT and Google, IBM, SAP, OpenText, . . .

part of a School of CS with 75+ professors, 300+ grad students, etc.AI&ML, Algorithms&Data Structures, PL, Theory, Systems, . . .

Cheriton School of Computer Science has been ranked #18 in CS bythe world by US News and World Report (#1 in Canada).

. . . and we are always looking for good graduate students (MMath/PhD)⇒ comes with full support over multiple years

David Toman (et al.) KR in DBMS Implementation Summary 18 / 18