Download - SD202: Evaluation algorithms
![Page 1: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/1.jpg)
SD202: Evaluation algorithms
Louis Jachiet
Louis JACHIET 1 / 53
![Page 2: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/2.jpg)
Introduction
Query⇒How to compute
the result of a query?
Louis JACHIET 2 / 53
![Page 3: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/3.jpg)
Introduction
Query⇒
How to compute
the result of a query?
Louis JACHIET 2 / 53
![Page 4: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/4.jpg)
Introduction
Query⇒
How to compute
the result of a query?
Louis JACHIET 2 / 53
![Page 5: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/5.jpg)
Introduction
Query⇒How to compute
the result of a query?
Louis JACHIET 2 / 53
![Page 6: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/6.jpg)
Introduction
Query⇒How to compute efficiently
the result of a query?
Louis JACHIET 2 / 53
![Page 7: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/7.jpg)
Remember the relational algebra?
A dialect comprising:
• Relation R• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t’)
• UNION
And also:
• JOIN(t,t’,cond,type)
• AGGREGATE(t,cols, exprs)
• and other operators
Louis JACHIET 3 / 53
![Page 8: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/8.jpg)
Query evaluation in a nutshell
SQL query
Relational Algebra term
Output
Data
Translation
Execution
Is it that simple?
Of course, no :)
Louis JACHIET 4 / 53
![Page 9: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/9.jpg)
Query evaluation in a nutshell
SQL query
Relational Algebra term
Output
Data
Translation
Execution
Is it that simple?
Of course, no :)
Louis JACHIET 4 / 53
![Page 10: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/10.jpg)
Query evaluation in a nutshell
SQL query Relational Algebra term
Query Optimizer
Best query plan Compiled query Output
Data
Compilation Execution
Louis JACHIET 5 / 53
![Page 11: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/11.jpg)
Today’s lesson
We will cover several topics
• Logical optimization
• A model of computation time
• Efficient retrieval using indexes
• Several algorithms to joins
• A general method for query optimization
• Some advanced topics (e.g. transactions)
Louis JACHIET 6 / 53
![Page 12: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/12.jpg)
Logical optimization of queries
![Page 13: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/13.jpg)
Comparing relational algebra terms
Consider the two following terms, are they computing the
same thing? Is one more efficient than the other?
FILTER(
JOIN(Room, Movie),
title="Le Bonheur"
)
JOIN(Room,
FILTER(Movie,
title="Le Bonheur")
)
Here JOIN is the natural join (i.e. on shared attributes)
Louis JACHIET 7 / 53
![Page 14: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/14.jpg)
Comparing relational algebra terms
What about those two terms?
JOIN(
JOIN(Room, Movie),
OscarMovies
)
JOIN(
Room,
JOIN(OscarMovies,Movie)
)
Louis JACHIET 8 / 53
![Page 15: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/15.jpg)
Equivalence rule
We can devise a set of rewriting rules
• JOIN(a,JOIN(b,c)) → JOIN(JOIN(a,b),c)
Associativity
• JOIN(a,b) → JOIN(b,a)
Commutativity
• FILTER(JOIN(a,b),f) → JOIN(FILTER(a,f),b)
Distributivity when f operates on attributes in a
• And many other rules. . .
Louis JACHIET 9 / 53
![Page 16: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/16.jpg)
A RA term optimizer
Algorithm
• Start with an RA term ϕ
• Set oldplans = ∅, plans = {ϕ}• While oldplans 6= plans
• oldplans = plans
• plans = plans ∪{ϕ′ | where ϕ′ is a rewriting of ϕ ∈ plans}• Return the most efficient ϕ ∈ plans
In practice, exploring the full plan space
is prohibitively expensive for large queries.
Also, how can we determine the most efficient ϕ?
Louis JACHIET 10 / 53
![Page 17: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/17.jpg)
A RA term optimizer
Algorithm
• Start with an RA term ϕ
• Set oldplans = ∅, plans = {ϕ}• While oldplans 6= plans
• oldplans = plans
• plans = plans ∪{ϕ′ | where ϕ′ is a rewriting of ϕ ∈ plans}• Return the most efficient ϕ ∈ plans
In practice, exploring the full plan space
is prohibitively expensive for large queries.
Also, how can we determine the most efficient ϕ?
Louis JACHIET 10 / 53
![Page 18: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/18.jpg)
A RA term optimizer
Algorithm
• Start with an RA term ϕ
• Set oldplans = ∅, plans = {ϕ}• While oldplans 6= plans
• oldplans = plans
• plans = plans ∪{ϕ′ | where ϕ′ is a rewriting of ϕ ∈ plans}• Return the most efficient ϕ ∈ plans
In practice, exploring the full plan space
is prohibitively expensive for large queries.
Also, how can we determine the most efficient ϕ?
Louis JACHIET 10 / 53
![Page 19: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/19.jpg)
Cost model on RA terms
Determining the efficiency of an RA term ϕ depends on:
• a cost-model for each RA operator
• a cardinality estimation for each subterm appearing in ϕ
This simple model ignores the fact one RA operator (e.g. JOIN)
can be performed with different algorithms... more on that later!
Louis JACHIET 11 / 53
![Page 20: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/20.jpg)
Cost model on RA terms
Determining the efficiency of an RA term ϕ depends on:
• a cost-model for each RA operator
• a cardinality estimation for each subterm appearing in ϕ
This simple model ignores the fact one RA operator (e.g. JOIN)
can be performed with different algorithms... more on that later!
Louis JACHIET 11 / 53
![Page 21: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/21.jpg)
A cost model to estimate
computation time
![Page 22: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/22.jpg)
Modeling computation time
Basic model
Count the number of operations.
We know it is correct (up to a constant)
Data access time
CPU L1 cache ≤1ns
CPU L2 cache 5ns
RAM access 0.1µs
SSD seek 0.1ms
HDD seek 10ms
I/O speed (vague numbers!)
CPU caches 50GB/s
RAM 10GB/s
SSD 500MB/s
HDD sequential 100MB/s
HDD random 30MB/s
Looking only up to a constant factor does not give the full picture!
Louis JACHIET 12 / 53
![Page 23: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/23.jpg)
Modeling computation time
Basic model
Count the number of operations.
We know it is correct (up to a constant)
Data access time
CPU L1 cache ≤1ns
CPU L2 cache 5ns
RAM access 0.1µs
SSD seek 0.1ms
HDD seek 10ms
I/O speed (vague numbers!)
CPU caches 50GB/s
RAM 10GB/s
SSD 500MB/s
HDD sequential 100MB/s
HDD random 30MB/s
Looking only up to a constant factor does not give the full picture!
Louis JACHIET 12 / 53
![Page 24: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/24.jpg)
Modeling computation time
Basic model
Count the number of operations.
We know it is correct (up to a constant)
Data access time
CPU L1 cache ≤1ns
CPU L2 cache 5ns
RAM access 0.1µs
SSD seek 0.1ms
HDD seek 10ms
I/O speed (vague numbers!)
CPU caches 50GB/s
RAM 10GB/s
SSD 500MB/s
HDD sequential 100MB/s
HDD random 30MB/s
Looking only up to a constant factor does not give the full picture!
Louis JACHIET 12 / 53
![Page 25: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/25.jpg)
Modeling computation time
Basic model
Count the number of operations.
We know it is correct (up to a constant)
Data access time
CPU L1 cache ≤1ns
CPU L2 cache 5ns
RAM access 0.1µs
SSD seek 0.1ms
HDD seek 10ms
I/O speed (vague numbers!)
CPU caches 50GB/s
RAM 10GB/s
SSD 500MB/s
HDD sequential 100MB/s
HDD random 30MB/s
Looking only up to a constant factor does not give the full picture!
Louis JACHIET 12 / 53
![Page 26: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/26.jpg)
A more precise model of time
Count the different kinds of operations separately:
• number of seeks × tseek
• number of blocks read sequentially × tread
• number of blocks written × twrite
• number of CPU operations × tcpu
• size of block sblock (typically a few kB).
Louis JACHIET 13 / 53
![Page 27: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/27.jpg)
Simple query evaluation
SELECT * FROM myTable
Not much can be done to optimize this FULL SCAN (N sequential
reads).
SELECT * FROM myTable
WHERE field=”value”
Can we do better than a full scan? It depends!
Louis JACHIET 14 / 53
![Page 28: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/28.jpg)
Simple query evaluation
SELECT * FROM myTable
Not much can be done to optimize this FULL SCAN (N sequential
reads).
SELECT * FROM myTable
WHERE field=”value”
Can we do better than a full scan?
It depends!
Louis JACHIET 14 / 53
![Page 29: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/29.jpg)
Simple query evaluation
SELECT * FROM myTable
Not much can be done to optimize this FULL SCAN (N sequential
reads).
SELECT * FROM myTable
WHERE field=”value”
Can we do better than a full scan? It depends!
Louis JACHIET 14 / 53
![Page 30: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/30.jpg)
SELECT * FROM myTable WHERE field=”value”
Algorithm 1
Read the table sequentially.
⇒ N sequential reads.
Algorithm 2
Precompute list L with the posi-
tion of relevant records. For each
element of L, retrieve it.
⇒ L non sequential reads.
Algorithm 2 is better only when the condition is very selective!
Louis JACHIET 15 / 53
![Page 31: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/31.jpg)
SELECT * FROM myTable WHERE field=”value”
Algorithm 1
Read the table sequentially.
⇒ N sequential reads.
Algorithm 2
Precompute list L with the posi-
tion of relevant records. For each
element of L, retrieve it.
⇒ L non sequential reads.
Algorithm 2 is better only when the condition is very selective!
Louis JACHIET 15 / 53
![Page 32: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/32.jpg)
SELECT * FROM myTable WHERE field=”value”
Algorithm 1
Read the table sequentially. ⇒ N sequential reads.
Algorithm 2
Precompute list L with the posi-
tion of relevant records. For each
element of L, retrieve it.
⇒ L non sequential reads.
Algorithm 2 is better only when the condition is very selective!
Louis JACHIET 15 / 53
![Page 33: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/33.jpg)
SELECT * FROM myTable WHERE field=”value”
Algorithm 1
Read the table sequentially. ⇒ N sequential reads.
Algorithm 2
Precompute list L with the posi-
tion of relevant records. For each
element of L, retrieve it.
⇒ L non sequential reads.
Algorithm 2 is better only when the condition is very selective!
Louis JACHIET 15 / 53
![Page 34: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/34.jpg)
SELECT * FROM myTable WHERE field=”value”
Algorithm 1
Read the table sequentially. ⇒ N sequential reads.
Algorithm 2
Precompute list L with the posi-
tion of relevant records. For each
element of L, retrieve it.
⇒ L non sequential reads.
Algorithm 2 is better only when the condition is very selective!
Louis JACHIET 15 / 53
![Page 35: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/35.jpg)
How to get the estimation for tseek, twrite, tread, tcpu, etc.
• use default values
• use default values based on hardware
• detect values at boot
• learn values
• ...
Default values are generally OK but can often be tuned for optimal
performance.
Louis JACHIET 16 / 53
![Page 36: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/36.jpg)
How to get the estimation for tseek, twrite, tread, tcpu, etc.
• use default values
• use default values based on hardware
• detect values at boot
• learn values
• ...
Default values are generally OK but can often be tuned for optimal
performance.
Louis JACHIET 16 / 53
![Page 37: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/37.jpg)
Indexes
![Page 38: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/38.jpg)
Simple query evaluation
SELECT * FROM myTable WHERE field=”value”
How to do better than a full scan?
With indexes!
Different types of indexes:
• Hash
for equality tests only
• B-tree
for arbitrary comparisons
• GIN
for case-insensitive search, n-grams, prefixes, JSON field
lookup, etc.
• GiST
for non ordered data (e.g. geographical data, date inter-
vals, etc.)
• . . .
Louis JACHIET 17 / 53
![Page 39: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/39.jpg)
Simple query evaluation
SELECT * FROM myTable WHERE field=”value”
How to do better than a full scan?
With indexes!
Different types of indexes:
• Hash
for equality tests only
• B-tree
for arbitrary comparisons
• GIN
for case-insensitive search, n-grams, prefixes, JSON field
lookup, etc.
• GiST
for non ordered data (e.g. geographical data, date inter-
vals, etc.)
• . . .
Louis JACHIET 17 / 53
![Page 40: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/40.jpg)
Simple query evaluation
SELECT * FROM myTable WHERE field=”value”
How to do better than a full scan?
With indexes!
Different types of indexes:
• Hash
for equality tests only
• B-tree
for arbitrary comparisons
• GIN
for case-insensitive search, n-grams, prefixes, JSON field
lookup, etc.
• GiST
for non ordered data (e.g. geographical data, date inter-
vals, etc.)
• . . .
Louis JACHIET 17 / 53
![Page 41: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/41.jpg)
Simple query evaluation
SELECT * FROM myTable WHERE field=”value”
How to do better than a full scan?
With indexes!
Different types of indexes:
• Hash for equality tests only
• B-tree for arbitrary comparisons
• GIN for case-insensitive search, n-grams, prefixes, JSON field
lookup, etc.
• GiST for non ordered data (e.g. geographical data, date inter-
vals, etc.)
• . . .
Louis JACHIET 17 / 53
![Page 42: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/42.jpg)
Hash indexes
Data
An array A of size K containing lists of pair (key,pointer).
Check key x
Compute hash(x) and iterate over the elements of A(hash(x)).
Add key x to pointer p
Append (x , p) to the list A(hash(x)).
Louis JACHIET 18 / 53
![Page 43: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/43.jpg)
Hash indexes
A B A B A B A B
Cab 11 Alice 42 Bob 3 Dacy 34
H P
0 Bob
1
2 Cab
3
4 Alice
Dacy
What is the time to retrieve or add data?
(use c the average number of collisions)
Louis JACHIET 19 / 53
![Page 44: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/44.jpg)
Hash indexes
A B A B A B A B
Cab 11 Alice 42 Bob 3 Dacy 34
H P
0 Bob
1
2 Cab
3
4 Alice
Dacy
What is the time to retrieve or add data?
(use c the average number of collisions)
Louis JACHIET 19 / 53
![Page 45: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/45.jpg)
Dense hash indexes
A B A B A B A B
Cab 11 Alice 42 Bob 3 Dacy 34
H P1 P2 P3 P4
0 Bob
1
2 Cab Dacy Ewen Fiona
3
4 Alice
Gael
Louis JACHIET 20 / 53
![Page 46: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/46.jpg)
Denser hash indexes
A B A B A B A B A B
Cab 11 Alice 42 Bob 3 Dacy 34 Ewen 102
H P1 P2 P3 P4
0
1
2
3
4
Louis JACHIET 21 / 53
![Page 47: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/47.jpg)
Pros & Cons of the denser hash indexes
Pros
• Generally the fastest index (but with a small margin)
• Especially good on large datasets with complex types like strings
• Small space overhead
Cons
• At least two seeks required
• Cannot retrieve data in sorted order
• Can only check equality
Louis JACHIET 22 / 53
![Page 48: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/48.jpg)
B-trees
B-trees
Just like binary search trees but B-ary trees.
Bob Ewen
Alice Ann
Cab Dacy
Fiona Gael
Louis JACHIET 23 / 53
![Page 49: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/49.jpg)
Binary Search Trees (BST)
def search(key, node):
if node is None or key == node.key:
return node
if key < node.key:
return search(key, node.left)
return search(key, node.right)
Insertion
Insertion code is similar to search but where we
make sure that the tree stays balanced.
Deletion
Deletion is a bit tricky...
8
3 10
1 6 14
4 7 13
Louis JACHIET 24 / 53
![Page 50: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/50.jpg)
B-trees
Data
A search tree where all nodes (except the root) have an arity
between B/2 and B.
Check existence of key x or retrieve tuples with x ∈ [a; b]
Recursively go down the tree. If we neglect CPU time this is
logB(N) seeks. For large B we can expect 3 or 4 seeks at most.
Insert key x
Same as looking up except when a node contains strictly more
than B elements, it is split into two. The splitter is added to the
parent which can recursively trigger a splitting up to the root.
Delete key x
Same as insertion except we merge with neighbors when the node
is too empty.Louis JACHIET 25 / 53
![Page 51: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/51.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
12
Louis JACHIET 26 / 53
![Page 52: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/52.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
12 24
Louis JACHIET 26 / 53
![Page 53: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/53.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
12 24 52
Louis JACHIET 26 / 53
![Page 54: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/54.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
12 24 52 67
Louis JACHIET 26 / 53
![Page 55: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/55.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
12 24 52 67 89
Louis JACHIET 26 / 53
![Page 56: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/56.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
52
12 24
67 89
Louis JACHIET 26 / 53
![Page 57: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/57.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
52
12 24
55 67 89
Louis JACHIET 26 / 53
![Page 58: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/58.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
52
12 24 30
55 67 89
Louis JACHIET 26 / 53
![Page 59: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/59.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
52
12 20 24 30
55 67 89
Louis JACHIET 26 / 53
![Page 60: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/60.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
52
5 12 20 24 30
55 67 89
Louis JACHIET 26 / 53
![Page 61: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/61.jpg)
Exercise on B-trees
Describe the state of an initially empty B-tree with B = 4
after the following insertions:
• 12
• 52
• 24
• 67
• 89
• 55
• 30
• 20
• 5
20 52
5 12
24 30
55 67 89
Louis JACHIET 26 / 53
![Page 62: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/62.jpg)
Pros & Cons of the B-tree indexes
Pros
• Quite fast (especially on simple types)
• Retrieve data in order with very few seeks
• Can be used for prefixes / subkeys; a B-tree on (x , y) can be
used to test whether a given x exists or retrieve the correspond-
ing y .
Cons
• Larger number of seeks required
• Larger space overhead (especially for complex data types)
Generally the default index structure used!
Louis JACHIET 27 / 53
![Page 63: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/63.jpg)
Creating indexes
CREATE INDEX title_idx ON movies (name);
CREATE INDEX title_idx ON movies (lower(name)) USING hash;
CREATE UNIQUE INDEX ON movies (year,name);
Louis JACHIET 28 / 53
![Page 64: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/64.jpg)
Inverted Index
Inverted Index query
Given a set of sets X1, . . . ,Xn and an item i report the integers e
such that i ∈ Xe .
You typically find them in cookbooks (recipes by ingredient)
Implementation
Typically a B-tree or a hash index giving for each i the set of e
such that i ∈ Xe .
Usage
They can be used for indexing JSON fields or getting records
where some word appears in a string.
Louis JACHIET 29 / 53
![Page 65: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/65.jpg)
Inverted Index
Inverted Index query
Given a set of sets X1, . . . ,Xn and an item i report the integers e
such that i ∈ Xe .
You typically find them in cookbooks (recipes by ingredient)
Implementation
Typically a B-tree or a hash index giving for each i the set of e
such that i ∈ Xe .
Usage
They can be used for indexing JSON fields or getting records
where some word appears in a string.
Louis JACHIET 29 / 53
![Page 66: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/66.jpg)
Inverted Index
Inverted Index query
Given a set of sets X1, . . . ,Xn and an item i report the integers e
such that i ∈ Xe .
You typically find them in cookbooks (recipes by ingredient)
Implementation
Typically a B-tree or a hash index giving for each i the set of e
such that i ∈ Xe .
Usage
They can be used for indexing JSON fields or getting records
where some word appears in a string.
Louis JACHIET 29 / 53
![Page 67: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/67.jpg)
Usage of Generalized Inverted Indexes in PostgreSQL
CREATE TABLE movies (
name VARCHAR(255), tags VARCHAR(16)[] );
INSERT INTO movies (name, tags)
values ('Star Wars',
'{"action", "adventure", "fantasy"}');
CREATE INDEX movie_tags ON movies USING gin (tags);
SELECT * FROM movies WHERE 'action' = ANY (tags);
SELECT * FROM movies
WHERE '{"action", "fantasy"}' && tags;
Louis JACHIET 30 / 53
![Page 68: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/68.jpg)
Usage of Generalized Inverted Indexes in PostgreSQL
CREATE INDEX lyrics_idx ON songs
USING GIN (to_tsvector('english',lyrics));
SELECT name FROM songs
WHERE to_tsvector('english',lyrics) @@ to_tsquery('make');
-- name
-- -------------------------
-- Never Gonna Give You Up
-- It's Friday!
Louis JACHIET 31 / 53
![Page 69: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/69.jpg)
Bitmap indexes
Principle
For each value v store a bit array of length n (the number of
records) where the i-th bit is set when the record has value v
Typical use case
• (Rarely) as an index built for values of low cardinality (gender,
binary values, enum)
• (Very often) built for intermediate results of complex conditions
(e.g. x < 10 or y > 42 and z < 1).
Can be built for pages instead of individual tuples.
Louis JACHIET 32 / 53
![Page 70: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/70.jpg)
Bitmap indexes
Principle
For each value v store a bit array of length n (the number of
records) where the i-th bit is set when the record has value v
Typical use case
• (Rarely) as an index built for values of low cardinality (gender,
binary values, enum)
• (Very often) built for intermediate results of complex conditions
(e.g. x < 10 or y > 42 and z < 1).
Can be built for pages instead of individual tuples.
Louis JACHIET 32 / 53
![Page 71: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/71.jpg)
Generalized Search Tree based indexes (GiST)
Principle
A GiST is a tree shaped index but where you cannot “split”
efficiently into two non overlapping subsets.
Examples
Shapes in 2D, date intervals, text, etc.
Key idea
Split into two (or more) subsets with an overlapping.
Consequence
Any search will go down on several parts of the tree, hence it is not
as efficient as a B-tree!
Louis JACHIET 33 / 53
![Page 72: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/72.jpg)
Generalized Search Tree based indexes (GiST)
Principle
A GiST is a tree shaped index but where you cannot “split”
efficiently into two non overlapping subsets.
Examples
Shapes in 2D, date intervals, text, etc.
Key idea
Split into two (or more) subsets with an overlapping.
Consequence
Any search will go down on several parts of the tree, hence it is not
as efficient as a B-tree!
Louis JACHIET 33 / 53
![Page 73: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/73.jpg)
Generalized Search Tree based indexes (GiST)
Principle
A GiST is a tree shaped index but where you cannot “split”
efficiently into two non overlapping subsets.
Examples
Shapes in 2D, date intervals, text, etc.
Key idea
Split into two (or more) subsets with an overlapping.
Consequence
Any search will go down on several parts of the tree, hence it is not
as efficient as a B-tree!
Louis JACHIET 33 / 53
![Page 74: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/74.jpg)
Generalized Search Tree based indexes (GiST)
Principle
A GiST is a tree shaped index but where you cannot “split”
efficiently into two non overlapping subsets.
Examples
Shapes in 2D, date intervals, text, etc.
Key idea
Split into two (or more) subsets with an overlapping.
Consequence
Any search will go down on several parts of the tree, hence it is not
as efficient as a B-tree!
Louis JACHIET 33 / 53
![Page 75: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/75.jpg)
Examples of GiST indexing
From postgis website
Louis JACHIET 34 / 53
![Page 76: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/76.jpg)
In conclusion
Queries containing patterns like FILTER(R,x = v)
can often be drastically improved.
It is also used to speed up key constraints.
Louis JACHIET 35 / 53
![Page 77: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/77.jpg)
Computing joins
![Page 78: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/78.jpg)
Joins
SELECT * FROM tableA, tableB WHERE idA=idB ;
SELECT * FROM tableA
WHERE EXISTS (
SELECT 1 FROM tableB
WHERE idA=idB) ;
SELECT * FROM tableA LEFT JOIN tableB ON idA=idB ;
Louis JACHIET 36 / 53
![Page 79: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/79.jpg)
Computing joins
Joins can be computed with many different algorithms
• Nested loop join
• Block nested loop joins
• Index-join
• Sort-merge join
• Hash join
• Indexed-join
• and many others...
Louis JACHIET 37 / 53
![Page 80: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/80.jpg)
Nested loop join
The simplest join
for t1 in tableA:
for t2 in tableB:
if condition(t1,t2):
output(t1,t2)
tableB will be read |tableA| times...
Louis JACHIET 38 / 53
![Page 81: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/81.jpg)
Nested loop join
The simplest join
for t1 in tableA:
for t2 in tableB:
if condition(t1,t2):
output(t1,t2)
tableB will be read |tableA| times...
Louis JACHIET 38 / 53
![Page 82: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/82.jpg)
Block nested loop join
A twist on the nested loop join:
for b1 in tableA.blocks
for b2 in tableB.blocks
for t1 in b1:
for t2 in b2:
if condition(t1,t2):
output(t1,t2)
Same number of operations but each block is loaded once.
Louis JACHIET 39 / 53
![Page 83: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/83.jpg)
Index-join
Key idea
Imagine we have an index on tB(id) and we have the following
query:
SELECT * FROM tA, tB WHERE tA.id=tB.id
We can use the index on tB to retrieve the matching tuples from
tA.
for t1 in tA
for t2 in tB.idIndex(t1.id)
output(t1,t2)
Louis JACHIET 40 / 53
![Page 84: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/84.jpg)
Index-join
Key idea
Imagine we have an index on tB(id) and we have the following
query:
SELECT * FROM tA, tB WHERE tA.id=tB.id
We can use the index on tB to retrieve the matching tuples from
tA.
for t1 in tA
for t2 in tB.idIndex(t1.id)
output(t1,t2)
Louis JACHIET 40 / 53
![Page 85: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/85.jpg)
Index-join
What if the join condition is tA.id = tB.id AND tA.x>tB.x?
We can use the index on a weakened condition:
for t1 in tA
for t2 in tB.idIndex(t1.id)
if t1.x > t2.x:
output(t1,t2)
Or more generally:
for t1 in tableA
for t2 in tableB.index[t1.value]:
if condition(t1,t2):
output(t1,t2)
Louis JACHIET 41 / 53
![Page 86: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/86.jpg)
Index-join
What if the join condition is tA.id = tB.id AND tA.x>tB.x?
We can use the index on a weakened condition:
for t1 in tA
for t2 in tB.idIndex(t1.id)
if t1.x > t2.x:
output(t1,t2)
Or more generally:
for t1 in tableA
for t2 in tableB.index[t1.value]:
if condition(t1,t2):
output(t1,t2)
Louis JACHIET 41 / 53
![Page 87: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/87.jpg)
Index-join
What if the join condition is tA.id = tB.id AND tA.x>tB.x?
We can use the index on a weakened condition:
for t1 in tA
for t2 in tB.idIndex(t1.id)
if t1.x > t2.x:
output(t1,t2)
Or more generally:
for t1 in tableA
for t2 in tableB.index[t1.value]:
if condition(t1,t2):
output(t1,t2)
Louis JACHIET 41 / 53
![Page 88: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/88.jpg)
Sort-merge-join
Another algorithm for the intersection of two lists
lA.sort()
lB.sort()
while len(lA)>0 and len(lB)>0:
if lA[-1] = lB[-1]:
output(lA[-1],lB[-1])
if lA[-1] >= lB[-1]:
lA.pop()
else:
lB.pop()
Louis JACHIET 42 / 53
![Page 89: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/89.jpg)
Sort-merge-join
More generally, for a sort-merge-join
• Get both inputs in sorted order
• Advance in both inputs by increasing order
• Output matching couples
Handles more than equality
Comparisons, max, min, . . . .
Louis JACHIET 43 / 53
![Page 90: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/90.jpg)
Hash-join
How to find the intersection of two sets A and B?
d = hashset()
res = []
for a in A:
d.add(a)
for b in B:
if b in d:
res.append(b)
Louis JACHIET 44 / 53
![Page 91: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/91.jpg)
Hash-join
More generally, for a join with a condition C containing a set
of equalities x1 = y1 ∧ · · · ∧ xk = yk :
• For each tuple t1 in tableA store it in a hash table with key
(t1.x1, . . . , t1.xk),
• For each tuple t2 in tableB look up the tuples t ′ in the hash
table with key (t2.y1, . . . , t2.yk),
• If C (t1, t2) output (t1, t2).
Louis JACHIET 45 / 53
![Page 92: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/92.jpg)
Indexed-join
Simple idea
Pre-compute and index the join result.
Allows for very fast join but not without drawbacks
• Not available in most databases
• Slow to update
• Can eat up a lot of space
Only useful for very frequent joins with very few updates on
the joined tables.
Louis JACHIET 46 / 53
![Page 93: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/93.jpg)
Indexed-join
Simple idea
Pre-compute and index the join result.
Allows for very fast join but not without drawbacks
• Not available in most databases
• Slow to update
• Can eat up a lot of space
Only useful for very frequent joins with very few updates on
the joined tables.
Louis JACHIET 46 / 53
![Page 94: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/94.jpg)
Query evaluation in a nutshell
SQL query Relational Algebra term
Query Optimizer
Best plan Compiled query
Data
Execution
Louis JACHIET 47 / 53
![Page 95: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/95.jpg)
Query evaluation in a nutshell
SQL query Relational Algebra term
Best plan Compiled query
Data
Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans
Cost
Execution
Louis JACHIET 47 / 53
![Page 96: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/96.jpg)
Query evaluation in a nutshell
SQL query Relational Algebra term
Best plan Compiled query
Data
Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans Logical plansPhysical plansGraded plans
Cost
Execution
Louis JACHIET 47 / 53
![Page 97: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/97.jpg)
Plans in Postgres
![Page 98: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/98.jpg)
EXPLAIN
Most databases allow the user to see the query plan used
Generally with the EXPLAIN COMMAND.
jachiet=> EXPLAIN SELECT name FROM songs WHERE to_tsvector('english',lyrics) @@ to_tsquery('make');
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on songs (cost=8.25..12.76 rows=1 width=32)
Recheck Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
-> Bitmap Index Scan on lyrics_idx (cost=0.00..8.25 rows=1 width=0)
Index Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
Interesting to understand why some queries are slow:
• maybe the query is more complex than you thought?
• maybe your are missing some indexes?
• maybe Postgres selects a dumb query plan?
Louis JACHIET 48 / 53
![Page 99: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/99.jpg)
EXPLAIN
Most databases allow the user to see the query plan used
Generally with the EXPLAIN COMMAND.
jachiet=> EXPLAIN SELECT name FROM songs WHERE to_tsvector('english',lyrics) @@ to_tsquery('make');
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on songs (cost=8.25..12.76 rows=1 width=32)
Recheck Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
-> Bitmap Index Scan on lyrics_idx (cost=0.00..8.25 rows=1 width=0)
Index Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
Interesting to understand why some queries are slow:
• maybe the query is more complex than you thought?
• maybe your are missing some indexes?
• maybe Postgres selects a dumb query plan?
Louis JACHIET 48 / 53
![Page 100: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/100.jpg)
EXPLAIN
Most databases allow the user to see the query plan used
Generally with the EXPLAIN COMMAND.
jachiet=> EXPLAIN SELECT name FROM songs WHERE to_tsvector('english',lyrics) @@ to_tsquery('make');
QUERY PLAN
---------------------------------------------------------------------------------------------
Bitmap Heap Scan on songs (cost=8.25..12.76 rows=1 width=32)
Recheck Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
-> Bitmap Index Scan on lyrics_idx (cost=0.00..8.25 rows=1 width=0)
Index Cond: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
Interesting to understand why some queries are slow:
• maybe the query is more complex than you thought?
• maybe your are missing some indexes?
• maybe Postgres selects a dumb query plan?
Louis JACHIET 48 / 53
![Page 101: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/101.jpg)
EXPLAIN ANALYZE
Adding ANALYZE makes the engine run the query and gather
statistics
=> EXPLAIN ANALYZE SELECT name FROM songs WHERE to_tsvector('english',lyrics) @@ to_tsquery('make');
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on songs (cost=0.00..2.02 rows=1 width=32) (actual time=1.412..3.126 rows=2 loops=1)
Filter: (to_tsvector('english'::regconfig, lyrics) @@ to_tsquery('make'::text))
Planning Time: 0.175 ms
Execution Time: 3.154 ms
Louis JACHIET 49 / 53
![Page 102: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/102.jpg)
Effect of the cardinality estimation
jachiet=> EXPLAIN SELECT * FROM students WHERE age = 20 ;
QUERY PLAN
-----------------------------------------------------------------------------------
Bitmap Heap Scan on students (cost=39.89..107.62 rows=1498 width=13)
Recheck Cond: (age = 20)
-> Bitmap Index Scan on students_age_idx (cost=0.00..39.52 rows=1498 width=0)
Index Cond: (age = 20)
(4 rows)
jachiet=> EXPLAIN SELECT * FROM students WHERE age = 33 ;
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using students_age_idx on students (cost=0.29..8.22 rows=1 width=13)
Index Cond: (age = 33)
(2 rows)
Louis JACHIET 50 / 53
![Page 103: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/103.jpg)
Understanding Postgres plans
The various Postgres operators
• SeqScan explore the full table
• Index Scan explore the relevant part of a table using an index
• Index Only Scan explore the index
• Bitmap Heap Scan like a SeqScan but might skip some of the
disk pages using a bitmap index
• Bitmap Index Scan builds a bitmap with an index
• Bitmap Or / Bitmap And builds a bitmap as the Or/And of
two bitmaps
• Filter, Nested loop, Hash join, Merge join, self explana-
tory.
• Materialize we need to talk about the execution...
Louis JACHIET 51 / 53
![Page 104: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/104.jpg)
Execution of query plans
Database engines prefer to “stream” tuples rather than computing
everything. For instance, for a filter:
Dofor t in subExpression:
if cond(t):
yield t
Don’tl = compute(subExpression)
return [e for e in l if cond(e)]
It avoids the materialization of large intermediate datasets
But it is sometimes unavoidable...
Louis JACHIET 52 / 53
![Page 105: SD202: Evaluation algorithms](https://reader031.vdocuments.mx/reader031/viewer/2022013018/61d0aa1946bab642fc594f71/html5/thumbnails/105.jpg)
Conclusion
• Queries in databases are automatically optimized
• It works well up
Louis JACHIET 53 / 53