query evaluation and optimization

64
Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21

Upload: others

Post on 18-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query Evaluation and Optimization

Query Evaluation and Optimization

Jan ChomickiUniversity at Buffalo

Jan Chomicki () Query Evaluation and Optimization 1 / 21

Page 2: Query Evaluation and Optimization

Evaluating σE (R)

Pick the option of least estimated cost.

Any E

Linear (sequential) scan.

E is A = cif file sorted on A, use binary search

if file has an index on A, use the index

if file hashed on A, use hashing

E is Aθc for θ ∈ {<,≤, >,≥}if file sorted on A, then:

I use binary searchI if there is an index on A, use the index

Jan Chomicki () Query Evaluation and Optimization 2 / 21

Page 3: Query Evaluation and Optimization

Evaluating σE (R)

Pick the option of least estimated cost.

Any E

Linear (sequential) scan.

E is A = cif file sorted on A, use binary search

if file has an index on A, use the index

if file hashed on A, use hashing

E is Aθc for θ ∈ {<,≤, >,≥}if file sorted on A, then:

I use binary searchI if there is an index on A, use the index

Jan Chomicki () Query Evaluation and Optimization 2 / 21

Page 4: Query Evaluation and Optimization

Evaluating σE (R)

Pick the option of least estimated cost.

Any E

Linear (sequential) scan.

E is A = cif file sorted on A, use binary search

if file has an index on A, use the index

if file hashed on A, use hashing

E is Aθc for θ ∈ {<,≤, >,≥}if file sorted on A, then:

I use binary searchI if there is an index on A, use the index

Jan Chomicki () Query Evaluation and Optimization 2 / 21

Page 5: Query Evaluation and Optimization

Evaluating σE (R)

Pick the option of least estimated cost.

Any E

Linear (sequential) scan.

E is A = cif file sorted on A, use binary search

if file has an index on A, use the index

if file hashed on A, use hashing

E is Aθc for θ ∈ {<,≤, >,≥}if file sorted on A, then:

I use binary searchI if there is an index on A, use the index

Jan Chomicki () Query Evaluation and Optimization 2 / 21

Page 6: Query Evaluation and Optimization

Evaluating σE (R)

Pick the option of least estimated cost.

Any E

Linear (sequential) scan.

E is A = cif file sorted on A, use binary search

if file has an index on A, use the index

if file hashed on A, use hashing

E is Aθc for θ ∈ {<,≤, >,≥}if file sorted on A, then:

I use binary searchI if there is an index on A, use the index

Jan Chomicki () Query Evaluation and Optimization 2 / 21

Page 7: Query Evaluation and Optimization

Complex selection conditions

E is E1 ∧ E2

use E1 or E2 to narrow down the search

use composite index (if available)

use indexes for E1 and E2 to obtain sets of RowIds, then intersect those sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean AND

use a combination of bitmap and other kinds of indexes

E is E1 ∨ E2

E1 or E2 cannot be used separately to narrow down the search

use indexes for E1 and E2 to obtain sets of RowIds, then take the union ofthose sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean OR

use a combination of bitmap and other kinds of indexes

Jan Chomicki () Query Evaluation and Optimization 3 / 21

Page 8: Query Evaluation and Optimization

Complex selection conditions

E is E1 ∧ E2

use E1 or E2 to narrow down the search

use composite index (if available)

use indexes for E1 and E2 to obtain sets of RowIds, then intersect those sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean AND

use a combination of bitmap and other kinds of indexes

E is E1 ∨ E2

E1 or E2 cannot be used separately to narrow down the search

use indexes for E1 and E2 to obtain sets of RowIds, then take the union ofthose sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean OR

use a combination of bitmap and other kinds of indexes

Jan Chomicki () Query Evaluation and Optimization 3 / 21

Page 9: Query Evaluation and Optimization

Complex selection conditions

E is E1 ∧ E2

use E1 or E2 to narrow down the search

use composite index (if available)

use indexes for E1 and E2 to obtain sets of RowIds, then intersect those sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean AND

use a combination of bitmap and other kinds of indexes

E is E1 ∨ E2

E1 or E2 cannot be used separately to narrow down the search

use indexes for E1 and E2 to obtain sets of RowIds, then take the union ofthose sets

use bitmap indexes for E1 and E2 to obtain bitvectors, then use boolean OR

use a combination of bitmap and other kinds of indexes

Jan Chomicki () Query Evaluation and Optimization 3 / 21

Page 10: Query Evaluation and Optimization

Evaluating R ./A=B

S

Nested loops

foreach t ∈ R do

foreach s ∈ S do

if t[A] = s[B] then output (t, s)

Index join (S has an index on B)

foreach t ∈ R do

lookup S using the index

and the value t[A]

Sort-merge join (R sorted on A, S sorted on B)

scan files in sorted order

matching A-values in Rwith B-values in S

Jan Chomicki () Query Evaluation and Optimization 4 / 21

Page 11: Query Evaluation and Optimization

Evaluating R ./A=B

S

Nested loops

foreach t ∈ R do

foreach s ∈ S do

if t[A] = s[B] then output (t, s)

Index join (S has an index on B)

foreach t ∈ R do

lookup S using the index

and the value t[A]

Sort-merge join (R sorted on A, S sorted on B)

scan files in sorted order

matching A-values in Rwith B-values in S

Jan Chomicki () Query Evaluation and Optimization 4 / 21

Page 12: Query Evaluation and Optimization

Evaluating R ./A=B

S

Nested loops

foreach t ∈ R do

foreach s ∈ S do

if t[A] = s[B] then output (t, s)

Index join (S has an index on B)

foreach t ∈ R do

lookup S using the index

and the value t[A]

Sort-merge join (R sorted on A, S sorted on B)

scan files in sorted order

matching A-values in Rwith B-values in S

Jan Chomicki () Query Evaluation and Optimization 4 / 21

Page 13: Query Evaluation and Optimization

Evaluating R ./A=B

S

Nested loops

foreach t ∈ R do

foreach s ∈ S do

if t[A] = s[B] then output (t, s)

Index join (S has an index on B)

foreach t ∈ R do

lookup S using the index

and the value t[A]

Sort-merge join (R sorted on A, S sorted on B)

scan files in sorted order

matching A-values in Rwith B-values in S

Jan Chomicki () Query Evaluation and Optimization 4 / 21

Page 14: Query Evaluation and Optimization

Hash-join

create new hashed files HR and HS;foreach t ∈ R do

i := h(t[A]);store t in bucket ri of HR;

foreach t ∈ S do

i := h(t[A]);store t in bucket si of HS;

for each bucket ri of HRread ri and build an in-memory hash index

for each tuple t in the bucket si of Sprobe the hash index to find and output

the tuples w such that w [A] = t[A]

the hash function used to build the hash index and to probe it has to bedifferent than h

each bucket of HR has to fit in main memory

Jan Chomicki () Query Evaluation and Optimization 5 / 21

Page 15: Query Evaluation and Optimization

Hash-join

create new hashed files HR and HS;foreach t ∈ R do

i := h(t[A]);store t in bucket ri of HR;

foreach t ∈ S do

i := h(t[A]);store t in bucket si of HS;

for each bucket ri of HRread ri and build an in-memory hash index

for each tuple t in the bucket si of Sprobe the hash index to find and output

the tuples w such that w [A] = t[A]

the hash function used to build the hash index and to probe it has to bedifferent than h

each bucket of HR has to fit in main memory

Jan Chomicki () Query Evaluation and Optimization 5 / 21

Page 16: Query Evaluation and Optimization

Hash-join

create new hashed files HR and HS;foreach t ∈ R do

i := h(t[A]);store t in bucket ri of HR;

foreach t ∈ S do

i := h(t[A]);store t in bucket si of HS;

for each bucket ri of HRread ri and build an in-memory hash index

for each tuple t in the bucket si of Sprobe the hash index to find and output

the tuples w such that w [A] = t[A]

the hash function used to build the hash index and to probe it has to bedifferent than h

each bucket of HR has to fit in main memory

Jan Chomicki () Query Evaluation and Optimization 5 / 21

Page 17: Query Evaluation and Optimization

Oracle: selection and join

Equality condition A = c

unique index on A: INDEX UNIQUE SCAN

nonunique index on A: INDEX RANGE SCAN

Range condition on A

any index on A: INDEX RANGE SCAN

Conjunction

both indexes: AND-EQUAL (intersection of RowId sets)

Join1 SORT JOIN/MERGE JOIN (sort-merge)

2 NESTED LOOPS (nested loop or index join)

3 HASH JOIN

Jan Chomicki () Query Evaluation and Optimization 6 / 21

Page 18: Query Evaluation and Optimization

Oracle: selection and join

Equality condition A = c

unique index on A: INDEX UNIQUE SCAN

nonunique index on A: INDEX RANGE SCAN

Range condition on A

any index on A: INDEX RANGE SCAN

Conjunction

both indexes: AND-EQUAL (intersection of RowId sets)

Join1 SORT JOIN/MERGE JOIN (sort-merge)

2 NESTED LOOPS (nested loop or index join)

3 HASH JOIN

Jan Chomicki () Query Evaluation and Optimization 6 / 21

Page 19: Query Evaluation and Optimization

Oracle: selection and join

Equality condition A = c

unique index on A: INDEX UNIQUE SCAN

nonunique index on A: INDEX RANGE SCAN

Range condition on A

any index on A: INDEX RANGE SCAN

Conjunction

both indexes: AND-EQUAL (intersection of RowId sets)

Join1 SORT JOIN/MERGE JOIN (sort-merge)

2 NESTED LOOPS (nested loop or index join)

3 HASH JOIN

Jan Chomicki () Query Evaluation and Optimization 6 / 21

Page 20: Query Evaluation and Optimization

Oracle: selection and join

Equality condition A = c

unique index on A: INDEX UNIQUE SCAN

nonunique index on A: INDEX RANGE SCAN

Range condition on A

any index on A: INDEX RANGE SCAN

Conjunction

both indexes: AND-EQUAL (intersection of RowId sets)

Join1 SORT JOIN/MERGE JOIN (sort-merge)

2 NESTED LOOPS (nested loop or index join)

3 HASH JOIN

Jan Chomicki () Query Evaluation and Optimization 6 / 21

Page 21: Query Evaluation and Optimization

Oracle: selection and join

Equality condition A = c

unique index on A: INDEX UNIQUE SCAN

nonunique index on A: INDEX RANGE SCAN

Range condition on A

any index on A: INDEX RANGE SCAN

Conjunction

both indexes: AND-EQUAL (intersection of RowId sets)

Join1 SORT JOIN/MERGE JOIN (sort-merge)

2 NESTED LOOPS (nested loop or index join)

3 HASH JOIN

Jan Chomicki () Query Evaluation and Optimization 6 / 21

Page 22: Query Evaluation and Optimization

Query optimization

Stages:

1 Parsing, validation, view expansion.

2 Logical plan selection (using algebraic laws).

3 Physical plan selection (cost-based)

Jan Chomicki () Query Evaluation and Optimization 7 / 21

Page 23: Query Evaluation and Optimization

Query optimization

Stages:

1 Parsing, validation, view expansion.

2 Logical plan selection (using algebraic laws).

3 Physical plan selection (cost-based)

Jan Chomicki () Query Evaluation and Optimization 7 / 21

Page 24: Query Evaluation and Optimization

Algebraic laws

Assumption

Tuples are mappings from attribute names to domain values.

Join (and Cartesian product)

E1 1 E2 = E2 1 E1

(E1 1 E2) 1 E3 = E1 1 (E2 1 E3).

Cascading

πA1,...,An(πA1,...,An,...,An+k(E )) = πA1,...,An(E )

σF1(σF2(E )) = σF1∧F2(E )

Jan Chomicki () Query Evaluation and Optimization 8 / 21

Page 25: Query Evaluation and Optimization

Algebraic laws

Assumption

Tuples are mappings from attribute names to domain values.

Join (and Cartesian product)

E1 1 E2 = E2 1 E1

(E1 1 E2) 1 E3 = E1 1 (E2 1 E3).

Cascading

πA1,...,An(πA1,...,An,...,An+k(E )) = πA1,...,An(E )

σF1(σF2(E )) = σF1∧F2(E )

Jan Chomicki () Query Evaluation and Optimization 8 / 21

Page 26: Query Evaluation and Optimization

Algebraic laws

Assumption

Tuples are mappings from attribute names to domain values.

Join (and Cartesian product)

E1 1 E2 = E2 1 E1

(E1 1 E2) 1 E3 = E1 1 (E2 1 E3).

Cascading

πA1,...,An(πA1,...,An,...,An+k(E )) = πA1,...,An(E )

σF1(σF2(E )) = σF1∧F2(E )

Jan Chomicki () Query Evaluation and Optimization 8 / 21

Page 27: Query Evaluation and Optimization

Algebraic laws

Assumption

Tuples are mappings from attribute names to domain values.

Join (and Cartesian product)

E1 1 E2 = E2 1 E1

(E1 1 E2) 1 E3 = E1 1 (E2 1 E3).

Cascading

πA1,...,An(πA1,...,An,...,An+k(E )) = πA1,...,An(E )

σF1(σF2(E )) = σF1∧F2(E )

Jan Chomicki () Query Evaluation and Optimization 8 / 21

Page 28: Query Evaluation and Optimization

Commuting projections

With selection

πA1,...,An(σF (E )) = σF (πA1,...,An(E ))

if F does not involve any attributes othet than A1, . . . ,An.

With Cartesian product

πA1,...,An,An+1,...,An+k(E1 × E2) =

πA1,...,An(E1)× πAn+1,...,An+k(E2)

where A1, . . . ,An come from E1 and An+1, . . . ,An+k come from E2.

With union

πA1,...,An(E1 ∪ E2) = πA1,...,An(E1) ∪ πA1,...,An(E2).

Jan Chomicki () Query Evaluation and Optimization 9 / 21

Page 29: Query Evaluation and Optimization

Commuting projections

With selection

πA1,...,An(σF (E )) = σF (πA1,...,An(E ))

if F does not involve any attributes othet than A1, . . . ,An.

With Cartesian product

πA1,...,An,An+1,...,An+k(E1 × E2) =

πA1,...,An(E1)× πAn+1,...,An+k(E2)

where A1, . . . ,An come from E1 and An+1, . . . ,An+k come from E2.

With union

πA1,...,An(E1 ∪ E2) = πA1,...,An(E1) ∪ πA1,...,An(E2).

Jan Chomicki () Query Evaluation and Optimization 9 / 21

Page 30: Query Evaluation and Optimization

Commuting projections

With selection

πA1,...,An(σF (E )) = σF (πA1,...,An(E ))

if F does not involve any attributes othet than A1, . . . ,An.

With Cartesian product

πA1,...,An,An+1,...,An+k(E1 × E2) =

πA1,...,An(E1)× πAn+1,...,An+k(E2)

where A1, . . . ,An come from E1 and An+1, . . . ,An+k come from E2.

With union

πA1,...,An(E1 ∪ E2) = πA1,...,An(E1) ∪ πA1,...,An(E2).

Jan Chomicki () Query Evaluation and Optimization 9 / 21

Page 31: Query Evaluation and Optimization

Commuting projections

With selection

πA1,...,An(σF (E )) = σF (πA1,...,An(E ))

if F does not involve any attributes othet than A1, . . . ,An.

With Cartesian product

πA1,...,An,An+1,...,An+k(E1 × E2) =

πA1,...,An(E1)× πAn+1,...,An+k(E2)

where A1, . . . ,An come from E1 and An+1, . . . ,An+k come from E2.

With union

πA1,...,An(E1 ∪ E2) = πA1,...,An(E1) ∪ πA1,...,An(E2).

Jan Chomicki () Query Evaluation and Optimization 9 / 21

Page 32: Query Evaluation and Optimization

Commuting selections

With Cartesian product

σF (E1 × E2) = σF (E1)× E2

if F involves only the attributes of E1.

With union

σF (E1 ∪ E2) = σF (E1) ∪ σF (E2).

With set difference

σF (E1 − E2) = σF (E1)− σF (E2).

Jan Chomicki () Query Evaluation and Optimization 10 / 21

Page 33: Query Evaluation and Optimization

Commuting selections

With Cartesian product

σF (E1 × E2) = σF (E1)× E2

if F involves only the attributes of E1.

With union

σF (E1 ∪ E2) = σF (E1) ∪ σF (E2).

With set difference

σF (E1 − E2) = σF (E1)− σF (E2).

Jan Chomicki () Query Evaluation and Optimization 10 / 21

Page 34: Query Evaluation and Optimization

Commuting selections

With Cartesian product

σF (E1 × E2) = σF (E1)× E2

if F involves only the attributes of E1.

With union

σF (E1 ∪ E2) = σF (E1) ∪ σF (E2).

With set difference

σF (E1 − E2) = σF (E1)− σF (E2).

Jan Chomicki () Query Evaluation and Optimization 10 / 21

Page 35: Query Evaluation and Optimization

Commuting selections

With Cartesian product

σF (E1 × E2) = σF (E1)× E2

if F involves only the attributes of E1.

With union

σF (E1 ∪ E2) = σF (E1) ∪ σF (E2).

With set difference

σF (E1 − E2) = σF (E1)− σF (E2).

Jan Chomicki () Query Evaluation and Optimization 10 / 21

Page 36: Query Evaluation and Optimization

Cost-based query optimization

Logical query plan

expression in relational algebra

produced by rewrite rules that are obtained from the algebraic laws.

Physical query plan

operators: different implementations of relational algebra operators

table-scan, index-scan, sort-scan, sort.

A physical query plan of least estimated cost is produced from a logical query plan.

Jan Chomicki () Query Evaluation and Optimization 11 / 21

Page 37: Query Evaluation and Optimization

Cost-based query optimization

Logical query plan

expression in relational algebra

produced by rewrite rules that are obtained from the algebraic laws.

Physical query plan

operators: different implementations of relational algebra operators

table-scan, index-scan, sort-scan, sort.

A physical query plan of least estimated cost is produced from a logical query plan.

Jan Chomicki () Query Evaluation and Optimization 11 / 21

Page 38: Query Evaluation and Optimization

Choices to be made

1 order and grouping of joins

2 choosing an algorithm for each operator occurrence

3 adding other operators like sorting

4 materialization vs. pipelining

Assumptions

relation ≡ file

cost ≡ size of intermediate relations

Jan Chomicki () Query Evaluation and Optimization 12 / 21

Page 39: Query Evaluation and Optimization

Choices to be made

1 order and grouping of joins

2 choosing an algorithm for each operator occurrence

3 adding other operators like sorting

4 materialization vs. pipelining

Assumptions

relation ≡ file

cost ≡ size of intermediate relations

Jan Chomicki () Query Evaluation and Optimization 12 / 21

Page 40: Query Evaluation and Optimization

Choices to be made

1 order and grouping of joins

2 choosing an algorithm for each operator occurrence

3 adding other operators like sorting

4 materialization vs. pipelining

Assumptions

relation ≡ file

cost ≡ size of intermediate relations

Jan Chomicki () Query Evaluation and Optimization 12 / 21

Page 41: Query Evaluation and Optimization

Cost estimation using size estimates

Size estimatesaccurate

easy to compute

consistent

Notation

B(R): number of blocks in relation R

T (R): number of tuples in relation R

V (R,A): number of distinct values R has in attribute A (can be generalizedto sets of attributes)

Jan Chomicki () Query Evaluation and Optimization 13 / 21

Page 42: Query Evaluation and Optimization

Cost estimation using size estimates

Size estimatesaccurate

easy to compute

consistent

Notation

B(R): number of blocks in relation R

T (R): number of tuples in relation R

V (R,A): number of distinct values R has in attribute A (can be generalizedto sets of attributes)

Jan Chomicki () Query Evaluation and Optimization 13 / 21

Page 43: Query Evaluation and Optimization

Cost estimation using size estimates

Size estimatesaccurate

easy to compute

consistent

Notation

B(R): number of blocks in relation R

T (R): number of tuples in relation R

V (R,A): number of distinct values R has in attribute A (can be generalizedto sets of attributes)

Jan Chomicki () Query Evaluation and Optimization 13 / 21

Page 44: Query Evaluation and Optimization

Relational algebra operators

Projection

T (πX (R)) = T (R)

Selection

T (σA=c(R)) = T (R)/V (R,A)

T (σA<c(R) = T (R)/3

T (σC1∧C2(R) = T (σC1(σC2(R)))

T (σC1∨C2(R)) = min(T (R),T (σC1(R)) + T (σC2(R)))

Those estimates may be inconsistent.

Jan Chomicki () Query Evaluation and Optimization 14 / 21

Page 45: Query Evaluation and Optimization

Relational algebra operators

Projection

T (πX (R)) = T (R)

Selection

T (σA=c(R)) = T (R)/V (R,A)

T (σA<c(R) = T (R)/3

T (σC1∧C2(R) = T (σC1(σC2(R)))

T (σC1∨C2(R)) = min(T (R),T (σC1(R)) + T (σC2(R)))

Those estimates may be inconsistent.

Jan Chomicki () Query Evaluation and Optimization 14 / 21

Page 46: Query Evaluation and Optimization

Relational algebra operators

Projection

T (πX (R)) = T (R)

Selection

T (σA=c(R)) = T (R)/V (R,A)

T (σA<c(R) = T (R)/3

T (σC1∧C2(R) = T (σC1(σC2(R)))

T (σC1∨C2(R)) = min(T (R),T (σC1(R)) + T (σC2(R)))

Those estimates may be inconsistent.

Jan Chomicki () Query Evaluation and Optimization 14 / 21

Page 47: Query Evaluation and Optimization

Relational algebra operators

Projection

T (πX (R)) = T (R)

Selection

T (σA=c(R)) = T (R)/V (R,A)

T (σA<c(R) = T (R)/3

T (σC1∧C2(R) = T (σC1(σC2(R)))

T (σC1∨C2(R)) = min(T (R),T (σC1(R)) + T (σC2(R)))

Those estimates may be inconsistent.

Jan Chomicki () Query Evaluation and Optimization 14 / 21

Page 48: Query Evaluation and Optimization

Natural join R(X ,Y ) 1 S(Y ,Z )

Simplifying assumptions

containment: V (R,Y ) ≤ V (S ,Y ) implies πY (R) ⊆ πY (S)

preservation: V (R 1 S ,A) = V (R,A).

Y consists of one attribute A

T (R 1 S) =T (R)T (S)

max(V (R,A),V (S ,A)).

Y consists of n attributes A1, . . . ,An

T (R 1 S) =T (R)T (S)∏

j=1,...,n max(V (R,Aj),V (S ,Aj)).

Jan Chomicki () Query Evaluation and Optimization 15 / 21

Page 49: Query Evaluation and Optimization

Natural join R(X ,Y ) 1 S(Y ,Z )

Simplifying assumptions

containment: V (R,Y ) ≤ V (S ,Y ) implies πY (R) ⊆ πY (S)

preservation: V (R 1 S ,A) = V (R,A).

Y consists of one attribute A

T (R 1 S) =T (R)T (S)

max(V (R,A),V (S ,A)).

Y consists of n attributes A1, . . . ,An

T (R 1 S) =T (R)T (S)∏

j=1,...,n max(V (R,Aj),V (S ,Aj)).

Jan Chomicki () Query Evaluation and Optimization 15 / 21

Page 50: Query Evaluation and Optimization

Natural join R(X ,Y ) 1 S(Y ,Z )

Simplifying assumptions

containment: V (R,Y ) ≤ V (S ,Y ) implies πY (R) ⊆ πY (S)

preservation: V (R 1 S ,A) = V (R,A).

Y consists of one attribute A

T (R 1 S) =T (R)T (S)

max(V (R,A),V (S ,A)).

Y consists of n attributes A1, . . . ,An

T (R 1 S) =T (R)T (S)∏

j=1,...,n max(V (R,Aj),V (S ,Aj)).

Jan Chomicki () Query Evaluation and Optimization 15 / 21

Page 51: Query Evaluation and Optimization

Multiway join S = R1 1 · · · 1 Rk

Assumption

A1, . . .An are the attributes that appear in more than one relation.

A-factor M(A)

Product of V (Ri ,A) for every relation Ri in which A appears, except for thesmallest V (Ri ,A).

Multiway join size

T (R1 1 · · · 1 Rk) =

∏i=1,...,k T (Ri )∏j=1,...,n M(Aj)

.

Jan Chomicki () Query Evaluation and Optimization 16 / 21

Page 52: Query Evaluation and Optimization

Multiway join S = R1 1 · · · 1 Rk

Assumption

A1, . . .An are the attributes that appear in more than one relation.

A-factor M(A)

Product of V (Ri ,A) for every relation Ri in which A appears, except for thesmallest V (Ri ,A).

Multiway join size

T (R1 1 · · · 1 Rk) =

∏i=1,...,k T (Ri )∏j=1,...,n M(Aj)

.

Jan Chomicki () Query Evaluation and Optimization 16 / 21

Page 53: Query Evaluation and Optimization

Multiway join S = R1 1 · · · 1 Rk

Assumption

A1, . . .An are the attributes that appear in more than one relation.

A-factor M(A)

Product of V (Ri ,A) for every relation Ri in which A appears, except for thesmallest V (Ri ,A).

Multiway join size

T (R1 1 · · · 1 Rk) =

∏i=1,...,k T (Ri )∏j=1,...,n M(Aj)

.

Jan Chomicki () Query Evaluation and Optimization 16 / 21

Page 54: Query Evaluation and Optimization

Properties

consistency: join size estimate is independent of the order and the groupingof the joins

accuracy: better estimates exist, for example based on histograms

Statisticscomputed periodically, on-demand or incrementally

do not have to be very accurate

Jan Chomicki () Query Evaluation and Optimization 17 / 21

Page 55: Query Evaluation and Optimization

Properties

consistency: join size estimate is independent of the order and the groupingof the joins

accuracy: better estimates exist, for example based on histograms

Statisticscomputed periodically, on-demand or incrementally

do not have to be very accurate

Jan Chomicki () Query Evaluation and Optimization 17 / 21

Page 56: Query Evaluation and Optimization

Properties

consistency: join size estimate is independent of the order and the groupingof the joins

accuracy: better estimates exist, for example based on histograms

Statisticscomputed periodically, on-demand or incrementally

do not have to be very accurate

Jan Chomicki () Query Evaluation and Optimization 17 / 21

Page 57: Query Evaluation and Optimization

Computing the best plan for R1 1 · · · 1 Rk

Constructing a table D(Size,LeastCost,BestPlan)

Base cases:I One relation R: D(T (R), 0,R)I One join Ri 1 Rj : D(T (Ri 1 Rj), 0,Ri 1 Rj) if T (Ri ) ≤ T (Rj)

Inductive case ./R

:

1 R = {Ri1 , . . . ,Rim} for m < k2 for every j ∈ {i1, . . . , im}, retrieve the tuple D(T ( ./

i 6=j,Ri∈RRi ),Cj ,Ej)

3 find p ∈ {i1, . . . , im} such that Cp + T (( ./i 6=p,Ri∈R

Ri )) is minimal

4 let Best = ( ./i 6=p,Ri∈R

Ri )

5 return D(T (Best 1 Rp),Cp + T (Best),Ep 1 Rp)

How many plans are considered?

Jan Chomicki () Query Evaluation and Optimization 18 / 21

Page 58: Query Evaluation and Optimization

Computing the best plan for R1 1 · · · 1 Rk

Constructing a table D(Size,LeastCost,BestPlan)

Base cases:I One relation R: D(T (R), 0,R)I One join Ri 1 Rj : D(T (Ri 1 Rj), 0,Ri 1 Rj) if T (Ri ) ≤ T (Rj)

Inductive case ./R

:

1 R = {Ri1 , . . . ,Rim} for m < k2 for every j ∈ {i1, . . . , im}, retrieve the tuple D(T ( ./

i 6=j,Ri∈RRi ),Cj ,Ej)

3 find p ∈ {i1, . . . , im} such that Cp + T (( ./i 6=p,Ri∈R

Ri )) is minimal

4 let Best = ( ./i 6=p,Ri∈R

Ri )

5 return D(T (Best 1 Rp),Cp + T (Best),Ep 1 Rp)

How many plans are considered?

Jan Chomicki () Query Evaluation and Optimization 18 / 21

Page 59: Query Evaluation and Optimization

Computing the best plan for R1 1 · · · 1 Rk

Constructing a table D(Size,LeastCost,BestPlan)

Base cases:I One relation R: D(T (R), 0,R)I One join Ri 1 Rj : D(T (Ri 1 Rj), 0,Ri 1 Rj) if T (Ri ) ≤ T (Rj)

Inductive case ./R

:

1 R = {Ri1 , . . . ,Rim} for m < k2 for every j ∈ {i1, . . . , im}, retrieve the tuple D(T ( ./

i 6=j,Ri∈RRi ),Cj ,Ej)

3 find p ∈ {i1, . . . , im} such that Cp + T (( ./i 6=p,Ri∈R

Ri )) is minimal

4 let Best = ( ./i 6=p,Ri∈R

Ri )

5 return D(T (Best 1 Rp),Cp + T (Best),Ep 1 Rp)

How many plans are considered?

Jan Chomicki () Query Evaluation and Optimization 18 / 21

Page 60: Query Evaluation and Optimization

Refinements

More sophisticated cost measure

calculating the cost of using different algorithms for the same operation

keeping multiple plans, together with their costs, to account for interestingjoin orders.

Greedy algorithm1 start with a pair of relations whose estimated join size is minimal

2 keep adding further relations with minimal estimated join size

Jan Chomicki () Query Evaluation and Optimization 19 / 21

Page 61: Query Evaluation and Optimization

Refinements

More sophisticated cost measure

calculating the cost of using different algorithms for the same operation

keeping multiple plans, together with their costs, to account for interestingjoin orders.

Greedy algorithm1 start with a pair of relations whose estimated join size is minimal

2 keep adding further relations with minimal estimated join size

Jan Chomicki () Query Evaluation and Optimization 19 / 21

Page 62: Query Evaluation and Optimization

Completing the evaluation plan

Operations

TableScan(Relation)

SortScan(Relation, Attributes)

IndexScan(Relation, Condition)

Filter(Condition)

Sort(Attributes)

different algorithms for the basic operations

Pipelining vs. materialization.

Jan Chomicki () Query Evaluation and Optimization 20 / 21

Page 63: Query Evaluation and Optimization

Completing the evaluation plan

Operations

TableScan(Relation)

SortScan(Relation, Attributes)

IndexScan(Relation, Condition)

Filter(Condition)

Sort(Attributes)

different algorithms for the basic operations

Pipelining vs. materialization.

Jan Chomicki () Query Evaluation and Optimization 20 / 21

Page 64: Query Evaluation and Optimization

Jan Chomicki () Query Evaluation and Optimization 21 / 21