temple university – cis dept. cis661 – principles of data management v. megalooikonomou query...

93
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Upload: jocelyn-phebe-walters

Post on 03-Jan-2016

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Temple University – CIS Dept. CIS661 – Principles of Data Management

V. MegalooikonomouQuery Optimization

(based on slides by C. Faloutsos at CMU)

Page 2: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

General Overview - rel. model Relational model - SQL Functional Dependencies &

Normalization Physical Design Indexing Query optimization Transaction processing

Page 3: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Overview of a DBMSDBAcasual

user

DML parser

buffer mgr

trans. mgr

DMLprecomp.

DDL parser

catalogData-files

Naïve user

Page 4: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies

Page 5: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Why Q-opt? SQL: ~declarative good q-opt -> big difference

eg., seq. Scan vs B-tree index, on P=1,000 pages

Page 6: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic

q-opt) generate alternative plans estimate cost; pick best

Page 7: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt - example

select name

from STUDENT, TAKES

where c-id=‘CIS661’ and

STUDENT.ssn=TAKES.ssn

STUDENT TAKES

STUDENT TAKES

Canonical form

Page 8: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt - example

STUDENT TAKES

Index; seq. scan

Hash join; merge join;

nested loops;

Page 9: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies

Page 10: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions A.k.a.: syntactic q-opt In short: perform selections and

projections early More details:

see transformation rules in text

Page 11: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions Q: How to prove a transformation

rule?

A: use TRC, to show that LHS = RHS, e.g.: )2()1()21(

?

RRRR PPP

)2()1()21(?

RRRR PPP

Page 12: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions

))()2())(1(

)()21(

)()21(

)2()1()21(?

tPRttPRt

tPRtRt

tPRRt

LHSt

RRRR PPP

Page 13: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions

QED

RHSt

RRt

RtRt

tPRttPRt

RRRR

PP

PP

PPP

)2()1(

))2(())1((

))()2())(1(

...

)2()1()21(?

Page 14: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions Q: how to disprove a rule??

)2()1()21(?

RRRR AAA

Page 15: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions Selections

perform them early break a complex predicate, and push

simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’

))...)((...()( 21^...2^1 RR pnpppnpp

Page 16: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions Projections

perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are

eliminated) project out all attributes except the

ones requested or required (e.g., joining attr.)

Page 17: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Equivalence of expressions Joins

Commutative , associative

Q: n-way join - how many diff. orderings? … Exhaustive enumeration too slow…

RSSR

)()( TSRTSR

Page 18: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic

q-opt) generate alt. plans estimate cost; pick best

Page 19: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

19

Cost estimation Eg., find ssn’s of students with an

‘A’ in CIS661 (using seq. scanning) How long will a query take?

CPU (but: small cost; decreasing; tough to estimate)

Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to

keep?)

Page 20: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation

Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in

bytes …

Sr

#1#2

#3

#nr

Page 21: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation Statistics: for each

relation ‘r’ we keep … V(A,r): number of

distinct values of attr. ‘A’

(recently, histograms, too)

Sr

#1#2

#3

#nr

Page 22: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Derivable statistics fr: blocking factor =

max# records/block (=?? )

br: # blocks (=?? ) SC(A,r) = selection

cardinality = avg# of records with A=given (=?? )

fr

Sr

#1

#2

#br

Page 23: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Derivable statistics fr: blocking factor = max#

records/block (= B/Sr ; B: block size in bytes)

br: # blocks (= nr / fr )

Page 24: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Derivable statistics SC(A,r) = selection cardinality =

avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 30,000 students, 10 colleges – how many students in CST?

Page 25: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Additional quantities we need:

For index ‘i’: fi: average fanout - degree (~50-100) HTi: # levels of index ‘i’ (~2-3)

~ log(#entries)/log(fi) LBi: # blocks at leaf level

HTi

Page 26: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Statistics Where do we store them? How often do we update them?

Page 27: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic q-

opt) generate alt. plans

selections; sorting; projections joins

estimate cost; pick best

Page 28: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation Selections – eg.,

select * from TAKESwhere grade =

‘A’ Plans? …

fr

Sr

#1

#2

#br

Page 29: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation Plans?

seq. scan binary search

(if sorted & consecutive)

index search if an index

exists

fr

Sr

#1

#2

#br

Page 30: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

seq. scan – cost? br (worst case) br/2 (average, if

we search for primary key)

fr

Sr

#1

#2

#br

Page 31: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

binary search – cost?if sorted and

consecutive: ~log(br) + SC(A,r)/fr (=#blocks

spanned by qualified tuples)

-1

fr

Sr

#1

#2

#br

Page 32: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

estimation of selection cardinalities SC(A,r):

non-trivial – details later

fr

Sr

#1

#2

#br

Page 33: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

method#3: index – cost? levels of index + blocks w/ qual. tuples

fr

Sr

#1

#2

#br

...

case#1: primary key

case#2: sec. key – clustering index

case#3: sec. key – non-clust. index

Page 34: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

method#3: index – cost? levels of index + blocks w/ qual. tuples

fr

Sr

#1

#2

#br

..

case#1: primary key – cost:

HTi + 1

HTi

Page 35: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

method#3: index - cost? levels of index + blocks w/ qual. tuples

fr

Sr

#1

#2

#br

case#2: sec. key – clustering index

OR prim. index on non-key

…retrieve multiple records

HTi + SC(A,r)/fr

HTi

Page 36: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

method#3: index – cost? levels of index + blocks w/ qual. tuples

fr

Sr

#1

#2

#br

...

case#3: sec. key – non-clust. index

HTi + SC(A,r)

(actually, pessimistic...)

Page 37: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation – arithmetic examples find accounts with branch-name =

‘Perryridge’ account(branch-name, balance, ...)

Page 38: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Arithm. examples – cont’d n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct

values V(branch-name, account) = 50

distinct values for branch-index: fanout fi = 20

Page 39: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses Q2: assume a clustering index on

branch-name – cost?

Page 40: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Cost estimation + plan generation

method#3: index – cost? levels of index + blocks w/ qual.

tuples

fr

Sr

#1

#2

#br

case#2: sec. key – clustering index

HTi + SC(A,r)/frHTi

Page 41: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Arithm. examples A2:

HTi + SC(branch-name, account)/f-account

HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+)

SC(..)= # qualified records = nr/V(A,r) = 10,000/50 = 200 tuples SC/f: spanning 200/20 blocks = 10 blocks

Page 42: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Arithm. examples A2 final answer: 2+10= 12 block

accesses (vs. 500 block accesses of seq.

scan) footnote: in all fairness

seq. disk accesses: ~2msec or less random disk accesses: ~10msec

Page 43: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic q-

opt) generate alternative plans

selections; sorting; projections joins

estimate cost; pick best

Page 44: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

General Overview - rel. model Relational model - SQL Functional Dependencies &

Normalization Physical Design Indexing Query optimization Transaction processing

Page 45: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg., parse

tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans

selections (simple; complex predicates) sorting; projections joins

estimate cost; pick best

Page 46: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Reminder – statistics: for each relation ‘r’ we keep

nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr.

‘A’ fr: blocking factor br: number of blocks SC(A,r): selection cardinality (avg.# of

records with A=given)

Page 47: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections we saw simple predicates

(A=constant; eg., ‘name=Smith’) how about more complex predicates,

like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’

what is their selectivity?

Page 48: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates

selectivity sel(P) of predicate P :== fraction of tuples that qualifysel(P) = SC(P) * nr

Page 49: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates

eg., assume that V(grade, TAKES)=5 distinct values

simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5

(what if V(A,r) is unknown??)grade

count

AF

Page 50: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates range query: sel( grade >= ‘C’)

sel(A>a) = (Amax – a) / (Amax – Amin)

grade

count

AF

Page 51: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections - complex predicates negation: sel( grade != ‘C’)

sel( not P) = 1 – sel(P) (Observation: selectivity =~

probability)

grade

count

AF

‘P’

Page 52: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates

conjunction: sel( grade = ‘C’ and course = ‘CIS661’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION

P1 P2

Page 53: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates

disjunction: sel( grade = ‘C’ or course = ‘CIS661’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again

P1 P2

Page 54: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – complex predicates

disjunction: in generalsel(P1 or P2 or … Pn) =1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn))

P1 P2

Page 55: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Selections – summary sel(A=constant) = 1/V(A,r) sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) –

sel(P1)*sel(P2)

UNIFORMITY and INDEPENDENCE ASSUMPTIONS

Page 56: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic q-

opt) generate alt. plans

selections (simple; complex predicates) sorting; projections joins

estimate cost; pick best

Page 57: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost?

...

12

M

1

br

...

‘r’

Page 58: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting Q1: how to sort (‘external sorting’)? A1:

create sorted runs of size M merge

...

12

M

1

br

...

‘r’

Page 59: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting create sorted runs of size M (how many?) merge them (how?)

M

... ...

Page 60: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ...

M

... ...…..

Page 61: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br

(each step reads every block once and writes it once)

M

... ...…..

Page 62: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br

M

... ...…..

Page 63: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic q-

opt) generate alt. plans

selections (simple; complex predicates) sorting; projections, aggregations joins

estimate cost; pick best

Page 64: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Projection - dupl. elimination

eg., select distinct c-idfrom TAKES

How?Pros and cons?

Page 65: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Set operations

eg., select * from REGULAR-STUDENTunionselect * from SPECIAL-STUDENT

How?Pros and cons?

Page 66: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Aggregations

eg., select ssn, avg(grade) from TAKES

How?

Page 67: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg., parse

tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans

selections; sorting; projections, aggregations joins

2-way joins n-way joins

estimate cost; pick best

Page 68: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S

have no common attribute) #of output tuples=??

Page 69: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand.

key for ‘r’ #of output tuples=??

r(A, ...)

s(A, ......)nr

ns

<=ns

Page 70: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand.

key for neither (is it possible??) #of output tuples=??

r(A, ...)

s(A, ......)nr

ns

Page 71: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins #of output tuples= nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less)

r(A, ...)

s(A, ......)nr

ns

Page 72: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg., parse

tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans

selections; sorting; projections, aggregations joins

2-way joins - output size estimation; algorithms n-way joins

estimate cost; pick best

Page 73: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins algorithm(s) for r JOIN s? nr, ns tuples each

r(A, ...)

s(A, ......)nr

ns

Page 74: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Algorithm #0: (naive) nested loop

(SLOW!)for each tuple tr of r

for each tuple ts of sprint, if they match

r(A, ...)

s(A, ......)nr

ns

Page 75: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and

‘bs’ are the number of blocks for ‘r’ and ‘s’)?r(A, ...)

s(A, ......)nr

ns

nr*bs + br

Page 76: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Algorithm #1: Blocked nested-loop join

read in a block of r read in a block of s

print matching tuples

r(A, ...)

s(A, ......)nr,

brns records, bs blocks

cost: br + br * bs

Page 77: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Arithmetic example:

nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks

r(A, ...)

s(A, ......)10,000

1,000 1,000 records,

200 blocks

alg#0: 2,001,000 d.a.

alg#1: 201,000 d.a.

Page 78: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Observation1: Algo#1: asymmetric:

cost: br + br * bs - reverse roles: cost= bs + bs*br

Best choice?

r(A, ...)

s(A, ......)nr,

brns records, bs blocks

smallest relation in outer loop

Page 79: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joinsObservation2 [NOT IN BOOK]:

what if we have ‘k’ buffers available?

r(A, ...)

s(A, ......)nr,

brns records, bs blocks

read in ‘k-1’ blocks of ‘r’

read in a block of ‘s’

print matching tuples

Page 80: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Cost?

r(A, ...)

s(A, ......)nr,

brns records, bs blocks

read in ‘k-1’ blocks of ‘r’

read in a block of ‘s’

print matching tuples

br + br/(k-1) * bs

what if br=k-1?what if we assign k-1 blocks to inner?)

Page 81: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Observation3: can we get rid of the ‘br’ term?

cost: br + br * bs

r(A, ...)

s(A, ......)nr,

brns records, bs blocks

A: read the inner relation backwards half of the times! Q: cons?

Page 82: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each

r(A, ...)

s(A, ......)nr

ns

Page 83: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins - other algo’s sort-merge

sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted)

r(A, ...)

s(A, ......)nr

ns

Page 84: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

2-way joins - other algo’s sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order

r(A, ...)

s(A, ......)nr

ns

Page 85: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

use an existing index, or even build one on the fly

cost: br + nr * c (c: look-up cost)

2-way joins - other algo’s

r(A, ...)

nr

s(A, ......)

ns

Page 86: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets

2-way joins - other algo’s

r(A, ...)s(A, ......)

0

1

max

Page 87: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

how to join each pair of partitions Hr-i, Hs-i ?

A: build another hash table for Hs-i, and probe (look up) it with each tuple of Hr-i

2-way joins - hash join details

r(A, ...)s(A, ......)

Hr-0

0

1

max

Hs-0

Page 88: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

what if Hs-i is too large to fit in main-memory?

A: recursive partitioning more details (overflows, hybrid hash

joins): in book cost of hash join? (under certain

assumptions) 2(br + bs) + (br + bs) + 4* max

2-way joins - hash join details

Page 89: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg., parse

tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans

selections; sorting; projections, aggregations joins

2-way joins - output size estimation; algorithms n-way joins

estimate cost; pick best

Page 90: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way

joins

n-way joins

Page 91: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only

one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal

2-way joins: nested-loop and sort-merge

Structure of query optimizers:

r1 r2 r3 r4

Page 92: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book

In general: q-opt is very important for large databases.

(‘explain select <sql-statement>’ gives plan)

Structure of query optimizers:

Page 93: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)

Q-opt steps bring query in internal form (eg.,

parse tree) … into ‘canonical form’ (syntactic q-

opt) generate alt. plans

selections (simple; complex predicates) sorting; projections joins

estimate cost; pick best