pessimistic query optimization: tighter upper bounds for ... · pessimistic query optimization:...

56
Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska Dan Suciu University of Washington [walter,magda,suciu]@cs.washington.edu July 19th, 2019 Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 1 / 27

Upload: others

Post on 18-Mar-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Pessimistic Query Optimization: Tighter UpperBounds for Intermediate Join Cardinalities

Walter Cai Magdalena Balazinska Dan Suciu

University of Washington

[walter,magda,suciu]@cs.washington.edu

July 19th, 2019

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 1 / 27

Page 2: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Systematic Underestimation

Query optimizers assume:I UniformityI Independence

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 2 / 27

Page 3: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

1 Background: Cardinality Bounds

2 Tightened Cardinality Bounds

3 Evaluation

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 3 / 27

Page 4: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

1 Background: Cardinality Bounds

2 Tightened Cardinality Bounds

3 Evaluation

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 3 / 27

Page 5: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Example Query (SQL)

SELECT

*

FROM

pseudonym ,

cast,

movie_companies ,

company_name

WHERE

pseudonym.person id = cast.person id AND

cast.movie id id = movie_companies.movie id AND

movie_companies.company id = company_name.id;

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 4 / 27

Page 6: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Example Query (Join Graph & Datalog)

movie companies

cast

pseudonym

company name

Figure: Join Graph.

Q(x, y, z,w) Dpseudo(x, y),

cast(y, z),

mc(z,w),

cn(w)

y 7→person

z 7→movie

w 7→company

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 5 / 27

Page 7: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Review: Entropy

Take random variable X :

h(X) = −∑

a

P(X = a) · log(P(X = a))

Multiple variables:

h(X ,Y) = −∑a,b

P(X = a,Y = b) · log(P(X = a,Y = b))

Conditional entropy:

h(X |Y) = −∑a,b

P(X = a,Y = b) · log(P(X = a,Y = b)P(Y = b)

)

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 6 / 27

Page 8: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Review: Entropy

X ∼ P(x1, . . . , xn)

I Fact: h(X) ≤ log(n)I h(X) = log(n) iff P is uniform

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 7 / 27

Page 9: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Connection to Entropy

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

I Create random variable for each attribute.

x → X , y → Y , z → Z , w → W

I Let (X ,Y ,Z ,W) be uniformly distributed over true output of Q .

h(X ,Y ,Z ,W) = log∣∣∣∣Q(x, y, z,w)

∣∣∣∣exp

(h(X ,Y ,Z ,W)

)=

∣∣∣∣Q(x, y, z,w)∣∣∣∣

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 8 / 27

Page 10: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bound

∣∣∣∣Q(x, y, z,w)∣∣∣∣ = exp

(h(X ,Y ,Z ,W)

)

I Suffices to bound h(X ,Y ,Z ,W).

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 9 / 27

Page 11: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bound

h(X ,Y ,Z ,W) ≤ h(X |Y) + h(Y ,Z) + h(W |Z)

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 10 / 27

Page 12: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bound

∣∣∣∣Q(x, y, z,w)∣∣∣∣ = exp(h(X ,Y ,Z ,W)) ≤ exp(h(X |Y) + h(Y ,Z) + h(W |Z))

h(Y ,Z) ≤ log(count(cast))

h(X |Y) ≤ log(max degree(pseudonym))

h(W |Z) ≤ log(max degree(movie companies))

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 27

Page 13: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bound

∣∣∣∣Q(x, y, z,w)∣∣∣∣ = exp(h(X ,Y ,Z ,W)) ≤ exp(h(X |Y) + h(Y ,Z) + h(W |Z))

h(Y ,Z) ≤ log ccasth(X |Y) ≤ log dy

pseudo

h(W |Z) ≤ log dzmc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 27

Page 14: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Cardinality Bound

∣∣∣∣Q(x, y, z,w)∣∣∣∣ = exp(h(X ,Y ,Z ,W))

≤ exp( h(X |Y)︸ ︷︷ ︸≤log dy

pseudo

+ h(Y ,Z)︸ ︷︷ ︸≤log ccast

+ h(W |Z)︸ ︷︷ ︸≤log dz

mc

)

≤ dypseudo

· ccast · dzmc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 12 / 27

Page 15: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Many Entropic Bounds

h(X ,Y ,Z ,W) ≤ ...

h(X ,Y) + h(Z |Y) + h(W |Z)h(X ,Y) + h(Z |Y) + h(W)h(X ,Y) + h(Z ,W)h(X ,Y) + h(Z |W) + h(W)h(X |Y) + h(Y ,Z) + h(W |Z)h(X |Y) + h(Y |Z) + h(Z ,W)h(X |Y) + h(Y |Z) + h(Z |W) + h(Z)

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 13 / 27

Page 16: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

∣∣∣∣Q(x, y, z,w)∣∣∣∣ ≤ min

cpseudo · dycast · d

zmc

cpseudo · dycast · ccn

cpseudo · cmccpseudo · dw

mc · ccndypseudo

· ccast · dzmc

dypseudo

· ccast · ccndypseudo

· dzcast · cmc

dypseudo

· dzcast · d

wmc · ccn

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 14 / 27

Page 17: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Background: Cardinality Bounds

Entropic Bounds

Neat! But is it useful?I Short answer: ‘No’. (Not yet, anyway)

I Bounds still too loose (overestimation)I Need to tighten

I How to tighten? Partitioning

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 15 / 27

Page 18: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

1 Background: Cardinality Bounds

2 Tightened Cardinality Bounds

3 Evaluation

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 16 / 27

Page 19: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudonym cast movie companies company name

pseudo cast mc cn

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27

Page 20: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudonym cast movie companies company name

pseudo[0,1] pseudo[1,1]

pseudo[0,0] pseudo[1,0]

cast[0,1] cast[1,1]

cast[0,0] cast[1,0]

mc[0,1] mc[1,1]

mc[0,0] mc[1,0] cn[0]

cn[1]

I Value based hashing

I Analagous to hash join

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27

Page 21: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudonym cast movie companies company name

pseudo[0,1]

cast[1,0]

mc[0,1] cn[1]

I Value based hashing

I Analagous to hash join

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27

Page 22: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudonym cast movie companies company name

pseudo[0,1] pseudo[1,1]

pseudo[0,0] pseudo[1,0]

cast[0,1] cast[1,1]

cast[0,0] cast[1,0]

mc[0,1] mc[1,1]

mc[0,0] mc[1,0] cn[0]

cn[1]

I Q(D): query evaluated on database D

I Q(D[J]): query evaluated on parition D[J]Q(D) =

⋃J

Q(D[J])

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27

Page 23: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudonym cast movie companies company name

pseudo[0,1] pseudo[1,1]

pseudo[0,0] pseudo[1,0]

cast[0,1] cast[1,1]

cast[0,0] cast[1,0]

mc[0,1] mc[1,1]

mc[0,0] mc[1,0] cn[0]

cn[1]

I Bound each partition D[J]

I Sum will be bound on full database

Q(D) =⋃

J

Q(D[J])∣∣∣∣Q(D)∣∣∣∣ ≤∑

J

bound(Q(D[J]))

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 17 / 27

Page 24: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Partition Bounding

∣∣∣∣Q(D)∣∣∣∣ ≤ ∑

J∈{0,1}4min

cpseudo[J] · dycast[J] · d

zmc[J]

cpseudo[J] · dycast[J] · ccn[J]

cpseudo[J] · cmc[J]cpseudo[J] · dw

mc[J] · ccn[J]dypseudo[J] · ccast[J] · d

zmc[J]

dypseudo[J] · ccast[J] · ccn[J]

dypseudo[J] · d

zcast[J] · cmc[J]

dypseudo[J] · d

zcast[J] · d

wmc[J] · ccn[J]

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 18 / 27

Page 25: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tightened Cardinality Bounds

Optimizations

I Bound Formula GenerationI Partition Budgeting

I Combats exponential runtime w.r.t. hash sizeI Non-monotonic behaviour

I Filter Predicates

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 19 / 27

Page 26: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

1 Background: Cardinality Bounds

2 Tightened Cardinality Bounds

3 Evaluation

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 20 / 27

Page 27: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Join Order Benchmark1

I Built on the IMDb datasetI 113 queriesI 33 unique topoplogiesI Skew!I Correlation!I Complex selection predicates!

1How Good Are Query Optimizers, Really? Leis et al. VLDB 2015.Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 21 / 27

Page 28: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Bound Relative Error Versus Postgres Relative Error

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 22 / 27

Page 29: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Bound Q-Error Versus Postgres Q-Error

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 23 / 27

Page 30: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Plan Execution Runtime (With Foreign Keys Indexes)

Figure: Linear scale runtime improvements over JOB queries.

I Total runtime:I Postgres: 3,190 secondsI Tightened Bound: 1,832 seconds

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 24 / 27

Page 31: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Plan Execution Runtime (With Foreign Keys Indexes)

Figure: Linear scale runtime improvements over JOB queries.

I Total runtime:I Postgres: 3,190 secondsI Tightened Bound: 1,832 seconds

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 24 / 27

Page 32: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Plan Execution Runtime (No Foreign Key Indexes)

Figure: Linear scale plan execution time over JOB queries.

I Total runtime (including 1 hour cutoff for default Postgres):I Postgres: 21,125 secondsI Tightened Bound: 2,216 seconds

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 25 / 27

Page 33: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Plan Execution Runtime (No Foreign Key Indexes)

Figure: Linear scale plan execution time over JOB queries.

I Total runtime (including 1 hour cutoff for default Postgres):I Postgres: 21,125 secondsI Tightened Bound: 2,216 seconds

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 25 / 27

Page 34: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Takeaways

I Gains for very slow queries.I On par for fast queries.I Pessimistic but robust query optimization.

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 26 / 27

Page 35: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Evaluation

Acknowledgements

I Thank you to Jenny, Tomer, Laurel, Brandon, Jingjing, Tobin, Leilani,and Guna!

I This research is supported by NSF grant AITF 1535565 and III1614738.

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 27 / 27

Page 36: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Googleplus Microbenchmark Examples

SELECT COUNT(*)FROM

community_44 AS t0,community_44 AS t1,community_44 AS t2,community_44 AS t3

WHEREt0.object = t1.subject ANDt1.object = t2.subject ANDt2.object = t3.subject ANDt0.subject % 512 = 89 ANDt3.object % 512 = 174;

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 1 / 12

Page 37: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Googleplus Microbenchmark Examples

SELECT COUNT(*)FROM

community_30 AS t0,community_30 AS t1,community_30 AS t2,community_30 AS t3,community_30 AS t4

WHEREt0.object = t1.subject ANDt0.object = t2.subject ANDt0.object = t3.subject ANDt3.object = t4.subject ANDt0.subject % 256 = 49 ANDt1.object % 256 = 213 ANDt2.object % 256 = 152 ANDt4.object % 256 = 248;

AND ci.movie_id = mc.movie_id;

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 1 / 12

Page 38: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Googleplus Progressive Bound Tightness

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 2 / 12

Page 39: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Example of Non-monotonic Behavior

Q (x, y, z,w) :-R (z, y) ,S (y, z) ,T (z,w)

x y0 00 11 01 1︸ ︷︷ ︸

R

./

y z0 01 02 13 1︸ ︷︷ ︸

S

./

z w0 01 12 23 3︸ ︷︷ ︸

T

=

x y z w0 0 0 00 1 0 01 0 0 01 1 0 0︸ ︷︷ ︸

Q

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 3 / 12

Page 40: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Example of Non-monotonic Behavior

x y0 00 11 01 1︸ ︷︷ ︸

R

./

y z0 01 02 13 1︸ ︷︷ ︸

S

./

z w0 01 12 23 3︸ ︷︷ ︸

T

=

x y z w0 0 0 00 1 0 01 0 0 01 1 0 0︸ ︷︷ ︸

Q

∣∣∣Q(x, y, z,w)∣∣∣ ≤ min

cR · d

yS · d

zT

dyR · cS · dz

T

dyR · d

zS · cT

cR · cT

cR(0) · dyS(0,0) · d

zT (0) = 4 · 1 · 1 = 4

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 4 / 12

Page 41: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Example of Non-monotonic Behavior

x y0 00 11 01 1︸ ︷︷ ︸

R

./

y z0 01 02 13 1︸ ︷︷ ︸

S

./

z w0 01 12 23 3︸ ︷︷ ︸

T

=

x y z w0 0 0 00 1 0 01 0 0 01 1 0 0︸ ︷︷ ︸

Q

∣∣∣Q(x, y, z,w)∣∣∣ ≤ min

cR · d

yS · d

zT

dyR · cS · dz

T

dyR · d

zS · cT

cR · cT

cR(0) · dyS(0,0) · d

zT (0) = 4 · 1 · 1 = 4

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 4 / 12

Page 42: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Example of Non-monotonic Behavior

x y0 00 11 01 1︸ ︷︷ ︸

R

./

y z0 01 02 13 1︸ ︷︷ ︸

S

./

z w0 01 12 23 3︸ ︷︷ ︸

T

=

x y z w0 0 0 00 1 0 01 0 0 01 1 0 0︸ ︷︷ ︸

Q

∣∣∣Q(x, y, z,w)∣∣∣ ≤ min

cR · d

yS · d

zT

dyR · cS · dz

T

dyR · d

zS · cT

cR · cT

cR(0) · dyS(0,0) · d

zT (0) = 4 · 1 · 1 = 4

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 4 / 12

Page 43: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Example of Non-monotonic Behavior

I Define hash function hash(ui) = i%2.

hash(0) = hash(2) = 0

hash(1) = hash(3) = 1

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 5 / 12

Page 44: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Partitioned Relations

hash(y), hash(z) = . . .

0, 0 −→0 01 0

./ 0 0 ./0 02 2

=0 0 0 01 0 0 0

0, 1 −→0 01 0

./ 2 1 ./1 13 3

= ∅

1, 0 −→0 11 1

./ 1 0 ./0 02 2

=0 1 0 01 1 0 0

1, 1 −→0 11 1

./ 3 1 ./1 13 3

= ∅

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 6 / 12

Page 45: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

0, 0 −→0 01 0

./ 0 0 ./0 02 2

=0 0 0 01 0 0 0

0, 1 −→0 01 0

./ 2 1 ./1 13 3

= ∅

1, 0 −→0 11 1

./ 1 0 ./0 02 2

=0 1 0 01 1 0 0

1, 1 −→0 11 1

./ 3 1 ./1 13 3

= ∅

∑i,j∈{0,1}

min

cR(i) · dy

S(i,j) · dzT (j)

dyR(i) · cS(i,j) · dz

T (j)

dyR(i) · d

zS(i,j) · cT (j)

cR(i) · cT (j)

=∑i,j∈{0,1}

min

2 · 1 · 1

2 · 1 · 1

2 · 1 · 2

2 · 2

=∑i,j∈{0,1}

2 = 8

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 7 / 12

Page 46: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

0, 0 −→0 01 0

./ 0 0 ./0 02 2

=0 0 0 01 0 0 0

0, 1 −→0 01 0

./ 2 1 ./1 13 3

= ∅

1, 0 −→0 11 1

./ 1 0 ./0 02 2

=0 1 0 01 1 0 0

1, 1 −→0 11 1

./ 3 1 ./1 13 3

= ∅

∑i,j∈{0,1}

min

cR(i) · dy

S(i,j) · dzT (j)

dyR(i) · cS(i,j) · dz

T (j)

dyR(i) · d

zS(i,j) · cT (j)

cR(i) · cT (j)

=∑i,j∈{0,1}

min

2 · 1 · 1

2 · 1 · 1

2 · 1 · 2

2 · 2

=∑i,j∈{0,1}

2 = 8

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 7 / 12

Page 47: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

0, 0 −→0 01 0

./ 0 0 ./0 02 2

=0 0 0 01 0 0 0

0, 1 −→0 01 0

./ 2 1 ./1 13 3

= ∅

1, 0 −→0 11 1

./ 1 0 ./0 02 2

=0 1 0 01 1 0 0

1, 1 −→0 11 1

./ 3 1 ./1 13 3

= ∅

∑i,j∈{0,1}

min

cR(i) · dy

S(i,j) · dzT (j)

dyR(i) · cS(i,j) · dz

T (j)

dyR(i) · d

zS(i,j) · cT (j)

cR(i) · cT (j)

=∑i,j∈{0,1}

min

2 · 1 · 1

2 · 1 · 1

2 · 1 · 2

2 · 2

=∑i,j∈{0,1}

2 = 8

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 7 / 12

Page 48: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Exponential Growth

I Sketch size (number of buckets) exponential in hash size.I Exponent = number of attributes in relation.

I Number of elements to sum up exponential in hash size.I Exponent = number of attributes in entire query.

I Non-monotonic bound behavior

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 8 / 12

Page 49: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Tuning Bucket Allocation

I Larger hash size =⇒ more information =⇒ tighter bounds, right?I Partitioning unconditionally covered attributes: yes.I Partitioning conditionally covered attributes: no.I Non-monotonic tradeoff space.

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 9 / 12

Page 50: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Non-Linearity of Degree Statistic

I Count is linear with respect to disjoint union!

count(A) + count(B) = count(A ∪ B)

I Degree is not...

degree(A) + degree(B) ≥ degree(A ∪ B)

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 10 / 12

Page 51: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

pseudo cast movie companies company name

cnpseudo cast mc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 12

Page 52: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

cpseudo · dycast · d

zmc

pseudo cast movie companies company name

cnpseudo cast mc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 12

Page 53: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

cpseudo · dycast · d

zmc

pseudo cast movie companies company name

cn

pseudo[0

]

pseudo[1

]

pseudo[2

]

pseudo[3

]

cast[3]

cast[2]

cast[1]

cast[0]

mc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 12

Page 54: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

dypseudo

· ccast · dzmc

pseudo cast movie companies company name

cnpseudo cast mc

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 12

Page 55: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Q(x, y, z,w) D pseudo(x, y), cast(y, z),mc(z,w), cn(w)

dypseudo

· ccast · dzmc

pseudo cast movie companies company name

cnpseudo[0] pseudo[1]

cast[0,0] cast[1,0]

cast[0,1] cast[1,1]

mc[0]

mc[1]

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 11 / 12

Page 56: Pessimistic Query Optimization: Tighter Upper Bounds for ... · Pessimistic Query Optimization: Tighter Upper Bounds for Intermediate Join Cardinalities Walter Cai Magdalena Balazinska

Reformultated Bound Formula

I Old:

∣∣∣∣Q(D)∣∣∣∣ ≤ ∑

J∈partition indexes

minb∈

bounding formulas

b(Q(D[J]))

I New:

∣∣∣∣Q(D)∣∣∣∣ ≤ min

b∈bounding formulas

∑J∈

partition indexesw.r.t. b

b(Q(D[J]))

Cai, Balazinska, Suciu (UW) Pessimistic Query Optimization July 19th, 2019 12 / 12