data analytics using deep learning•first implementation of a query optimizer. people argued that...

150
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #09:QUERY OPTIMIZATION

Upload: others

Post on 12-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

DATA ANALYTICS

USING DEEP LEARNING

GT 8803 // FALL 2019 // JOY ARULRAJ

L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N

Page 2: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

a d m i n i s t r i v i a

• Reminders– Assignment 1: postponed to next Monday

– Sign up for discussion slots on Thursday

– Proposal presentations on next Wednesday

2

Page 3: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

L A S T C L A S S

• Query execution models– Tuple-at-a-time

– Operator-at-a-time

– Vector-at-a-time

3

SELECT A.id, B.valueFROM A, BWHERE A.id = B.idAND B.value > 100

A B

A.id=B.id

value>100

A.id, B.value

s

p

Page 4: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

L A S T C L A S S

• Access methods– Sequential scan

– Index scan

– Multi-index scan

4

101 102 103 104

Page 5: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

L A S T C L A S S

• Access methods– Sequential scan

– Index scan

– Multi-index scan

5

101 102 103 104

Scan Direction

Page 6: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

L A S T C L A S S

• Access methods– Sequential scan

– Index scan

– Multi-index scan

6

101 102 103 104

Scan Direction

Page 7: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

L A S T C L A S S

• Visual Query Execution Engine– Filtering classifier, Sampling

7

OLD PLAN

NEW PLAN

Page 8: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

T O D A Y ’ s A G E N D A

• Relational Algebra Equivalences

• Plan Cost Estimation

• Plan Enumeration

• Visual Query Optimizer

8

Page 9: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

RELATIONALALGEBRAEQUIVALENCES

9

Page 10: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

A N A T O M Y O F A D A T A B A S E S Y S T E M

Connection Manager + Admission Control

Query Parser

Query Optimizer

Query Executor

Lock Manager (Concurrency Control)

Access Methods (or Indexes)

Buffer Pool Manager

Log Manager

Memory Manager + Disk Manager

Networking Manager

10

QueryTransactional

Storage Manager

Query Processor

Shared Utilities

Process Manager

Source: Anatomy of a Database System

Page 11: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

Q U E R Y O P T I M I Z A T I O N

• Remember that SQL is declarative.– User tells the DBMS what answer they want, not

how to get the answer.

• There can be a big difference in performance

based on plan is used:– 1.3 hours vs. 0.45 seconds

11

Page 12: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

I B M S Y S T E M R

• First implementation of a query optimizer.

People argued that the DBMS could never

choose a query plan better than what a

human could write.

• A lot of the concepts from System R’s

optimizer are still used today.

12

Page 13: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

Q U E R Y O P T I M I Z A T I O N

• Rule-based Optimizer– Rewrite the query to remove inefficient things.

– Does not require a cost model.

• Cost-based Optimizer– Use a cost model to evaluate multiple equivalent

plans and pick the one with the lowest cost.

13

Page 14: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

14

Page 15: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

15

SQL Query

Parser

Page 16: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

16

SQL Query

Parser

AbstractSyntax

TreeBinder

Page 17: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

17

SQL Query

Parser

AbstractSyntax

Tree

SystemCatalog

BinderName→Internal ID

Page 18: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

18

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

SystemCatalog

Rewriter(Optional)

BinderName→Internal ID

Page 19: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

19

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

SystemCatalog

Rewriter(Optional)

Binder OptimizerAnnotated

AST

Name→Internal ID

Page 20: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

20

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

SystemCatalog

Rewriter(Optional)

Binder OptimizerAnnotated

AST

Name→Internal ID

Page 21: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

21

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

CostModel

SystemCatalog

Rewriter(Optional)

Binder OptimizerAnnotated

AST

Name→Internal ID

Page 22: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

Q U E R Y O P T I M I Z A T I O N : O V E R V I E W

22

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

Query Plan

CostModel

SystemCatalog

Rewriter(Optional)

Binder OptimizerAnnotated

AST

Name→Internal ID

Page 23: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

Q U E R Y O P T I M I Z A T I O N I S N P - H A R D

• This is the hardest part of building a DBMS.

• If you are good at this, you will get paid.

• People are starting to look at employing ML

to improve the accuracy and efficacy of

optimizers.

23

Page 24: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

• Two relational algebra expressions are

equivalent if they generate the same set of

tuples.– The DBMS can identify better query plans without

a cost model.

– This is often called query rewriting.

24

Page 25: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P R E D I C A T E P U S H D O W N

25

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 26: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P R E D I C A T E P U S H D O W N

26

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 27: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

27

πname, cid(σgrade='A'(student⋈enrolled))

πname, cid(student⋈(σgrade='A'(enrolled)))

=

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 28: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

• Selections:– Perform filters as early as possible.

– Reorder predicates so that the DBMS applies the

most selective one first.

– Break a complex predicate, and push down

σp1∧p2∧…pn(R) = σp1(σp2(…σpn(R)))

• Simplify a complex predicate – (X=Y AND Y=3) → X=3 AND Y=3

28

Page 29: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

• Projections:– Perform them early to create smaller tuples and

reduce intermediate results (if duplicates are

eliminated)

– Project out all attributes except the ones requested

or required (e.g., joining keys)

29

Page 30: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P R O J E C T I O N P U S H D O W N

30

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 31: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P R O J E C T I O N P U S H D O W N

31

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

sid,cidpsid,namep

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 32: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

32

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 33: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

33

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 34: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

34

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 35: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

35

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 36: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

36

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 37: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

37

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 38: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Impossible / Unnecessary Predicates

• Join Elimination

38

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT A1.*FROM A AS A1 JOIN A AS A2ON A1.id = A2.id;

SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 39: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Impossible / Unnecessary Predicates

• Join Elimination

39

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT A1.*FROM A AS A1 JOIN A AS A2ON A1.id = A2.id;

SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 40: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Impossible / Unnecessary Predicates

• Join Elimination

40

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;

SELECT * FROM A;

SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 41: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

41

Source: Lukas Eder

SELECT * FROM A AS A1WHERE EXISTS(SELECT * FROM A AS A2

WHERE A1.id = A2.id);

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 42: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

42

Source: Lukas Eder

SELECT * FROM A AS A1WHERE EXISTS(SELECT * FROM A AS A2

WHERE A1.id = A2.id);

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 43: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

43

Source: Lukas Eder

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 44: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Ignoring Projections

• Merging Predicates

44

Source: Lukas Eder

SELECT * FROM AWHERE val BETWEEN 1 AND 100

OR val BETWEEN 50 AND 150;

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 45: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Ignoring Projections

• Merging Predicates

45

Source: Lukas Eder

SELECT * FROM AWHERE val BETWEEN 1 AND 100

OR val BETWEEN 50 AND 150;

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 46: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M O R E E X A M P L E S

• Ignoring Projections

• Merging Predicates

46

Source: Lukas Eder

SELECT * FROM AWHERE val BETWEEN 1 AND 150;

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 47: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

• Joins:– Commutative, associative

R⋈S = S⋈R

( R⋈S )⋈T = R⋈ ( S⋈T )– How many different orderings are there for an n-

way join?

47

Page 48: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E L A T I O N A L A L G E B R A E Q U I V A L E N C E S

• How many different orderings are there for an

n-way join?

• Catalan number ≈4n

– Exhaustive enumeration will be too slow.

• We’ll see in a second how an optimizer limits

the search space.

48

Page 49: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

PLANCOSTESTIMATION

49

Page 50: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N

• How long will a query take?– CPU: Small cost; tough to estimate

– Disk: # of block transfers

– Memory: Amount of DRAM used

• How many tuples will be read/written?

• What statistics do we need to keep?

50

Page 51: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S T A T I S T I C S

• The DBMS stores internal statistics about

tables, attributes, and indexes in its internal

catalog.

• Different systems update them at different

times.

• Manual invocations:– Postgres/SQLite: ANALYZE

– SQL Server: UPDATE STATISTICS

51

Page 52: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S T A T I S T I C S

• For each relation R, the DBMS maintains the

following information:– NR: Number of tuples in R.

– V(A,R): Number of distinct values for attribute A.

52

Page 53: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

D E R I V A B L E S T A T I S T I C S

53

Page 54: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

D E R I V A B L E S T A T I S T I C S

• The selection cardinality SC(A,R) is the

average number of records with a value for an

attribute A given NR / V(A,R)

• Note that this assumes data uniformity.– 10,000 students, 10 colleges – how many students

in SCS?

54

Page 55: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S T A T I S T I C S

55

Page 56: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S T A T I S T I C S

56

SELECT * FROM people WHERE id = 123

Page 57: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S T A T I S T I C S

• Equality predicates on unique keys are easy to

estimate.

• What about more complex predicates? What

is their selectivity?

57

SELECT * FROM people WHERE id = 123

SELECT * FROM people WHERE val > 1000

SELECT * FROM people WHERE age = 30AND status = 'Lit'

Page 58: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O M P L E X P R E D I C A T E S

• The selectivity (sel) of a predicate P is the

fraction of tuples that qualify.

• Formula depends on type of predicate:– Equality

– Range– Negation

– Conjunction– Disjunction

58

Page 59: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O M P L E X P R E D I C A T E S

• The selectivity (sel) of a predicate P is the

fraction of tuples that qualify.

• Formula depends on type of predicate:– Equality

– Range– Negation

– Conjunction– Disjunction

59

Page 60: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

60

SELECT * FROM people WHERE age = 2

Page 61: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Assume that V(age,people) has five distinct

values (0–4) and NR = 5

• Equality Predicate: A=constant– sel(A=constant) = SC(P) / NR

– Example: sel(age=2) =

61

SELECT * FROM people WHERE age = 2

Page 62: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Assume that V(age,people) has five distinct

values (0–4) and NR = 5

• Equality Predicate: A=constant– sel(A=constant) = SC(P) / NR

– Example: sel(age=2) =

62

0 1 2 3 4

co

un

t

age

SELECT * FROM people WHERE age = 2

Page 63: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Assume that V(age,people) has five distinct

values (0–4) and NR = 5

• Equality Predicate: A=constant– sel(A=constant) = SC(P) / NR

– Example: sel(age=2) =

63

0 1 2 3 4

co

un

t

age

V(age,people)=5

SELECT * FROM people WHERE age = 2

Page 64: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Assume that V(age,people) has five distinct

values (0–4) and NR = 5

• Equality Predicate: A=constant– sel(A=constant) = SC(P) / NR

– Example: sel(age=2) =

64

0 1 2 3 4

co

un

t

age

V(age,people)=5

SC(age=2)=1

SELECT * FROM people WHERE age = 2

Page 65: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Assume that V(age,people) has five distinct

values (0–4) and NR = 5

• Equality Predicate: A=constant– sel(A=constant) = SC(P) / NR

– Example: sel(age=2) =

65

0 1 2 3 4

co

un

t

age

V(age,people)=5

SC(age=2)=1

SELECT * FROM people WHERE age = 2

1/5

Page 66: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Range Query:– sel(A>=a) = (Amax– a) / (Amax– Amin)

– Example: sel(age >= 2)

66

SELECT * FROM people WHERE age >= 2

Page 67: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Range Query:– sel(A>=a) = (Amax– a) / (Amax– Amin)

– Example: sel(age >= 2)

67

SELECT * FROM people WHERE age >= 2

Page 68: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Range Query:– sel(A>=a) = (Amax– a) / (Amax– Amin)

– Example: sel(age >= 2)

68

agemin = 0 agemax = 4

SELECT * FROM people WHERE age >= 2

Page 69: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Range Query:– sel(A>=a) = (Amax– a) / (Amax– Amin)

– Example: sel(age >= 2)

69

= (4–2) / (4–0)

= 1/2

agemin = 0 agemax = 4

SELECT * FROM people WHERE age >= 2

Page 70: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

70

SELECT * FROM people WHERE age != 2

Page 71: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

71

SC(age=2)=1

SELECT * FROM people WHERE age != 2

Page 72: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

72

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

Page 73: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

73

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

Page 74: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

0 1 2 3 4

co

un

t

age

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Negation Query:– sel(not P) = 1 – sel(P)

– Example: sel(age != 2)

• Observation: Selectivity ≈ Probability

74

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

Page 75: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Conjunction: – sel(P1 ⋀ P2) = sel(P1) · sel(P2)

– sel(age=2 ⋀ name LIKE 'A%')

• This assumes that the predicates are

independent.

75

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 76: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Conjunction: – sel(P1 ⋀ P2) = sel(P1) · sel(P2)

– sel(age=2 ⋀ name LIKE 'A%')

• This assumes that the predicates are

independent.

76

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 77: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Conjunction: – sel(P1 ⋀ P2) = sel(P1) · sel(P2)

– sel(age=2 ⋀ name LIKE 'A%')

• This assumes that the predicates are

independent.

77

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 78: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Disjunction: – sel(P1 ⋁ P2)

= sel(P1)+ sel(P2)–sel(P1⋁P2)

= sel(P1)+ sel(P2)–sel(P1)· sel(P2)

– sel(age=2 OR name LIKE 'A%')

• This again assumes that the

selectivities are independent.

78

P1 P2

SELECT * FROM people WHERE age = 2OR name LIKE 'A%'

Page 79: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S E L E C T I O N S – C O M P L E X P R E D I C A T E S

• Disjunction: – sel(P1 ⋁ P2)

= sel(P1)+ sel(P2)–sel(P1⋁P2)

= sel(P1)+ sel(P2)–sel(P1)· sel(P2)

– sel(age=2 OR name LIKE 'A%')

• This again assumes that the

selectivities are independent.

79

P1 P2

SELECT * FROM people WHERE age = 2OR name LIKE 'A%'

Page 80: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E S U L T S I Z E E S T I M A T I O N F O R J O I N S

• Given a join of R and S, what is the range of

possible result sizes in # of tuples?

• In other words, for a given tuple of R, how

many tuples of S will it match?

80

Page 81: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

R E S U L T S I Z E E S T I M A T I O N F O R J O I N S

• General case: Rcols⋂Scols={A} where A is not a

key for either table.– Match each R-tuple with S-tuples:

estSize ≈ NR · NS / V(A,S)

– Symmetrically, for S:

estSize ≈ NR · NS / V(A,R)

• Overall: – estSize ≈ NR · NS / max({V(A,S), V(A,R)})

81

Page 82: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N S

• Our formulas are nice but we assume that

data values are uniformly distributed.

82

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Uniform Approximation

Page 83: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N S

• Our formulas are nice but we assume that

data values are uniformly distributed.

83

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Uniform Approximation

Distinct values of attribute

# of occurrences

Page 84: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N S

• Our formulas are nice but we assume that

data values are uniformly distributed.

84

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Non-Uniform Approximation

Page 85: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N S

• Our formulas are nice but we assume that

data values are uniformly distributed.

85

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Non-Uniform Approximation

Bucket #1Count=8

Bucket #2Count=4

Bucket #3Count=15

Bucket #4Count=3

Bucket #5Count=14

Page 86: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C O S T E S T I M A T I O N S

• Our formulas are nice but we assume that

data values are uniformly distributed.

86

Bucket #1Count=8

Bucket #2Count=4

Bucket #3Count=15

Bucket #4Count=3

Bucket #5Count=14

0

5

10

15

1-3 4-6 7-9 10-12 13-15

Non-Uniform Approximation

Bucket Ranges

Page 87: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

H I S T O G R A M S W I T H Q U A N T I L E S

• A histogram type wherein the "spread" of

each bucket is same.

87

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Equi-width Histogram (Quantiles)

Page 88: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

H I S T O G R A M S W I T H Q U A N T I L E S

• A histogram type wherein the "spread" of

each bucket is same.

88

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Equi-width Histogram (Quantiles)

Bucket #1Count=12

Bucket #2Count=12

Bucket #3Count=9

Bucket #4Count=12

Page 89: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

H I S T O G R A M S W I T H Q U A N T I L E S

• A histogram type wherein the "spread" of

each bucket is same.

89

0

5

10

15

1-5 6-8 9-13 14-15

Equi-width Histogram (Quantiles)

Page 90: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

H I S T O G R A M S W I T H Q U A N T I L E S

• A histogram type wherein the "spread" of

each bucket is same.

90

0

5

10

15

1-5 6-8 9-13 14-15

Equi-width Histogram (Quantiles)

Page 91: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

91

1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

Page 92: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

92

1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

Page 93: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

93

1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 94: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

94

1 billion tuplessel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 95: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

95

1 billion tuplessel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 96: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S A M P L I N G

• Modern DBMSs also collect samples from

tables to estimate selectivities.

• Update samples when the underlying tables

changes significantly.

96

1 billion tuples1/3sel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 97: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

O B S E R V A T I O N

• Now that we can (roughly) estimate the

selectivity of predicates, what can we actually

do with them?

97

Page 98: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

PLANENUMERATION

98

Page 99: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

Q U E R Y O P T I M I Z A T I O N

• After performing rule-based rewriting, the

DBMS will enumerate different plans for the

query and estimate their costs.– Single table.

– Multiple tables.

• It chooses the best plan it has seen for the

query after exhausting all plans or some

timeout.

99

Page 100: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

S I N G L E - T A B L E Q U E R Y P L A N N I N G

• Pick the best access method.– Sequential Scan

– Binary Search (clustered indexes)

– Index Scan

• Simple heuristics are often good enough for

this.

• OLTP queries are especially easy.

100

Page 101: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

O L T P Q U E R Y P L A N N I N G

• Query planning for OLTP queries is easy

because they are sargable.– Search Argument Able

– It is usually just picking the best index.

– Joins are almost always on foreign key relationships

with a small cardinality.

– Can be implemented with simple heuristics.

101

Page 102: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

• As number of joins increases, number of

alternative plans grows rapidly– We need to restrict search space.

• Fundamental decision in System R: only left-

deep join trees are considered.– Modern DBMSs do not always make this

assumption anymore.

102

Page 103: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

• Fundamental decision in System R: Only

consider left-deep join trees.

103

A B

C

D

A B

C

D

⨝⨝

A BC D

Page 104: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

• Fundamental decision in System R: Only

consider left-deep join trees.

104

A B

C

D

A B

C

D

⨝⨝

A BC DX X

Page 105: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

• Fundamental decision in System R: Only

consider left-deep join trees.

• Allows for fully pipelined plans where

intermediate results are not written to temp

files.– Not all left-deep trees are fully pipelined.

105

Page 106: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

106

Page 107: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

M U L T I - T A B L E Q U E R Y P L A N N I N G

• Enumerate the orderings– Example: Left-deep tree #1, Left-deep tree #2…

• Enumerate the plans for each operator– Example: Hash, Sort-Merge, Nested Loop…

• Enumerate the access paths for each table– Example: Index #1, Index #2, Seq Scan…

• Use dynamic programming to reduce the

number of cost estimations.

107

Page 108: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

108

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

RST

Page 109: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

109

SortMerge JoinR.a=S.a

SortMerge JoinT.b=S.b

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a SELECT * FROM R, S, T

WHERE R.a = S.aAND S.b = T.b

RST

Page 110: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

110

SortMerge JoinR.a=S.a

SortMerge JoinT.b=S.b

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a SELECT * FROM R, S, T

WHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 400

Cost: 280

Cost: 200

RST

Page 111: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

111

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a SELECT * FROM R, S, T

WHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

RST

Page 112: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

112

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

Hash JoinS.b=T.b

SortMerge JoinS.b=T.b

SortMerge JoinS.a=R.a

Hash JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

Cost: 450

Cost: 300

Cost: 400

Cost: 380

RST

Page 113: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

113

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

Hash JoinS.b=T.b

SortMerge JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

Cost: 300

Cost: 380

RST

Page 114: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

D Y N A M I C P R O G R A M M I N G

114

Hash JoinT.b=S.b

• • •

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ TSortMerge JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 200

Cost: 300

RST

Page 115: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N E X A M P L E

• How to generate plans for

search algorithm:– Enumerate relation orderings

– Enumerate join algorithm choices

– Enumerate access method choices

• No real DBMSs does it this way.

It’s actually more messy…

115

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 116: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #1: Enumerate table orderings

116

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

Page 117: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #1: Enumerate table orderings

117

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

Prune plans with cross-products immediately!

Page 118: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #1: Enumerate table orderings

118

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

X

XPrune plans with cross-products immediately!

Page 119: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #1: Enumerate table orderings

119

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

X

XPrune plans with cross-products immediately!

Page 120: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #2: Enumerate join algorithm choices

120

R S

T

Page 121: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #2: Enumerate join algorithm choices

121

R S

T

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 122: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #2: Enumerate join algorithm choices

122

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 123: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #2: Enumerate join algorithm choices

123

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 124: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #3: Enumerate access method choices

124

R S

T

HJ

HJ

Page 125: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #3: Enumerate access method choices

125

R S

T

HJ

HJ

HJ

HJ

SeqScan SeqScan

SeqScan

HJ

HJ

SeqScan IndexScan(S.b)

SeqScan

Page 126: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

C A N D I D A T E P L A N S

• Step #3: Enumerate access method choices

126

R S

T

HJ

HJ

Do this for the other plans.

HJ

HJ

SeqScan SeqScan

SeqScan

HJ

HJ

SeqScan IndexScan(S.b)

SeqScan

Page 127: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P O S T G R E S Q U E R Y O P T I M I Z E R

• Examines all types of join trees– Left-deep, Right-deep, bushy

• Two optimizer implementations:– Traditional Dynamic Programming Approach

– Genetic Query Optimizer (GEQO)

• Postgres uses the traditional algorithm when

# of tables in query is less than 12 and

switches to GEQO when there are 12 or more.

127

Page 128: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

128

1st Generation

R S

T

NL

NL

T R

S

NL

HJ

S R

T

HJ

HJ

Page 129: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

129

1st Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

Cost:200

Cost:100

Page 130: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

130

Best:100

1st Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

Cost:200

Cost:100

Page 131: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

131

Best:100

1st Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

Page 132: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

132

Best:100

1st Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

Page 133: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

133

Best:100

1st Generation 2nd Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

Page 134: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

134

Best:100

1st Generation 2nd Generation

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

Cost:80

Cost:200

Cost:110

Page 135: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

135

1st Generation 2nd GenerationBest:80

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

Cost:80

Cost:200

Cost:110

Page 136: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

136

1st Generation 2nd GenerationBest:80

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

X

Cost:80

Cost:200

Cost:110

Page 137: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

P O S T G R E S Q U E R Y O P T I M I Z E R

137

1st Generation 2nd Generation 3rd Generation

Best:80

R S

T

NL

NLCost:3

00

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:2

00

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

X

Cost:80

Cost:200

Cost:110

R S

T

HJ

HJ

R S

T

HJ

HJ

R T

S

HJ

HJ

Cost:90

Cost:160

Cost:120

Page 138: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2018

VISUALQUERYOPTIMIZER

138

Page 139: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

V I S U A L Q U E R Y O P T I M I Z A T I O N

• Queries only contain a complex predicate

• Optimization techniques– BlazeIt (Stanford): Rule-based optimization

– PP (Microsoft Research): Cost-based optimization

139

SELECT frameID, vehType, vehColorFROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;

Page 140: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

V I S U A L Q U E R Y O P T I M I Z A T I O N

• Queries only contain a complex predicate

• Optimization techniques– BlazeIt (Stanford): Rule-based optimization

– PP (Microsoft Research): Cost-based optimization

140

SELECT frameID, vehType, vehColorFROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;

Page 141: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

B L A Z E I T : R U L E - B A S E D O P T I M I Z E R

• Example: Content-based selection for red buses.– Train a specialized NN to filter frames with buses

– But the NN may not be accurate on every frame

– Call the object detection model on uncertain frames

– To account for this error rate, it uses held-out set of

frames to estimate the selectivity and error rate.

• Given an error budget, the optimizer selects

between the filters and uses rule-based

optimization to select the fastest query plan

141

Page 142: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

B L A Z E I T : R U L E - B A S E D O P T I M I Z E R

• Example: Choosing a filter

• Consider two possible filters for redness:– F1 : A filter which returns true if the over 80% of the

pixels have a red-channel value of at least 200– F2: A filter that returns the average of the red-

channel values

142

Page 143: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

B L A Z E I T : R U L E - B A S E D O P T I M I Z E R

• In estimating thresholds at the frame-level

based on frames from the held-out set, it

learns that:– sel(F1) = 0.9 and sel(F2) = 0.3

• Which filter should it pick?– More selective filter (F2)

143

Page 144: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P P : C O S T - B A S E D O P T I M I Z E R

• Decompose a complex predicate to

expressions over simple predicates– Old: <vehType=SUV AND vehColor=red>

– New: <vehType=SUV> ∧ <vehColor=red>

• Rewrite rules (logical equivalences):– p ∧ (Prest) ⇒ Filterp

– Filterp∧q ⇒ Filterp ∧ Filterq

– Filterp∨q ⇒ Filterp ∨ Filterq

144

Page 145: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P P : C O S T - B A S E D O P T I M I Z E R

• Sort the list of available filters based on:– Filter evaluation cost (C)

– Data reduction ratio (R[Accuracy])

• Efficacy of filter = C / R[1]– A smaller ratio of cost to data reduction indicates

better performance

145

Page 146: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P P : C O S T - B A S E D O P T I M I Z E R

• Example: (p ∨ q) ∧ ¬r ∧ Prest

• ⇒p ∨ q ⇒ Fp∨q ⇒Fp∨ Fq

• ⇒¬r ⇒ F¬r

• ⇒ F(p∨q)∧¬r ⇒ (Fp∨ Fq) ∧ F¬r

• ⇒ F(p∧¬r)∨(q∧¬r) ⇒ Fp∧¬r ∨ Fq∧¬r

⇒ (Fp ∧ F¬r) ∨ (Fq ∧ F¬r)

146

Page 147: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P P : C O S T - B A S E D O P T I M I Z E R

• Pruning search space– Limit the number of different filters to be a small

constant (k)

• Example:– Available filters: {Fp∨q, Fp, Fp∧¬r, Fq∧¬r, Fq, F¬r}

– Query requirements: {Fp∨q, Fp∨ Fq, F¬r, (Fp∨ Fq) ∧ F¬r, Fp∧¬r ∨ Fq∧¬r }

– k = 2

– Candidate plans: {Fp∨q, F¬r, Fp∧¬r ∨ Fq∧¬r}

147

Page 148: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P P : C O S T - B A S E D O P T I M I Z E R

• Plan Enumeration– First, explore different allocations of the query’s

accuracy budget to individual filters.

– Next, explore different orderings of filters within a

conjunction or disjunction.

• Cost Estimation– Finally, after fixing both the accuracy thresholds

and the order of filters, compute the cost and

reduction rate of the resulting plan.

148

Page 149: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

P A R T I N G T H O U G H T S

• Filter as early as possible.

• Filter selectivity estimations– Uniformity, Independence, Histograms

• Dynamic programming for join orderings

• Again, query optimization is super important.

149

Page 150: DATA ANALYTICS USING DEEP LEARNING•First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. •A

GT 8803 // Fall 2019

N E X T L E C T U R E

• Convolutional neural networks– Popular neural network architecture

150