se2017 query-optimizer

33

Upload: mary-prokhorova

Post on 22-Jan-2018

25 views

Category:

Technology


2 download

TRANSCRIPT

About Me

• Denis Reznik

• Kyiv, Ukraine

• Data Architect at Intapp, Inc.

• Microsoft Data Platform MVP

• Co-Founder of Ukrainian Data Community Kyiv (PASS Chapter)

• PASS Regional Mentor, Central and Eastern Europe

• Co-author of “SQL Server MVP Deep Dives vol. 2”

Why We Need a Query Optimizer?

T1

23

4

9

6

112

8

1

T2

1

9

4

4

112

112

112

Complexity?

O(𝑁2)

Why We Need a Query Optimizer?

T1

23

4

9

6

112

8

1

T2

1

9

4

4

112

112

112

Complexity?

O(N)

John Dow

John Smith

2

3

1

4

0

John SmithJohn Snow

Hashtable

John Dow

John Snow

John Smith

2

3

1

4

0

John Dow

Hash Function

020

Why We Need a Query Optimizer?

T1

1

4

3

7

5

6

2

T2

1

1

2

2

2

2

4

Complexity?

O(𝑁 ⋅ log𝑁)

Why We Need a Query Optimizer?

T1

1

2

3

4

5

6

7

T2

1

1

2

2

2

2

4

Complexity?

O(𝑁)

What If?

• We have non-fixed amount of tables

• Data filtering required

• Data was changed

• We more complex logic to implement

• Hardware changed

• Etc…

Parsing Optimizing Executing

Query Processing

Parsing Optimizer Executor

Query Processing

Simple SelectId Name

1 Superman

2 Wonder Woman

3 Deadpool

4 Batman

5 Wolverine

6 Spider-Man

7 Darth Vader

SELECT * FROM Users u

Heap

1 .. 100

100 .. 1k

5K .. 6K

1K .. 5K

6K .. 7K

15K .. 21K

12K .. 15K

10K .. 11K

21K .. 22K

22K .. 41K

9K .. 10K

41K .. 51K

7K .. 8K

8K .. 9K

71K .. 1M

51K .. 71K

1M .. 2M

2M .. 3M

Clustered Index

1 .. 1M

1 .. 2K 2K+1 .. 4K 1M-2K .. 1M

1 .. 300 301 .. 800 801 .. 1,5K 1,5K+1 .. 2K

More Complex SelectId Name

1 Superman

2 Wonder Woman

3 Deadpool

4 Batman

5 Wolverine

6 Spider-Man

7 Darth Vader

SELECT * FROM Users uWHERE Name = 'Batman'

Index Seek

1 .. 1M

1 .. 2K 2K+1 .. 4K 1M-2K .. 1M

1 .. 300 301 .. 800 801 .. 1,5K 1,5K+1 .. 2K

SELECT * FROM UsersWHERE Id = 523

Index Scan

1 .. 1M

1 .. 2K 2K+1 .. 4K 1M-2K .. 1M

1 .. 300 301 .. 800 801 .. 1,5K 1,5K+1 .. 2K

SELECT * FROM Users

Non-Clustered Index

A .. Z

A .. C C .. K X .. Z

1 .. 1M

1 .. 2K 2K+1 .. 4K 1M-2K .. 1M

SELECT * FROM UsersWHERE Name = 'John Dow'

1 .. 2K 2K .. 4K 1M-2K .. 1M

Clustered Index (Id)

Non-Clustered Index (Name)

Heap

DEMOIndexes

Selecting Several Tables

SELECT * FROM Posts pINNER JOIN Users u

ON p.OwnerUserId = u.Id

Query Plan Alternatives

Users

Posts

Posts

Users

Statistics

500

1000

10

1200

800

1 800 2000 2800 4500 5400

SELECT * FROM UsersWHERE Id BETWEEN 2100 AND 2500SELECT * FROM UsersWHERE Id BETWEEN 200 AND 5000

Joins – Nested LoopsId = 1

Id = 2

Id = 3

Id = 4

UserId = 1

UserId = 4

UserId = 5

UserId = 1

UserId = 3

UserId = 4

Users

Badges

Joins – Hash JoinId = 1

Id = 2

Id = 3

Id = 4

UserId = 1

UserId = 4

UserId = 5

UserId = 1

UserId = 3

UserId = 4

Clients

Work

0

1

2

3

0

0

3

3

10

2

Joins – Merge JoinId = 1

Id = 2

Id = 3

Id = 4

UserId = 1

UserId = 4

UserId = 5

UserId = 1

UserId = 3

UserId = 4

Users

Badges

DEMOStatistics

Cost Model

• CPU

• IO

• Memory

• Operators Overhead

Parallel Query Execution

• Amdal’s Law

Thread 1

Thread 2

Thread 3

Thread 4

1s2s

Query Plan Alternatives

• 1 Table – 1 option

• 2 Tables – 2 options

• 3 Tables – 6 options

• 4 Tables – 24 options

• …

• 10 Tables – 3628800 options

• 1 Table – 1!

• 2 Tables – 2!

• 3 Tables – 3!

• 4 Tables – 4!

• …

• 10 Tables – 10!

Exploring Search Space

• JOIN(A,B,C,D)

• JOIN(A,B,D,C)

• JOIN(A,C,D,B)

• JOIN(A,D,B,C)

• JOIN(A,D,C,B)

• JOIN(B,A,C,D)

• JOIN(B,A,D,C)

• …

• O(N!)

• JOIN(A,B,C,D)

• A – Optimal Access Path

• B – Optimal Access Path

• … pruning

• (A,B) – Optimal Access Path

• … pruning

• (A,B),(C)

• … pruning

• O(𝑁2𝑛−1)

Quite Complex Query

SELECT * FROM Posts pINNER JOIN Users u

ON p.OwnerUserId = u.IdWHERE PostedUserName LIKE 'B%'GROUP BY u.Id, u.NameHAVING COUNT(*) > 1ORDER BY u.Name

DEMOHow we can help Query Optimizer

Summary

• Query Optimization

• Cost Model

• Search Space

• Knowledge of Internals are Important

Thank You!

Denis Reznik

Twitter: @denisreznik

Email: [email protected]

Blog: http://reznik.uneta.com.ua

Facebook: https://www.facebook.com/denis.reznik.5

LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234