a top-down approach to achieving performance …mozafari/php/data/uploads...a top-down approach to...

47
A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang , Barzan Mozafari, Grant Schoenebeck and Thomas F. Wenisch University of Michigan

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

A Top-Down Approach to Achieving�Performance Predictability in

Database SystemsJiamin Huang, Barzan Mozafari, Grant Schoenebeck and Thomas F. Wenisch

University of Michigan

Page 2: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

0E+00

3E+08

5E+08

8E+08

1E+09

Mean Latency Standard Deviation 99th Percentile

Tim

e (n

anos

econ

d)

10x

2x

MySQL running TPC-C benchmark at a fixed rate

Performance Predictability in Today’s DBMS

By focusing too much on raw performancewe have neglected predictability

Page 3: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Why Does Predictability Matter?•  Latency-sensitive applications

•  Provisioning

•  SLA guarantees

•  Tuning

•  Interactive applications

•  User-experience

Page 4: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Example: Provisioning & SLAs

00.020.040.060.080.1

0.120.140.16

10 30 50 70 90 110 130 150 170 190 210

Late

ncy

(ms)

Number of cores

Mean Latency 99th Percentile

Latency: 0.016TPS: 62.5k

Page 5: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Example: Provisioning & SLAs

00.020.040.060.080.1

0.120.140.16

10 30 50 70 90 110 130 150 170 190 210

Late

ncy

(ms)

Number of cores

Mean Latency 99th Percentile

Latency: 0.075msTPS:13k

Latency: 0.016TPS: 62.5k Latency: 0.016

TPS: 62.5k

Page 6: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

What is Performance Predictability?•  Performance Variance:

1.  Inherent (External): varying amounts of work, network problems, …

2.  Avoidable (Internal): due to internal artifacts of the DBMS (algorithms, data structures, …)

Page 7: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Two Approaches to Achieve Predictability•  Bottom-up: build a new DBMS from scratch

•  Once an academic prototype, always an academic prototype

•  Sacrifice performance for predictability

•  Top-down: identify root causes of unpredictability and mitigate them

•  Goal: do not compromise performance

•  Benefit: adoption is “no-brainer”

•  Challenge: today’s DBMSs are extremely complex

Page 8: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Key Questions1.  How to identify sources of variance?

2.  What makes today’s DBMSs unpredictable?

3.  How to achieve perf. predictability?

4.  How effective are our techniques?

Page 9: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Key Questions1.  How to identify sources of variance?

2.  What makes today’s DBMSs unpredictable?

3.  How to achieve perf. predictability?

4.  How effective are our techniques?

Page 10: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Identifying Root Causes of Performance Variance

•  Profiling tools: critical for diagnosing perf. problems in modern software

•  Existing profilers focus on average performance

•  DTrace, gprof, perf, etc.

•  Breakdown of avg. performance of DBs done before•  “OLTP through the looking glass, and what we found there” [SIGMOD’08]

•  Need a new profiler capable of breaking down perf. variance → TProfiler

Page 11: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

TProfiler•  Goal: Pinpoint root causes of performance variance in large

and complex codebases of today’s DBMS

770K lines of code

1.5M lines of code

1.9M lines of code

Q: How to find the root causes of

performance variance efficiently and accurately?

Page 12: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Our Solution: Variance Trees

process_query

execute_queryparse_query send_result

Call Graph

T(process_query)

T(execute_query)T(parse_query) T(send_result)

≡ Overall Latency

T(f): Execution time of function f

Latency Break Down

+ +

Page 13: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Our Solution: Variance Trees•  If , then:

Page 14: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Our Solution: Variance Trees

T(process_query)

T(execute_query)T(parse_query) T(send_result)

Var(process_query)

Var(execute_query)Var(parse_query) Var(send_result)

Cov(parse_query,execute_query)

Cov(parse_query,send_result)

Cov(execute_query,send_result)

Latency Break Down

Variance Break Down

Variance Tree

≡ Overall Latency Variance

Page 15: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Efficiency•  Observation: most nodes are actually insignificant

•  Do not build a complete variance tree!

•  Build variance tree iteratively and selectively

1.  Tree expansion: break down variance of selected functions (process_query at the beginning)

2.  Node selection: select significant* nodes from the tree

3.  User inspection: users inspect selected functions, and decide whether to further investigate

* See paper for details

Page 16: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Key Questions1.  How to identify sources of variance?

2.  What makes today’s DBMSs unpredictable?

3.  How to achieve perf. predictability?

4.  How effective are our techniques?

Page 17: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Case Studies•  Used TProfiler to analyze 3 popular (both

traditional and modern) DBMSs

Page 18: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Setup•  Application: MySQL 5.6.23

•  Hardware: Intel Xeon E5 2.1GHz

•  Workload: TPC-C

•  128 Warehouses, 30GB Buffer Pool

Page 19: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Root Causes of Performance Unpredictability in MySQL

•  With 37 iterations, 6 mins manual inspection time each, out of 30K functions

Function Name Contribution to Overall Latency

Variance

os_event_wait[A] 37.5%

os_event_wait[B] 21.7%

buf_pool_mutex_enter 32.92%

Transactions waiting for locks on data objectsSame function, different call sites

Waiting for lock on the→ buffer pool before updating the list of buffer pages

Page 20: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Key Questions1.  How to identify sources of variance?

2.  What makes today’s DBMSs unpredictable?

3.  How to achieve perf. predictability?

4.  How effective are our techniques?

Page 21: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Mitigating Performance Variance1.  Changing the implementation

•  Parallel Logging

2.  Changing the algorithm•  VATS, LLU

3.  Changing the tuning parameters•  Buffer pool size, redo log flush policy, etc.

Page 22: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Mitigating Performance Variance1.  Changing the implementation

•  Parallel Logging

2.  Changing the algorithm•  VATS, LLU

3.  Changing the tuning parameters•  Buffer pool size, redo log flush policy, etc.

Page 23: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Latency Variance Caused by Queuing

Min Queueing TimeMax Queueing Time

Average Queueing Time

L T1 T2 T3 T4 T5

Page 24: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Our Insight: Look at the Big Picture

L1 T1 T2

L2

T1T2L3

T1T2T4

T4

Page 25: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

VATS: Variance Aware Transaction Scheduling Algorithm

L1 T1 T2

L2

T1T2L3

T1T2T4

T4

VATS grants locks according to transactions’ arrival time in the system, not in the queue (earliest first)

Page 26: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LRU Ordering of Buffer Pages

P1 P2 P3 P4 P5

List of buffer pages

Page 27: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LRU Ordering of Buffer Pages

P1 P2 P3 P4 P5

P4 is accessed

Page 28: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LRU Ordering of Buffer Pages

P1 P2 P3 P4 P5

The whole list is locked

Place where variance occurs

Page 29: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LRU Ordering of Buffer Pages

P4 P1 P2 P3 P5

P4 is moved to the head

Page 30: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LRU Ordering of Buffer Pages

P4 P1 P2 P3 P5

Solution: Use a lazy page update algorithm (LLU)

Lock is released

Page 31: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Variance-aware Tuning•  buf_pool_mutex_enter – buffer pool size

•  33%

•  66%

•  100%

Page 32: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Key Questions1.  How to identify sources of variance?

2.  What makes today’s DBMSs unpredictable?

3.  How to achieve perf. predictability?

4.  How effective are our techniques?

Page 33: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

VATS Improvement

0

1

2

3

4

5

6

7

Variance Mean Latency

Impr

ovem

ent (

x)

•  189 lines of code changed in MySQL

Page 34: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

LLU Improvement

00.20.40.60.8

11.21.41.61.8

Variance Mean Latency

Impr

ovem

ent (

x)

•  46 lines of code changed in MySQL

Page 35: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Buffer Pool Size Tuning

0123456789

10

Variance Mean Latency

Impr

ovem

ent (

33%

/Bu

ffer P

ool S

ize)

66% 100%

Page 36: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Real-world Adoption•  TProfiler open-sourced

•  VATS has been merged into MySQL distributions (default in MariaDB and staged in Oracle MySQL)

•  2M+ installations in the world

•  Our buffer pool problem independently discovered and fixed in MySQL 5.8.0

Page 37: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Conclusion•  Predictability is an increasingly critical dimension of modern

software overlooked in today’s DBMSs

•  TProfiler identifies root causes of perf. variance in a principled fashion

•  Enable local and surgical changes to complex DBMS codebases

•  Lock waiting is major source of perf. variance in today’s DBMSs

•  Variance-aware scheduling, lazy optimizations, and tuning strategies dramatically improve predictability w/o sacrificing raw performance

Page 38: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Backup Slides

Page 39: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Definition of Predictability•  Many ways to capture perf. predictability

•  Minimize latency variance or tail latencies

•  Bound latency variance or tail latencies

•  Minimize the (stdev / mean) ratio

•  Our focus: identifying source of latency variance

•  Reducing variance without sacrificing mean latency

Page 40: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Node Selection Example

Var(write_logs)

Var(write_to_buffer)

80%· Var(Latency)

10%· Var(Latency)

Larger contribution

Var(lock_buffer) Var(unlock_buffer)

2%· Var(Latency)60%· Var(Latency)

Page 41: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Node Selection Example

Var(write_logs)

Var(write_to_buffer)

80%· Var(Latency)

10%· Var(Latency)

The lower in the variance tree, the more specific

Var(lock_buffer) Var(unlock_buffer)

2%· Var(Latency)60%· Var(Latency)↑

More specific

Page 42: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Manual Efforts

Application SemanticInterval

Annotation

# ofTProfiler

Runs

Avg. ManualInspection Time

per Run

Modified Lines

of Code

MySQL 9 lines of code 37 6 minutes 235Postgres 7 lines of code 16 10 minutes 355Httpd 4 lines of code 17 12 minutes 45

Page 43: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Related Work: DARC•  Uses multiple runs to produce latency histograms

•  Can find man contributors of latency in each execution time range

≠ Main contributors of latency variance in a semantic interval

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

write_logs

write unlocklock

0-1sec 1-10sec 10-20sec

Page 44: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

1. Tree ExpansionVar(process_query)

Set the root to the variance of the toplevel function for query processing

Root Creation

Page 45: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Var(process_query)

Var(execute_query)Var(parse_query) Cov(parse_query,execute_query)

Break down the root and expand the variance tree

1. Tree Expansion

Page 46: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Var(process_query)

Var(execute_query)Var(parse_query) Cov(parse_query,execute_query)

Select the most “informative” nodes from the tree

informative = large-enough value + deep-enough in the tree

2. Node Selection

VarianceContribution

Specificity

Page 47: A Top-Down Approach to Achieving Performance …mozafari/php/data/uploads...A Top-Down Approach to Achieving Performance Predictability in Database Systems Jiamin Huang, Barzan Mozafari,

Var(process_query)

Var(execute_query)Var(parse_query) Cov(parse_query,execute_query)

User Inspection

3. User Inspection

•  Ask for user inspection when:

1.  Cov terms are large

•  Study how to de-correlate the two functions

2.  Var terms are both large and deep

•  If cause is still unclear, repeat the expand-select-inspect process