benchmarking (devnexus 2015)

Benchmarking:-You’re Doing It Wrong

Aysylu-Greenberg-

--------@aysylu22---

To-Write-Good-Benchmarks…-

Need-to-be-Full-Stack-

--

your-process-vs-goal-your-process-vs-best-pracBces-

-

Benchmark-=-How-Fast?-

Today-

•  How-Not-to-Write-Benchmarks-•  Benchmark-Setup-&-Results:--  -You’re-wrong-about-machines--  -You’re-wrong-about-stats--  -You’re-wrong-about-what-maKers-

•  Becoming-Less-Wrong-

HOW$NOT$TO$WRITE$BENCHMARKS$

Website-Serving-Images-

•  Access-1-image-1000-Bmes-•  Latency-measured-for-each-access-•  Start-measuring-immediately-•  3-runs-•  Find-mean-•  Dev-environment-

Web-Request-

Server-

S3-Cache-

WHAT’S$WRONG$WITH$THIS$BENCHMARK?$$

YOU’RE$WRONG$ABOUT$THE$MACHINE$$

Wrong-About-the-Machine-

•  Cache,-cache,-cache,-cache!-

It’s-Caches-All-The-Way-Down-

Web-Request-

Server-

S3-Cache-

It’s-Caches-All-The-Way-Down-

Prefetching:-Program-

Prefetching:-Disabled-

Prefetching:-Enabled-

Caches-in-Benchmarks-Prof.-Saman-Amarasinghe,-MIT-2009--



Web-Request-

Server-

S3-Cache-


•  Cache,-cache,-cache,-cache!-•  Warmup-&-Bming-



Web-Request-

Server-

S3-Cache-


•  Cache,-cache,-cache,-cache!-•  Warmup-&-Bming-•  Periodic-interference-

Periodic-Interference-



Web-Request-

Server-

S3-Cache-


•  Cache,-cache,-cache,-cache!-•  Warmup-&-Bming-•  Periodic-interference-•  Test-!=-Prod-



Web-Request-

Server-

S3-Cache-


•  Cache,-cache,-cache,-cache!-•  Warmup-&-Bming-•  Periodic-interference-•  Test-!=-Prod-•  Power-mode-changes-

Power-Modes-

$-cat-/sys/devices/system/cpu/*/cpufreq/scaling_governor-

“ondemand”-OR-“performance”--Current-CPU-frequencies:-$-grep-"MHz"-/proc/cpuinfo-

YOU’RE$WRONG$ABOUT$THE$STATS$$

Wrong-About-Stats-

•  Too-few-samples--

Wrong-About-Stats-

0-

20-

40-

60-

80-

100-

120-

0- 10- 20- 30- 40- 50- 60-

Latency$

Time$

Convergence$of$Median$on$Samples$

Stable-Samples-

Stable-Median-

Decaying-Samples-

Decaying-Median-


•  Access-1-image-1000-Bmes-•  Latency-measured-for-each-access-•  Start-measuring-immediately-•  3-runs-•  Find-mean-•  Dev-machine-

Web-Request-

Server-

S3-Cache-

Wrong-About-Stats-

•  Too-few-samples-•  Gaussian-(not)-


•  Access-1-image-1000-Bmes-•  Latency-measured-for-each-access-•  Start-measuring-immediately-•  3-runs-•  Find-mean-•  Dev-machine-

Web-Request-

Server-

S3-Cache-

Wrong-About-Stats-

•  Too-few-samples-•  Gaussian-(not)-•  MulBmodal-distribuBon-

MulBmodal-DistribuBon-

50%-99%-

#-occurren

ces-

Latency- 5-ms- 10-ms-

MulBmodal-DistribuBon-

Wrong-About-Stats-

•  Too-few-samples-•  Gaussian-(not)-•  MulBmodal-distribuBon-•  Outliers-

Coordinated-Omission-

0-

request-

response-

request-

response-10-

request-

20- 30- 40- 50- 60- 70- 80-

response-

Bme-

request-

response-

request-

Wrong-About-Stats-

•  Too-few-samples-•  Gaussian-(not)-•  MulBmodal-distribuBon-•  Outliers-

YOU’RE$WRONG$ABOUT$WHAT$MATTERS$$

Wrong-About-What-MaKers-

•  Premature-opBmizaBon-

“Programmers-waste-enormous-amounts-of-Bme-thinking-about-…-the-speed-of-noncriBcal-parts-of-their-programs-...-Forget-about-small-efficiencies-…97%-of-the-Bme:-premature$opHmizaHon$is$the$root$of$all$evil.-Yet-we-should-not-pass-up-our-opportuniBes-in-that-criBcal-3%.”--

pp-Donald-Knuth-


•  Premature-opBmizaBon-•  UnrepresentaBve-workloads-


•  Premature-opBmizaBon-•  UnrepresentaBve-workloads-•  Memory-pressure-


•  Premature-opBmizaBon-•  UnrepresentaBve-workloads-•  Memory-pressure-•  Hidden-components-


•  Premature-opBmizaBon-•  UnrepresentaBve-workloads-•  Memory-pressure-•  Hidden-components-•  Reproducibility-of-measurements-

BECOMING$LESS$WRONG$

User-AcBons-MaKer--

X->-Y-for-workload-Z-with-trade-offs-A,-B,-and-C-

p-hKp://www.toomuchcode.org/-

Profiling--

Profiling-

perf-

gprof-&-Oprofile-

YourKit-&-jProfiler- jVisualVM-

cProfile-

perf-#-Various-basic-CPU-staBsBcs,-system-wide,-for-10-seconds-perf-stat-pe-cycles,instrucBons,cachepmisses-pa-sleep-10-

#-Count-system-calls-for-the-enBre-system,-for-5-seconds-perf-stat-pe-'syscalls:sys_enter_*'-pa-sleep-5-

#-Sample-CPU-stack-traces,-once-every-10,000-Level-1-data-cache-misses,-for-5-seconds-perf-record-pe-L1pdcacheploadpmisses-pc-10000-pag-pp-sleep-5-

hKp://www.brendangregg.com/perf.html-

perf-

hKp://www.brendangregg.com/perf.html-

Profiling-

perf-

gprof-&-Oprofile-


cProfile-

gprof:-Where-Does-It-Spend-Its-Time?-

•  Compile-with-profiling--•  Execute-the-code--•  Run-the-gprof-

hKp://www.thegeekstuff.com/2012/08/gprofptutorial/-

gprof:-Where-Does-It-Spend-Its-Time?-

hKp://www.thegeekstuff.com/2012/08/gprofptutorial/-

Profiling-

perf-

gprof-&-Oprofile-


cProfile-

hKp://www.brendangregg.com/linuxperf.html-

Profiling-

perf-

gprof-&-Oprofile-


cProfile-

Profiling-Code-instrumentaBon-Aggregate-over-logs-Traces--

Microbenchmarking:-Blessing-&-Curse-

+ Quick-&-cheap-+ Answers-narrow-?s-well-- O|en-misleading-results-- Not-representaBve-of-the-program-


•  Choose-your-N-wisely--

Choose-Your-N-Wisely-Prof.-Saman-Amarasinghe,-MIT-2009--


•  Choose-your-N-wisely-•  Measure-side-effects-


•  Choose-your-N-wisely-•  Measure-side-effects-•  Beware-of-clock-resoluBon-


•  Choose-your-N-wisely-•  Measure-side-effects-•  Beware-of-clock-resoluBon-•  Dead-Code-EliminaBon-


•  Choose-your-N-wisely-•  Measure-side-effects-•  Beware-of-clock-resoluBon-•  Dead-Code-EliminaBon-•  Constant-work-per-iteraBon-

NonpConstant-Work-Per-IteraBon-

What-Should-a-Benchmark-Do?-Measures-behavior-of-system--

Represents-realisBc-workload--

Runs-for-sufficiently-long-Bme--

Compares-in-the-same-context--

Predictable-and-reproducible-results-

Followpup-Material-•  How$NOT$to$Measure$Latency$by-Gil-Tene-

–  hKp://www.infoq.com/presentaBons/latencyppi}alls-•  Taming$the$Long$Latency$Tail-on-highscalability.com-

–  hKp://highscalability.com/blog/2012/3/12/googleptamingptheplongplatencyptailpwhenpmorepmachinespequal.html-

•  Performance$Analysis$Methodology$by-Brendan-Gregg-–  hKp://www.brendangregg.com/methodology.html-

•  Silverman’s$Mode$Detec@on$Method-by-MaK-Adereth-–  hKp://adereth.github.io/blog/2014/10/12/silvermanspmodepdetecBonp

methodpexplained/-•  How$Not$To$Measure$System$Performance-by-James-Bornholt$

–  hKps://homes.cs.washington.edu/~bornholt/post/performancepevaluaBon.html-

•  Trust$No$One,$Not$Even$Performance$Counters-by-Paul-Khuong$–  hDp://www.pvk.ca/Blog/2014/10/19/performancePop@misa@onP~Pwri@ngPanP

essay/#trustPnoPone$

Takeaway-#1:-Cache-

Takeaway-#2:-Outliers-

Takeaway-#3:-Workload-

Benchmarking:-You’re Doing It Wrong

Aysylu-Greenberg-@aysylu22-

benchmarking (devnexus 2015)

Software