benchmarking (devnexus 2015)
TRANSCRIPT
Today-
• How-Not-to-Write-Benchmarks-• Benchmark-Setup-&-Results:-- -You’re-wrong-about-machines-- -You’re-wrong-about-stats-- -You’re-wrong-about-what-maKers-
• Becoming-Less-Wrong-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-environment-
Web-Request-
Server-
S3-Cache-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-environment-
Web-Request-
Server-
S3-Cache-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-environment-
Web-Request-
Server-
S3-Cache-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-environment-
Web-Request-
Server-
S3-Cache-
Wrong-About-the-Machine-
• Cache,-cache,-cache,-cache!-• Warmup-&-Bming-• Periodic-interference-• Test-!=-Prod-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-environment-
Web-Request-
Server-
S3-Cache-
Wrong-About-the-Machine-
• Cache,-cache,-cache,-cache!-• Warmup-&-Bming-• Periodic-interference-• Test-!=-Prod-• Power-mode-changes-
Power-Modes-
$-cat-/sys/devices/system/cpu/*/cpufreq/scaling_governor-
“ondemand”-OR-“performance”--Current-CPU-frequencies:-$-grep-"MHz"-/proc/cpuinfo-
Wrong-About-Stats-
0-
20-
40-
60-
80-
100-
120-
0- 10- 20- 30- 40- 50- 60-
Latency$
Time$
Convergence$of$Median$on$Samples$
Stable-Samples-
Stable-Median-
Decaying-Samples-
Decaying-Median-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-machine-
Web-Request-
Server-
S3-Cache-
Website-Serving-Images-
• Access-1-image-1000-Bmes-• Latency-measured-for-each-access-• Start-measuring-immediately-• 3-runs-• Find-mean-• Dev-machine-
Web-Request-
Server-
S3-Cache-
Coordinated-Omission-
0-
request-
response-
request-
response-10-
request-
20- 30- 40- 50- 60- 70- 80-
response-
Bme-
request-
response-
request-
“Programmers-waste-enormous-amounts-of-Bme-thinking-about-…-the-speed-of-noncriBcal-parts-of-their-programs-...-Forget-about-small-efficiencies-…97%-of-the-Bme:-premature$opHmizaHon$is$the$root$of$all$evil.-Yet-we-should-not-pass-up-our-opportuniBes-in-that-criBcal-3%.”--
pp-Donald-Knuth-
Wrong-About-What-MaKers-
• Premature-opBmizaBon-• UnrepresentaBve-workloads-• Memory-pressure-• Hidden-components-
Wrong-About-What-MaKers-
• Premature-opBmizaBon-• UnrepresentaBve-workloads-• Memory-pressure-• Hidden-components-• Reproducibility-of-measurements-
perf-#-Various-basic-CPU-staBsBcs,-system-wide,-for-10-seconds-perf-stat-pe-cycles,instrucBons,cachepmisses-pa-sleep-10-
#-Count-system-calls-for-the-enBre-system,-for-5-seconds-perf-stat-pe-'syscalls:sys_enter_*'-pa-sleep-5-
#-Sample-CPU-stack-traces,-once-every-10,000-Level-1-data-cache-misses,-for-5-seconds-perf-record-pe-L1pdcacheploadpmisses-pc-10000-pag-pp-sleep-5-
hKp://www.brendangregg.com/perf.html-
gprof:-Where-Does-It-Spend-Its-Time?-
• Compile-with-profiling--• Execute-the-code--• Run-the-gprof-
hKp://www.thegeekstuff.com/2012/08/gprofptutorial/-
Microbenchmarking:-Blessing-&-Curse-
+ Quick-&-cheap-+ Answers-narrow-?s-well-- O|en-misleading-results-- Not-representaBve-of-the-program-
Microbenchmarking:-Blessing-&-Curse-
• Choose-your-N-wisely-• Measure-side-effects-• Beware-of-clock-resoluBon-
Microbenchmarking:-Blessing-&-Curse-
• Choose-your-N-wisely-• Measure-side-effects-• Beware-of-clock-resoluBon-• Dead-Code-EliminaBon-
Microbenchmarking:-Blessing-&-Curse-
• Choose-your-N-wisely-• Measure-side-effects-• Beware-of-clock-resoluBon-• Dead-Code-EliminaBon-• Constant-work-per-iteraBon-
What-Should-a-Benchmark-Do?-Measures-behavior-of-system--
Represents-realisBc-workload--
Runs-for-sufficiently-long-Bme--
Compares-in-the-same-context--
Predictable-and-reproducible-results-
Followpup-Material-• How$NOT$to$Measure$Latency$by-Gil-Tene-
– hKp://www.infoq.com/presentaBons/latencyppi}alls-• Taming$the$Long$Latency$Tail-on-highscalability.com-
– hKp://highscalability.com/blog/2012/3/12/googleptamingptheplongplatencyptailpwhenpmorepmachinespequal.html-
• Performance$Analysis$Methodology$by-Brendan-Gregg-– hKp://www.brendangregg.com/methodology.html-
• Silverman’s$Mode$Detec@on$Method-by-MaK-Adereth-– hKp://adereth.github.io/blog/2014/10/12/silvermanspmodepdetecBonp
methodpexplained/-• How$Not$To$Measure$System$Performance-by-James-Bornholt$
– hKps://homes.cs.washington.edu/~bornholt/post/performancepevaluaBon.html-
• Trust$No$One,$Not$Even$Performance$Counters-by-Paul-Khuong$– hDp://www.pvk.ca/Blog/2014/10/19/performancePop@misa@onP~Pwri@ngPanP
essay/#trustPnoPone$