usenix atc 2017: visualizing performance with flame graphs
TRANSCRIPT
![Page 1: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/1.jpg)
Visualizing Performance with Flame Graphs
Brendan Gregg Senior Performance Architect
Jul 2017
2017 USENIX Annual Technical Conference
![Page 2: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/2.jpg)
VisualizeCPU-meconsumedbyallso5ware
Kernel
Java
User-level
![Page 3: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/3.jpg)
Agenda
1.CPUFlamegraphs
2.FixingStacks&Symbols 3.Advancedflamegraphs
![Page 4: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/4.jpg)
Takeaways
1. InterpretCPUflamegraphs
2. UnderstandpiHallswithstacktracesandsymbols
3. DiscoveropportuniKesforfuturedevelopment
![Page 5: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/5.jpg)
![Page 6: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/6.jpg)
CaseStudy
Exception handling consuming CPU
![Page 7: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/7.jpg)
CPUPROFILINGSummary
![Page 8: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/8.jpg)
CPUProfiling
AB
block interrupt
on-CPU off-CPU
ABA A
BA
syscall
time
• Record stacks at a timed interval: simple and effective – Pros: Low (deterministic) overhead – Cons: Coarse accuracy, but usually sufficient
stack samples: A
![Page 9: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/9.jpg)
StackTraces
• Acodepathsnapshot.e.g.,fromjstack(1):
$ jstack 1819
[…]
"main" prio=10 tid=0x00007ff304009000 nid=0x7361
runnable [0x00007ff30d4f9000]
java.lang.Thread.State: RUNNABLE
at Func_abc.func_c(Func_abc.java:6)
at Func_abc.func_b(Func_abc.java:16)
at Func_abc.func_a(Func_abc.java:23)
at Func_abc.main(Func_abc.java:27)
running parent g.parent g.g.parent
![Page 10: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/10.jpg)
SystemProfilers• Linux
– perf_events(aka"perf")
• OracleSolaris– DTrace
• OSX– Instruments
• Windows– XPerf,WPA(whichnowhasflamegraphs!)
• Andmanyothers…
![Page 11: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/11.jpg)
Linuxperf_events• StandardLinuxprofiler
– Providestheperfcommand(mulK-tool)– Usuallypkgaddedbylinux-tools-common,etc.
• Manyeventsources:– Timer-basedsampling– Hardwareevents– Tracepoints– Dynamictracing
• Cansamplestacksof(almost)everythingonCPU– CanmisshardinterruptISRs,buttheseshouldbenear-zero.Theycanbemeasuredifneeded(Iwrote
myowntools).
![Page 12: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/12.jpg)
perfProfiling# perf record -F 99 -ag -- sleep 30[ perf record: Woken up 9 times to write data ][ perf record: Captured and wrote 2.745 MB perf.data (~119930 samples) ]# perf report -n -stdio[…]# Overhead Samples Command Shared Object Symbol# ........ ............ ....... ................. .............................# 20.42% 605 bash [kernel.kallsyms] [k] xen_hypercall_xen_version | --- xen_hypercall_xen_version check_events | |--44.13%-- syscall_trace_enter | tracesys | | | |--35.58%-- __GI___libc_fcntl | | | | | |--65.26%-- do_redirection_internal | | | do_redirections | | | execute_builtin_or_function | | | execute_simple_command[… ~13,000 lines truncated …]
call tree summary
![Page 13: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/13.jpg)
FullperfreportOutput
![Page 14: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/14.jpg)
…asaFlameGraph
![Page 15: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/15.jpg)
FlameGraphSummary• VisualizesacollecKonofstacktraces
– x-axis:alphabeKcalstacksort,tomaximizemerging– y-axis:stackdepth– color:random(default),oradimension
• CurrentlymadefromPerl+SVG+JavaScript– hBps://github.com/brendangregg/FlameGraph– Takesinputfrommanydifferentprofilers– MulKpled3versionsarebeingdeveloped
• References:– hcp://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html– hcp://queue.acm.org/detail.cfm?id=2927301– "TheFlameGraph"CACM,June2016
![Page 16: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/16.jpg)
FlameGraphInterpretaKon
a()
b() h()
c()
d()
e() f()
g()
i()
![Page 17: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/17.jpg)
FlameGraphInterpretaKon(1/3)Topedgeshowswhoisrunningon-CPU,andhowmuch(width)
a()
b() h()
c()
d()
e() f()
g()
i()
![Page 18: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/18.jpg)
FlameGraphInterpretaKon(2/3)
h()
d()
e()
i()
a()
b()
c()
f()
g()
Top-downshowsancestrye.g.,fromg():
![Page 19: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/19.jpg)
FlameGraphInterpretaKon(3/3)
a()
b() h()
c()
d()
e() f()
g()
i()
WidthsareproporKonaltopresenceinsamplese.g.,comparingb()toh()(incl.children)
![Page 20: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/20.jpg)
Mixed-ModeFlameGraphs• Hues:
– green==JIT(eg,Java)– aqua==inlined
• ifincluded
– red==user-level*– orange==kernel– yellow==C++
• Intensity:– Randomizedto
differenKateframes– Orhashedon
funcKonname
Java JVM (C++)
Kernel Mixed-Mode
C
*newpaleceusesredforkernelmodulestoo
![Page 21: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/21.jpg)
DifferenKalFlameGraphs• Hues:
– red==moresamples– blue==lesssamples
• Intensity:– Degreeofdifference
• Comparestwoprofiles• Canshowother
metrics:e.g.,CPI• Othertypesexist
– flamegraphdiff
Differential
more less
![Page 22: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/22.jpg)
IcicleGraph
top (leaf) merge
![Page 23: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/23.jpg)
FlameGraphSearch• Color:magenta
toshowmatchedframes
search button
![Page 24: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/24.jpg)
FlameCharts
• Flame charts: x-axis is time • Flame graphs: x-axis is population (maximize merging)
• Final note: these are useful, but are not flame graphs
fromChromedevtools
![Page 25: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/25.jpg)
STACKTRACINGPiHallsandfixes
![Page 26: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/26.jpg)
BrokenStackTracesareCommonBecause:
A. ProfilersuseframepointerwalkingbydefaultB. Compilersreusetheframepointerregisterasageneralpurposeregister:a
(usuallyverysmall)performanceopKmizaKon.
# perf record –F 99 –a –g – sleep 30# perf script[…]java 4579 cpu-clock: 7f417908c10b [unknown] (/tmp/perf-4458.map)
java 4579 cpu-clock: 7f41792fc65f [unknown] (/tmp/perf-4458.map) a2d53351ff7da603 [unknown] ([unknown])[…]
![Page 27: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/27.jpg)
…asaFlameGraph
Broken Java stacks (missing frame pointer)
![Page 28: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/28.jpg)
FixingStackWalkingA. Framepointer-based
– FixbydisablingthatcompileropKmizaKon:gcc's-fno-omit-frame-pointer– Pros:simple,supportedbymanytools– Cons:mightcostalicleextraCPU
B. Debuginfo(DWARF)walking– Cons:costsdiskspace,andnotsupportedbyallprofilers.EvenpossiblewithJIT?
C. JITrunKmewalkers– Pros:includemoreinternals,suchasinlinedframes– Cons:limitedtoapplicaKoninternals-nokernel
D. Lastbranchrecord
![Page 29: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/29.jpg)
FixingJavaStackTraces# perf script[…]java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm… 7fd301861e46 [unknown] (/tmp/perf-8131.map) 7fd30184def8 [unknown] (/tmp/perf-8131.map) 7fd30174f544 [unknown] (/tmp/perf-8131.map) 7fd30175d3a8 [unknown] (/tmp/perf-8131.map) 7fd30166d51c [unknown] (/tmp/perf-8131.map) 7fd301750f34 [unknown] (/tmp/perf-8131.map) 7fd3016c2280 [unknown] (/tmp/perf-8131.map) 7fd301b02ec0 [unknown] (/tmp/perf-8131.map) 7fd3016f9888 [unknown] (/tmp/perf-8131.map) 7fd3016ece04 [unknown] (/tmp/perf-8131.map) 7fd30177783c [unknown] (/tmp/perf-8131.map) 7fd301600aa8 [unknown] (/tmp/perf-8131.map) 7fd301a4484c [unknown] (/tmp/perf-8131.map) 7fd3010072e0 [unknown] (/tmp/perf-8131.map) 7fd301007325 [unknown] (/tmp/perf-8131.map) 7fd301007325 [unknown] (/tmp/perf-8131.map) 7fd3010004e7 [unknown] (/tmp/perf-8131.map) 7fd3171df76a JavaCalls::call_helper(JavaValue*,… 7fd3171dce44 JavaCalls::call_virtual(JavaValue*… 7fd3171dd43a JavaCalls::call_virtual(JavaValue*… 7fd31721b6ce thread_entry(JavaThread*, Thread*)… 7fd3175389e0 JavaThread::thread_main_inner() (/… 7fd317538cb2 JavaThread::run() (/usr/lib/jvm/nf… 7fd3173f6f52 java_start(Thread*) (/usr/lib/jvm/… 7fd317a7e182 start_thread (/lib/x86_64-linux-gn…
# perf script[…]java 4579 cpu-clock: 7f417908c10b [unknown] (/tmp/…
java 4579 cpu-clock: 7f41792fc65f [unknown] (/tmp/… a2d53351ff7da603 [unknown] ([unkn…[…]
IprototypedJVMframepointers.OraclerewroteitandaddedittoJavaas-XX:+PreserveFramePointer(JDK8u60b19)
![Page 30: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/30.jpg)
FixedStacksFlameGraph
Java stacks (but no symbols, yet)
![Page 31: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/31.jpg)
Inlining
• Manyframesmaybemissing(inlined)– FlamegraphmaysKllmakeenoughsense
• Inliningcanoqenbebetuned– e.g.Java's-XX:-Inlinetodisable,butcanbe80%slower– Java's-XX:MaxInlineSizeand-XX:InlineSmallCodecanbetuned
alicletorevealmoreframes:canevenimproveperformance!
• RunKmescanun-inlineondemand– SothatexcepKonstacktracesmakesense– e.g.Java'sperf-map-agentcanun-inline(unfoldallopKon)
![Page 32: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/32.jpg)
StackDepth• perfhada127framelimit• NowtunableinLinux4.8
– sysctl-wkernel.perf_event_max_stack=512– ThanksArnaldoCarvalhodeMelo!
AJavamicroservicewithastackdepth
of>900brokenstacks
perf_event_max_stack=1024
![Page 33: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/33.jpg)
SYMBOLSFixing
![Page 34: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/34.jpg)
FixingNaKveSymbolsA. Adda-dbgsympackage,ifavailableB. Recompilefromsource
![Page 35: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/35.jpg)
FixingJITSymbols(Java,Node.js,…)• Just-in-KmerunKmesdon'thaveapre-compiledsymboltable• SoLinuxperflooksforanexternallyprovidedJITsymbol
file:/tmp/perf-PID.map
• ThiscanbecreatedbyrunKmes;eg,Java'sperf-map-agent
# perf scriptFailed to open /tmp/perf-8131.map, continuing without symbols[…]java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm… 7fd301861e46 [unknown] (/tmp/perf-8131.map)[…]
![Page 36: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/36.jpg)
Java Mixed-Mode Flame Graph
FixedStacks&Symbols
Java JVM
Kernel
GC
![Page 37: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/37.jpg)
Stacks&Symbols(zoom)
![Page 38: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/38.jpg)
SymbolChurn• ForJITrunKmes,symbolscanchangeduringaprofile• Symbolsmaybemistranslatedbyperf'smapsnapshot• SoluKons:
A. Takeabefore&aqersnapshot,andcompareB. perf'snewsupportforKmestampedsymbollogs
![Page 39: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/39.jpg)
Containers• perfcan'tfindanysymbolsources
– Unlessyoucopythemintothehost
• I'mtesKngKristerJohansen'sfix,hopefullyforLinux4.13– lkml:"[PATCHKp/perf/core0/7]namespacetracingimprovements"
![Page 40: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/40.jpg)
INSTRUCTIONSForLinux
![Page 41: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/41.jpg)
LinuxCPUFlameGraphsLinux2.6+,viaperf.dataandperfscript:
Linux4.5+canusefoldedoutput
– SkipstheCPU-costlystackcollapse-perf.plstep;see:hcp://www.brendangregg.com/blog/2016-04-30/linux-perf-folded.html
Linux4.9+,viaBPF:
– Mostefficient:noperf.datafile,summarizesin-kernel
git clone --depth 1 https://github.com/brendangregg/FlameGraphcd FlameGraphperf record -F 99 -a –g -- sleep 30perf script | ./stackcollapse-perf.pl |./flamegraph.pl > perf.svg
git clone --depth 1 https://github.com/brendangregg/FlameGraphgit clone --depth 1 https://github.com/iovisor/bcc./bcc/tools/profile.py -dF 99 30 | ./FlameGraph/flamegraph.pl > perf.svg
![Page 42: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/42.jpg)
perf record
perf script
capturestacks
writetext
stackcollapse-perf.pl
flamegraph.pl
perf.data
writesamples
readssamples
foldedoutput
perf record
perf report –g folded
capturestacks
foldedreport
awk
flamegraph.pl
perf.data
writesamples
readssamples
foldedoutput
Linux4.5countstacks(BPF)
foldedoutput
flamegraph.pl
profile.py
Linux4.9
LinuxProfilingOpKmizaKonsLinux2.6
![Page 43: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/43.jpg)
Language/RunKmeInstrucKons• Eachmayhavespecialstack/symbolinstrucKons
– Java,Node.js,Python,Ruby,C++,Go,…
• I'mdocumenKngsomeon:– hcp://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html– AlsotryanInternetsearch
![Page 44: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/44.jpg)
GUIAutomaKon
Flame Graphs
Eg,NeHlixVector(self-serviceUI):
Shouldbeopensourced;youmayalsobuild/buyyourown
![Page 45: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/45.jpg)
ADVANCEDFLAMEGRAPHSFutureWork
![Page 46: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/46.jpg)
FlamegraphscanbegeneratedforstacktracesfromanyLinuxeventsource
![Page 47: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/47.jpg)
PageFaults• Showwhattriggeredmainmemory(resident)togrow:
• "fault"as(physical)mainmemoryisallocatedon-demand,whenavirtualpageisfirstpopulated
• Lowoverheadtooltosolvesometypesofmemoryleak
# perf record -e page-faults -p PID -g -- sleep 120
RES column in top(1) grows because
![Page 48: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/48.jpg)
OtherMemorySources
hcp://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
![Page 49: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/49.jpg)
ContextSwitches• ShowwhyJavablockedandstoppedrunningon-CPU:
• IdenKfieslocks,I/O,sleeps– Ifcodepathshouldn'tblockandlooksrandom,it'saninvoluntarycontextswitch.Icouldfilterthese,butyoushould
havesolvedthembeforehand(CPUload).
• e.g.,wasusedtounderstandframeworkdifferences:
# perf record -e context-switches -p PID -g -- sleep 5
vs
rxNetty Tomcat
futex
sys_poll
epoll futex
![Page 50: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/50.jpg)
DiskI/ORequests• ShowswhoissueddiskI/O(syncreads&writes):
• e.g.:pagefaultsinGC?ThisJVMhasswappedout!:# perf record -e block:block_rq_insert -a -g -- sleep 60
GC
![Page 51: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/51.jpg)
TCPEvents• TCPtransmit,usingdynamictracing:
• Note:canbehighoverheadforhighpacketrates– Forthecurrentperftrace,dump,post-processcycle
• CanalsotraceTCPconnect&accept– Lowerfrequency,thereforeloweroverhead
• TCPreceiveisasync– Couldtraceviasocketread
# perf probe tcp_sendmsg# perf record -e probe:tcp_sendmsg -a -g -- sleep 1; jmaps# perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > out.stacks# perf probe --del tcp_sendmsg
TCP sends
![Page 52: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/52.jpg)
CPUCacheMisses• Inthisexample,samplingviaLastLevelCacheloads:• -cisthecount(samples
oncepercount)• UseotherCPUcountersto
samplehits,misses,stalls
# perf record -e LLC-loads -c 10000 -a -g -- sleep 5; jmaps# perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso > out.stacks
![Page 53: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/53.jpg)
CPIFlameGraph• CyclesPerInstrucKon
– red==instrucKonheavy– blue==cycleheavy
(likelymemorystallcycles)
zoomed:
![Page 54: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/54.jpg)
Off-CPUAnalysis
Off-CPUanalysisisthestudyofblockingstates,orthecode-path(stacktrace)thatledtothesestates
![Page 55: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/55.jpg)
Off-CPUTimeFlameGraph
Moreinfohcp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
Stack depth Off-CPU time
![Page 56: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/56.jpg)
Off-CPUTime(zoomed):tar(1)
file read from disk
directory read from disk
Onlyshowingkernelstacksinthisexample
pipe write path read from disk
fstat from disk
![Page 57: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/57.jpg)
CPU+Off-CPUFlameGraphs:SeeEverything
hcp://www.brendangregg.com/flamegraphs.html
CPU
Off-CPU
![Page 58: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/58.jpg)
Off-CPUTime(zoomed):gzip(1)
The off-CPU stack trace often doesn't show the root cause of latency. What is gzip blocked on?
![Page 59: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/59.jpg)
Off-WakeTimeFlameGraph
UsesLinuxenhancedBPFtomergeoff-CPUandwakerstackinkernelcontext
![Page 60: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/60.jpg)
Off-WakeTimeFlameGraph(zoomed)Wakertask
Wakerstack
Blockedstack
Blockedtask
StackDirecKon
Wokeup
![Page 61: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/61.jpg)
ChainGraphs
Walkingthechainofwakeupstackstoreachrootcause
![Page 62: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/62.jpg)
HotColdFlameGraphsIncludesbothCPU&Off-CPU(orchain)stacksinoneflamegraph• However,Off-CPUKmeoqen
dominates:threadswaiKngorpolling
hcp://www.brendangregg.com/FlameGraphs/hotcoldflamegraphs.html
![Page 63: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/63.jpg)
FlameGraphDiff
hcps://github.com/corpaul/flamegraphdiff
![Page 64: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/64.jpg)
Takeaways1. InterpretCPUflamegraphs2. UnderstandpiHallswithstacktracesandsymbols3. DiscoveropportuniKesforfuturedevelopment
![Page 65: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/65.jpg)
Links&References• FlameGraphs
– "TheFlameGraph"Communica-onsoftheACM,Vol.56,No.6(June2016)– hcp://queue.acm.org/detail.cfm?id=2927301– hcp://www.brendangregg.com/flamegraphs.html->hBp://www.brendangregg.com/flamegraphs.html#Updates– hcp://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html– hcp://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html– hcp://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html– hcp://techblog.neHlix.com/2015/07/java-in-flames.html– hcp://techblog.neHlix.com/2016/04/saving-13-million-computaKonal-minutes.html– hcp://techblog.neHlix.com/2014/11/nodejs-in-flames.html– hcp://www.brendangregg.com/blog/2014-11-09/differenKal-flame-graphs.html– hcp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html– hcp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html– hcp://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html– hcp://corpaul.github.io/flamegraphdiff/
• Linuxperf_events– hcps://perf.wiki.kernel.org/index.php/Main_Page– hcp://www.brendangregg.com/perf.html
• NeHlixVector– hcps://github.com/neHlix/vector– hcp://techblog.neHlix.com/2015/04/introducing-vector-neHlixs-on-host.html
![Page 66: USENIX ATC 2017: Visualizing Performance with Flame Graphs](https://reader034.vdocuments.mx/reader034/viewer/2022050613/5a672c387f8b9a453d8b4c71/html5/thumbnails/66.jpg)
Thank You
– QuesKons?– hcp://www.brendangregg.com– hcp://slideshare.net/brendangregg– [email protected]– @brendangregg
Nexttopic:PerformanceSuperpowerswithEnhancedBPF
2017 USENIX Annual Technical Conference