debugging and profiling .net core apps on linux
TRANSCRIPT
ThePlan
• Thisisatalkondebuggingandprofiling.NETCoreappsonLinux—yes,it’sprettycrazythatwegotthatfar!• You’lllearn:qToprofileCPUactivityin.NETCoreappsqTovisualizestacktraces(e.g.ofCPUsamples)usingflamegraphsqTouseLinuxtracingtoolswith.NETCoreprocessesqTocapture.NETCoreruntimeeventsusingLTTngqTogenerateandanalyzecoredumpsof.NETCoreapps
🎓 Disclaimer
• Alotofthisstuffischangingmonthly,ifnotweekly• Thetoolsdescribedheresortof workfor.NETCore1.1and.NETCore2.0,butyourmileagemayvary• SomeofthisreliesonscriptsIhackedtogether,andwillhopefullybeofficiallysupportedinthefuture
ToolsAndOperatingSystemsSupported
Linux Windows macOSCPUsampling perf,BCC ETW Instruments,dtrace
Dynamictracing perf,SystemTap,BCC ⛔ dtraceStatic tracing LTTng ETW ⛔
Dumpgeneration core_pattern,gcore Procdump,WER kern.corefile,gcoreDumpanalysis lldb VisualStudio,WinDbg lldb
Thistalk
⚠MindTheOverhead
• Anyobservationcanchangethestateofthesystem,butsomeobservationsareworsethanothers• Diagnostictoolshaveoverhead• Checkthedocs• Tryonatestsystemfirst• Measuredegradationintroducedbythetool
OVERHEADThis traces variouskernelpagecachefunctionsandmaintainsin-kernelcounts,whichareasynchronouslycopiedtouser-space.Whiletherateofoperationscanbeveryhigh(>1G/sec)wecanhaveupto34%overhead,thisisstillarelativelyefficientwaytotracetheseevents,andsotheoverheadisexpectedtobesmallfornormalworkloads. Measureinatestenvironment.
—man cachestat (fromBCC)
Samplingvs.Tracing
• Sampling worksbygettingasnapshotoracallstackeveryNoccurrencesofaninterestingevent• Formostevents,implementedinthePMUusingoverflowcountersandinterrupts
• Tracing worksbygettingamessageoracallstackateveryoccurrenceofaninterestingevent
CPUtimepid 121 pid 121 pid 408 pid 188
systemtimepid 121 pid 408
CPUsample
diskwrite
.NETCoreonLinuxTracingArchitecture
kernel
user.NETCoreapp
CoreCLR
OSlibraries
CPU
EventSource events
CLRevents(LTTng)
Probes,USDT
Tracepoints(scheduler,blockI/O,networking)
“Softwareevents”(coremigrations,pagefaults)
PMUevents(clockcycles,branches,LLCmisses)
TheOfficialStory:perfcollect andPerfView
1. Downloadperfcollect2. Installprerequisites:./perfcollect install3. Runcollection:./perfcollect collect mytrace4. Copythemytrace.zip filetoaWindowsmachine😳5. DownloadPerfView😁6. OpenthetraceinPerfView😱
perf
• perf isaLinuxmulti-toolforperformanceinvestigations• Capableofbothtracingandsampling• Developedinthekerneltree,mustmatchrunningkernel’sversion
• Debian-based: apt install linux-tools-common• RedHat-based: yum install perf
perf_events Architecture
UserKernel
tcp_msgsend
Pagefault
LLCmiss
sched_switch
perf_events
libc:malloc
mmap buffer
mmap buffer
Real-timeanalysis
perf script…perf.data file
GettingDebugInformation
Type DebuginformationsourceSyS_write Kernel /proc/kallsyms__write Native Debuginfo packageSystem.IO.SyncText… Managed(AOT) Crossgen*System.Console.WriteLine Managed(AOT) Crossgen*MyApp.Program.Foo Managed(JIT) /tmp/perf-$PID.mapMyApp.Program.Main Managed(JIT) /tmp/perf-$PID.mapExecuteAssembly Native(CLR) Debuginfo packageorsourcebuildCorExeMain Native(CLR) Debuginfo packageorsourcebuild__libc_start_main Native Debuginfo package
FlameGraphs
• Avisualizationmethod(adjacencygraph),veryusefulforstacktraces,inventedbyBrendanGregg• http://www.brendangregg.com/flamegraphs.html
• Turnsthousandsofstacktracepagesintoasingleinteractivegraph• Examplescenarios:• IdentifyCPUhotspotsonthesystem/application• Showstacksthatperformheavydiskaccesses• Findthreadsthatblockforalongtimeandthestackwheretheydoit
ReadingaFlameGraph
• Eachrectangleisafunction• Y-axis:caller-callee• X-axis:sortedstacks(nottime)
• Widerframesaremorecommon• Supportszoom,find• Filterwithgrep😎
TheOldWayAndTheNewWay:BPFkernel
u{,ret}probe:malloc
perf_events perf.data
user
perf | awk | …
reportbytes # distribution0 - 1 … |@@@@ |1 – 2 … |@ |2 - 4 … |@@@@@@@@ |
kernelu{,ret}probe:
malloc
BPFprogram BPFmap
user
controlprogram
reportbytes # distribution0 - 1 … |@@@@ |1 – 2 … |@ |2 - 4 … |@@@@@@@@ |
user
libc:malloc
user
libc:malloc
TheBCCBPFFront-End
• https://github.com/iovisor/bcc• BPFCompilerCollection(BCC)isaBPFfrontendlibraryandamassivecollectionofperformancetools• ContributorsfromFacebook,PLUMgrid,Netflix,Sela
• HelpsbuildBPF-basedtoolsinhigh-levellanguages• Python,Lua,C++
kernel
user
BCCtool BCCtool …
BCCcompilerfrontend
Clang+LLVM
BCCloaderlibrary
BPFruntime eventsources
Managedruntimes
Syscall interface
BlockI/O EthernetScheduler Mem
Devicedrivers
Filesystem TCP/IP
CPUApplications
Systemlibraries
profilellcstat
hardirqssoftirqsttysnoop
runqlatcpudist
offcputimeoffwaketimecpuunclaimed
memleakoomkill
slabratetoptcptoptcplife
tcpconnecttcpaccept
biotopbiolatencybiosnoopbitesize
filetopfilelifefileslowervfscountvfsstatcachestatcachetopmountsnoop*fsslower*fsdistdcstatdcsnoopmdflush
execsnoopopensnoopkillsnoopstatsnoopsyncsnoopsetuidsnoop
mysqld_qslowerbashreadlinedbslowerdbstat
mysqlsniff
memleaksslsniff
gethostlatencydeadlock_detector
ustatugc
uthreadsucallsuflow
argdisttrace
funccountfunclatencystackcount
Demo:Tracing.NETCoreApps1$ make build && make run2$ make getstatsbench3$ make getstatsrecord 3$ make alloccount 3$ make allocstacks2$ make catsbench3$ make catsrecord2
LTTng Architecture
UserKernel
SyS_write
SyS_read LLTngkernelmodule
myapp:myprobe
buffer lttng --live
babeltraceCTFfile
buffer
Demo:CapturingRuntimeEvents1$ make build && make run2$ make getstatsbench3$ make gcrecord 3$ make gcview | grep… 3$ make gcallocstats
GeneratingCoreDumps
• /proc/sys/kernel/core_pattern configuresthecorefilenameorapplicationtoprocessthecrash• ulimit -c controlsmaximumcorefilesize(often0bydefault)• gcore (partofgdb)cancreateacoredumpondemand
libsosplugin.so
(lldb) plugin load …/libsosplugin.so(lldb) setclrpath …
DumpObjDumpArrayDumpHeapGCRootPrintException
ThreadsClrStackEEHeapDumpDomainDumpAssembly
Demo:DumpGenerationAndAnalysis1$ make dockersvc && make dockerrun2$ make update3$ make updatelogs 3$ make updateanalyze
Checklist:PreparingYourEnvironment
qexport COMPlus_PerfMapEnabled=1qAOTperfmapwithcrossgenqDebuginfo packageforlibcoreclr,libcqInstallperf/BCCtoolsqexport COMPlus_EnableEventLog=1qulimit -c unlimited (ormanagedbysystem)qInstallgdb (forgcore),lldb-3.x
Summary
• Wehavelearned:üToprofileCPUactivityin.NETCoreappsüTovisualizestacktraces(e.g.ofCPUsamples)usingflamegraphsüTouseLinuxtracingtoolswith.NETCoreprocessesüTocapture.NETCoreruntimeeventsusingLTTngüTogenerateandanalyzecoredumpsof.NETCoreapps
References
• perf andflamegraphs• https://perf.wiki.kernel.org/index.php/Main_Page• http://www.brendangregg.com/perf.html• https://github.com/brendangregg/perf-tools
• .NETCorediagnosticsdocs• https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md
• https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md
• Myblogposts• http://blogs.microsoft.co.il/sasha/2017/02/26/analyzing-a-net-core-core-dump-on-linux/
• http://blogs.microsoft.co.il/sasha/2017/02/27/profiling-a-net-core-application-on-linux/
• http://blogs.microsoft.co.il/sasha/2017/03/30/tracing-runtime-events-in-net-core-on-linux/
• BCCtutorials• https://github.com/iovisor/bcc/blob/master/docs/tutorial.md
• https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
• https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md