deep into your applications, performance & profiling

Post on 14-Apr-2017

197 Views

Category:

Engineering

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DEEP INTO YOURAPPLICATION ...PERFORMANCE & PROFILING

/ Fabien Arcellier @farcellier

ABOUT @FARCELLIERTechnical Architect, Developer, Life-long learner at

Favourite subject : Devops,Performance & Software craftmanship

Octo Technology

WHAT'S THE MENUWhat means profiling a application ?How does it works ?Apply on real world application memcached

PROFILING IN A FEW WORDS ...

Software profiling is a form of dynamicprogram analysis that measures, forexample :

the space or time complexity of aprogramthe usage of particular instructionsthe frequency and duration of functioncalls, ...

calls, ...@copyright wikipedia

TO GET THIS SORT OF REPORT ...

TO HAVE A BETTER VIEW ON WHAT'SHAPPENS ON YOUR HARDWARE, ...

@copyright highscalability

TO IMPROVE YOUR APPLICATIONPERFORMANCE, ...

@copyright macifcourseaularge

You need measurements to improve continuously yourapplication performance.

TO UNDERSTAND YOUR

TO UNDERSTAND YOURAPPLICATION, ...

You want to understand what is consuming your CPU.

TO MONITOR YOUR SERVER, ...Flame Graph Search

app__libc_start_mainmain

dotmat_mul

You want to understand what your CPUs are doing.

AT THE BEGINNING THERE IS APROGRAM ...

int main(void) return 0;

int func1(void) return 0;

Use gcc to compile itgcc ­c app.c ­o app

WITH A SIMPLE SYMBOLS TABLE ...readelf ­ Displays information about ELF files

readelf ­s app

45: 0000000000400580 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini46: 00000000004004f8 11 FUNC GLOBAL DEFAULT 13 func1...57: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 25 _end58: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 _start59: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main...

00000000004004ed : Virtual address of the symbolFUNC : type.main : Name of the symbol

HOW IT WORKS ?

60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main

CAPTURE EVENTS AND ASSOCIATETHEM TO SYMBOLS

Generally we can list 3 type of profilers :

Instrumented profilingSampling profilingEvent-based profiling (Java, .Net, ...)

INSTRUMENTED PROFILINGGprof, Callgrind, ...

ProCapture all eventsGranularity

ConsSlower than raw execution (20 times slower forcallgrind)Intrusive (modify code assembly or emulate a virtualprocessor)What they capture and what they show could differs

TOOLING - CALLGRINDCallgrind is a callgraph analyzer that comes with Valgrind.Valgrind is a virtual machine using just-in-time (JIT)compilation techniques.

EXAMPLE WITH A MATRIX CALCULUS

You can instrument your execution with callgrind andexplore on kcachegrind.

SAMPLING PROFILINGPerf, Oprofile, Intel Vtune, ...

Pro~5 or 10% slower than raw executionRun on any code

ConsSome events are invisible

SANDBOX - WRITE MY OWNSAMPLING PROFILER

To understand how simple a sampling profiler is, write yourown thread dump using gdb.

gstack() tmp=$(tempfile) echo thread apply all bt >"$tmp" gdb ­batch ­nx ­q ­x "$tmp" ­p "$1" rm ­f "$tmp"

You execute with frequency to know where your program isspending time

while sleep 1; do gstack @pid@ ; done

TOOLING - PERF & FLAMEGRAPHPerf instrumentation appears on linux 2.6+ (Ubuntu 11.10& redhat 6)common interface for hardware counterFlamegraph is actively developped by Brendan Gregg

EXAMPLE WITH A MATRIX CALCULUSFlame Graph

app__libc_start_mainmain

dotmat_mul

We don't have any time record on mat_new, even if it'scalled 3 times.

FLAMEGRAPH INSTALLATIONgit clone https://github.com/brendangregg/FlameGraph.gitsudo ln ­s $PWD/flamegraph.pl /usr/bin/flamegraph.plsudo ln ­s $PWD/stackcollapse­perf.pl /usr/bin/stackcollapse­perf.plsudo ln ­s $PWD/stackcollapse­jstack.pl /usr/bin/stackcollapse­jstack.plsudo ln ­s $PWD/stackcollapse­gdb.pl /usr/bin/stackcollapse­gdb.pl

WHAT'S HAPPENDS INSIDEMEMCACHE ?

COMPILE MEMCACHEgit clone https://github.com/memcached/memcached.gitcd memcached./configure && make

WHAT'S HIDDEN INSIDE MEMCACHEBINARY ?

readelf ­s ./memcached

...434: 000000000040edf0 10 FUNC GLOBAL DEFAULT 13 slabs_rebalancer_resume435: 0000000000000000 0 FUNC GLOBAL DEFAULT UND setuid@@GLIBC_2436: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_base_loop437: 0000000000412fd0 315 FUNC GLOBAL DEFAULT 13 pause_threads438: 00000000004135e0 10 FUNC GLOBAL DEFAULT 13 STATS_LOCK439: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getaddrinfo@@GLIBC_2440: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strerror@@GLIBC_2441: 000000000040f550 201 FUNC GLOBAL DEFAULT 13 do_item_unlink442: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_init443: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@@GLIBC_2444: 0000000000412b40 247 FUNC GLOBAL DEFAULT 13 assoc_delete...

WHAT'S HAPPENS WHEN I WRITE 100RECORD ON MEMCACHE

Doing a test with valgrind (not production friendly)Capture cpu usage with gdbCapture cpu usage with perf_eventCapture cache miss with perf_event

MEMCACHE - PROFILING WITHCALLGRIND

Understand what's happen internally by following executiontrace.

valgrind ­­tool=callgrind ­­instr­atstart=no ./memcached

On another terminalcallgrind_control ­i onphp memcache­set.phpcallgrind_control ­i off

MEMCACHE - PROFILING WITHCALLGRIND

kcachegrind callgrind.out.@pid@

MEMCACHE - PROFILING WITH GDB./memcached &

while sleep 0.1; do gstack 8748; done > stack.txtcat stack.txt | stackcollapse­gdb.pl | flamegraph.pl > gdb_graph.svg

In an another terminalphp memcache­set.php

MEMCACHE - PROFILING WITH PERFWe capture events to build callgraph

perf record ­g ./memcached

In an another terminalphp memcache­set.php

To show an interactive reportperf reportperf report ­­stdio

MEMCACHE - PROFILING CPU CYCLEWITH PERF

perf script | stackcollapse­perf.pl | flamegraph.pl > graph_stack_missing.svg

Flamegraph

Some information from kernel are missing.

MEMCACHED - PROFILING CPUCYCLE WITH PERF - WITH KERNEL

STACKTRACE./memcached &sudo perf record ­a ­g ­p @pid@

In an another terminalphp memcache­set.php

Generate the flamegraphperf script | stackcollapse­perf.pl | flamegraph.pl > graph.svg

Flamegraph

MEMCACHED - PROFILING CACHEMISS WITH PERF

./memcached &sudo perf record ­e branch­misses ­a ­g ­p @pid@

SYSTEM - WHAT'S YOUR SYSTEM ISDOING ?

sudo perf record ­a ­g

USE FLAMEGRAPH WITH JAVAYou can export a flamegraph from jstack output

Logstash contention flamegraph

TO SUMMARYPrefer :

perf when you are looking for a bottleneck or you want towatch what's happens on a machinecallgrind when you want to understand what's happen inthe code and when the performance is not a requirement

top related