deep into your applications, performance & profiling

37
DEEP INTO YOUR APPLICATION ... PERFORMANCE & PROFILING / Fabien Arcellier @farcellier

Upload: fabien-arcellier

Post on 14-Apr-2017

197 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Deep into your applications, performance & profiling

DEEP INTO YOURAPPLICATION ...PERFORMANCE & PROFILING

/ Fabien Arcellier @farcellier

Page 2: Deep into your applications, performance & profiling

ABOUT @FARCELLIERTechnical Architect, Developer, Life-long learner at

Favourite subject : Devops,Performance & Software craftmanship

Octo Technology

Page 3: Deep into your applications, performance & profiling

WHAT'S THE MENUWhat means profiling a application ?How does it works ?Apply on real world application memcached

Page 4: Deep into your applications, performance & profiling

PROFILING IN A FEW WORDS ...

Software profiling is a form of dynamicprogram analysis that measures, forexample :

the space or time complexity of aprogramthe usage of particular instructionsthe frequency and duration of functioncalls, ...

Page 5: Deep into your applications, performance & profiling

calls, ...@copyright wikipedia

TO GET THIS SORT OF REPORT ...

Page 6: Deep into your applications, performance & profiling

TO HAVE A BETTER VIEW ON WHAT'SHAPPENS ON YOUR HARDWARE, ...

@copyright highscalability

Page 7: Deep into your applications, performance & profiling

TO IMPROVE YOUR APPLICATIONPERFORMANCE, ...

@copyright macifcourseaularge

You need measurements to improve continuously yourapplication performance.

TO UNDERSTAND YOUR

Page 8: Deep into your applications, performance & profiling

TO UNDERSTAND YOURAPPLICATION, ...

You want to understand what is consuming your CPU.

Page 9: Deep into your applications, performance & profiling

TO MONITOR YOUR SERVER, ...Flame Graph Search

app__libc_start_mainmain

dotmat_mul

You want to understand what your CPUs are doing.

Page 10: Deep into your applications, performance & profiling

AT THE BEGINNING THERE IS APROGRAM ...

int main(void) return 0;

int func1(void) return 0;

Use gcc to compile itgcc ­c app.c ­o app

Page 11: Deep into your applications, performance & profiling

WITH A SIMPLE SYMBOLS TABLE ...readelf ­ Displays information about ELF files

readelf ­s app

45: 0000000000400580 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini46: 00000000004004f8 11 FUNC GLOBAL DEFAULT 13 func1...57: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 25 _end58: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 _start59: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main...

00000000004004ed : Virtual address of the symbolFUNC : type.main : Name of the symbol

Page 12: Deep into your applications, performance & profiling

HOW IT WORKS ?

60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main

Page 13: Deep into your applications, performance & profiling

CAPTURE EVENTS AND ASSOCIATETHEM TO SYMBOLS

Generally we can list 3 type of profilers :

Instrumented profilingSampling profilingEvent-based profiling (Java, .Net, ...)

Page 14: Deep into your applications, performance & profiling

INSTRUMENTED PROFILINGGprof, Callgrind, ...

ProCapture all eventsGranularity

ConsSlower than raw execution (20 times slower forcallgrind)Intrusive (modify code assembly or emulate a virtualprocessor)What they capture and what they show could differs

Page 15: Deep into your applications, performance & profiling

TOOLING - CALLGRINDCallgrind is a callgraph analyzer that comes with Valgrind.Valgrind is a virtual machine using just-in-time (JIT)compilation techniques.

Page 16: Deep into your applications, performance & profiling

EXAMPLE WITH A MATRIX CALCULUS

You can instrument your execution with callgrind andexplore on kcachegrind.

Page 17: Deep into your applications, performance & profiling

SAMPLING PROFILINGPerf, Oprofile, Intel Vtune, ...

Pro~5 or 10% slower than raw executionRun on any code

ConsSome events are invisible

Page 18: Deep into your applications, performance & profiling

SANDBOX - WRITE MY OWNSAMPLING PROFILER

To understand how simple a sampling profiler is, write yourown thread dump using gdb.

gstack() tmp=$(tempfile) echo thread apply all bt >"$tmp" gdb ­batch ­nx ­q ­x "$tmp" ­p "$1" rm ­f "$tmp"

You execute with frequency to know where your program isspending time

while sleep 1; do gstack @pid@ ; done

Page 19: Deep into your applications, performance & profiling

TOOLING - PERF & FLAMEGRAPHPerf instrumentation appears on linux 2.6+ (Ubuntu 11.10& redhat 6)common interface for hardware counterFlamegraph is actively developped by Brendan Gregg

Page 20: Deep into your applications, performance & profiling

EXAMPLE WITH A MATRIX CALCULUSFlame Graph

app__libc_start_mainmain

dotmat_mul

We don't have any time record on mat_new, even if it'scalled 3 times.

Page 21: Deep into your applications, performance & profiling

FLAMEGRAPH INSTALLATIONgit clone https://github.com/brendangregg/FlameGraph.gitsudo ln ­s $PWD/flamegraph.pl /usr/bin/flamegraph.plsudo ln ­s $PWD/stackcollapse­perf.pl /usr/bin/stackcollapse­perf.plsudo ln ­s $PWD/stackcollapse­jstack.pl /usr/bin/stackcollapse­jstack.plsudo ln ­s $PWD/stackcollapse­gdb.pl /usr/bin/stackcollapse­gdb.pl

Page 22: Deep into your applications, performance & profiling

WHAT'S HAPPENDS INSIDEMEMCACHE ?

Page 23: Deep into your applications, performance & profiling

COMPILE MEMCACHEgit clone https://github.com/memcached/memcached.gitcd memcached./configure && make

Page 24: Deep into your applications, performance & profiling

WHAT'S HIDDEN INSIDE MEMCACHEBINARY ?

readelf ­s ./memcached

...434: 000000000040edf0 10 FUNC GLOBAL DEFAULT 13 slabs_rebalancer_resume435: 0000000000000000 0 FUNC GLOBAL DEFAULT UND setuid@@GLIBC_2436: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_base_loop437: 0000000000412fd0 315 FUNC GLOBAL DEFAULT 13 pause_threads438: 00000000004135e0 10 FUNC GLOBAL DEFAULT 13 STATS_LOCK439: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getaddrinfo@@GLIBC_2440: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strerror@@GLIBC_2441: 000000000040f550 201 FUNC GLOBAL DEFAULT 13 do_item_unlink442: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_init443: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@@GLIBC_2444: 0000000000412b40 247 FUNC GLOBAL DEFAULT 13 assoc_delete...

Page 25: Deep into your applications, performance & profiling

WHAT'S HAPPENS WHEN I WRITE 100RECORD ON MEMCACHE

Doing a test with valgrind (not production friendly)Capture cpu usage with gdbCapture cpu usage with perf_eventCapture cache miss with perf_event

Page 26: Deep into your applications, performance & profiling

MEMCACHE - PROFILING WITHCALLGRIND

Understand what's happen internally by following executiontrace.

valgrind ­­tool=callgrind ­­instr­atstart=no ./memcached

On another terminalcallgrind_control ­i onphp memcache­set.phpcallgrind_control ­i off

Page 27: Deep into your applications, performance & profiling

MEMCACHE - PROFILING WITHCALLGRIND

kcachegrind callgrind.out.@pid@

Page 28: Deep into your applications, performance & profiling

MEMCACHE - PROFILING WITH GDB./memcached &

while sleep 0.1; do gstack 8748; done > stack.txtcat stack.txt | stackcollapse­gdb.pl | flamegraph.pl > gdb_graph.svg

In an another terminalphp memcache­set.php

Page 29: Deep into your applications, performance & profiling

MEMCACHE - PROFILING WITH PERFWe capture events to build callgraph

perf record ­g ./memcached

In an another terminalphp memcache­set.php

To show an interactive reportperf reportperf report ­­stdio

Page 30: Deep into your applications, performance & profiling

MEMCACHE - PROFILING CPU CYCLEWITH PERF

perf script | stackcollapse­perf.pl | flamegraph.pl > graph_stack_missing.svg

Flamegraph

Some information from kernel are missing.

Page 31: Deep into your applications, performance & profiling

MEMCACHED - PROFILING CPUCYCLE WITH PERF - WITH KERNEL

STACKTRACE./memcached &sudo perf record ­a ­g ­p @pid@

In an another terminalphp memcache­set.php

Generate the flamegraphperf script | stackcollapse­perf.pl | flamegraph.pl > graph.svg

Flamegraph

Page 32: Deep into your applications, performance & profiling

MEMCACHED - PROFILING CACHEMISS WITH PERF

./memcached &sudo perf record ­e branch­misses ­a ­g ­p @pid@

Page 33: Deep into your applications, performance & profiling

SYSTEM - WHAT'S YOUR SYSTEM ISDOING ?

sudo perf record ­a ­g

Page 34: Deep into your applications, performance & profiling

USE FLAMEGRAPH WITH JAVAYou can export a flamegraph from jstack output

Logstash contention flamegraph

Page 36: Deep into your applications, performance & profiling

TO SUMMARYPrefer :

perf when you are looking for a bottleneck or you want towatch what's happens on a machinecallgrind when you want to understand what's happen inthe code and when the performance is not a requirement

Page 37: Deep into your applications, performance & profiling