mae mostafa

Upload: pradeep-c

Post on 06-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 MAE Mostafa

    1/40

    MAE Presentation - Nagy Mostafa 1

    Software Performance Profiling

    Major Area Presentation

    Nagy Mostafa6-5-2008

    Committee Members

    Chandra Krintz (chair)Tim SherwoodTevfik Bultan

  • 8/3/2019 MAE Mostafa

    2/40

    MAE Presentation - Nagy Mostafa 2

    Motivation

    Application

    Hardware

    Application Server

    Virtual Machine

    O.S.

    Thousands of lines of code Several different-language modules Numerous features and usage scenarios

    Heterogeneous Prefetching, data/control-flow speculation,pipelining, caching, etc.

  • 8/3/2019 MAE Mostafa

    3/40

    MAE Presentation - Nagy Mostafa 3

    Motivation Static analysis fails to capture runtime-

    behavior because S/W and H/W are complex S/W H/W interaction is hard to understand

    Application usage scenarios are hard topredict

    Solution:Dynamic analysis based on run-time information

  • 8/3/2019 MAE Mostafa

    4/40

    MAE Presentation - Nagy Mostafa 4

    Profiling Profiling: investigation of program behavior using run-

    time information

    Profiler: conceptual module that collects/analysis run-time data Profile: a set of frequencies associated with run-time

    events

    Profile usage examples: Feedback-directed optimization [Calder 99] [Burrows 00] Debugging and bug-isolation [Liblit 03]

    Coverage testing [Tikir 02] Understanding program/architecture interaction

    [Hauswirth 04]

  • 8/3/2019 MAE Mostafa

    5/40

    MAE Presentation - Nagy Mostafa 5

    Profiling Desired qualities of a profiler:

    Accurate Low-overhead

    Fast convergence Flexible Portable

    Transparent Low storage overhead

  • 8/3/2019 MAE Mostafa

    6/40

    MAE Presentation - Nagy Mostafa 6

    Outline Types of profiles

    Profiling techniques and implementations

    Hybrid profiling systems

    Software profiling systems

    Path profiling

  • 8/3/2019 MAE Mostafa

    7/40

    MAE Presentation - Nagy Mostafa 7

    Types of profiles Point profile

    Events are simple and independente.g.

    basic block, edge, method, cache-misses,

    Context profileEvents are composed of simpler ordered eventse.g.

    call-context is a sequence of method invocationsexecution paths is a sequence of edges in a CFG

  • 8/3/2019 MAE Mostafa

    8/40

    MAE Presentation - Nagy Mostafa 8

    Profiling techniquesExhaustive instrumentation

    Sampling

    Instrumentation sampling,

    temporary instrumentation,

    A c c ur a t e

    L i gh t -W ei gh t

  • 8/3/2019 MAE Mostafa

    9/40

    MAE Presentation - Nagy Mostafa 9

    Profiler implementations Hybrid (HW-assisted)

    1. Hardware Performance Monitors (HPMs)2. Dedicated HW collectors that deliver data to

    SW module

    Fixed, low-overhead

    Software Pure software implementations

    Portable, flexible, high-overhead

  • 8/3/2019 MAE Mostafa

    10/40

    MAE Presentation - Nagy Mostafa 10

    Outline Types of profiles

    Profiling techniques and implementations

    Hybrid profiling systems

    Software profiling systems

    Path profiling

  • 8/3/2019 MAE Mostafa

    11/40

    MAE Presentation - Nagy Mostafa 11

    Hybrid profiling systems HPM-based system-wide profilers

    Event-based sampling Support multiple events Profile unmodified binaries Low overhead May be used in production environment

    E.g. DCPI [Anderson 97] , OProfile, VTune

  • 8/3/2019 MAE Mostafa

    12/40

    MAE Presentation - Nagy Mostafa 12

    Hybrid profiling systems HPM-sampling for dynamic compilation

    [Buytaert 07] HPM-sampling for JikesRVM

    More accurate, converges faster, less overhead

    Speed-up by 5 - 18%

    Online optimizations using HPMs[Schneider 07]

    Cache-misses profiling Co-allocation of objects O.H. < 1%, speed-up 14%

    Class A Class B

    parent child

  • 8/3/2019 MAE Mostafa

    13/40

    MAE Presentation - Nagy Mostafa 13

    Hybrid profiling systems Rapid profiling via stratified sampling [Sastry 01]

    Data stream is divided into disjoint strata

    Reduces size of output stream and improvesaccuracy

    O.H. 4.5%, accuracy 97%

    P

    PP

    P

    .

    .

    .

    HEvents

    stream

    Samples

    Periodic Sampler

    Stratified sampling

    Hasher Samples

  • 8/3/2019 MAE Mostafa

    14/40

    MAE Presentation - Nagy Mostafa 14

    Hybrid profiling systems

    Phase-aware profiling [Nagpurkar 05]

    Programs exhibit repeating patterns of execution(phases) Profiling a representative of each phase approximates

    full profile

    Phase-change detector is in HW O.H. reduction by 58% over periodic sampling,

    accuracy 95%

    Source: Nagpurkar 05

  • 8/3/2019 MAE Mostafa

    15/40

    MAE Presentation - Nagy Mostafa 15

    Outline Types of profiles

    Profiling techniques and implementations

    Hybrid profiling systems

    Software profiling systems

    Path profiling

  • 8/3/2019 MAE Mostafa

    16/40

    MAE Presentation - Nagy Mostafa 16

    Software profiling systems Dynamic call tree

    Fully describes method invocation duringexecution

    Calling context tree (CCT)

    Aggregates calls with same context

    Source: [Whaley 00]

    Dynamic call treeCalling context tree

  • 8/3/2019 MAE Mostafa

    17/40

    MAE Presentation - Nagy Mostafa 17

    Software profiling systems Calling-context profiling

    Goal: build an approximate calling contexttree (CCT)

    Expensive to collect via instrumentation

    Approximate CCTDynamic call tree

    Source: Whaley 00

  • 8/3/2019 MAE Mostafa

    18/40

    MAE Presentation - Nagy Mostafa 18

    Software profiling systems Sampling-based approach [Whaley 00]

    Builds a partial call-context tree (PCCT)

    Walks the stack on each interrupt up to k frames Overhead 2-4%, Precision more than 90% Disadv:

    Precision is the correlation of PCCTs for differentruns of benchmark (not accuracy)

    Relies on O.S. timer interrupt (Low frequency,slower on faster architectures)

  • 8/3/2019 MAE Mostafa

    19/40

    MAE Presentation - Nagy Mostafa 19

    Software profiling systems Approximating CCT [Arnold 00]

    Uses event-based sampling (method counter)

    No bound on stack depth sampled High accuracy, O.H. not analyzed (expected to be high)

    Timer/event-based sampling [Arnold 05]

    Stride-enabled (mix of timer-based and event-basedsampling), O.H. < 0.3%, Accuracy 60%

    Probabilistic calling context (PCC) [Bond 07] Usage: Residual testing, debugging, anomaly detection Instrumentation-based (Sampling is not suitable) No CCT is maintained O.H. 3%

  • 8/3/2019 MAE Mostafa

    20/40

    MAE Presentation - Nagy Mostafa 20

    Outline Types of profiles

    Profiling techniques and implementations

    Hybrid profiling systems

    Software profiling systems

    Path profiling

  • 8/3/2019 MAE Mostafa

    21/40

    MAE Presentation - Nagy Mostafa 21

    Path profiling Frequency of execution paths in CFG Exponential in the number of CFG nodes

    Hard to collect by sampling Path profile is NOT edge profile

    Source: Ball 96

  • 8/3/2019 MAE Mostafa

    22/40

    MAE Presentation - Nagy Mostafa 22

    Path profiling

    Source: Ball 96

    Efficient path profiling [Ball 96] Compact enumeration of path (Path IDs) O.H. average 31%, maximum 97% Disadv: High overhead, suitable only for offline setting

  • 8/3/2019 MAE Mostafa

    23/40

    MAE Presentation - Nagy Mostafa 23

    Path profiling Online hot path prediction scheme

    [Duesterwald 00 ] Only path head is instrumented instead of full path Next Executing Trail (NET) predicted as hot Hit rate average: 97% when 10% of execution profiled Disadv: Cannot distinguish hot from warm paths

    (false positives)

  • 8/3/2019 MAE Mostafa

    24/40

    MAE Presentation - Nagy Mostafa 24

    Path profiling Selective path profiling (SPP) [Apiwattanapong 02] Instrument only paths of interest

    Usage: Residual testing, profiling different subsetsover multiple copies of deployed software

    Disadv: the set of path is not totally arbitrary

    Preferential path profiling [Vasawni 07] Similar to SPP but paths can be specified arbitrarily O.H. average 15%

  • 8/3/2019 MAE Mostafa

    25/40

    MAE Presentation - Nagy Mostafa 25

    Path profiling Variational path profiling [Perelman 05]

    Offline profiling scheme Collects execution time variability for paths Paths with high variability are good candidates for

    optimization O.H. average: 5%, speedup via simple

    optimizations: 8.5%

    Disadv: Doesnt report constantly slow paths.

  • 8/3/2019 MAE Mostafa

    26/40

    MAE Presentation - Nagy Mostafa 26

    Path profiling From edge profile we can find total flow (frequency)

    upper bound on the flow of each paths lower bound on the flow of each path (definite flow)

    A

    B C

    D

    E F

    G

    50 30

    6020

    Consider ABDEG:Max flow = 50

    Let ACDFG = 0, ACDEG = 30, ABDFG = 20Min flow = 30 (Definite Flow)

  • 8/3/2019 MAE Mostafa

    27/40

    MAE Presentation - Nagy Mostafa 27

    Path profilingA

    I

    B

    C D

    F

    E

    H

    G

    J

    Defining edge

    Defining edgeABDFGHJFG

    ABDFHJFH

    ABDEHJDE

    ABCEHJCEAIJAI

    Obvious pathDefining Edge

    Obvious path:Path whose flow can be driven from edge profile

  • 8/3/2019 MAE Mostafa

    28/40

    MAE Presentation - Nagy Mostafa 28

    Path profiling Targeted path profiling (TPP) [Joshi 04]

    Suitable for staged dynamic optimizers Relies on edge profile to:

    Remove obvious paths Remove cold edges

    O.H. 14%, Acc > 97% Disadv: Compilation O.H. 74%

  • 8/3/2019 MAE Mostafa

    29/40

    MAE Presentation - Nagy Mostafa 29

    Path profiling Practical path profiling (PPP) [Bond 05] Reduces amount of instrumentation than TPP Reduce instrumentation overhead

    Smart path numbering (avoid instrumenting hotedges)

    O.H. average 5%, accuracy average 96% Disadv: Compilation overhead not reported

    (expected to be high)

  • 8/3/2019 MAE Mostafa

    30/40

    MAE Presentation - Nagy Mostafa 30

    Path profilingTwo types of instrumentation:

    Expensive profile updates

    Cheap register updates

  • 8/3/2019 MAE Mostafa

    31/40

    MAE Presentation - Nagy Mostafa 31

    Path profiling Path and edge profiling (PEP) [Bond05]

    Orthogonal approach that samples profileupdates

    Piggybacks on JikesRVM sampling profiler Edge profile guides instrumentation placing O.H. average 1.2%, 94% accuracy

    Disadv: VM-specific

  • 8/3/2019 MAE Mostafa

    32/40

    MAE Presentation - Nagy Mostafa 32

    Conclusion Programs behavior are hard to understand statically

    Dynamic analysis based on runtime data is needed

    Performance profiling investigates runtime behavior

    Profiling techniques range from exhaustive instrum-entation to sampling

    Implementation can be in hardware, software or hybrid

    Context and path profiling techniques

  • 8/3/2019 MAE Mostafa

    33/40

    MAE Presentation - Nagy Mostafa 33

  • 8/3/2019 MAE Mostafa

    34/40

    MAE Presentation - Nagy Mostafa 34

    Reducing profiling cost Ephemeral instrumentation [Traub 00]

    Temporary instrumentation Shadow profiling [Moseley 07]

    O.H. < 1%, Accuracy 94% (value-profiling)

    Profile over adaptive ranges [Mysore 06] Adaptively reducing profile storage overhead

  • 8/3/2019 MAE Mostafa

    35/40

    MAE Presentation - Nagy Mostafa 35

    Reducing profiling cost

    Source: Arnold 01 Source: Hirzel 01

  • 8/3/2019 MAE Mostafa

    36/40

    MAE Presentation - Nagy Mostafa 36

    Usage examples Value profiling and optimization

    [Calder 99] [Burrows 00] Code coverage testing [Tikir 02] Bug isolation [Liblit 03] Understand OO applications [Hauswirth 04] Offline/Online feed-back directed

    optimization

  • 8/3/2019 MAE Mostafa

    37/40

    MAE Presentation - Nagy Mostafa 37

    Path profiling usage example

    Source: Gupta et al. 1998

  • 8/3/2019 MAE Mostafa

    38/40

    MAE Presentation - Nagy Mostafa 38

    Sampling: Triggering mechanism

    Timer-based Easy to implement, Relies on O.S. timer

    interrupt ( e.g. 4 ms on Linux 2.6 = 250 samples/sec) Disadv.: Low sampling frequency, independent of

    processor speed, cannot always correlate withprogram events

    Event-based Relies on event counting in SW or HW (e.g.

    HPMs) Correlates with program events Adapts to processor speed

  • 8/3/2019 MAE Mostafa

    39/40

    MAE Presentation - Nagy Mostafa 39

    Sampling: Collection mechanism Polling-based

    Samples are taken only at polling points (e.g.method entries, back-edges, etc. )

    Disadv.: Not timely Overhead: flag checking Limited accuracy: biased sampling (e.g. polling

    points after long I/O operations)

    Immediate Sample is taken right after interrupt is raised

  • 8/3/2019 MAE Mostafa

    40/40

    MAE Presentation - Nagy Mostafa 40

    Sampling: Collection mechanism(2)

    I/O

    Over-sampled polling pointtime

    Timer-interrupt

    Timer-interrupt

    time

    Arnold-Grove sampling [Arnold 05]

    Sample burst = 3, Stride = 3

    Problem with polling sampling

    .. .. ..