open trace format (otf) tutorial

91
© 2006 Wolfgang E. Nagel, TU Dresden, ZIH Open Trace Format (OTF) Tutorial Wolfgang E. Nagel, Holger Brunst, T.U. Dresden, Germany Sameer Shende, Allen D. Malony, ParaTools, Inc. http://www.vampir-ng.de [email protected]

Upload: shing

Post on 13-Jan-2016

95 views

Category:

Documents


2 download

DESCRIPTION

Open Trace Format (OTF) Tutorial. Wolfgang E. Nagel, Holger Brunst, T.U. Dresden, Germany Sameer Shende, Allen D. Malony, ParaTools, Inc. http://www.vampir-ng.de [email protected]. Outline. An overview of OTF, TAU and Vampir/VNG OTF Tools API Building trace conversion tools TAU - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open Trace Format (OTF) Tutorial

© 2006 Wolfgang E. Nagel, TU Dresden, ZIH

Open Trace Format (OTF) Tutorial

Wolfgang E. Nagel, Holger Brunst, T.U. Dresden, Germany

Sameer Shende, Allen D. Malony, ParaTools, Inc.

http://www.vampir-ng.de

[email protected]

Page 2: Open Trace Format (OTF) Tutorial

2

Outline

• An overview of OTF, TAU and Vampir/VNG

• OTF– Tools– API– Building trace conversion tools

• TAU– Instrumentation– Measurement – Analysis

• Scalable Tracing– Vampir– VNG– OTF

Page 3: Open Trace Format (OTF) Tutorial

3

Tutorial Goals

• This tutorial is intended as an introduction to OTF tools.

• Today you should leave here with a better understanding of…– OTF API and tools– Steps involved in building a trace conversion tool to target OTF– How to instrument your programs with TAU to generate OTF

– Automatic instrumentation at the routine level and outer loop level– Manual instrumentation at the loop/statement level

– Measurement options provided by TAU– Environment variables used for choosing metrics, generating

performance data– How to use the Vampir and VNG tools– Nature and types of visualization that VNG provides for visualizing OTF

traces

Page 4: Open Trace Format (OTF) Tutorial

4

Vampir: Technical Components

Worker 1

Worker 2

Worker m

Master

Server

Trace 1Trace 2

Trace 3Trace N

Tools

1. Trace generator

2. Classical Vampir viewer and analyzer

3. Vampir client viewer

4. Parallel server engine

5. Conversion and analysis tools

Page 5: Open Trace Format (OTF) Tutorial

5

Many Trace Formats to choose from …

Page 6: Open Trace Format (OTF) Tutorial

6

OTF Features

• Fast and efficient sequential and parallel access

• Platform independent

• Selective access to– Processes– Time intervals

• API / Interfaces– High level interface for analysis tools

– Read/write complete traces with multiple files– Supports filtering and parallel I/O

– Low level interface for trace libraries

Page 7: Open Trace Format (OTF) Tutorial

7

Relative File Size

0

0,5

1

1,5

2

2,5

3

STF VTF OTF OTFZ

Relative Size

SMG 98 (18MB)

IRS (1.8 GB)

SMG2000 (2.3 GB)

Bet

ter

Page 8: Open Trace Format (OTF) Tutorial

8

Read Performance

0

0,5

1

1,5

2

2,5

3

3,5

STF VTF OTF OTFZ

Mevents/s

SMG 98 (18MB)

IRS (1.8 GB)

SMG2000 (2.3 GB)

Bet

ter

Page 9: Open Trace Format (OTF) Tutorial

9

Performance Scalability

1

10

100

1000

1 4 16 64 256

Mevents/s

VTF

STF

OTF

OTFZBet

ter

Page 10: Open Trace Format (OTF) Tutorial

10

MergedTraces

Analysis Server

Classic Analysis:

monolithic

sequential

Worker 1

Worker 2

Worker m

Master

Trace 1Trace 2

Trace 3Trace N

File System

Internet

Internet

Parallel Program

Monitor System

(TAU/Kojak)

Event Streams

Visualization Client

Segment Indicator

768 Processes Thumbnail

Timeline with 16 visible Traces

ProcessParallel

I/OMessage Passing

Vampir Server Workflow

Page 11: Open Trace Format (OTF) Tutorial

11

Worker 1

Worker 2

Worker m

Master

Worker

Session Thread

Analysis Module

Event Databases

Message Passing

Trace Format Driver

Master

Session Thread

Analysis Merger

Endian Conversion

Message Passing

Socket Communication

VisualizationClient

M Worker

N Session Threads N Session Threads

Traces

Organization of Parallel Analysis

Page 12: Open Trace Format (OTF) Tutorial

12

Scalability – sPPM Analyzed on Origin 2000

• sPPM ASCI Benchmark– 3D Gas Dynamic– Data to be analyzed

• 16 Processes– 200 MByte Volume 0,00

2,00

4,00

6,00

8,00

10,00

12,00

14,00

16,00

18,00

0 10 20 30 40

Number of Workers

Speedup

Com. Matrix

Timeline

Summary Profile

Process Profile

Stack Tree

LoadTime

Number of Workers 1 2 4 8 16 32Load Time 47,33 22,48 10,80 5,43 3,01 3,16Timeline 0,10 0,09 0,06 0,08 0,09 0,09Summary Profile 1,59 0,87 0,47 0,30 0,28 0,25Process Profile 1,32 0,70 0,38 0,26 0,17 0,17Com. Matrix 0,06 0,07 0,08 0,09 0,09 0,09Stack Tree 2,57 1,39 0,70 0,44 0,25 0,25

Page 13: Open Trace Format (OTF) Tutorial

13

A Fairly Large Test Case

• IRS ASCI Benchmark– Implicit Radiation Solver

• Data to be analyzed:– 64 Processes in

8 Streams– Approx.

800.000.000 Events– 40 GByte Data Volume

• Analysis Platform:– Jump.fz-juelich.de– 41 IBM p690 nodes (32 processors per node)– 128 GByte per node

• Visualization Platform:– Remote Laptop

0,02 0,02

4,653,62

9,11

4,67

0,16 0,09

5,59

3,84

0,00

2,00

4,00

6,00

8,00

10,00

Timeline SummaryProf.

ProcessProf.

Com.Matrix

Stack Tree

Processing Times in Seconds

16 Worker 32 Worker

Page 14: Open Trace Format (OTF) Tutorial

14

Outline

• An overview of OTF, TAU and Vampir/VNG

• OTF– Tools– API– Building trace conversion tools

• TAU– Instrumentation– Measurement – Analysis

• Scalable Tracing– Vampir– VNG– OTF

Page 15: Open Trace Format (OTF) Tutorial

15

OTF Trace Generation and Analysis Tools

Page 16: Open Trace Format (OTF) Tutorial

16

OTF Contents

• Definition records– Map event ids to interval (begin/end) event names– Symbols for atomic events– Process groups

• Performance events– Timestamped events for entering or leaving a state– Timestamped counter events (monotonically increasing or not)

• Global master file– Mapping processes to streams

• Statistical Summaries– Overview over a whole interval of time

• Snapshots– Callstack, list of pending messages, etc. at a point in time

Page 17: Open Trace Format (OTF) Tutorial

17

OTF File Hierarchy

Page 18: Open Trace Format (OTF) Tutorial

18

OTF Streams

Page 19: Open Trace Format (OTF) Tutorial

19

otfmerge

• Allows an existing OTF trace to alter the number of streams

• Add snapshots or statistics to the merged trace file

• otfmerge - converter program of OTF library. otfmerge [Options] <input file name> options: -h, --help show this help message -n <n> set number of streams for output -f <n> set max number of filehandles available -o <name> namestub of the output file (default ’out’) -rb <size> set buffersize of the reader -wb <size> set buffersize of the writer -stats cover statistics too -snaps cover snapshots too -V show OTF version

Page 20: Open Trace Format (OTF) Tutorial

20

OTF Tools: otfaux

• otfaux– Adds auxillary snapshot and/or statistics information to the trace file– Snapshots include callstack, pending messages, current counter values– Statistics include number of calls, exclusive/inclusive time– Statistics are monotonically increasing - unlike profiles– Original event trace is unmodified– Auxillary data is generated at breakpoints -periodically or at ticks

Page 21: Open Trace Format (OTF) Tutorial

21

otfaux

• otfaux - append snapshots and statistics to existing otf traces at given ’break’ time stamps otfaux [Options] <file name> Options: -h, --help show this help message -b <size> buffer size for read and write operations -n <n> number of breaks (distributed regularly) if

-p and -t are not set, the default for -n is 200 breaks

-p <p> create break every ’p’ ticks (if both, -n and -p are specified the one producing more breaks wins)

-t <t> define (additional) break at given time stamp -F force overwrite old snapshots and statistics -R delete existing snapshots and statistics only -f <n> max number of filehandles output ...

Page 22: Open Trace Format (OTF) Tutorial

22

otfaux (contd.)

-g create functiongroup summaries instead of function summaries

-v verbose mode, print break time stamps -V show OTF version -a show advancing progress during operation --snapshots write ONLY snapshots but NO statistics --statistics write ONLY statistics but NO snapshots -s a[,b]* regard given streams only when computing

statistics. expects a single token or comma separated list. this implies the ’--

statistics’ option! -l list existing stream tokens

Page 23: Open Trace Format (OTF) Tutorial

23

tau2otf

• Converts TAU traces to OTF

• tau2otf <TAU trace> <edf file> <out file> [-n streams] [-nomessage] [-z] [-v]

-n <streams> : Specifies the number of output streams (default 1)

-nomessage : Suppress printing of message information in the trace

-z : Enable compression of trace files. By default it is uncompressed.

-v : Verbose

Trace format of <out file> is OTF

% tau2otf merged.trc tau.edf app.otf

Page 24: Open Trace Format (OTF) Tutorial

24

vtf2otf

• Convert VTF traces to OTF format

• vtf2otf [Options] <input file name> Options: -o <file> output file -f <n> max count of filehandles -n <n> output stream count -b <n> size of the writer buffer -V show OTF version

Page 25: Open Trace Format (OTF) Tutorial

25

otf2vtf

• Convert OTF trace files to VTF format

• otf2vtf [Options] <input file name> Options: -o <file> output file -b <n> size of the reader buffer -A write VTF3 ASCII sub-format (default) -B write VTF3 binary sub-format -V show OTF version

Page 26: Open Trace Format (OTF) Tutorial

26

Building Trace Analysis Tools

• Writing OTF traces in trace conversion tools– High level API writes multiple streams– Low level API writes a single stream– Each OTF file has a prefix (e.g., app.otf)

• Parallel reading and searching in OTF analysis tools– Each process in tool reads local and global event definitions

– Each process reads a subset of events– Read summary information to select interesting spots in trace– Tool might read a selected time interval for analysis

– OTF supports efficient binary search

• Tool may support for compressed or uncompressed OTF trace

• Tool may support for single or multi-stream OTF traces

Page 27: Open Trace Format (OTF) Tutorial

27

OTF Trace Writer API - OTF_FileManager_open

• Generates a new file manager with a maximum number of files that are allowed to be open simultaneously

• OTF_FileManager* OTF_FileManager_open( uint32_t number );

#include <otf.h>

OTF_FileManager *manager;manager = OTF_FileManager_open(256);

Page 28: Open Trace Format (OTF) Tutorial

28

OTF_FileManager_close

• Closes the file manager

• void OTF_FileManager_close( OTF_FileManager* m );

#include <otf.h>

OTF_FileManager_close(manager);

Page 29: Open Trace Format (OTF) Tutorial

29

OTF_Writer_open

• Define file control block for output trace file

• OTF_Writer* OTF_Writer_open( char* fileNamePrefix,uint32_t numberOfStreams,OTF_FileManager* fileManager );

#include <otf.h>

void *fcb = (void *) OTF_Writer_open(out_file, num_streams, manager);

Page 30: Open Trace Format (OTF) Tutorial

30

OTF_Writer_setCompression

• Enable compression if specified by the user

• int OTF_Writer_setCompression( OTF_Writer* writer, OTF_FileCompression);

#include <otf.h>

OTF_Writer_setCompression((OTF_Writer *)fcb, OTF_FILECOMPRESSION_COMPRESSED);

Page 31: Open Trace Format (OTF) Tutorial

31

OTF_Writer_writeDefCreator

• Specify a comment about the creator (trace conversion tool)

• int OTF_Handler_DefCreator( void* userData,

uint32_t stream, /* stream = 0 means global definition */

const char* creator );

#include <otf.h>

OTF_Handler_DefCreator(fcb, 0, “MyTool2otf ver 2.42”);

Page 32: Open Trace Format (OTF) Tutorial

32

OTF_Writer_writeDefProcess

• Write a process definition record

• int OTF_Writer_writeDefProcess( OTF_Writer* writer, uint32_t stream,

uint32_t process, const char* name, uint32_t parent );

#include <otf.h>

OTF_Writer_writeDefProcess((OTF_Writer *)fcb, 0, cpuid, name, 0);

Page 33: Open Trace Format (OTF) Tutorial

33

OTF_Writer_writeDefTimerResolution

• Provides the timer resolution. All timestamps are interpreted based on this resolution. By default it is 1 microseconds.

• int OTF_Writer_write_DefTimerResolution( void* userData, uint32_t stream, uint64_t ticksPerSecond );

#include <otf.h>OTF_Writer_writeDefTimerResolution((OTF_Writer*)

userData, 0, getTicksPerSecond());

Page 34: Open Trace Format (OTF) Tutorial

34

OTF_Writer_write_DefFunction

• Provide a function definition and specify an event id to name mapping

• int OTF_Writer_write_DefFunction( void* userData, uint32_t stream, uint32_t func, const char* name, uint32_t funcGroup, uint32_t source ); /* specify source code location */

#include <otf.h>OTF_Writer_writeDefFunction((OTF_Writer*)userData, 0, eventID, (const char *) name, groupID, 0);

Page 35: Open Trace Format (OTF) Tutorial

35

OTF_Writer_writeDefFunctionGroup

• Provides a function group definition

• int OTF_Handler_DefFunctionGroup( void* userData, uint32_t stream, uint32_t funcGroup, const char* name );

#include <otf.h>OTF_Writer_writeDefFunctionGroup((OTF_Writer*)userData, 0, groupId, GroupName);

Page 36: Open Trace Format (OTF) Tutorial

36

OTF_Writer_writeEnter

• Write a function entry record

• int OTF_Writer_writeEnter( OTF_Writer* writer,

uint64_t time,

uint32_t function,

uint32_t process,

uint32_t source );

#include <otf.h> OTF_Writer_writeEnter((OTF_Writer*)userData, GetClockTicksInGHz(time), stateid, cpuid, 0);

Page 37: Open Trace Format (OTF) Tutorial

37

int OTF_Writer_writeSendMsg

• Write a send message record

• int OTF_Writer_writeSendMsg( OTF_Writer* writer,

uint64_t time,

uint32_t sender,

uint32_t receiver,

uint32_t procGroup,

uint32_t tag,

uint32_t length,

uint32_t source );

Page 38: Open Trace Format (OTF) Tutorial

38

int OTF_Writer_writeRecvMsg

• Write a receive message record

• int OTF_Writer_writeRecvMsg( OTF_Writer* writer,

uint64_t time,

uint32_t receiver,

uint32_t sender,

uint32_t procGroup,

uint32_t tag,

uint32_t length,

uint32_t source );

Page 39: Open Trace Format (OTF) Tutorial

39

OTF Trace Reader API

• Similar to trace writer API

• Instead of Write, create a Handler for callbacks, e.g.,

• int OTF_Handler_DefFunction( void* userData,

uint32_t stream,

uint32_t func,

const char* name,

uint32_t funcGroup,

uint32_t source );

Page 40: Open Trace Format (OTF) Tutorial

40

Building OTF Analysis Tools

• Header files are in <otf-version>/include directory

• Libraries are in <otf-version>/<arch>/lib directory– Support for Zlib (v1.2.3) is included in libotf.a

% g++ tool.cpp -I<otf-version>/include% g++ tool.o -o tool -L<otf-version>/<arch>/lib -lotf

Page 41: Open Trace Format (OTF) Tutorial

41

Outline

• An overview of OTF, TAU and Vampir/VNG

• OTF– Tools– API– Building trace conversion tools

• TAU– Instrumentation– Measurement – Analysis

• Scalable Tracing– Vampir– VNG– OTF

Page 42: Open Trace Format (OTF) Tutorial

42

TAU Parallel Performance System

• http://www.cs.uoregon.edu/research/tau/

• Multi-level performance instrumentation– Multi-language automatic source instrumentation

• Flexible and configurable performance measurement

• Widely-ported parallel performance profiling system– Computer system architectures and operating systems– Different programming languages and compilers

• Support for multiple parallel programming paradigms– Multi-threading, message passing, mixed-mode, hybrid

• Integration in complex software, systems, applications

Page 43: Open Trace Format (OTF) Tutorial

43

Using TAU: A brief Introduction

• To instrument source code, choose measurement module:% setenv TAU_MAKEFILE /usr/tau-2.16/x86_64/lib/Makefile.tau-mpi-

pdt-trace-pgiAnd use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C

compilers:% mpif90 foo.f90 changes to % tau_f90.sh foo.f90

• Execute application and then run:% tau_treemerge.pl% tau2otf tau.trc tau.edf app.otf% vampir app.otf

Page 44: Open Trace Format (OTF) Tutorial

44

TAU Performance System Architecture

eventselection

Page 45: Open Trace Format (OTF) Tutorial

45

TAU Performance System Architecture

Page 46: Open Trace Format (OTF) Tutorial

46

Program Database Toolkit (PDT)

Application/ Library

C / C++parser

Fortran parserF77/90/95

C / C++IL analyzer

FortranIL analyzer

ProgramDatabase

Files

IL IL

DUCTAPE

PDBhtml

SILOON

CHASM

TAU_instr

Programdocumentation

Applicationcomponent glue

C++ / F90/95interoperability

Automatic sourceinstrumentation

Page 47: Open Trace Format (OTF) Tutorial

47

TAU Instrumentation Approach

• Support for standard program events– Routines– Classes and templates– Statement-level blocks

• Support for user-defined events– Begin/End events (“user-defined timers”)– Atomic events (e.g., size of memory allocated/freed)– Selection of event statistics

• Support definition of “semantic” entities for mapping

• Support for event groups

• Instrumentation optimization (eliminate instrumentation in lightweight routines)

Page 48: Open Trace Format (OTF) Tutorial

48

TAU Instrumentation

• Flexible instrumentation mechanisms at multiple levels– Source code

– manual (TAU API, TAU Component API)– automatic

– C, C++, F77/90/95 (Program Database Toolkit (PDT))– OpenMP (directive rewriting (Opari), POMP spec)

– Object code– pre-instrumented libraries (e.g., MPI using PMPI)– statically-linked and dynamically-linked

– Executable code– dynamic instrumentation (pre-execution) (DynInstAPI)– virtual machine instrumentation (e.g., Java using JVMPI)– Python interpreter based instrumentation at runtime

– Proxy Components

Page 49: Open Trace Format (OTF) Tutorial

49

TAU Measurement Approach

• Portable and scalable parallel profiling solution– Multiple profiling types and options– Event selection and control (enabling/disabling, throttling)– Online profile access and sampling– Online performance profile overhead compensation

• Portable and scalable parallel tracing solution– Trace translation to Open Trace Format (OTF)– Trace streams and hierarchical trace merging

• Robust timing and hardware performance support

• Multiple counters (hardware, user-defined, system)

• Performance measurement for CCA component software

Page 50: Open Trace Format (OTF) Tutorial

50

Using TAU

• Configuration

• Instrumentation– Manual– MPI – Wrapper interposition library– PDT- Source rewriting for C,C++, F77/90/95– OpenMP – Directive rewriting– Component based instrumentation – Proxy components– Binary Instrumentation

– DyninstAPI – Runtime Instrumentation/Rewriting binary– Java – Runtime instrumentation– Python – Runtime instrumentation

• Measurement

• Performance Analysis

Page 51: Open Trace Format (OTF) Tutorial

51

TAU Measurement System Configuration

• configure [OPTIONS]{-c++=<CC>, -cc=<cc>} Specify C++ and C compilers{-pthread, -sproc} Use pthread or SGI sproc threads-openmp Use OpenMP threads-jdk=<dir> Specify Java instrumentation (JDK) -opari=<dir> Specify location of Opari OpenMP tool-papi=<dir> Specify location of PAPI -pdt=<dir> Specify location of PDT-dyninst=<dir> Specify location of DynInst Package-mpi[inc/lib]=<dir> Specify MPI library instrumentation-shmem[inc/lib]=<dir> Specify PSHMEM library

instrumentation-python[inc/lib]=<dir> Specify Python instrumentation-tag=<name> Specify a unique configuration name-epilog=<dir> Specify location of EPILOG -slog2 Build SLOG2/Jumpshot tracing package-otf=<dir> Specify location of OTF trace package-arch=<architecture> Specify architecture explicitly

(bgl, xt3,ibm64,ibm64linux…)

Page 52: Open Trace Format (OTF) Tutorial

52

TAU Measurement System Configuration

• configure [OPTIONS]-TRACE Generate binary TAU traces-PROFILE (default) Generate profiles (summary)-PROFILECALLPATH Generate call path profiles-PROFILEPHASE Generate phase based profiles-PROFILEMEMORY Track heap memory for each

routine-PROFILEHEADROOM Track memory headroom to grow-MULTIPLECOUNTERS Use hardware counters + time-COMPENSATE Compensate timer overhead-CPUTIME Use usertime+system time -PAPIWALLCLOCK Use PAPI’s wallclock time-PAPIVIRTUAL Use PAPI’s process virtual time-SGITIMERS Use fast IRIX timers-LINUXTIMERS Use fast x86 Linux timers

Page 53: Open Trace Format (OTF) Tutorial

53

TAU Measurement Configuration – Examples

• ./configure –pdt=/opt/ALTIX/pkgs/pdtoolkit-3.9 -mpi– Configure using PDT and MPI with GNU compilers

• ./configure -papi=/usr/local/packages/papi -pdt=/usr/local/pdtoolkit-3.9 -mpiinc=/usr/local/include -mpilib=/usr/local/lib -MULTIPLECOUNTERS –c++=icpc –cc=icc –fortran=intel -tag=intel91039; make clean install– Use PAPI counters (one or more) with C/C++/F90 automatic

instrumentation. Also instrument the MPI library. Use Intel compilers.

• Typically configure multiple measurement libraries

• Each configuration creates a unique <arch>/lib/Makefile.tau<options>

stub makefile. It corresponds to the configuration options used. e.g.,– /opt/tau-2.15.5/x86_64/lib/Makefile.tau-icpc-mpi-pdt– /opt/tau-2.15.5/x86_64/lib/Makefile.tau-icpc-mpi-pdt-trace

Page 54: Open Trace Format (OTF) Tutorial

54

TAU Measurement Configuration – Examples

% cd /usr/tau-2.16/x86_64/lib; ls Makefile.*pgi

Makefile.tau-pdt-pgi

Makefile.tau-mpi-pdt-pgi

Makefile.tau-callpath-mpi-pdt-pgi

Makefile.tau-mpi-pdt-trace-pgi

Makefile.tau-mpi-compensate-pdt-pgi

Makefile.tau-pthread-pdt-pgi

Makefile.tau-papiwallclock-multiplecounters-papivirtual-mpi-papi-pdt-pgi

Makefile.tau-multiplecounters-mpi-papi-pdt-trace-pgi

Makefile.tau-mpi-pdt-epilog-trace-pgi

Makefile.tau-papiwallclock-multiplecounters-papivirtual-papi-pdt-openmp-opari-pgi

• For an MPI+F90 application, you may want to start with:Makefile.tau-mpi-pdt-trace-pgi

– Supports MPI instrumentation & PDT for automatic source instrumentation for PGI with tracing

Page 55: Open Trace Format (OTF) Tutorial

55

Configuration Parameters in Stub Makefiles

• Each TAU stub Makefile resides in <tau>/<arch>/lib directory

• Variables:– TAU_CXX Specify the C++ compiler used by TAU– TAU_CC, TAU_F90 Specify the C, F90 compilers– TAU_DEFS Defines used by TAU. Add to CFLAGS– TAU_LDFLAGS Linker options. Add to LDFLAGS– TAU_INCLUDE Header files include path. Add to CFLAGS– TAU_LIBS Statically linked TAU library. Add to LIBS– TAU_SHLIBS Dynamically linked TAU library– TAU_MPI_LIBS TAU’s MPI wrapper library for C/C++– TAU_MPI_FLIBS TAU’s MPI wrapper library for F90– TAU_FORTRANLIBS Must be linked in with C++ linker for F90– TAU_CXXLIBS Must be linked in with F90 linker – TAU_INCLUDE_MEMORY Use TAU’s malloc/free wrapper lib– TAU_DISABLE TAU’s dummy F90 stub library– TAU_COMPILER Instrument using tau_compiler.sh script

• Each stub makefile encapsulates the parameters that TAU was configured with

• It represents a specific instance of the TAU libraries. TAU scripts use stub makefiles to identify what performance measurements are to be performed.

Page 56: Open Trace Format (OTF) Tutorial

56

Using TAU

• Install TAU% configure [options]; make clean install

• Instrument application manually/automatically– TAU Profiling API

• Typically modify application makefile– Select TAU’s stub makefile, change name of compiler in Makefile

• Set environment variables– TAU_MAKEFILE stub makefile– directory where profiles/traces are to be stored

• Execute application% mpirun –np <procs> a.out;

• Analyze performance data– paraprof, vampir, pprof, paraver …

Page 57: Open Trace Format (OTF) Tutorial

57

TAU’s MPI Wrapper Interposition Library

• Uses standard MPI Profiling Interface– Provides name shifted interface

– MPI_Send = PMPI_Send– Weak bindings

• Interpose TAU’s MPI wrapper library between MPI and TAU– -lmpi replaced by –lTauMpi –lpmpi –lmpi

• No change to the source code!– Just re-link the application to generate performance data– setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi -[options]– Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers

Page 58: Open Trace Format (OTF) Tutorial

58

Instrumenting MPI Applications

• Under Linux you may use tau_load.sh to launch un-instrumented programs under TAU– Without TAU:

% mpirun -np 4 ./a.out– With TAU:

% ls /usr/tau/x86_64/lib/libTAU*pgi*% mpirun -np 4 tau_load.sh ./a.out% mpirun -np 4 tau_load.sh -XrunTAUsh-mpi-pdt-trace-pgi.so a.outloads <taudir>/<arch>/lib/libTAUsh-mpi-pdt-trace-pgi.so shared object

• Under AIX, use tau_poe instead of poe– Without TAU:

% poe a.out -procs 8– With TAU:

% tau_poe a.out -procs 8% tau_poe -XrunTAUsh-mpi-pdt-trace.so a.out -procs 8chooses <taudir>/<arch>/lib/libTAUsh-mpi-pdt-trace.so

• No change to source code or executables! No need to re-link!

• Only instruments MPI routines. To instrument user routines, you may need to parse the application source code!

Page 59: Open Trace Format (OTF) Tutorial

59

Integration with Application Build Environment

• Try to minimize impact on user’s application build procedures

• Handle process of parsing, instrumentation, compilation, linking

• Dealing with Makefiles– Minimal change to application Makefile– Avoid changing compilation rules in application Makefile– No explicit inclusion of rules for process stages

• Some applications do not use Makefiles– Facilitate integration in whatever procedures used

• Two techniques:– TAU shell scripts (tau_<compiler>.sh)

– Invokes all PDT parser, TAU instrumenter, and compiler– TAU_COMPILER

Page 60: Open Trace Format (OTF) Tutorial

60

Using Program Database Toolkit (PDT)

1. Parse the Program to create foo.pdb:

% cxxparse foo.cpp –I/usr/local/mydir –DMYFLAGS …

or

% cparse foo.c –I/usr/local/mydir –DMYFLAGS …

or

% f95parse foo.f90 –I/usr/local/mydir …

% f95parse *.f –omerged.pdb –I/usr/local/mydir –R free

2. Instrument the program:% tau_instrumentor foo.pdb foo.f90 –o foo.inst.f90

–f select.tau

3. Compile the instrumented program:% ifort foo.inst.f90 –c –I/usr/local/mpi/include –o foo.o

Page 61: Open Trace Format (OTF) Tutorial

61

Tau_[cxx,cc,f90].sh – Improves Integration in Makefiles

# set TAU_MAKEFILE and TAU_OPTIONS env vars

CC = tau_cc.sh

F90 = tau_f90.sh

CFLAGS =

LIBS = -lm

OBJS = f1.o f2.o f3.o … fn.o

app: $(OBJS)

$(F90) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)

.c.o:

$(CC) $(CFLAGS) -c $<

.f90.o:

$(F90) $(FFLAGS) –c $<

Page 62: Open Trace Format (OTF) Tutorial

62

AutoInstrumentation using TAU_COMPILER

• $(TAU_COMPILER) stub Makefile variable

• Invokes PDT parser, TAU instrumentor, compiler through tau_compiler.sh shell script

• Requires minimal changes to application Makefile– Compilation rules are not changed– User adds $(TAU_COMPILER) before compiler name

– F90=mpxlf90Changes toF90= $(TAU_COMPILER) mpxlf90

• Passes options from TAU stub Makefile to the four compilation stages

• Use tau_cxx.sh, tau_cc.sh, tau_f90.sh scripts OR $(TAU_COMPILER)

• Uses original compilation command if an error occurs

Page 63: Open Trace Format (OTF) Tutorial

63

Automatic Instrumentation

• We now provide compiler wrapper scripts– Simply replace mpxlf90 with tau_f90.sh– Automatically instruments Fortran source code, links with

TAU MPI Wrapper libraries.

• Use tau_cc.sh and tau_cxx.sh for C/C++BeforeCXX = mpCC

F90 = mpxlf90_r

CFLAGS =

LIBS = -lm

OBJS = f1.o f2.o f3.o … fn.o

app: $(OBJS)

$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)

.cpp.o:

$(CC) $(CFLAGS) -c $<

AfterCXX = tau_cxx.sh

F90 = tau_f90.sh

CFLAGS =

LIBS = -lm

OBJS = f1.o f2.o f3.o … fn.o

app: $(OBJS)

$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)

.cpp.o:

$(CC) $(CFLAGS) -c $<

Page 64: Open Trace Format (OTF) Tutorial

64

TAU_COMPILER – Improving Integration in Makefiles

include /usr/tau-2.15.5/x86_64/Makefile.tau-icpc-mpi-pdt

CXX = $(TAU_COMPILER) mpicxx

F90 = $(TAU_COMPILER) mpif90

CFLAGS =

LIBS = -lm

OBJS = f1.o f2.o f3.o … fn.o

app: $(OBJS)

$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)

.cpp.o:

$(CXX) $(CFLAGS) -c $<

Page 65: Open Trace Format (OTF) Tutorial

65

TAU_COMPILER Commandline Options

• See <taudir>/<arch>/bin/tau_compiler.sh –help

• Compilation:

% mpxlf90 -c foo.f90

Changes to% f95parse foo.f90 $(OPT1)% tau_instrumentor foo.pdb foo.f90 –o foo.inst.f90 $(OPT2)% mpxlf90 –c foo.f90 $(OPT3)

• Linking:

% mpxlf90 foo.o bar.o –o app

Changes to% mpxlf90 foo.o bar.o –o app $(OPT4)

• Where options OPT[1-4] default values may be overridden by the user:F90 = $(TAU_COMPILER) $(MYOPTIONS) mpxlf90

Page 66: Open Trace Format (OTF) Tutorial

66

TAU_COMPILER Options

• Optional parameters for $(TAU_COMPILER): [tau_compiler.sh –help]-optVerbose Turn on verbose debugging messages-optDetectMemoryLeaks Turn on debugging memory allocations/

de-allocations to track leaks-optPdtGnuFortranParser Use gfparse (GNU) instead of f95parse

(Cleanscape) for parsing Fortran source code-optKeepFiles Does not remove intermediate .pdb and .inst.* files-optPreProcess Preprocess Fortran sources before instrumentation-optTauSelectFile="" Specify selective instrumentation file for tau_instrumentor-optLinking="" Options passed to the linker. Typically

$(TAU_MPI_FLIBS) $(TAU_LIBS) $(TAU_CXXLIBS)-optCompile="" Options passed to the compiler. Typically

$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)-optPdtF95Opts="" Add options for Fortran parser in PDT (f95parse/gfparse)-optPdtF95Reset="" Reset options for Fortran parser in PDT (f95parse/gfparse)-optPdtCOpts="" Options for C parser in PDT (cparse). Typically

$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)-optPdtCxxOpts="" Options for C++ parser in PDT (cxxparse). Typically

$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)...

Page 67: Open Trace Format (OTF) Tutorial

67

Overriding Default Options:TAU_COMPILER

include /usr/tau/x86_64/lib/Makefile.tau-icpc-mpi-pdt-trace

# Fortran .f files in free format need the -R free option for parsing

# Are there any preprocessor directives in the Fortran source?

MYOPTIONS= -optVerbose –optPreProcess -optPdtF95Opts=’’-R free’’ 

F90 = $(TAU_COMPILER) $(MYOPTIONS) ifort

OBJS = f1.o f2.o f3.o …

LIBS = -Lappdir –lapplib1 –lapplib2 …

app: $(OBJS)

$(F90) $(OBJS) –o app $(LIBS)

.f.o:

$(F90) –c $<

Page 68: Open Trace Format (OTF) Tutorial

68

Overriding Default Options:TAU_COMPILER

% cat Makefile

F90 = tau_f90.sh

OBJS = f1.o f2.o f3.o …

LIBS = -Lappdir –lapplib1 –lapplib2 …

app: $(OBJS)

$(F90) $(OBJS) –o app $(LIBS)

.f90.o:

$(F90) –c $<

% setenv TAU_OPTIONS ‘-optVerbose -optTauSelectFile=select.tau -optKeepFiles’

% setenv TAU_MAKEFILE <taudir>/x86_64/lib/Makefile.tau-icpc-mpi-pdt

Page 69: Open Trace Format (OTF) Tutorial

69

Optimization of Program Instrumentation

• Need to eliminate instrumentation in frequently executing lightweight routines

• Throttling of events at runtime:% setenv TAU_THROTTLE 1Turns off instrumentation in routines that execute over 10000 times

(TAU_THROTTLE_NUMCALLS) and take less than 10 microseconds of inclusive time per call (TAU_THROTTLE_PERCALL)

• Selective instrumentation file to filter events% tau_instrumentor [options] –f <file> OR% setenv TAU_OPTIONS ’-optTauSelectFile=tau.txt’

• Compensation of local instrumentation overhead % configure -COMPENSATE

Page 70: Open Trace Format (OTF) Tutorial

70

Selective Instrumentation File

• Specify a list of routines to exclude or include (case sensitive)

• # is a wildcard in a routine name. It cannot appear in the first column.BEGIN_EXCLUDE_LISTFooBarD#EMM END_EXCLUDE_LIST

• Specify a list of routines to include for instrumentationBEGIN_INCLUDE_LISTint main(int, char **)F1F3END_LIST_LIST

• Specify either an include list or an exclude list!

Page 71: Open Trace Format (OTF) Tutorial

71

Selective Instrumentation File

• Optionally specify a list of files to exclude or include (case sensitive)

• * and ? may be used as wildcard characters in a file nameBEGIN_FILE_EXCLUDE_LISTf*.f90Foo?.cpp END_EXCLUDE_LIST

• Specify a list of routines to include for instrumentationBEGIN_FILE_INCLUDE_LISTmain.cppfoo.f90END_INCLUDE_LIST_LIST

Page 72: Open Trace Format (OTF) Tutorial

72

Selective Instrumentation File

• User instrumentation commands are placed in INSTRUMENT section

• ? and * used as wildcard characters for file name, # for routine name

• \ as escape character for quotes

• Routine entry/exit, arbitrary code insertion

• Outer-loop level instrumentation

BEGIN_INSTRUMENT_SECTIONloops file=“foo.f90” routine=“matrix#”file=“foo.f90” line = 123 code = " print *, \" Inside foo\""exit routine = “int foo()” code = "cout <<\"exiting foo\"<<endl;"END_INSTRUMENT_SECTION

Page 73: Open Trace Format (OTF) Tutorial

73

Instrumentation Specification% tau_instrumentor

Usage : tau_instrumentor <pdbfile> <sourcefile> [-o <outputfile>] [-noinline] [-g groupname] [-i headerfile] [-c|-c++|-fortran] [-f <instr_req_file> ]

For selective instrumentation, use –f option

% tau_instrumentor foo.pdb foo.cpp –o foo.inst.cpp –f selective.dat

% cat selective.dat

# Selective instrumentation: Specify an exclude/include list of routines/files.

BEGIN_EXCLUDE_LIST

void quicksort(int *, int, int)

void sort_5elements(int *)

void interchange(int *, int *)

END_EXCLUDE_LIST

BEGIN_FILE_INCLUDE_LIST

Main.cpp

Foo?.c

*.C

END_FILE_INCLUDE_LIST

# Instruments routines in Main.cpp, Foo?.c and *.C files only

# Use BEGIN_[FILE]_INCLUDE_LIST with END_[FILE]_INCLUDE_LIST

Page 74: Open Trace Format (OTF) Tutorial

74

Automatic Outer Loop Level Instrumentation

BEGIN_INSTRUMENT_SECTION

loops file="loop_test.cpp" routine="multiply"

# it also understands # as the wildcard in routine name

# and * and ? wildcards in file name.

# You can also specify the full

# name of the routine as is found in profile files.

#loops file="loop_test.cpp" routine="double multiply#"

END_INSTRUMENT_SECTION

% pprof

NODE 0;CONTEXT 0;THREAD 0:

---------------------------------------------------------------------------------------

%Time Exclusive Inclusive #Call #Subrs Inclusive Name

msec total msec usec/call

---------------------------------------------------------------------------------------

100.0 0.12 25,162 1 1 25162827 int main(int, char **)

100.0 0.175 25,162 1 4 25162707 double multiply()

90.5 22,778 22,778 1 0 22778959 Loop: double multiply()[ file = <loop_test.cpp> line,col = <23,3> to <30,3> ]

9.3 2,345 2,345 1 0 2345823 Loop: double multiply()[ file = <loop_test.cpp> line,col = <38,3> to <46,7> ]

Page 75: Open Trace Format (OTF) Tutorial

75

TAU_REDUCE

• Reads profile files and rules

• Creates selective instrumentation file – Specifies which routines should be excluded from instrumentation

tau_reduce

rules

profile

Selectiveinstrumentation file

Page 76: Open Trace Format (OTF) Tutorial

76

Optimizing Instrumentation Overhead: Rules

• #Exclude all events that are members of TAU_USER #and use less than 1000 microsecondsTAU_USER:usec < 1000

• #Exclude all events that have less than 100 #microseconds and are called only onceusec < 1000 & numcalls = 1

• #Exclude all events that have less than 1000 usecs per #call OR have a (total inclusive) percent less than 5usecs/call < 1000percent < 5

• Scientific notation can be used– usec>1000 & numcalls>400000 & usecs/call<30 & percent>25

• Usage:% pprof –d > pprof.dat % tau_reduce –f pprof.dat –r rules.txt –o select.tau

Page 77: Open Trace Format (OTF) Tutorial

77

TAU Tracing Enhancements

• Configure TAU with -TRACE –otf=<dir> option % configure –TRACE –otf=<dir> …Generates tau_merge, tau2vtf, tau2otf tools in <tau>/<arch>/bin directory% tau_f90.sh app.f90 –o app

• Instrument and execute application % mpirun -np 4 app

• Merge and convert trace files to OTF format

% tau2otf tau.trc tau.edf app.otf [-z][–n <nstreams>]

% vampir app.otf

OR use VNG to analyze OTF/VTF trace files

Page 78: Open Trace Format (OTF) Tutorial

78

Environment Variables

• Configure TAU with -TRACE –otf=<dir> option% configure –TRACE –otf=<dir>

-MULTIPLECOUNTERS –papi=<dir> -mpi –pdt=dir …

• Set environment variables% setenv TRACEDIR /p/gm1/<login>/traces% setenv COUNTER1 GET_TIME_OF_DAY (reqd)% setenv COUNTER2 PAPI_FP_INS% setenv COUNTER3 PAPI_TOT_CYC …

• Execute application% mpirun -np 32 ./a.out [args]

% tau_treemerge.pl; tau2otf/tau2vtf ...

Page 79: Open Trace Format (OTF) Tutorial

79

Outline

• An overview of OTF, TAU and Vampir/VNG

• OTF– Tools– API– Building trace conversion tools

• TAU– Instrumentation– Measurement – Analysis

• Scalable Tracing– Vampir– VNG– OTF

Page 80: Open Trace Format (OTF) Tutorial

80

Using Vampir Next Generation (VNG)

Page 81: Open Trace Format (OTF) Tutorial

81

VNG Timeline Display

Page 82: Open Trace Format (OTF) Tutorial

82

VNG Calltree Display

Page 83: Open Trace Format (OTF) Tutorial

83

VNG Timeline Zoomed In

Page 84: Open Trace Format (OTF) Tutorial

84

VNG Grouping of Interprocess Communications

Page 85: Open Trace Format (OTF) Tutorial

85

VNG Process Timeline with PAPI Counters

Page 86: Open Trace Format (OTF) Tutorial

86

OTF/VNG Support for Counters

Page 87: Open Trace Format (OTF) Tutorial

87

VNG Communication Matrix Display

Page 88: Open Trace Format (OTF) Tutorial

88

VNG Message Profile

Page 89: Open Trace Format (OTF) Tutorial

89

VNG Process Activity Chart

Page 90: Open Trace Format (OTF) Tutorial

90

VNG Preferences

Page 91: Open Trace Format (OTF) Tutorial

91

Support Acknowledgements

• Lawrence Livermore National Laboratory (LLNL)

• Department of Energy (DOE)

– Office of Science contracts

– LLNL ParaTools/GWT contract

• University of Oregon

• T.U. Dresden, GWT

• Research Centre Juelich