mpe/jumpshot evaluation report adam leko hans sherburne, upc group hcs research laboratory...

20
MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

Upload: avis-young

Post on 27-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

MPE/Jumpshot Evaluation Report

Adam LekoHans Sherburne,

UPC Group

HCS Research LaboratoryUniversity of Florida

Color encoding key:

Blue: Information

Red: Negative note

Green: Positive note

Page 2: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

2

Basic Information Name: MPE/Jumpshot Developer: Argonne National Labratory Current versions:

MPE 1.26 Jumpshot-4

Website: http://www-unix.mcs.anl.gov/perfvis/

Contacts: Anthony Chan ([email protected]) David Ashton ([email protected]) Rusty Lusk ([email protected]) William Gropp ([email protected])

Page 3: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

3

What Is MPE/Jumpshot? The “quintessential” MPI logging and post-mortem

visualization toolset MPE – Multi-Processing Environment

A software package for MPI programmers Has three main parts:

A tracing library that outputs all MPI calls to stdout A shared-display parallel X graphics and animation library A logging library for logging events Note: MPE/Jumpshot “logging” -> what we call tracing

Jumpshot A visualization tool for logfiles created by the MPE package Written in Java (crossplatform) Provides a “time line” (GANTT) view of MPI and program

events Also has basic search and summary (histogram) functionality

Page 4: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

4

Logfiles: What’s In A Format? Much thought has been put into logfile

formats “Traditional” tracing results in large trace files Trace file format can play a large part in

visualization tool’s response time ALOG: original format (Argonne LOGging

format) Text-based format Visualization tool: Upshot

An X-windows application using the Athena widget toolset

Later rewritten using Tcl/Tk for easy coding Turned out to be too slow Parts rewritten in C (“Nupshot”) but Tcl->C

interface kept changing BLOG: intermediary format CLOG

Binary file format created to improve upon ALOG and BLOG

Visualization tools Jumpshot-1

Complete rewrite of Upshot/Nupshot Coded in Java/AWT for cross-platformness Bad performance, not widely used

Jumpshot-2 Improved version using Java/SWING Slightly better performance

By default, MPE still outputs logfiles in CLOG Low overhead Can be easily converted to other formats

as needed

SLOG: “scalable” format State-based logging format Visualization tool: Jumpshot-3

Rewrite of Jumpshot-2 to use SLOG Can scale to ~GB logfiles

SLOG-2: Current logfile format Next-generation SLOG file format “Graphical” logfile format to speed

logfile parsing Visualization tool: Jumpshot-4

Page 5: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

5

MPE Overview Tracing capability

Automatic instrumentation: mpicc –mpitrace Writes to stdout at every MPI call, eg

[1] Starting MPI_Send with count = 28, dest = 0, tag = 0... [1] Ending MPI_Send

Equivalent “manual” method: printf Very simple & intuitive

Parallel graphics ability Automatic instrumentation: mpicc -mpianim -L/usr/X11R6/lib -

lX11 –lm Displays graphics on one machine Circle for each process, arrow indicate sends/receives Slows down execution considerably

Graphics are also available via library calls Calls seem relatively easy to use: MPE_Draw_string, MPE_Draw_circle,

MPE_Update, etc Probably not all that useful

Page 6: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

6

MPE Overview (2) Logging ability

Automatic instrumentation: mpicc -mpilog Logs start and stop of events Can overlap starting and stopping of events Can add “custom” events

Easy to do using library calls MPE_Log_get_event_number: create a new event MPE_Describe_state: gives name and color to event MPE_Log_event: records event in logfile, uses MPI_Wtime to get global

time Custom events show up in Jumpshot-4 just like events from

automatic instrumentation Conventions

Automatic instrumentation uses all caps (SEND, RECV) Manual instrumentation uses mixed case

Page 7: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

7

MPE Overhead All programs executed correctly when instrumented Expect about 5% overhead of “real-world” applications Barrier recording mechanism has a lot of overhead

Most applications don’t use a bunch of barriers, though

MPE logging overhead

0%

4%

5%

0%

48%

0%

1%

1%

56%

1%

2%

0%

0% 10% 20% 30% 40% 50% 60%

CAMEL

NAS LU (8p, W)

NAS LU (32p, B)

PP: Big message

PP: Diffuse procedure

PP: Hot procedure

PP: Intensive server

PP: Ping pong

PP: Random barrier

PP: Small messages

PP: System time

PP: Wrong way

Be

nc

hm

ark

Overhead (instrumented/uninstrumented)

Page 8: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

8

MPE Overhead: Barriers Programs that have large measurement overhead shown

below Tons of barriers! (yellow)

PPerfMark: diffuse procedure PPerfMark: random barrier

Page 9: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

9

Jumpshot Overview Jumpshot-4 supports two types of

visualizations for metrics Timeline (right, top) Histogram (right, bottom)

Visualization is dependant on SLOG-2 format and Data model Real drawables

State – Single timeline ID, start/end timestamp

Arrow – Pair of timeline IDs, start/end timestamp

Event – Single timeline ID, single timestamp

Preview drawables Amalgamation of real drawables One corresponding type for each of the

real drawables Serve to optimize performance of

visualization

Timeline view

Histogram view

Page 10: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

10

Jumpshot Overview (2) Emphasis on providing useful profile analysis

from High-level (entire program execution) view Low-level (individual events) view

Nice features Intuitive interface Automatically converts from CLOG to SLOG-2 Very good support for zooming and scrolling User manual very thorough

Things that could use improvement Java application -> uses a lot of memory (~70-

100MB during typical runs) Memory uses seems to scale nicely with logfile

size though No direct support for non-event-based data (running

averages, time-varying histograms for cache miss numbers, etc)

Documentation a little unclear/excessively technical in some places

Timeline view

Histogram view

Page 11: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

11

Bottleneck Identification Test Suite Testing metric: what did trace visualization tell us (automatic

instrumentation)? CAMEL: PASSED

Identified large number of small messages at beginning of program execution

Also identified sequential parts of algorithm (sort on node 0, etc) No other problems visible from trace

NAS LU (“W” workload): PASSED Showed communication bottlenecks very clearly

Large(!) number of small messages Illustrated time taken for repartitioning data Shows sensitivity to latency for processors waiting on data from other

processors

Page 12: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

12

Bottleneck Identification Test Suite (2) Big message: PASSED

Traces illustrated large amount of time spent in send and receive

Diffuse procedure: PASSED Traces illustrated a lot of synchronization with

one process doing more work Since no source code correlation, hard to tell

why problem existed Hot procedure: FAILED

CLOG trace file conversion failed (no communication events)

Even if trace loaded, no communication problems

Intensive server: PASSED Traces showed that other nodes were waiting

on node 0 Ping pong: PASSED

Traces illustrated that the application was very latency-sensitive

Much time being spent on waiting for messages to arrive

Random barrier: PASSED Traces showed that one noe was doing more

work than the others Small messages: PASSED

Traces illustrated a large number of messages being sent to node 0

System time: FAILED CLOG trace file conversion failed (no

communication events) Even if trace loaded, no communication

problems Wrong way: PASSED

First receive took a long time for message to arrive in trace

Page 13: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

13

NAS LU (Class W) Visualization

Much time taken for data redistribution

Large number of small

messages

Page 14: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

14

General Comments Good things

Jumpshot-4 represents a well-written, scalable event-based tracefile viewer

Formats used by Jumpshot are well-defined Low measurement overhead in MPICH Mature GUI, few bugs, has been around for a long time in one form or

another To leverage, just need to write logfile in a specific format

Things that could use improvement Adding support for metrics other than events would require hacking

SLOG-2 format E.g., how to support showing L-2 miss rates as time increases? Seems like it would be best used as part of our toolkit

Automatic instrumentation really necessary to make tool useful Jumpshot-4 can fit in our toolkit as an event-based tracefile viewer if we

can easily write to a format it understands

Page 15: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

15

Adding UPC/SHMEM Support At a minimum, need mechanism to output CLOG trace files

CLOG library currently uses many MPI calls E.g., MPI_Wtime for timing information Therefore, cannot just insert MPE logging calls and use the MPE library unmodified

However, CLOG format is defined Could (relatively) easily create a C implementation that used UPC calls instead of MPI calls Would need to come up with our own buffering scheme though

Can’t write files as data comes in, too slow Should be able to steal a lot of code from MPE source Not necessarily a problem, since we will most likely have to come up with a method if we go the tracing route anyways

Could also use slog2sdk SDK kit for writing to SLOG-2 files directly, but API in Java only SLOG-2 may have larger creation overhead than simple event-based formats such as

CLOG Several examples (and example C code) given for converting logfiles of arbitrary

format to SLOG-2 format using slog2sdk Can use our own log file format if needed! Recommend going with CLOG though, so we can steal existing code

Page 16: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

16

Evaluation (1) Available metrics: 1/5

Only communication-based metrics (timeline + histograms) available Restricted to recording event-based metrics

Cost: free 5/5 Documentation quality: 3.5/5

Jumpshot-4 has a very good but lengthy user’s manual slog2sdk (SDK for reading/writing SLOG-2 files) is not very clear, although SLOG-2 is also described in a

lengthy paper Extensibility: 3.5/5

Jumpshot-4 written in Java (easy to find Java coders at UF) Can easily add new events using MPE library calls Adding time-varying metrics (histograms, etc) would require writing code from scratch

Filtering and aggregation: 3/5 Can restrict event types being displayed from trace Preview drawables and histograms provide aggregation abilities Does not filter or aggregate data directly when recording data

Hardware support: 4/5 64-bit Linux (Opteron, Itanium), Tru64 (AlphaServer), IRIX, IBM SP (AIX), Cray MPI Can be used with any MPICH or LAM installation many more

Heterogeneity support: 0/5 (not supported)

Page 17: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

17

Evaluation (2) Installation: 5/5

About as easy as you could expect Zero effort if using MPICH already, compiling from source also easy

Interoperability: 0.5/5 No way provided to export SLOG-2 files to other viewers Example code provided in slog2sdk on how to convert existing formats into SLOG-2 format

Learning curve: 4.5/5 Easy to learn, well-written documentation MPE really easy to use (mpicc -mpilog)

Manual overhead: 1/5 All MPI calls automatically instrumented for you when linking against MPE Adding other events requires manual work (not much though) No way to turn on/off tracing in places without recompilation

Measurement accuracy: 5/5 CAMEL overhead < 1% Correctness of programs not affected Measurements seem accurate to millisecond (relies on MPI_Wtime resolution though) Only large numbers of messages (106 or more back-to-back) or frequent barriers seem to introduce any

appreciable overhead

Page 18: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

18

Evaluation (3) Multiple executions: 0/5 (not supported) Multiple analyses & views: 2/5

Only shows timeline and histograms (but does both very well) Excellent zooming and scrolling features (scalable to GB logfiles)

Performance bottleneck identification: 4.5/5 No automatic methods supported Traces do very good job of showing communication and synchronization

bottlenecks Can also use custom events to indirectly determine some types of bottlenecks

(e.g., load imbalance) Profiling/tracing support: 3/5

Only supports tracing Trace format compact & scalable so viewer can comfortably show GB logfiles Automatic tracing is either entirely on or entirely off Turning on/off manual tracing requires code modification and recompilation

Page 19: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

19

Evaluation (4) Response time: 2/5

No results until after run For an 850MB CLOG tracefile:

Converting to SLOG-2 took 5 minutes Opening up 350MB SLOG-2 file took about 10 seconds

However, large trace files will be slower than a method that incorporates more filtering and aggregation Limitation of tracing method, not tool implementation

Software support: 3/5 Supports C & Fortran Tied closely to MPI applications Supports linking with any library supported by GCC/platform C compiler, but linked libraries will not

be profiled unless they contain MPI calls

Source code correlation: 1/5 Not directly supported Can correlate indirectly by using custom events at function entry/exit points

Searching: 1.5/5 Only a simple search function available

Page 20: MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

20

Evaluation (5) System stability: 4.5/5

MPE very stable (no problems observed) Jumpshot-4 has very few bugs (small ones exist but do not get in the way) Extremely good for a freely-downloadable research project

Technical support: 4/5 Jumpshot-4 does give very good error messages Developers responded within 24 hours Developers willing to help point us in the right direction for writing SLOG-2

files using their APIs