supplementary slides s.1 empirical study of parallel programs measuring execution time visualizing...

21
Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.1

Empirical Study of Parallel Programs

• Measuring execution time

• Visualizing execution trace

• Debugging

• Optimization strategies

Page 2: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.2

Empirical Study of Parallel Programs (cont’d)• Objective

– An initiation into empirical analysis of parallel programs– By example – number summation

• Basis for coursework

• Outcome: Ability to– Follow same steps to measure simple parallel programs– Explore the detail functionalities of the tools– Get better insight into and explain behavior of parallel

programs– Optimize parallel programs– Use similar tools for program measurements

Page 3: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.3

Homework Contract• Requirements

– A number generator program– Assemble and compile Hw program– Instrument Hw program with MPI timing functions– A file management script

• Deliverables– Speedup (and linear speedup) graph plots (on same

page) showing # processors against problem size– A file of raw execution times of the form:

Data size # processors Execution time

– Jumpshot visualization graphs– A report explaining your work especially the

instrumentation, the speedup graphs and the Jumpshot graphs

Page 4: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.4

Execution Time: Number Generator Program

main(int argc, char **argv) { int i; FILE *fp;

if (argc != 4) { printf("randFile filename #ofValues powerOfTwo\n"); return -1; } srand(clock()); fp = fopen(argv[1],"w");

if (fp == NULL) return -1; fprintf(fp,"%d\n",atoi(argv[2])); for (i=0; i<atoi(argv[2]); i++) fprintf(fp,"%d\n",rand()%(int)pow(2,atoi(argv[3]))); fclose(fp);}

Page 5: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.5

Number Generator: Compiling & Running• Compiling

• Running

• Should generate >4 groups of #s of different sizes: 1000,5000,10000,15000,20000 etc

Page 6: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.6

Number Generator: A Helper Script

for var in 1000 5000 10000 15000 20000 do ./genRandom data$var.txt $var 16; done;

Page 7: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.7

Sample MPI program#include “mpi.h”

#include <stdio.h>

#include <math.h>

#define MAXSIZE 1000

void main(int argc, char *argv[]){

int myid, numprocs;

int data[MAXSIZE], i, x, low, high, myresult, result;

char fn[255];

char *fp;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if (myid == 0) { /* Open input file and initialize data */

strcpy(fn,getenv(“HOME”));

strcat(fn,”/MPI/rand_data.txt”);

if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open the input file: %s\n\n”, fn);

exit(1);

}

for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);

}

Summation Program

Page 8: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.8

Sample MPI program

MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */

x = n/nproc; /* Add my portion Of data */

low = myid * x;

high = low + x;

for(i = low; i < high; i++)

myresult += data[i];

printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */

MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

if (myid == 0) printf(“The sum is %d.\n”, result);

MPI_Finalize();

}

Summation Program (cont’d)

Page 9: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.9

Summation Program: Instrumentation

• Place your instrumentation code carefully

• You need to justify placement of such code

• MPI_Wtime()– Returns an elapsed (wall clock) time on the calling processor

• MPI_Wtick()– returns, as a double precision value, the number of seconds between

successive clock ticks. – For example, if the clock is implemented by the hardware as a

counter that is incremented every millisecond, the value returned by MPI_WTICK should be 10-3

Page 10: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.10

Summation Program: Compiling & Running• Recompile for different data size, or

• Take the data size & input file dynamically• Sample script:

for var1 in forData1000 forData5000 forData10000 forData15000 forData20000

do

for var2 in 1 2 4 5 8 10 12

do

mpirun -np $var2 $var1;

done;

done;

Page 11: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.11

Jumpshot: Visualizing execution trace

• Jumpshot is a graphical tool for investigating the behavior of parallel programs.

– Implemented in Java (Jumpshot can run as Applet)

• It is a ``post-mortem'' analyzer– Inputs a logfile of time-stamped events

• The file is written by the companion package CLOG• Jumpshot can present multiple views of logfile data.

– Per process timelines - the primary view • showing with colored bars the state of each process at each time.

– State duration histograms view – ``mountain range'' view

• showing the aggregate number of processes in each state at each time.

Page 12: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.12

Visualizing Program Execution

• Other logfile-based tools with similar features :– Commercial tools include

• TimeScan and • Vampir

– Academic tools include• ParaGraph • TraceView• XPVM • XMPI • Pablo

Page 13: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.13

Linking with Logging Libraries• Generating log files:

– Compile your MPI code and link using the -mpilog flag:bash-2.04$ mpicc -c numbersSummation.c

bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpilog

– Check file names associated with the compiled program

bash-2.04$ ls numbersSummation*

numbersSummation numbersSummation.o numbersSummation.xls

numbersSummation.c numbersSummation.txt

Page 14: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.14

Linking with Logging Libraries (cont’d)• Generating log files:

– Run the MPI program:bash-2.04$ mpirun -np 8 numbersSummationI got 82638836 from 0The sum is 657273685.Writing logfile....Finished writing logfile.I got 81256047 from 3I got 80498627 from 6I got 82306891 from 2I got 83437153 from 7I got 82228251 from 4I got 82302109 from 1I got 82605771 from 5

– Check to verify that the a .clog file is createdbash-2.04$ !lls numbersSummation*numbersSummation numbersSummation.clog numbersSummation.txtnumbersSummation.c numbersSummation.o numbersSummation.xls

Page 15: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.15

Linking with Logging Libraries (cont’d)

• Use Jumpshot to visualize the .clog file

– Run vncserver to get Linux remote desktop

– Launch Jumpshot on the .clog file• May require conversion to .slog-2

Page 16: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.16

Jumpshot: Sample Display

Page 17: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.17

Linking with Tracing Libraries

• Compile your MPI code and link using the -mpitrace flag:bash-2.04$ mpicc -c numbersSummation.c

bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpitrace

– Running:bash-2.04$ mpirun -np 4 numbersSummation

Starting MPI_Init...

Starting MPI_Init...

Starting MPI_Init...

Starting MPI_Init...

[1] Ending MPI_Init

[1] Starting MPI_Comm_size...

[1] Ending MPI_Comm_size

[1] Starting MPI_Comm_rank...

[1] Ending MPI_Comm_rank

[1] Starting MPI_Bcast...

[2] Ending MPI_Init

[3] Ending MPI_Init

……

Page 18: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.18

Linking with Animation Libraries

• Compile your MPI code and link using the -mpianim flag:

bash-2.04$ mpicc -c numbersSummation.c

bash-2.04$ mpicc -o numbersSummation -mpianim numbersSummation.o -L/export/tools/mpich/lib -lmpe -L/usr/X11R6/lib -lX11 -lm

– Running:bash-2.04$ mpirun -np 4 numbersSummation

Page 19: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.19

Starting mpirun with a Debugger

bash-2.04$ mpirun -dbg=gdb -np 4 summationGNU gdb 5.0rh-5 Red Hat Linux 7.1

Copyright 2001 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux"...

Breakpoint 1 at 0x804cbee

Breakpoint 1, 0x0804cbee in MPI_Init ()

(gdb)

Page 20: Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies

Supplementary SlidesS.20

• Structural changes may need to be made to a parallel program after measuring its performance

– Hot spots exposed etc• A number of measures can be taken to optimize a parallel

program:1. Change the number of processes to alter process granularity

2. Increase messages sizes to lessen the effect of startup times

3. Recompute values locally rather than send computed values in additional messages to send these values

4. Latency hiding – overlapping communication with computation

5. Perform critical path analysis – determine the longest path that dominates overall execution time

6. Address effect of memory hierarchy – reducing cache misses by, for example, reordering the memory requests in the program

Optimization Strategies