upc-check tutorial * high performance computing group glenn luecke(director), james coyle, james...

32
UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University Aug 30, 2011 * This work was supported by the United States Department of Defense & used resources of the Extreme Scale Systems Center at Oak Ridge National of Oak Ridge National Laboratory. 1

Upload: yazmin-broad

Post on 31-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

1

UPC-Check Tutorial *High Performance Computing Group

Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy

Iowa State UniversityAug 30, 2011

* This work was supported by the United States Department of Defense & used resources of the Extreme Scale Systems Center at Oak Ridge National of Oak Ridge National Laboratory.

Page 2: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

2

UPC-CHECK Tutorial Outline

• UPC-CHECK Design• Current Functionality of UPC-CHECK • UPC-CHECK syntax• How to use UPC-CHECK to find and

correct program errors. (6 examples)• Efficiency of UPC-CHECK• Scalability of UPC-CHECK• Memory overhead of UPC-CHECK

Page 3: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

3

UPC-CHECK Design

Original UPC Program

UPC to UPCTranslator

UPC program with error checking

UPC-CHECK Support Routines

UPC Compiler

Executable with error checking

Page 4: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

4

Current Functionality of UPC-CHECK

• Argument checking for UPC functions• Deadlock detection

Page 5: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

5

UPC-CHECK Syntax

• Use upc-check the same as your UPC compiler, e.g. instead of

upcc -O –T 3 a.upc r.o

issue:

upc-check –O –T 3 a.upc r.o

• In a Makefile, change UPC=upcc to UPC=upc-check

• Note: the -T compiler option must be used with the upc-check command

since ROSE currently requires that the number of threads be known at

compile time (UPC-CHECK uses the ROSE Toolkit from Lawrence

Livermore National Laboratory to instrument UPC source code).

Page 6: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

Run-Time Errors Detected by UPC-CHECK

• UPC-CHECK detects Argument Errors in UPC Functions and Deadlocks in UPC programs.

• UPC-CHECK will not test the single-valued requirement of upc_forall statements.

• Since UPC-CHECK works on UPC source programs, it cannot detect deadlocks within library functions.

• Currently, UPC-CHECK requires that programs do not define the ‘main' function in a header file.

Page 7: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

Quantifying the quality of a tool which detects UPC run-time errors.

• Iowa State University has a Test Suite that scores the ability of UPC compilers/tools to detect run-time errors: see http://rted.public.iastate.edu/UPC/

• This Test Suite uses the following scoring system:– A score of 5 is given for a detailed error message that will assist a

programmer to quickly correct the error.– A score of 4 is given for error messages with more information than a

score of 3 and less than 5. – A score of 3 is given for error messages with the correct error name,

line number and the name of the file where the error occurred.– A score of 2 is given for error messages with the correct error name

and line number where the error occurred but not the file name where the error occurred.

– A score of 1 is given for error messages with the correct error name.– A score of 0 is given when the error was not detected.

Page 8: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

How UPC-CHECK compares• Results from ISU’s test suite:http://rted.public.iastate.edu/UPC/RESULTS/result_table.html

• UPC-CHECK gets the highest score for Deadlocks and the highest score for all but 3 tests in the Argument Errors section.

Compiler Argument Errors Deadlocks

UPC-CHECK 4.89 5.00

Berkley UPC 0.04 0.58

Cray 0.38 0.00

HP 0.00 0.36

GNU 0.00 0.27

Page 9: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

Additional Checks

• While collecting the information necessary to instrument the Argument Errors and Deadlock checks, UPC-CHECK sometimes detects a different error. Whichever error would occur first is reported with a meaningful error message. (E.g. a collective routine within a upc_forall.)

• Due to this, more categories in ISU’s RTED_UPC test suite showed improvement when using UPC-CHECK, see:

http://rted.public.iastate.edu/UPC/RESULTS/result_table.html

• In addition, some errors are detected and reported at translation/compile time.

Page 10: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

10

Examples illustrating how to use UPC-CHECK to find and correct

program errors *

* All examples use the Berkeley UPC compiler.

Page 11: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

11

Example 1: http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex1.upc

This program contains the function:

upc_all_broadcast(arrA, arrB, sizeof(int)*sh_val, UPC_IN_NOSYNC | UPC_OUT_NOSYNC);

and sh_val is declared as

static shared int sh_val;

However the program does not initialize sh_val

The declaration means that sh_val has an initial value of zero.

Therefore the third argument of the above broadcast function is zero. This is not allowed by the UPC specification.

Page 12: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

12

When issuing: upcc -T 4 -o ex1 ex1.upc; upcrun -n 4 ./ex1;

the program executes without any error messages being issued.

When issuing: upc-check -T 4 –o ex1 ex1.upc; upcrun -n 4 ./ex1;

the following message is issued:

Thread 0 encountered invalid arguments in function upc_broadcast at line 26 in file /home/jjc/ex1.upc.Error: Parameter (((sizeof(int )) *(sh_val))) passes non-positive value of 0 to nbytes argumentVariable sh_val was declared at line 10 in file /home/jjc/ex1.upc.

Page 13: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

13

Correcting Example 1

Seeing that sizeof(int)*shval was zero, the

programmer can see that sh_val still has the default value

of zero due to its declaration in line 10. (Static shared

variables are initialized to zero according to the UPC Spec.)

Thus, an assignment of a value to sh_val before line 26 is

missing.

Inserting the statement

sh_val=BLOCK_SIZE ;

at line 16 fixes this error.

Page 14: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

14

Example 2:

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex2.upc

This program contains:

numhints = 1;fd = upc_all_fopen("upcio1.txt",

UPC_INDIVIDUAL_FP|UPC_WRONLY| UPC_CREATE, numhints, hints);

And the program does not allocate space for the structure hints.

Page 15: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

15

When issuing: upcc -T 4 -o ex2 ex2.upc; upcrun -n 4 ./ex2;

the following is printed from the printf in the program: File not open.

When issuing: upc-check -T 4 –o ex2 ex2.upc; upcrun -n 4 ./ex2;

the following message is issued:Thread 0 encountered invalid arguments in function upc_all_fopen at line 13 in file /home/jjc/ex2.upc.Error: Parameter numhints passes non-zero value of 1 to 'numhints' argument while target of parameter (hints) passed to 'hints' argument is unallocated.Variable numhints was declared at line 7 in file /home/jjc/ex2.upc.Variable hints was declared at line 9 in file /home/jjc/ex2.upc

Page 16: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

16

Correcting Example 2

The argument hints is not used unless numhints is positive. hints may be used to convey information about a file in hopes of more efficient I/O. Therefore, example 2 can be corrected by either

1) setting numhints to 0, or

2) allocating hints and assigning correct values to it.

Page 17: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

17

Example 3:

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex3.upc

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex3_s.upc

In this program, the upc_barrier function is not called by all threads, and causes a deadlock.

This error is difficult to find since the barrier is contained inside a function which is called from within an if block.

Page 18: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

18

When issuing: upcc -T 4 -o ex3 ex3.upc ex3_s.upc; upcrun -n 4 ./ex3;

a deadlock occurs and the upcrun command never returns.

When issuing: upc-check -T 4 -o ex3 ex3.upc ex3_s.upc; upcrun -n 4 ./ex3;

the following message is issued: Runtime error: Deadlock condition detected: One or more threads have finished executing while other threads are waiting at a collective routine Status of threads=================Thread id:Status:Presently waiting at line number:of file -------------------------------------------------------- 0:waiting at upc_barrier: 7: /home/jjc/ex3_s.upc1:reached end of execution through: 39: /home/jjc/ex3.upc2:waiting at upc_barrier: 7: /home/jjc/ex3_s.upc3:waiting at upc_barrier: 7: /home/jjc/ex3_s.upc

Page 19: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

19

Correcting Example 3

The upc_barrier is called from funcA. Two of the

three possible paths through the two nested if

statements appear and contain a upc_barrier,

but the third possible (else) path is missing.

This error can be corrected by creating the missing

else block and placing either a call to funcA, or a

upc_barrier call.

Page 20: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

20

Example 4:

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex4.upc

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex4_s.upc

In this program, not all threads call the UPC collective function upc_all_fsync.

Page 21: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

21

When issuing: upcc -T 4 -o ex4 ex4.upc ex4_s.upc; upcrun -n 4 ./ex4;

the upcrun command never completes.

When issuing: upc-check -T 4 –o ex4 ex4.upc ex4_s.upc; upcrun -n 4 ./ex4;

the following message is issued:Runtime error: Deadlock condition detected: Different threads waiting at different collective routines Status of threads=================Thread id:Status:Presently waiting at line number:of file --------------------------------------------------------- 0:waiting at upc_all_fsync on file pointer fd: 9: /home/jjc/ex4_s.upc1:waiting at upc_all_fclose on file pointer fd: 52: /home/jjc/ex4.upc2:waiting at upc_all_fsync on file pointer fd: 9: /home/jjc/ex4_s.upc3:waiting at upc_all_fsync on file pointer fd: 9: /home/jjc/ex4_s.upc

Page 22: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

22

Correcting Example 4

This is another case where a UPC collective (in this case upc_all_fsync) is not called by all threads, as required. This is detected when one set of threads executes upc_all_fsync, while another set executes upc_all_fclose.

Inserting an else clause with the statement upc_all_fsync(fd) corrects the problem.

Page 23: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

23

Example 5:

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex5.upc

In this program, all of the threads call the UPC collective function upc_all_reduceI, but they call with different source arrays, which is not allowed by the UPC specification.

Without UPC-CHECK, when issuing:

upcc -T 4 -o ex5 ex5.upc; upcrun -n 4 ./ex5;

the following is printed:

sumA=120

Page 24: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

24

When issuing: upc-check -T 4 –o ex5 ex5.upc; upcrun -n 4 ./ex5;

the following message is issued:Runtime error: Unspecified behavior condition detected, may lead to deadlock : One or more threads have different values for single_valued parameters. Status of threads=================Thread id:Status:Presently waiting at line number:of file --------------------------------------------------------- 0:waiting at upc_all_reduceI: 21: /home/jjc/ex5.upc1:waiting at upc_all_reduceI: 21: /home/jjc/ex5.upc2:waiting at upc_all_reduceI: 21: /home/jjc/ex5.upc3:waiting at upc_all_reduceI: 21: /home/jjc/ex5.upc Mismatch in parameter: src. Thread no. =================================================================== 0:ptrA points to memory location 0x2b7dd810dff0. Variable ptrA was declared at line 7 in file /home/jjc/ex5.upc. 1:ptrA points to memory location 0x2b7dd810dfe0. Variable ptrA was declared at line 7 in file /home/jjc/ex5.upc. 2:ptrA points to memory location 0x2b7dd810dfc0. Variable ptrA was declared at line 7 in file /home/jjc/ex5.upc. 3:ptrA points to memory location 0x2b7dd810dfd0. Variable ptrA was declared at line 7 in file /home/jjc/ex5.upc.

Page 25: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

25

Correcting Example 5

The error message on the previous slide reports that threads have different values of the src parameter of function upc_all_reduceI.

ptrA, declared at line 7 of file ex5.upc, points to different memory locations. Looking at the ptrA declaration, we see that ptrA is a private pointer-to-shared.

Later in the code ptrA is assigned the value returned by the call to upc_global_alloc. This function is not collective. If it's called by multiple threads, all threads which make the call get different allocations.

Changing upc_global_alloc to upc_all_alloc corrects the problem since now ptrA will have the same value on every thread.

Note that with the current version of Berkley UPC compiler, the value of sumA will be the same in either case, but this behavior is not guaranteed for the test above.

Page 26: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

26

Example 6

Example 6 is the Dining Philosopher’s problem, a classic deadlock problem.

http://hpcgroup.public.iastate.edu/UPC-CHECK/Ex/ex6.upc

Without UPC-CHECK, when issuing:

upcc -T 3 -o ex6 ex6.upc; upcrun -n 3 ./ex6;

the output produced varies from run to run. For one run the following output was produced:

philosopher # 0 got the left forkphilosopher # 0 got the right forkphilosopher # 0 got the left forkphilosopher # 1 got the left forkphilosopher # 2 got the left fork

the program then deadlocks and no additional output is issued.

Page 27: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

27

When issuing:

upc-check -T 3 –o ex6 ex6.upc; upcrun -n 3 ./ex6;

the program exits after issuing the following message:

Runtime error: Deadlock condition detected: Found cycle of hold-and-wait dependencies for acquiring locks:   Thread 2 is waiting at upc_lock function at line 18 of file /home/jjc/ex6.upc to acquire lock forks[((MYTHREAD ) + 1) % 3] pointing to location 0x9f40. Lock forks[((MYTHREAD ) + 1) % 3] was already acquired as forks[MYTHREAD ] by thread 0 with 'upc_lock' at line 16 of file /home/jjc/ex6.upc. Thread 0 is waiting at upc_lock function at line 18 of file /home/jjc/ex6.upc to acquire lock forks[((MYTHREAD ) + 1) % 3] pointing to location 0x9f20.Lock forks[((MYTHREAD ) + 1) % 3] was already acquired as forks[MYTHREAD ] by thread 1 with 'upc_lock' at line 16 of file /home/jjc/ex6.upc. Thread 1 is waiting at upc_lock function at line 18 of file /home/jjc/ex6.upc to acquire lock forks[((MYTHREAD ) + 1) % 3] pointing to location 0x9f00. Lock forks[((MYTHREAD ) + 1) % 3] was already acquired as forks[MYTHREAD ] by thread 2 with 'upc_lock' at line 16 of file /home/jjc/ex6.upc.

Page 28: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

28

Correcting Example 6

The error message on the previous slide shows •where the deadlock is occurring (line 18 of the indicated file), •which locks are involved, •who holds which locks, •and what lock each thread is waiting on.

The deadlock can be avoided by numbering the forks, and picking up the even fork first, then another fork, and putting them down in the reverse order.

Page 29: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

29

Efficiency of UPC-CHECK

UPC-CHECK has been carefully designed to minimize the overhead when executing the instrumented UPC program. Using the UPC implementation of the NAS Parallel CG benchmark, we timed both the instrumented and non-instrumented executables using 4 threads for the smallest 3 benchmarks (S, A, and B). In these cases we also use the Berkley UPC compiler. We see essentially zero overhead.

WallTime(sec.)

Size No UPC-CHECK With UPC-CHECK Overhead

S 7.36 7.41 0.7%

A 9.06 9.12 0.7%

B 85.03 83.04 - 2.3%

Page 30: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

30

Scalability of UPC-CHECK checks

Type of check Overhead (for T threads)

Argument checking O(1)

Deadlocks

Collective routines O(1)

UPC_Locks O(L) , L<=T

Where L is the length of the longest hold-and-wait chain.

For a program that does not use upc_locks, the overhead in using UPC-CHECK

does not depend on the number of threads. This is because all checking can be

done via values local to the threads and its neighboring threads. The O(1) deadlock checking for collective routines will be described in a paper that is being prepared.

A program that uses upc_locks may have overhead that depends on the number

of threads because there may be a chain of lock dependencies (a deadlock) which spans all threads.

Page 31: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

Overhead on a Cray XT using the Cray compiler 128 threads

NAS Benchmark Execution Time for Original Program (sec.)

Execution Time for Instrumented Program (sec.) Slowdown

CG-A4.912 4.99 1.02

CG-B54.183 54.239 1.00

CG-C58.309 58.281 1.00

EP-A1.417 1.427 1.01

EP-B7.116 7.128 1.00

EP-C11.19 11.17 1.00

IS-A3.56 3.658 1.03

IS-B8.752 8.776 1.00

IS-C10.089 10.073 1.00

Total159.528 159.742 1.00

Page 32: UPC-Check Tutorial * High Performance Computing Group Glenn Luecke(director), James Coyle, James Hoekstra, Marina Kraeva and Indranil Roy Iowa State University

32

Memory overhead of UPC-CHECK

The memory overhead per thread consists of three components:

1) Extra context variables allocated to support checks: approximately 128KB.

2) Extra information about call stack if call stack tracking is requested: 1/2 KB per call level per thread

3) Executable size: The support routines add less than1MB and each UPC routine adds about 3.5Kbytes.