class13_debugging.pdf

Upload: pavan-behara

Post on 04-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 class13_debugging.pdf

    1/82

    Debugging in Serial & Parallel

    M. D. Jones, Ph.D.

    Center for Computational ResearchUniversity at Buffalo

    State University of New York

    High Performance Computing I, 2012

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 1 / 89

  • 7/29/2019 class13_debugging.pdf

    2/82

    Part I

    Basic (Serial) Debugging

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 2 / 89

  • 7/29/2019 class13_debugging.pdf

    3/82

    Introduction Software for Debugging

    Software for Debugging

    The most common method for debugging (by far) is theinstrumentation method:

    One instruments the code with print statements to check valuesand follow the execution of the program

    Not exactly sophisticated - one can certainly debug code in thisway, but wise use of software debugging tools can be more

    effective

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 4 / 89

  • 7/29/2019 class13_debugging.pdf

    4/82

    Introduction Software for Debugging

    Debugging Tools

    Debugging tools are abundant, but we will focus merely on some of the

    most common attributes to give you a bag of tricks that can be usedwhen dealing with common problems.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 5 / 89

  • 7/29/2019 class13_debugging.pdf

    5/82

    Introduction Software for Debugging

    Basic Capabilities

    Common attributes:

    Divided into command-line or graphical user interfaces

    Usually have to recompile (-g is almost a standard option toenable debugging) your code to utilize most debugger features

    Invocation by name of debugger and executable (e.g. gdb ./a.out[core])

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 6 / 89

    I d i S f f D b i

  • 7/29/2019 class13_debugging.pdf

    6/82

    Introduction Software for Debugging

    Running Within

    Inside a debugger (be it using a command-line interface (CLI) orgraphical front-end), you have some very handy abilities:

    Look at source code listing (very handy when isolating an IEEEexception)

    Line-by-line execution

    Insert stops or breakpoints at certain functional points (i.e., when

    critical values change)

    Ability to monitor variable values

    Look at stack trace (or backtrace) when code crashes

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 7 / 89

    I t d ti S ft f D b i

  • 7/29/2019 class13_debugging.pdf

    7/82

    Introduction Software for Debugging

    Command-line debugging example

    Consider the following code example:

    1 #include < s t d i o . h>2 #include < s t d l i b . h>34 i n t i n d x ;56 void i n i t A r r a y ( i n t nelem_in_array , i n t a r r a y ) ;7 void p r i n t A r r a y ( i n t nelem_in_array , i n t a r r a y ) ;8 i n t squareArr ay ( i n t nelem_in_array , i n t a r r a y ) ;9

    10 i n t main( void ) {11 c on st i n t ne le m = 1 0 ;12 i n t a rra y1 , a rra y2 , del ;1314 / A l l o c a t e memory f o r each a r r a y /15 a r r a y 1 = ( i n t ) malloc ( nelems i z e o f ( i n t ) ) ;16 a r r a y 2 = ( i n t ) malloc ( nelems i z e o f ( i n t ) ) ;17 d e l = ( i n t ) malloc (nelems i z e o f ( i n t ) ) ;18

    19 / I n i t i a l i z e a r r a y 1 /20 i n i t A r r a y ( nelem , a rr ay 1 ) ;2122 f o r ( i nd x = 0 ; i nd x < n elem ; i nd x ++) {23 a r r a y 1 [ i n d x ] = i n d x + 2 ;24 }

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 8 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    8/82

    Introduction Software for Debugging

    25 / P r i n t t h e e le me nt s o f a r r a y1 /26 p r i n t f ( " a r ra y 1 = " ) ;27 p r i n tA r r a y ( nelem , a rr ay 1 ) ;2829 / Copy a rra y1 to a r r a y 2 /30 a r r a y 2 = a r r a y 1 ;3132 / Pass a rra y2 to th e fu n c ti o n sq u a re Arra y ( ) /33 s qu ar eA rr ay ( n elem , a r ra y 2 ) ;3435 / Compute d i f f e r e n c e b et we en e l em e nt s o f a r r a y 2 and a r r a y 1 /36 f o r ( i n d x = 0; i n d x < nelem ; i n d x ++) {

    37 d el [ i nd x ] = a rr ay 2 [ i nd x ] a rra y1 [ i n d x ] ;38 }3940 / P r i n t t h e c o mp ut ed d i f f e r e n c e s /41 p r i n t f ( " The d i f f e r en c e i n t he elements o f a rr ay 2 and a r ra y1 are : " ) ;42 p r i n tA r r a y ( nelem , d el ) ;4344 f re e ( a r r a y 1 ) ;45 f re e ( a r r a y 2 ) ;

    46 f re e ( d e l ) ;47 re tu rn 0 ;48 }

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 9 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    9/82

    Introduction Software for Debugging

    49 v oi d i n i t A r r a y ( c on st i n t n el em _i n_ ar ra y , i n t a r r ay ) {

    50 f o r ( i nd x = 0; i nd x < ne le m_ in _ar ra y ; i nd x ++) {51 a r r a y [ i n d x ] = i n d x + 1 ;52 }53 }5455 i n t s qu ar eA rr ay ( c o ns t i n t n el em _i n_ ar ra y , i n t a r r ay ) {56 i n t i n d x ;57 f o r ( i nd x = 0 ; i nd x < n el em _i n_ ar ra y ; i nd x + +) {58 a r r a y [ i n d x ] = a r r a y [ i n d x ] ;59 }60 re tu rn a rra y ;61 }6263 v oi d p r i n t A r r a y ( c on st i n t n el em _i n_ ar ra y , i n t a r r a y ) {64 p r i n t f ( " \ n ( " ) ;65 f o r ( i nd x = 0 ; i nd x < n el em _i n_ ar ra y ; i nd x + +) {66 p r i n t f ( "%d " , a r ra y [ i n d x ] ) ;67 }68 p r i n t f ( " ) \ n " ) ;

    69 }

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 10 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    10/82

    Introduction Software for Debugging

    Ok, now lets compile and run this code:1 [ b ono : ~ / d_debug ] $ gc c g o a r ra yex a r r aye x . c2 [ b ono : ~ / d_debug ] $ . / a rr ayex3 a r r a y 1 =4 ( 2 3 4 5 6 7 8 9 10 11 )5 The d i f f e r e n c e i n t he e le men ts o f a rr ay 2 and a rr ay 1 a re :6 ( 0 0 0 0 0 0 0 0 0 0 )7 g l i b c d e te c te d d o ub l e f r e e o r c o r r u p t i o n ( f a s t t o p ) : 0 x0 000 000 000 50 101 0 8 Aborted

    Not exactly what we expect, is it? Array2 should contain the squares ofthe values in array1, and therefore the difference should be i2 i for

    i = [2, 11].

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 11 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    11/82

    Introduction Software for Debugging

    Now let us run the code from within gdb. Our goal is to set abreakpoint where the squared arrays elements are computed, then

    step through the code:

    1 [ b ono : ~ / d_debug ] $ gdb a rr a yex2 ( gdb ) l 343 314 32 / Copy a r r ay 1 t o a r r ay 2 /5 33 a r ra y 2 = a r r a y 1 ;6 34

    7 35 / Pass a r ra y 2 t o t he f u n c t i o n s qu ar eA rr ay ( ) /8 36 s qua reArray ( nelem , a r ra y 2 ) ;9 37

    10 38 / Compute d i f f e r e n c e b et we en e l em e nt s o f a r r a y 2 and a r r a y 1 /11 39 f o r ( i n d x = 0 ; i n d x < nelem ; in d x ++) {12 40 d e l [ i n d x ] = a r ra y 2 [ i n d x ] a rra y1 [ i n d x ] ;13 ( gdb ) b 3414 B re ak po in t 1 a t 0x400604 : f i l e ex1 . c , l i n e 3 4.15 ( gdb ) run

    16 S t a r t i n g p ro gr am : / san / u s er / jon es m / u2 / d_debug / a r ra yex17 a r r a y 1 =18 ( 2 3 4 5 6 7 8 9 10 11 )

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 12 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    12/82

    Introduction Software for Debugging

    19 B re ak po in t 1 , main ( ) a t a rr aye x . c : 3 420 34 s qua reArray ( nelem , a r ra y 2 ) ;

    21 ( gdb ) s22 s q ua r eA r ra y ( n e le m _i n _a r r ay = 1 0 , a r r a y =0 x501010 ) a t a r ra ye x . c : 5 923 59 f o r ( i n d x = 0 ; i n d x < n e l e m_ i n _a rr ay ; i nd x + +) {24 ( gdb ) s25 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ;26 ( gdb ) s27 59 f o r ( i n d x = 0 ; i n d x < n e l e m_ i n _a rr ay ; i nd x + +) {28 ( gdb ) p i n d x29 $1 = 0

    30 ( gdb ) p a r ra y [ i nd x ]31 $2 = 432 ( gdb ) d i s p l ay i nd x33 1 : i n d x = 034 ( gdb ) d i s p la y a r ra y [ i nd x ]35 2 : a rr ay [ i nd x ] = 436 ( gdb ) s37 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ;38 2 : a rr ay [ i nd x ] = 3

    39 1 : i n d x = 1

    Ok, that is instructive, but no closer to finding the bug.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 13 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    13/82

    gg g

    So, what have we learned so far about the command-line debugger:

    Useful for peaking inside source code

    (break) Breakpoints

    (s) Stepping through execution

    (p) Print values at selected points (can also use handy printf

    syntax as in C)

    (display) Displaying values for monitoring while stepping through

    code

    (bt) Backtrace, or Stack Trace - havent used this yet, but

    certainly will

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 14 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    14/82

    gg g

    Digging Out the Bug

    What we have learned is enough - look more closely at the line wherethe differences between array1 and array2 are computed:

    1 ( gdb ) l 382 33 / Pass a r ra y 2 t o t he f u n c t i o n s qu ar eA rr ay ( ) /3 34 s qua reArray ( nelem , a r ra y 2 ) ;4 355 36 / Compute d i f f e r e n c e b et we en e l em e nt s o f a r r a y 2 and a r r a y 1 /

    6 37 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) {7 38 d e l [ i n d x ] = a r r a y 2 [ i n d x ] a rra y1 [ i n d x ] ;8 39 }9 40

    10 41 / P r i n t t h e c omputed d i f f e r e n c e s /11 42 p r i n t f ( " The d if f e re n c e i n th e elements of ar r a y 2 and a r r a y 1 are : " ) ;12 ( gdb ) b 3713 B r ea k po i nt 1 a t 0 x400611 : f i l e a rr a yex . c , l i n e 3 7.14 ( gdb ) run

    15 S t a r t i n g p ro gr am : / san / u s er / jon es m / u2 / d_debug / a r ra yex16 a r r a y 1 =17 ( 2 3 4 5 6 7 8 9 10 11 )

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 15 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    15/82

    18 B re ak po in t 1 , main ( ) a t a rr aye x . c : 3 719 37 f o r ( i n d x = 0 ; i n d x < nelem ; in d x ++) {

    20 ( gdb ) d i sp i n dx21 1 : i nd x = 1022 ( gdb ) d i sp a r ra y 1 [ i n dx ]23 2 : a rr ay 1 [ i nd x ] = 4924 ( gdb ) d i sp a r ra y 2 [ i n dx ]25 3 : a rr ay 2 [ i nd x ] = 4926 ( gdb ) s27 38 d e l [ i n d x ] = a r ra y 2 [ i n d x ] a rra y1 [ i n d x ] ;28 3 : a rr ay 2 [ i nd x ] = 429 2 : a rr ay 1 [ i nd x ] = 430 1 : i n d x = 031 ( gdb ) s32 37 f o r ( i n d x = 0 ; i n d x < nelem ; in d x ++) {33 3 : a rr ay 2 [ i nd x ] = 434 2 : a rr ay 1 [ i nd x ] = 435 1 : i n d x = 036 ( gdb ) s37 38 d e l [ i n d x ] = a r ra y 2 [ i n d x ] a rra y1 [ i n d x ] ;38 3 : a rr ay 2 [ i nd x ] = 939 2 : a rr ay 1 [ i nd x ] = 940 1 : i n d x = 1

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 16 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    16/82

    Now that isnt right - array1 was not supposed to change. Let us goback and look more closely at the call to squareArray ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 17 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    17/82

    1 ( gdb ) l2 323 33 / Pass a r ra y 2 t o t he f u n c t i o n s qu ar eA rr ay ( ) /4 34 s qua reArray ( nelem , a r ra y 2 ) ;5 35

    6 36 / Compute d i f f e r e n c e b et we en e l em e nt s o f a r r a y 2 and a r r a y 1 /7 37 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) {8 38 d e l [ i n d x ] = a r r a y 2 [ i n d x ] a rra y1 [ i n d x ] ;9 39 }

    10 4011 41 / P r i n t t h e c omputed d i f f e r e n c e s /12 ( gdb ) b 3413 B r ea k po i nt 2 a t 0 x400605 : f i l e a rr a yex . c , l i n e 3 4.14 ( gdb ) run

    15 The p r og ra m b e i n g debug ged h as b een s t a r t e d a l r e a d y .16 S t ar t i t from th e b e g i nn i ng ? ( y o r n ) y1718 S t a r t i n g p ro gr am : / san / u s er / jon es m / u2 / d_debug / a r ra yex19 a r r a y 1 =20 ( 2 3 4 5 6 7 8 9 10 11 )2122 B re ak po in t 2 , main ( ) a t a rr aye x . c : 3 423 34 s qua reArray ( nelem , a r ra y 2 ) ;24 3 : a rr ay 2 [ i nd x ] = 4925 2 : a rr ay 1 [ i nd x ] = 4926 1 : i nd x = 1027 ( gdb ) d is p a rr ay 228 4 : a rr a y 2 = ( i n t ) 0x50101029 ( gdb ) d is p a rr ay 130 5 : a rr a y 1 = ( i n t ) 0x501010

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 18 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    18/82

    Yikes, array1 and array2 point to the same memory location! See,pointer errors like this dont happen too often in Fortran ... Now , of

    course, the bug is obvious - but arent they all obvious after you findthem?

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 19 / 89

    Introduction Software for Debugging

  • 7/29/2019 class13_debugging.pdf

    19/82

    The Fix Is In

    Just as an afterthought, what we ought to have done in the first placewas copy array1 into array2:

    / Copy a r r a y1 t o a r r a y2 // a r r a y2 = a r r ay 1 ; /f o r ( i n d x = 0 ; i n dx

  • 7/29/2019 class13_debugging.pdf

    20/82

    Array Indexing Errors

    Array indexing errors are one of the most common errors in both

    sequential and parallel codes - and it is not entirely surprising:

    Different languages have different indexing defaults

    Multi-dimensional arrays are pretty easy to referenceout-of-bounds

    Fortran in particular lets you use very complex indexing schemes(essentially arbitrary!)

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 22 / 89

    Array Indexing Errors

    E l I d i E

  • 7/29/2019 class13_debugging.pdf

    21/82

    Example: Indexing Error

    1 #include < s t d i o . h>

    2 # d e fi n e N 103 i n t main( i n t argc , char a rg v [ ] ) {4 i n t a r r [N ] ;5 i n t i , odd_sum , even_sum ;67 f o r ( i =1; i

  • 7/29/2019 class13_debugging.pdf

    22/82

    Now, try compiling with gcc and running the code:

    1 [ b ono : ~ / d_debug ] $ gc c O g o e x 2 e x 2 . c2 [ b ono : ~ / d_debug ] $ . / ex23 odd_sum=5 , even_sum=671173703

    Ok, that hardly seems reasonable (does it?) Now, lets run thisexample from within gdb and set a breakpoint to examine theaccumulation of values to even_sum.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 24 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    23/82

    1 ( gdb ) l 162 11 a r r [ i ] = ( i i )%5;

    3 12 }4 13 }5 14 odd_sum =0 ;6 15 even_sum =0;7 16 f o r ( i =0 ; i

  • 7/29/2019 class13_debugging.pdf

    24/82

    So we see that our original example code missed initializing the firstelement of the array, and the results were rather erratic (in fact they willlikely be compiler and flag dependent).

    Initialization is just one aspect of things going wrong with arrayindexing - let us examine another common problem ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 26 / 89

    Array Indexing Errors

    The (Infamous) Seg Fault

  • 7/29/2019 class13_debugging.pdf

    25/82

    The (Infamous) Seg Fault

    This example I borrowed from Norman Matloff (UC Davis), who has a

    nice article (well worth the time to read): Guide to Faster, Less

    Frustrating Debugging, which you can find easily enough on the web:

    http://heather.cs.ucdavis.edu/~matloff/unix.html

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 27 / 89

    Array Indexing Errors

    http://heather.cs.ucdavis.edu/~matloff/unix.htmlhttp://heather.cs.ucdavis.edu/~matloff/unix.html
  • 7/29/2019 class13_debugging.pdf

    26/82

    Main code: findprimes.c

    1 /2 primenu mbe r f i n d i n g p ro gram w i l l ( a f t e r bugs ar e f i x e d ) r e po r t a l i s t o f3 a l l pr im es whic h a re l e ss t ha n o r e qu al t o t he us ersu p p l i e d u p pe r b o un d4 /5 #include < s t d i o . h>67 # d e fi n e MaxPrimes 508 i n t Prime [MaxPrimes ] , / Prime [ I ] w i l l be 1 i f I i s prim e , 0 o th er wi se /9 UpperBound ; / we w i l l c he ck up t h r o u gh UpperB ound f o r p r im e ne s s /

    10 void CheckPrime( i n t K ) ; / p r o t o t y p e f o r Chec kPr im e f u n c t i o n /11 i n t main ( )12 {13 i n t N;1415 p r i n t f ( " e nt er upper b ound \ n " ) ;16 s c a n f ( "%d " , U pper Bound ) ;1718 Prime [ 2 ] = 1 ;1920 f o r ( N = 3 ; N < = U pperBound ; N += 2 )

    21 CheckPrime (N ) ;22 i f ( P ri me [ N ] ) p r i n t f ( "%d i s a p ri me \ n " , N ) ;23 }

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 28 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    27/82

    Function FindPrime:

    2425 void CheckPrime( i n t K) {26 i n t J ;2728 / t he p la n : see i f J d i vi d es K , f o r a l l v al ue s J which are29 ( a ) t he ms el ve s p ri me ( no n eed t o t r y J i f i t i s no np rim e ) , and30 ( b ) l es s t han o r eq ual t o s q r t ( K) ( i f K has a d i v i s o r l a r ge r31 t ha n t h i s s qu ar e r o ot , i t must a l so have a s m a l le r one ,32 so n o need t o c hec k f o r l a r g e r ones ) /3334 J = 2 ;35 while ( 1 ) {

    36 i f ( P ri me [ J ] == 1 )37 i f ( K % J == 0 ) {38 Prime [ K ] = 0 ;39 re tu rn ;40 }41 J ++;42 }4344 / i f we g e t here , t he n t h er e were no d i v i s o r s o f K , so i t i s

    45 prime /46 Prime [ K ] = 1 ;47 }

    so now if we compile and run this code ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 29 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    28/82

    1 [ b ono : ~ / d_debug ] $ gc c g o f i n d pr i m e s f i n d pr i m e s . c2 [ b ono : ~ / d_debug ] $ . / f i n d pr i m e s3 e n te r upper bound

    4 205 S egm ent ati on f a u l t

    Ok, lets fire up gdb and see where this code crashed:

    1 [ b ono : ~ / d_debug ] $ gdb f i n d pr i m e s2

    3 ( gdb ) run4 S t a r t i n g p ro gr am : / san / u s er / jon es m / u2 / d_debug / f i n d p r i m e s5 e n te r upper bound6 2078 P ro gr am r e c e i v e d s i g n a l SIGSEGV , S e gm e nt a ti on f a u l t .9 0 x000000392815062f i n _ I O _ v f s c a n f _ i n te r n a l ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 6

    10 ( gdb ) b t11 #0 0 x000000392815062f i n _ I O _ v f s c a n f _ i n t er n a l ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 612 #1 0 x000000392815866a i n s c a nf ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 613 #2 0 x0000000000400524 i n main ( ) a t f i n d p r i m e s . c : 1 614 ( gdb )

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 30 / 89

  • 7/29/2019 class13_debugging.pdf

    29/82

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    30/82

    1 [ b ono : ~ / d_debug ] $ gc c g o f i n d pr i m e s f i n d pr i m e s . c2 [ b ono : ~ / d_debug ] $ gdb f i n d pr i m e s3 ( gdb ) run4 S t a r t i n g p ro gr am : / san / u s er / jon es m / u2 / d_debug / f i n d p r i m e s5 e n te r upper bound6 2078 P ro gr am r e c e i v e d s i g n a l SIGSEGV , S e gm e nt a ti on f a u l t .9 0 x00 000 00 000 400 58 6 i n C he ck Pr im e ( K =3 ) a t f i n d p r i m e s . c : 3 7

    10 37 i f ( Prime [ J ] == 1 )11 ( gdb ) b t12 #0 0 x0 00 000 00 004 005 86 i n C he ck Pr im e ( K = 3) a t f i n d p r i m e s . c : 3 713 #1 0 x0000000000400547 i n main ( ) a t f i n d p r i m e s . c : 2 114 ( gdb ) l 3715 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one ,16 33 so no need t o check f o r l a r g e r ones ) /17 3418 3 5 J = 2 ;19 3 6 w h i l e ( 1 ) {20 37 i f ( Prime [ J ] == 1 )21 38 i f (K % J == 0 ) {22 39 Prime [ K ] = 0 ;

    23 40 r e t u r n ;24 41 }25 ( gdb )

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 32 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    31/82

    very often we get seg faults on trying to reference an array

    out-of-bounds, so have a look at the value of J:26 ( gdb ) l 3727 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one ,28 33 so no need t o check f o r l a r g e r ones ) /29 3430 3 5 J = 2 ;31 3 6 w h i l e ( 1 ) {32 37 i f ( Prime [ J ] == 1 )

    33 38 i f (K % J == 0 ) {34 39 Prime [ K ] = 0 ;35 40 r e t u r n ;36 41 }37 ( gdb ) p J38 $1 = 376

    Oops! That is just a tad outside the bounds (50). Kind of forgot to put acap on the value of J ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 33 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    32/82

    Fixing the last bug:

    1 ( gdb ) l 40

    2 35 J = 2 ;3 36 / w h i l e ( 1 ) { /4 37 f o r ( J =2 ; JJ

  • 7/29/2019 class13_debugging.pdf

    33/82

    Ok, so now we will set a couple of breakpoints - one at the call toFindPrime and the second where a successful prime is to be output:

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 35 / 89

    Array Indexing Errors

  • 7/29/2019 class13_debugging.pdf

    34/82

    1 ( gdb ) l2 16 sc a n f ("%d " ,& UpperBound ) ;3 174 18 Prime [ 2 ] = 1 ;5 19

    6 20 f or (N = 3 ; N

  • 7/29/2019 class13_debugging.pdf

    35/82

    Another gotcha - misplaced (or no) braces. Fix that:

    1 ( gdb ) l2 16 sc a n f ("%d " ,& UpperBound ) ;3 174 18 Prime [ 2 ] = 1 ;

    5 196 20 f or (N = 3 ; N

  • 7/29/2019 class13_debugging.pdf

    36/82

    gg g

    Well, ok, not exactly debugging life itself; rather the game of life.Mathematician John Horton Conways game of life1, to be exact. Thisexample will basically be similar to the prior examples, but now we will

    work in Fortran, and debug some integer arithmetic errors.

    And the context will be slightly more interesting.

    1see, for example, Martin Gardners article in Scientific American, 223, pp.120-123 (1970).

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 39 / 89

    Debugging Life Itself Game of Life

    Game of Life

  • 7/29/2019 class13_debugging.pdf

    37/82

    The Game of Life is one of the better known examples of cellularautomatons (CA), namely a discrete model with a finite number ofstates, often used in theoretical biology, game theory, etc. The rules

    are actually pretty simple, and can lead to some rather surprising

    self-organizing behavior. The universe in the game of life:Universe is an infinite 2D grid of cells, each of which is alive ordead

    Cells interact only with nearest neighbors (including on the

    diagonals, which makes for eight neighbors)

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 40 / 89

    Debugging Life Itself Game of Life

    Rules of Life

  • 7/29/2019 class13_debugging.pdf

    38/82

    The rules in the game of life:

    Any live cell with fewer than two neighbours dies, as if byloneliness

    Any live cell with more than three neighbours dies, as if by

    overcrowdingAny live cell with two or three neighbours lives, unchanged, to the

    next generation

    Any dead cell with exactly three neighbours comes to life

    An initial pattern is evolved by simultaneously applying the above rulesto the entire grid, and subsequently at each tick of the clock.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 41 / 89

    Debugging Life Itself Game of Life

    Sample Code - Game of Life

  • 7/29/2019 class13_debugging.pdf

    39/82

    p

    1 program l i f e2 !3 ! Conway game of l i f e ( debugging example )4 !5 i m p l i c i t none6 i n te g e r , parameter : : n i = 100 0, n j = 1000 , n st ep s = 1007 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum8 i n te g e r , dimension ( 0 : n i , 0 : n j ) : : o ld , new9 r e a l : : a ra nd , ni m2 , n jm 2

    10 !11 ! i n i t i a l i z e elements o f " o ld " t o 0 o r 112 !13 do j = 1 , n j14 do i = 1 , n i15 CALL random_number ( arand )16 o l d ( i , j ) = NINT ( arand )17 enddo18 enddo

    19 nim2 = n i 220 njm2 = n j 2

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 42 / 89

    Debugging Life Itself Game of Life

  • 7/29/2019 class13_debugging.pdf

    40/82

    21 !

    22 ! t i me i t e r a t i o n23 !24 t i m e _ i t er a t i o n : do n = 1 , n st ep s25 do j = 1 , n j26 do i = 1 , n i27 !28 ! p e r i o d i c b ou nd ar ie s ,29 !30 im = 1 + ( i +nim2 ) ( ( i + ni m2 ) / n i ) n i ! i f i =1 , n i31 i p = 1 + i ( i / n i ) n i ! i f i =n i , 132 jm = 1 + ( j +njm2 ) ( ( j + nj m2 ) / n j ) n j ! i f j =1 , n j33 j p = 1 + j ( j / n j ) n j ! i f j =n j , 134 !35 ! f o r each p o in t , add s u rr o un d in g v al ue s36 !37 nsum = o l d ( im , j p ) + o ld ( i , j p ) + o ld ( i p , j p ) &38 + o l d ( im , j ) + o l d ( i p , j ) &39 + o l d ( im , jm ) + o l d ( i , jm ) + o l d ( i p , jm )

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 43 / 89

    Debugging Life Itself Game of Life

  • 7/29/2019 class13_debugging.pdf

    41/82

    40 !41 ! s e t new v a l ue based on number o f " l i v e " n e ig h bo r s42 !

    43 s e l e c t c as e (nsum)44 case ( 3 )45 new ( i , j ) = 146 case ( 2 )47 new ( i , j ) = o l d ( i , j )48 c as e d e fa u l t49 new ( i , j ) = 050 end s e l e c t51 enddo52 enddo53 !54 ! copy new s t a t e i n t o o ld s t a t e55 !56 o l d = new57 p r i n t , T i ck , n , number o f l i v i n g : , sum ( new )58 enddo t i m e _ i t e r a t i o n59 !60 ! w r i t e number o f l i v e p o in t s61 !62 p r i n t , number o f l i v e p o i n t s = , sum ( new )63 end program l i f e

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 44 / 89

    Debugging Life Itself Game of Life

    Initial Run ...

  • 7/29/2019 class13_debugging.pdf

    42/82

    1 [ bono : ~ / d_debug ] $ i f o r t g o l i f e l i f e . f9 02 [ b ono : ~ / d_debug ] $ . / l i f e3 T i c k 1 number o f l i v i n g : 3429464 T i c k 2 number o f l i v i n g : 3343815 T i c k 3 number o f l i v i n g : 2910226 T i c k 4 number o f l i v i n g : 2633567 T i c k 5 number o f l i v i n g : 2909408 T i c k 6 number o f l i v i n g : 322733

    9 :10 :11 T i c k 99 number o f l i v i n g : 012 T i c k 100 number o f l i v i n g : 013 number o f l i v e p o i n t s = 0

    Hmm, everybody dies! What kind of life is that? ... well, not a correct

    one, in this context, at least. Undoubtedly the problem lies within theneighbor calculation, so let us take a closer look at the execution ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 45 / 89

    Debugging Life Itself Game of Life

  • 7/29/2019 class13_debugging.pdf

    43/82

    1 ( gdb ) l 302 25 do j = 1 , n j3 26 do i = 1 , n i4 27 !5 28 ! p e r i o d i c bo und ari es

    6 29 !7 30 im = 1 + ( i +nim2 ) ( ( i + ni m2 ) / n i ) n i ! i f i =1 , n i8 31 i p = 1 + i ( i / n i ) n i ! i f i = n i , 19 32 jm = 1 + ( j +njm2 ) ( ( j + nj m2 ) / n j ) n j ! i f j =1 , n j

    10 33 j p = 1 + j ( j / n j ) n j ! i f j = n j , 111 ( gdb ) b 2512 B r ea k po i nt 1 a t 0 x402e23 : f i l e l i f e . f 90 , l i n e 2 5.13 ( gdb ) run14 S t a r t i n g p ro gr am : / s an / u s e r / j on es m / u2 / d_ debug / l i f e

    1516 B re ak po in t 1 , l i f e ( ) a t l i f e . f 90 : 2 517 25 do j = 1 , n j18 C ur re nt language : aut o ; c u r r e n t ly f o r t r a n19 ( gdb ) s20 26 do i = 1 , n i21 ( gdb ) s22 3 0 im = 1 + ( i +nim2 ) ( ( i + ni m2 ) / n i ) n i ! i f i =1 , n i23 ( gdb ) s

    24 31 i p = 1 + i

    ( i / n i )

    n i ! i f i = n i , 125 ( gdb ) p r i n t im26 $1 = 127 ( gdb ) p r i n t ( i + nim2 ) /1 00 028 $2 = 0.999

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 46 / 89

    Debugging Life Itself Game of Life

  • 7/29/2019 class13_debugging.pdf

    44/82

    Ok, so therein lay the problem - nim2 and njm2 should be integers,not real values ... fix that:

    1 program l i f e2 !3 ! Conway game of l i f e ( debugging example )4 !5 i m p l i c i t none6 i n te g e r , parameter : : n i = 100 0, n j = 1000 , n st ep s = 1007 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum , nim2 , njm28 i n te g e r , dimension ( 0 : n i , 0 : n j ) : : o ld , new9 r e a l : : ar an d

    and things become a bit more reasonable:

    1 [ bono : ~ / d_debug ] $ i f o r t g o l i f e l i f e . f9 02 [ b ono : ~ / d_debug ] $ . / l i f e3 T i c k 1 number o f l i v i n g : 2729904 T i c k 2 number o f l i v i n g : 2536905 :

    6 :7 T i c k 99 number o f l i v i n g : 950738 T i c k 100 number o f l i v i n g : 946649 number o f l i v e p o i n t s = 94664

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 47 / 89

    Debugging Life Itself Game of Life

    Diversion - Demo life

  • 7/29/2019 class13_debugging.pdf

    45/82

    http://www.radicaleye.com/lifepage

    http://en.wikipedia.org/wiki/Conways_Game_of_Life

    Interesting repository of Conways life and cellular automatareferences.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 48 / 89

    Other Debugging Miscellany Core Files

    Core Files

    http://www.radicaleye.com/lifepagehttp://en.wikipedia.org/wiki/Conway's_Game_of_Lifehttp://en.wikipedia.org/wiki/Conway's_Game_of_Lifehttp://www.radicaleye.com/lifepage
  • 7/29/2019 class13_debugging.pdf

    46/82

    Core files can also be used to instantly analyze problems that causeda code failure bad enough to dump a core file. Often the computer

    system has been set up in such a way that the default is not to outputcore files, however:

    1 [ bono : ~ / d_debug ] $ u l i m i t c

    2 0

    for bash syntax. In tcsh you would use the limit built-in command toset the coredumpsize value:

    1 [ b ono : ~ / d_debug ] $ t c s h2 [ j onesm@bono ~ / d _de bug ] $ l i m i t c or e du mp s iz e3 c or ed um ps iz e u n l i m i t e d

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 50 / 89

    Other Debugging Miscellany Core Files

  • 7/29/2019 class13_debugging.pdf

    47/82

    Systems administrators set the core file size limit to zero by default fora good reason - these files generally contain the entire memory imageof an application process when it dies, and that can be very large.

    End-users are also notoriously bad about leaving these files laying

    around ...

    Having said that, we can up the limit, and produce a core file that canlater be used for analysis.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 51 / 89

    Other Debugging Miscellany Core Files

    Core File Example

  • 7/29/2019 class13_debugging.pdf

    48/82

    Ok, so now we can use one of our previous examples, and generate a

    core file:

    1 [ b ono : ~ / d_debug ] $ gc c g o f i n d p r i m e s _ o ri g f i n d p r i m e s _ o r ig . c2 [ b ono : ~ / d_debug ] $ . / f i n d p r i m e s _ o r ig3 e n te r upper bound4 205 S eg me nt at io n f a u l t ( c o re dumped )6 [ bono : ~ / d_debug ] $ l s l c o re . 7 42 87 rw - - - 1 j one sm c c r s t a f f 65536 Sep 27 1 2: 15 c o re . 7 42 8

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 52 / 89

    Other Debugging Miscellany Core Files

    this particular core file is not at all large (it is a very simple code,though with very little stored data generally the core file size will

  • 7/29/2019 class13_debugging.pdf

    49/82

    though, with very little stored data - generally the core file size will

    reflect the size of the application in terms of its memory use when itcrashed). Analyzing it is pretty much like we did when running this

    example live in gdb:1 [ bono : ~ / d_debug ] $ l s l c o re . 7 42 82 rw - - - 1 j one sm c c r s t a f f 65536 Sep 27 1 2: 15 c o re . 7 42 83 [ b ono : ~ / d_debug ] $ gdb f i n d p r i m e s _ o r i g c or e . 74 284 GNU gdb Red H at L i n u x ( 6 . 3 . 0 . 01 .1 4 3 . e l 4 rh )5 . . .6 Core was g en er at ed by . / f i n d p ri m e s _ or i g .7 Program t e rm i n at e d w i t h s i g n a l 1 1 , S eg me nt at io n f a u l t .8 R ea di ng s ym bo ls f ro m / l i b 6 4 / t l s / l i b c . s o . 6 . . . done .9 Loaded s ymb ols f o r / l i b 6 4 / t l s / l i b c . s o . 6

    10 R ea di ng s ym bo ls f ro m / l i b 6 4 / l dl i n u xx8664.so . 2 . . . do ne .11 Loaded s ym bo ls f o r / l i b 6 4 / l dl i n u xx8664.so.212 #0 0 x000000392815062f i n _ I O _ v f s c a n f _ i n t er n a l ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 613 ( gdb ) b t14 #0 0 x000000392815062f i n _ I O _ v f s c a n f _ i n t er n a l ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 615 #1 0 x000000392815866a i n s c a nf ( ) f ro m / l i b 6 4 / t l s / l i b c . s o . 616 #2 0 x0000000000400524 i n main ( ) a t f i n d p r i m e s _ o r i g . c : 1 617 ( gdb ) l 1618 11 i n t main ( )19 12 {20 13 i n t N;21 1422 15 p r i n t f ( " e n t e r upper bound \ n " ) ;23 16 sc an f ("%d " , UpperBound ) ;

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 53 / 89

    Other Debugging Miscellany Core Files

    Summary on Core Files

  • 7/29/2019 class13_debugging.pdf

    50/82

    So why would you want to use a core file rather than interactively

    debug?

    Your bug may take quite a while to manifest itself

    You have to debug inside a batch queuing system whereinteractive use is difficult or curtailed

    You want to capture a picture of the code state when it crashes

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 54 / 89

    Other Debugging Miscellany More Command-line Debuggers

    More Comannd-line Debugging Tools

  • 7/29/2019 class13_debugging.pdf

    51/82

    We focused on gdb, but there are command-line debuggers thataccompany just about every available compiler product:

    pgdbg part of the PGI compiler suite, defaults to a GUI, but can

    be run as a command line interface (CLI) using the -textoption

    idb part of the Intel compiler suite, defaults to CLI (has a

    special option -gdb for using gdb command syntax)

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 55 / 89

    Other Debugging Miscellany Run-time Compiler Checks

    Run-time Compiler Checks

  • 7/29/2019 class13_debugging.pdf

    52/82

    Most compilers support run-time checks than can quickly catchcommon bugs. Here is a handy short-list (contributions welcome!):

    For Intel fortran, -check bounds -traceback -g will automatebounds checking, and enable extensive traceback analysis in case

    of a crash (leave out the bounds option to get a crash report onany IEEE exception, format mismatch, etc.)

    For PGI compilers, -Mbounds -g will do bounds checking

    For GNU compilers, -fbounds-check -g should also do bounds

    checking, but is only currently supported for Fortran and Javafront-ends.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 56 / 89

    Other Debugging Miscellany Run-time Compiler Checks

    Run-time Compiler Checks(contd)

  • 7/29/2019 class13_debugging.pdf

    53/82

    WARNING

    It should be noted that run-time error checking can very much slow

    down a codes execution, so it is not something that you will want touse all of the time.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 57 / 89

    Other Debugging Miscellany Serial Debugging GUIs

    Serial Debugging GUIs

  • 7/29/2019 class13_debugging.pdf

    54/82

    There are, of course, a matching set of GUIs for the various

    debuggers. A short list:

    ddd a graphical front-end for the venerable gdb

    pgdbg GUI for the PGI debuggeridb -gui GUI for Intel compiler suite debugger

    It is very much a matter of preference whether or not to use the GUI. I

    find the GUI to be constraining, but it does make navigation easier.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 58 / 89

    Other Debugging Miscellany Serial Debugging GUIs

    DDD Example

  • 7/29/2019 class13_debugging.pdf

    55/82

    Running one of our previous examples using ddd ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 59 / 89

  • 7/29/2019 class13_debugging.pdf

    56/82

    Other Debugging Miscellany Source Code Checking Tools

    Source Code Checking Tools

  • 7/29/2019 class13_debugging.pdf

    57/82

    Now, in a completely different vein, there are tools designed to helpidentify errors pre-compilation, namely by running it through the source

    code itself.

    splint is a tool for statically checking C programs:

    http://www.splint.org

    ftncheck is a tool that checks only (alas) FORTRAN 77 codes:

    http://www.dsm.fordham.edu/~ftnchek/

    I cant say that I have found these to be particulary helpful, though.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 61 / 89

    Other Debugging Miscellany Source Code Checking Tools

    Memory Allocation Tools

    http://www.splint.org/http://www.dsm.fordham.edu/~ftnchek/http://www.dsm.fordham.edu/~ftnchek/http://www.splint.org/
  • 7/29/2019 class13_debugging.pdf

    58/82

    Memory allocation problems are very common - there are some toolsdesigned to help you catch such errors at run-time:

    efence , or Electric Fence, tries to trap any out-of-boundsreferences (see man efence)

    valgrind is a suite of tools for anlayzing and profiling binaries (seeman valgrind) - there is a user manual available at:file:///usr/share/doc/valgrind-3.6.0/html/manual.html

    valgrind I have seen used with good success, but not particularly in

    the HPC arena.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 62 / 89

    Other Debugging Miscellany Source Code Checking Tools

    Strace

    http://usr/share/doc/valgrind-3.6.0/html/manual.htmlhttp://usr/share/doc/valgrind-3.6.0/html/manual.html
  • 7/29/2019 class13_debugging.pdf

    59/82

    strace is a powerful tool that will allow you to trace all system calls andsignals made by a particular binary, whether or not you have source

    code. Can be attached to already running processes. A powerful low-

    level tool. You can learn a lot from it, but is often a tool of last resort foruser applications in HPC due to the copious quantity of extraneousinformation it outputs.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 63 / 89

    Other Debugging Miscellany Source Code Checking Tools

    Strace Example

  • 7/29/2019 class13_debugging.pdf

    60/82

    As an example of using strace, lets peek in on a running MPI process

    (part of a 32 task job on U2):[ c 06n15 : ~ ] $ ps u jonesm L fUID PID PPID LWP C NLWP STIME TTY TIME CMDjo ne sm 23964 16284 23964 92 2 14 :34 ? 0 0 :0 4: 11 / u t i l / nwchem / nwchem5.0/ bin / jo ne sm 23964 16284 23965 99 2 14 :34 ? 0 0 :0 4: 30 / u t i l / nwchem / nwchem5.0/ bin / jo ne sm 23987 23986 23987 0 1 14 :37 p t s / 0 0 0 :0 0: 00 bashjo ne sm 24128 23987 24128 0 1 14 :39 p t s / 0 0 0 :0 0: 00 ps u jonesm L f[ c 06n15 : ~ ] $ s t r a c e p 23965

    Proce ss 2 39 65 a tta ch e d i n t e r r u p t t o q u i t:l s ee k ( 4 5 , 691535872 , SEEK_SET ) = 691535872r ea d ( 4 5 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 2 \ 2 7 3 \ f [ \ 2 5 0 \ 2 0 7V \2 7 6\ 3 76 K& ] \ 33 1 \2 3 0 d " . . . , 52 428 8) =52 428 8g e t t i m e o f d a y ( { 11 6 11 0 76 3 1 , 1 2 66 0 4} , { 2 4 0 , 1 16 11 07 63 1} ) = 0g e t t i m e o f d a y ( { 11 6 11 0 76 3 1 , 1 2 85 5 3} , { 2 4 0 , 1 16 11 07 63 1} ) = 0::s e l e c t ( 47 , [ 3 4 6 7 8 9 42 43 44 4 6] , [ 4 ] , NULL , NULL ) = 2 ( i n [ 4 ] , o ut [ 4 ] )

    w r i t e ( 4 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 2932) = 2932w ri te v ( 4 , [ { " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 , \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 3 2 } ,{ " \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 , \ 0 \ 1 \ 0 0 0 0 u " . . . , 4 4 }] , 2 ) = 76

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 64 / 89

  • 7/29/2019 class13_debugging.pdf

    61/82

    Part II

    Advanced (Parallel) Debugging

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 65 / 89

    Basic Parallel Debugging Wither Goest the GUI?

    Wither Goest the GUI?

  • 7/29/2019 class13_debugging.pdf

    62/82

    Using a GUI-based debugger gets considerably more difficult whendealing with debugging an MPI-based parallel code (not so much on

    the OpenMP side), due to the fact that you are now dealing withmultiple processes scattered across different machines.

    The TotalView debugger is the premier product in this arena (it hasboth CLI and GUI support) - but it is very expensive, and not present

    in all environments. We will start out using our same toolbox as before,and see that we can accomplish much without spending a fortune. Themethodologies will be equally applicable to the fancy commercial

    products.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 67 / 89

    Basic Parallel Debugging Process Checking

    Process Checking

  • 7/29/2019 class13_debugging.pdf

    63/82

    First on the agenda - parallel processing involves multipleprocesses/threads (or both), and the first rule is to make sure that theyare ending up where you think that they should be (needless to say, all

    too often they do not).

    Use MPI_Get_processor_name to report back on whereprocesses are running

    Use ps to monitor processes as they run (useful flags: ps u -L),

    even on remote nodes (rsh/ssh into them)

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 68 / 89

  • 7/29/2019 class13_debugging.pdf

    64/82

    Basic Parallel Debugging Process Checking

  • 7/29/2019 class13_debugging.pdf

    65/82

    or you can script it (I called this script job_ps):

    1 # ! / b i n / sh2 #3 # S h el l s c r i p t t o t ak e a s i n g l e argument ( PBS j ob i d ) and l a un ch a4 # ps command on ea ch node5 #6 QST= w hi ch q s t at 7 i f [ z $QST ] ; t h en8 e ch o "ERROR: no q st a t i n PATH: PATH=" $PATH9 e x i t

    10 f i1112 case $# i n13 0 ) echo " s i ng l e PBS_JOBID r eq ui re d . " ; e x i t ; ; # no args , e x i t14 1 ) j o b i d =$1 ; ;15 ) echo " s i ng l e PBS_JOBID r e qu ir ed . " ; e x i t ; ; # t oo many args , e x i t16 esac

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 70 / 89

    Basic Parallel Debugging Process Checking

  • 7/29/2019 class13_debugging.pdf

    66/82

    17 # g et node l i s t i n g & t r i m o ut excess v er bi ag e18 n l i n e s = $QST an $ j o bi d | wc l 19 n l i n e s = e cho " $ n l i n es 6" | bc

    20 n o d e l i s t = $QST n $ j ob i d | t a i l $ nl in es | sed " s / \ / [ 0 1 ] + / , / g " |21 sed " s / \ / [ 0 1 ] / / " | sed " s / +/ / g " | sed " s / , / \ / g " |22 awk { f o r ( i =1; i

  • 7/29/2019 class13_debugging.pdf

    67/82

    2 MYPS = ps u jonesm L o pid , pcpu , tim e ,comm3 NODE = c23n31 , my CPU/ thr ead Usage :4 PID %CPU TIME COMMAND5 27130 0 .0 0 0: 00 :0 0 bash

    6 27201 0 . 0 0 0: 0 0: 0 0 pbs_demux7 27235 0 .0 0 0: 00 :0 0 bash8 27244 0 .0 0 0: 00 :0 0 d oq mt es ts . m pi9 1652 0 .0 00 :0 0: 00 r u nt e st s . mpi . un

    10 1664 0 .0 0 0: 00 :0 0 mpiexec11 1666 0 .0 0 0: 00 :0 0 mpiexec12 1667 9 4. 5 0 0 :1 7 :1 8 nwchemi n t e l 9 113 1667 1 0. 0 0 0 :0 1 :5 0 nwchemi n t e l 9 114 1668 9 4. 2 0 0 :1 7 :1 5 nwchemi n t e l 9 1

    15 3813 0 .0 0 0 :0 0 :0 0 ps16 NODE = c23n30 , my CPU/ thr ead Usage :17 PID %CPU TIME COMMAND18 1975 9 6. 2 0 0 :1 7 :3 6 nwchemi n t e l 9 119 1975 6 .6 00 :0 1: 13 nwchemi n t e l 9 120 1976 9 6. 0 0 0 :1 7 :3 4 nwchemi n t e l 9 121 4033 0 .0 0 0 :0 0 :0 0 ps22 NODE = c23n29 , my CPU/ thr ead Usage :23 PID %CPU TIME COMMAND

    24 2673 9 5. 2 0 0 :1 7 :2 6 nwchem

    i n t e l 9 1

    25 2673 8 .9 00 :0 1: 38 nwchemi n t e l 9 126 2674 9 4. 5 0 0 :1 7 :1 9 nwchemi n t e l 9 127 4728 1 .0 0 0 :0 0 :0 0 ps

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 72 / 89

    Basic Parallel Debugging Process Checking

  • 7/29/2019 class13_debugging.pdf

    68/82

    28 NODE = c23n28 , my CPU/ thr ead Usage :29 PID %CPU TIME COMMAND30 1 92 84 8 8 .2 0 0 :1 6 :0 9 nwchemi n t e l 9 131 1 92 84 1 4 .8 0 0 :0 2 :4 3 nwchemi n t e l 9 132 1 92 85 8 8 .2 0 0 :1 6 :0 9 nwchemi n t e l 9 133 21374 0 .0 00 :0 0: 00 ps

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 73 / 89

    GDB in Parallel Using Serial Debuggers in Parallel?

    Using Serial Debuggers in Parallel?

  • 7/29/2019 class13_debugging.pdf

    69/82

    Yes, you can certainly run debuggers designed for use in sequential

    codes in parallel. They are even quite effective. You may just have to

    jump through a few extra hoops to do so ...

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 75 / 89

    GDB in Parallel Attaching GDB

    Attaching GDB to Running Processes

    The simplest way to use a CLI-based debugger in parallel is to attach

  • 7/29/2019 class13_debugging.pdf

    70/82

    p y gg pit to already running processes, namely:

    Find the parallel processes using the ps command (may have torsh/ssh into remote nodes if that is where they are running)

    Invoke gdb on each process ID:

    1 [ u2 : ] $ s sh f 10n32

    2 [ u2 : ~ ] $ ps

    u jonesm3 PID TTY TIME CMD4 680 p t s / 0 00 :0 0 :0 0 bash5 682 p ts / 0 00 :0 0 :0 0 pbs_mom6 683 p ts / 0 00 :0 0 :0 0 pbs_demux7 784 ? 00 :0 0 :0 0 python8 789 p t s / 0 00 :0 0 :0 0 python9 790 p t s / 0 00 :0 0 :0 0 sh

    10 791 ? 00 :0 0 :0 0 python11 792 ? 00 :0 0 :1 4 pp . gdb12 797 ? 00 :0 0 :0 0 sshd13 [ f 10 n3 2 : ~ ] $ gdb pp . gdb 79214 GNU gdb ( GDB ) R ed H a t E n t e r p r i s e L i n u x ( 7. 24 8 .e l 6 )15 . . . .16 0 x00 000 00 000 400 dd 5 i n pp ( ) a t pp . f 9 0 : 4 217 42 do w hi le ( gdbWait /= 1)

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 76 / 89

    GDB in Parallel Attaching GDB

  • 7/29/2019 class13_debugging.pdf

    71/82

    Of course, unless you put an explicit waiting point inside your code, the

    processes are probably happily running along when you attach tothem, and you will likely want to exert some control over that.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 77 / 89

    GDB in Parallel Attaching GDB

    First using our above example I was running one mpi task on f10n32

  • 7/29/2019 class13_debugging.pdf

    72/82

    First, using our above example, I was running one mpi task on f10n32and one on f10n24. After attaching gdb to each process, they paused:

    1 [ f 10 n3 2 : ~ / d _hw / d_hw2 / d_pp ] $ gdb pp . gdb 9232 GNU gdb ( GDB) Red H at E n t e r p r i s e L i n u x ( 7. 24 8 .e l 6 )3 . . .4 ( gdb ) where5 #0 0 x00000031ee0dc0e8 i n p o l l ( ) f ro m / l i b 6 4 / l i b c . s o . 66 #1 0 x0 00 02 b4 01 5e f9 a1 b i n M PI D_ ne m_ tc p_ co nn po ll ( i n _ b l o c k i n g _ p o l l = 1 ) a t . . / . . / sock sm . c : 2 5 267 #2 0 x0 00 02 b4 01 5e f8 e8 2 i n M PI D_ ne m_ tc p_ po ll ( i n _ b l o c k i n g _ p o l l = 1 ) a t . . / . . / s oc ks m . c : 2 3 248 #3 0 x0 00 02b 40 15e 51a 56 i n M PI D_ ne m_ ne tw or k_ po ll ( i n _ b l o c k i n g _ p r o g r e s s = 1 ) a t . . .9 #4 0 x 0 00 0 2b 4 01 5c f f4 c e i n M PI DI _C H3 I_ Pr og re ss ( p r o g r e s s _ s t a t e = 0 x 7 f f f 3 a f e 2 b 9 0 , . . .

    10 #5 0x00002b4015eab822 in PMPI_Recv ( buf=0x601e60 , count =1, data type =1275070495, . . .11 a t . . / . . / r e cv . c :1 5612 #6 0 x00002b40161ab000 i n p mp i_ re cv __ ( ) f ro m / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . .13 #7 0 x0000000000401288 i n pp ( ) a t p p . f 9 0 : 8 014 #8 0 x0 00 000 00 004 017 6a i n m ai n ( )15 #9 0 x00000031ee01ecdd i n _ _ l i b c _ s ta r t _ m a i n ( ) f ro m / l i b 6 4 / l i b c . s o . 616 #10 0x0000000000400c49 i n _ s t a r t ( )17 ( gdb ) c18 C on ti nu in g .

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 78 / 89

    GDB in Parallel Attaching GDB

    and on f10n24:

    1 [ u2 : ~ ] $ ssh f10n242 [ f10n24 : ~ ] $ ps u jonesm3 PID TTY TIME CMD

  • 7/29/2019 class13_debugging.pdf

    73/82

    3 PID TTY TIME CMD4 22673 ? 00 :0 0 :0 0 python5 22684 ? 00 :0 0 :0 0 python6 22686 ? 00 :0 2 :5 4 pp . gdb

    7 22693 ? 00 :0 0 :0 0 sshd8 22694 p ts / 0 00 :0 0 :0 0 bash9 22733 p t s / 0 00 :0 0 :0 0 ps

    10 [ f 10 n2 4 : ~ ] $ g db pp . gdb 2268611 GNU gdb ( GDB ) R ed H a t E n t e r p r i s e L i n u x ( 7. 24 8 .e l 6 )12 . . .13 ( gdb ) where14 #0 0 x000000309b2dc0e8 i n p o l l ( ) f ro m / l i b 6 4 / l i b c . s o . 615 #1 0 x0 00 02 b3 33 11 2f a1 b i n M PI D_ ne m_ tc p_ co nn po ll ( i n _ b l o c k i n g _ p o l l = 1 ) a t . . / . . / sock sm . c : 2 5 26

    16 #2 0 x0 00 02b 33 311 2ee 82 i n M PI D_ ne m_ tc p_ po ll ( i n _ b l o c k i n g _ p o l l = 1 ) a t . . / . . / s oc ks m . c : 2 3 2417 # 3 0 x0 00 02 b3 33 10 87 a5 6 i n MPID_n e m_ n etwork_p ol l ( i n _ b l o ck i n g _ p ro g r e ss =1 ) a t . . / . . / . . .18 #4 0 x 00 00 2b 33 30 f3 54 ce i n M PI DI _C H3 I_ Pr og re ss ( p r o g r e s s _ s t a t e = 0 x 7 f f f d 9 d f f 6 b 0 , . . .19 #5 0x00002b33310e1822 in PMPI_Recv ( buf=0x601e60 , count =4, data type =1275070495, . . .20 a t . . / . . / r e cv . c :1 5621 #6 0 x00002b33313e1000 i n p mp i_ re cv __ ( ) f ro m / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . .22 #7 0 x00000000004013be i n pp ( ) a t p p . f 9 0 : 9 123 #8 0 x0 00 000 00 004 017 6a i n m ai n ( )24 #9 0 x000000309b21ecdd i n _ _ l i b c _ s ta r t _ m a i n ( ) f ro m / l i b 6 4 / l i b c . s o . 6

    25 #10 0x0000000000400c49 i n _ s t a r t ( )26 ( gdb ) c27 C on ti nu in g .

    and we used the (c) continue command to let the execution pick upagain where we (temporarily) interrupted it.

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 79 / 89

    GDB in Parallel Attaching GDB

    Using a Waiting Point

  • 7/29/2019 class13_debugging.pdf

    74/82

    You can insert a waiting point into your code to ensure that executionwaits until you get a chance to attach a debugger:

    i n t e g e r : : g db W ai t=0::

    CALL MPI_COMM_RANK(MPI_COMM_WORLD, my id , i e r r )CALL MPI_COMM_SIZE(MPI_COMM_WORLD, Np ro cs , i e r r )! dummy pause p o i n t f o r gdb i n s t e r t i o ndo wh i l e ( g db Wa it / = 1 )end do

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 80 / 89

    GDB in Parallel Attaching GDB

    and then you will find the waiting at that point when you attach gdb,

  • 7/29/2019 class13_debugging.pdf

    75/82

    and you can release it at your leisure:

    1 [ f1 0n 3 2 :~ /d _hw/ d_hw2 / d _p p ] $ g d b p p . g db 10 032 GNU gdb ( GDB) Red H at E n t e r p r i s e L i n u x ( 7. 24 8 .e l 6 )3 . . .4 0 x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 25 42 do w h il e ( gdbWait /= 1)6 ( gdb ) s g dbWait = 17 44 i f ( Nprocs /= 2) then8 ( gdb ) c9 C on ti nu in g .

    1 [ f1 0n 2 4 :~ /d _hw/ d_hw2 / d _p p ] $ g d b p p . g db 2 27 772 GNU gdb ( GDB) Red H at E n t e r p r i s e L i n u x ( 7. 24 8 .e l 6 )3 0 x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 24 42 do w h il e ( gdbWait /= 1)5 ( gdb ) s g dbWait = 16 44 i f ( Nprocs /= 2) then7 ( gdb ) c8 C on ti nu in g .

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 81 / 89

    GDB in Parallel Using GDB Within MPI Task Launcher

    Using GDB Within MPI Task Launcher

    Last, but not least, you can usually launch gdb through your MPI task

  • 7/29/2019 class13_debugging.pdf

    76/82

    y y g g ylauncher. For example, using the Intel MPI task launcher, mpiexec,

    1 [ k 07n 14 : ~ / d _hw / d_hw2 / d_pp ] $ mp ie xe c gdb np 2 . / pp . gdb2 01: ( gdb ) l i s t 303 01: 254 01: 26 i n t e ge r : : gdbWait=05 01: 27 i n t e g e r m yi d , N pr oc s , i e r r , m pi _p ro c na me _l en gt h6 01: 28 i n t e g e r : : s t a t u s ( MPI_STATUS_SIZE )7 01: 29 c h ar ac t er ( le n =MPI_MAX_PROCESSOR_NAME) : : mpi_procname8 01: 309 01: 31 !

    10 01: 32 ! I n i t i a l i z e communicator , check t h a t we are u si ng o nl y 2p11 01: 33 !12 01: 34 CALL MPI _IN IT ( i e r r )13 01: ( gdb ) b 3414 01: B r ea k po i nt 2 a t 0 x4 00 d1 f : f i l e pp . f 90 , l i n e 3 4.15 01: ( gdb ) r un16 01: C o n t in u i ng .17 01:18 01: B re ak po in t 2 , pp ( ) a t p p . f 90 : 3 4

    19 01: 34 CALL MPI _IN IT ( i e r r )20 01: ( gdb ) 01: ( gdb ) c21 01: C o n t in u i ng .22 0 : H e l l o from proc 0 o f 2 k07n14 . c c r . b u ff a lo . edu23 1 : H e l l o from proc 1 o f 2 k07n14 . c c r . b u ff a lo . edu24 0 : Number Averaged f o r Sigmas : 2

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 82 / 89

    GDB in Parallel Using GDB Within MPI Task Launcher

    or, in batch mode:

    1 [ u2 : ~ ] $ qsub q debug lnodes =2:GM:ppn=2, wa ll t i me =00:15:00 I2 qsub : w a i t i n g f o r j o b 9 93 514 .d15n41 . c c r . b u f f a l o . e du t o s t a r t3 qs ub : j o b 9 93 51 4 d15n4 1 c c r b u f f a l o edu r e ad y

  • 7/29/2019 class13_debugging.pdf

    77/82

    3 qs ub : j o b 9 93 51 4. d15n4 1 . c c r . b u f f a l o . edu r e ad y45 J ob 9 93 51 4. d15n41 . c c r . b u f f a l o . edu ha s r e q ue s t e d 2 c o r es / p r o c e s s or s p e r node .

    6 PBSTMPDIR i s / scr a tch /9 9 3 5 1 4 .d 15 n4 1 . ccr . b u ff a l o . ed u7 [ f 10 n3 2 : ~ ] $ m odule l oa d i n t e lmpi8 [ f10n32 : ~] $ NNODES= cat $PBS_NODEFILE | uniq | wc l 9 [ f 10 n3 2 : ~ ] $ mpdboot n $NNODES f $PBS_NODEFILE

    10 [ f 10 n3 2 : ~ ] $ m p ic c g o c p i . i m pi c p i . c11 [ f 10 n3 2 : ~ ] $ m pi ex ec gdb np 4 . / c p i . i m p i12 03: ( gdb ) l i s t 1 6 ,2013 03: 16 double s ta r tw t im e = 0 .0 , endwtime ;14 03: 17 i n t namelen ;15 03: 18 ch ar process or_nam e [MPI_MAX_PROCESSOR_NAME ] ;16 03: 1917 03: 20 M PI _I ni t (& argc ,& arg v ) ;18 03: ( gdb ) b 2019 03: B r ea k po i nt 2 a t 0x4009a0 : f i l e c p i . c , l i n e 2 0.20 03: ( gdb ) r un21 03: C o n t in u i ng .22 03:23 03: B re ak po in t 2 , main ( a rg c =1 , a rg v=0 x 7 f f f f f f f d 9 b 8 ) a t c p i . c : 2 024 03: 20 M PI _I ni t (& argc ,& arg v ) ;

    25 03: ( gdb ) 03: ( gdb ) c26 03: C o n t in u i ng .27 0 : P ro ce ss 0 on f 10 n3 2 . c c r . b u f f a l o . e du28 1 : P ro ce ss 1 on f 10 n3 2 . c c r . b u f f a l o . e du29 2 : P ro ce ss 2 on f 10 n2 7 . c c r . b u f f a l o . e du30 3 : P ro ce ss 3 on f 10 n2 7 . c c r . b u f f a l o . e du

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 83 / 89

    GDB in Parallel Using GDB Within MPI Task Launcher

    Using Serial Debuggers in Parallel

  • 7/29/2019 class13_debugging.pdf

    78/82

    So you can certainly use serial debuggers in parallel - in fact it is a

    pretty handy thing to do. Just keep in mind:

    Dont forget to compile with debugging turned on

    You can always attach to a running code (and you can instrumentthe code with that purpose in mind)

    Beware that not all task launchers are equally friendly towardsbuilt-in support for serial debuggers

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 84 / 89

    GUI-based Parallel Debugging TotalView

    The TotalView Debugger

  • 7/29/2019 class13_debugging.pdf

    79/82

    The premier parallel debugger, TotalView:

    Sophisticated commercial product (think many $$ ...)

    Designed especially for HPC, multi-process, multi-thread

    Has both GUI and CLISupports C/C++, Fortran 77/90/95, mixtures thereof

    The official debugger of DOEs Advanced Simulation andComputing (ASC) program

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 86 / 89

    GUI-based Parallel Debugging TotalView

    Using TotalView at CCR

  • 7/29/2019 class13_debugging.pdf

    80/82

    Pretty simple to start using TotalView on the CCR systems:1 Generally you want to load the latest version:

    [ d 16n03 : ~ ] $ module a v a i l t o t a l v i e w

    2 Make sure that your X DISPLAY environment is working if you aregoing to use the GUI.

    3 The current CCR license supports 2 concurrent users up to 8processors (precludes usage on nodes with more than 8 coresuntil/unless this license is upgraded).

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 87 / 89

    GUI-based Parallel Debugging DDT

    The DDT Debugger

  • 7/29/2019 class13_debugging.pdf

    81/82

    Allineas commercial parallel debugger, DDT:

    Sophisticated commercial product (think many $$ ...)

    Designed especially for HPC, multi-process, multi-thread

    Has both GUI and CLI

    Supports C/C++, Fortran 77/90/95, mixtures thereof

    CCR has a 32-token license for DDT (including CUDA support)

    To find the latest installed version,module avail ddt

    M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 88 / 89 GUI-based Parallel Debugging Eclipse PTP

    Current Recommendations

  • 7/29/2019 class13_debugging.pdf

    82/82

    CCR has licenses for Allineas DDT and TotalView (although the

    current TotalView license is very small and outdated and will be eitherupgraded or dropped in favor of DDT). Both are quite expensive, butstay tuned for further developments. Note that the open-source

    eclipse project also has a parallel tools platform that can be used incombination with C/C++ and Fortran:

    http://www.eclipse.org/ptp

    M D Jones Ph D (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 89 / 89

    http://%20http//www.eclipse.org/ptphttp://%20http//www.eclipse.org/ptp