::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Advanced Memory Checking Frameworks forMPI Parallel Applications in Open MPI
Shiqing Fan
26. September 2011
5th Parallel Tools Workshop 2011
1/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
IntroductionOpen MPIValgrindIntel PIN
Design and ImplementationCheck Phase DefinitionsThe Tools ImplementationIntegration in Open MPI
Memory Checking ExamplesDetectable Bug Classes
Performance
Conclusions
2/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Outline
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Open MPI is
I a new MPI implementation from scratch
I w/o the cruft of previous implementationI Design started in early 2004I Open source license based on the BSD
license
Project goals:
I Full, fast & extensible MPI-2 implementation
I Thread-safety
I Prevent the forking problem
I Combine the best ideas and techniques.
I Multiple platforms support, including Windows
3/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Open MPI
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
In this project, there are:
I 14 members, 13contributors, 2 partners
I 4 US labs, 8 universities
I 7 vendors
I 1 individual
4/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Open MPI
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Open MPI consists of three sub-packages
Operating System
ORTE − Open Runtime Environment
Open MPI
OPAL − Open Portable Access Layer
Modular Component Architecture (MCA)I Dynamically load available modules and check for hardwareI Select best plug-in and unload others (e.g. if hw not available)I Fast indirect calls into each plug-in
Framework
User Application
MPI API
Comp
Comp
Comp
Comp SM
WV
TCP
IB
Module Component Architecture
BTLFramework
5/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Project Structure
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Valgrind
I is an Open-Source Debugging & Profiling tool
I runs on x86/Linux, AMD64/Linux, PPC32/Linux and PPC64/Linux
I works with any dynamically & statically linked application
Memcheck - A heavyweight memory checker:I Runs program on a synthetic CPU
I Identical to a real CPU, store information of memory
I Valid-value bits (V-bits) for each bitI Has valid value or not
I Address bits (A-bits) for each byteI Possible to read/write that location
I All reads and writes of memory are checked
I Calls to malloc/new/free/delete are intercepted
6/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Valgrind
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Open MPI readily supports execution of apps with valgrind:
> mpirun -np 2 valgrind ./app
...array = malloc(SIZE * sizeof(int));
if (rank == 0)
MPI Recv(array, SIZE+1, MPI INT, 1, 4711, MPI COMM WORLD,
&status);
else
MPI Send(array, SIZE+1, MPI INT, 0, 4711, MPI COMM WORLD);
printf(”rank %d: array[0] = %d”, rank, array[0]);
...
7/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Valgrind - MPI Example
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
==10420== Invalid read of size 4==10420== at 0x4026D96: memcpy (mc replace strmem.c:635)==10420== by 0x41DF637: opal convertor pack (opal convertor.c:251)==10420== by 0x51B2DD9: mca btl sm sendi (btl sm.c:854)==10420== by 0x5195986: mca bml base sendi (bml.h:305)...==10420== by 0x40C2AAB: PMPI Send (psend.c:75)==10420== by 0x8048956: main (example.c:24)==10420== Address 0x53d37ba is 2 bytes after a block of size 40 alloc’d==10420== at 0x4025230: malloc (vg replace malloc.c:236)==10420== by 0x80488D4: main (example.c:18)...==10618== Conditional jump or move depends on uninitialised value(s)==10618== at 0x4346973: vfprintf (vfprintf.c:1613)==10618== by 0x435015F: printf (printf.c:35)==10618== by 0x80487F7: main (example.c:28)
But it can not find:
I May be run with 1 process: One pending Recv (Marmot).
I May be run with more than 2 processes: Unmatched Sends (Marmot).
8/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Valgrind - MPI Example
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Intel Pin Framework:
I is a dynamic binary instrumentation framework,
I is for creating dynamic program analysis tools (Pintools),
A Pintool:
I is made based on the Intel Pin Framework,
I could insert code into a program to collect run-time information,
I performs program analysis or architectural study on user applicationson Linux and Windows,
I e.g. performance profiling, error detection, processor and cachesimulation.
9/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Intel PIN
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
IntroductionOpen MPIValgrindIntel PIN
Design and ImplementationCheck Phase DefinitionsThe Tools ImplementationIntegration in Open MPI
Memory Checking ExamplesDetectable Bug Classes
Performance
Conclusions
10/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Table of Content
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Detect Non-blocking/One-sided communication buffer errors
I Make sure the buffers are correctly accessed during thecommunication
I Send or receive buffer access violations
I Usage of undefined data for communication
I Set memory accessibility independent of MPI operations
I i.e. only set accessibility for the fragment to be sent/received
11/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Pre-Communication Checks
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Use two bits shadow memory for each byte of memory
I Send buffer should not be written
I Receive buffer should not be read or written
{
{Process 2
User code
Process 1
User code
Time
MPI Library
MPI Library
MPI_Wait MPI_Finalize
MPI_FinalizeMPI_Irecv MPI_Wait
MPI_Isend
Buffer readable
Buffer no access
Buffer CheckSend/Receive
12/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Pre-Communication Checks
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Post-communication checks:
I Detect memory usage problem after the communication
I Make sure the buffer are correctly used
I The received buffer should not be overwritten before read (WBR)
I There might be data that are not used at all
I Every memory access will be checked
I Specific memory access will trigger the defined callback
I The callback function processes/checks the memory accessinformation
13/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Post-Communication Checks
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Use two bits shadow memory for each byte of memory
I Receive buffer should not have write before read (WBR) error
I Receive buffer should be at least read out once
{
{User code
User code
MPI Library
MPI Library
Process 1
Process 2
Time
MPI_Wait MPI_Finalize
MPI_Wait MPI_FinalizeMPI_Irecv
MPI_Isend
Buffer no access Watch mem op
Check
Buffer usage
14/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Post-Communication Checks
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Valgrind is already capable for
I memory definedness checks
I changing memory status, e.g. noaccess
More features are implemented:
I An engine for implementing callback functions on memoryoperations for user application
I Making memory readable and/or writable
I Register/unregister memory regions with callback functions
I Initiating callback functions on particular memory read/write
15/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Valgrind Extension
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
MemPin = Memory Debugging with Pin
Main features:
I Implement applications mainly on Windows.
I An engine for implementing callback functions on memoryoperations for user application
I Make memory readable and/or writable
I Register/unregister memory regions with callback functions
I Initiating callback functions on particular memory read/write
I Callstack information of the user application
I Generate log and debug outputs to stdout or files.
16/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
MemPin
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Memchecker: a new concept to use debugging tools’ API internallyin Open MPI to reveal bugs
I In the ApplicationI In Open MPI itself
Implement generic interface memchecker as MCAI Implemented in Open PAL layerI Configure option --enable-memchecker
Simply run command, e.g. :I mpirun -np 2 [Valgrind | MemPin] ./my mpi
Operating System
ORTE − Open Runtime Environment
Open MPI
OPAL − Open Portable Access Layer
MemPin ...
Memchecker
Valgrind
Memchecker Memchecker
17/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Integration in Open MPI
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
d
/Memcheck
Open MPI
Memchecker
MPI Routines
Callback Impl
del callback
r/w counts
buffer usage
WBR check
reg callback
MemPin / Memcheck
PIN / Valgrind
Callback
Search Engine
Reg
Mem Reg
Engine
Memory Management
Storage
Memory
Application CodeTranslation Engine
insert delete
18/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
The Runtime
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
IntroductionOpen MPIValgrindIntel PIN
Design and ImplementationCheck Phase DefinitionsThe Tools ImplementationIntegration in Open MPI
Memory Checking ExamplesDetectable Bug Classes
Performance
Conclusions
19/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Table of Content
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Non-blocking buffer modified/accessed before finished
MPI Isend (buffer, SIZE, MPI INT, , &request);
buffer[1] = 4711;
result[0] = buffer[0];
MPI Wait (&req, &status);
*Side note: MPI-1, p30, Rationale for restrictive access rules; allows betterperformance on some systems.
20/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Detectable Bug Classes
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Non-blocking buffer accessed before finished
MPI Irecv (buffer, SIZE, MPI CHAR, , &request);
result[1] = buffer[1];
MPI Wait (&request, &status);
*Side note: CRC-based methods do not reliably catch these cases.
I Memory that is outside receive buffer is overwritten :
buffer = malloc( SIZE * sizeof(MPI CHAR) );
memset (buffer, SIZE * sizeof(MPI CHAR), 0);
MPI Recv(buffer, SIZE+1, MPI CHAR, , &status);
*Side note: MPI-1, p21, rationale of overflow situations: no memory that outside thereceive buffer will ever be overwritten.
21/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Detectable Bug Classes
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Write before read on received buffer:
MPI Recv(buffer, 2, MPI CHAR, 1, 0, MPI COMM WORLD,
&status);
buffer[0] = sizeof(long);
result = buffer[1]*1000;
I Write to buffer before accumulate is finished :
MPI Accumulate(A, NROWS*NCOLS, MPI INT, 1, 0, 1,
xpose, MPI SUM, win);
A[0][1] = 4711;
MPI Win fence(0, win);
22/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Detectable Bug Classes
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
IntroductionOpen MPIValgrindIntel PIN
Design and ImplementationCheck Phase DefinitionsThe Tools ImplementationIntegration in Open MPI
Memory Checking ExamplesDetectable Bug Classes
Performance
Conclusions
23/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Table of Content
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Benchmark
I Intel MPI Bechmark
EnvironmentI Linux:
I Dgrid-cluster at HLRSI Dual-processor Intel WoodcrestI Infiniband-DDR network with the Open Fabrics stack
I Windows:I VISCLUSTER at HLRSI Windows HPC 2008 R2 with TCP connectionI AMD opteron 250 processors and 4.3 GB memory
Test cases
I Running with plain Open MPI
I Running with memchecker component enabled
24/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Test Environment
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Two nodes on Dgrid over Inifiniband
3.5
4
4.5
5
5.5
6
6.5
7
7.5
0 50 100 150 200 250
Tim
e in
use
c
Message Length
compiled w/o instrumentationcompiled w/ instrumentation
25/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Performance on Linux
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Two nodes on VISCLUSTER over tcp
40
41
42
43
44
45
46
0 50 100 150 200 250
Tim
e in
use
c
Message Length
compiled w/o instrumentationcompiled w/ instrumentation
26/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Performance on Windows
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
IntroductionOpen MPIValgrindIntel PIN
Design and ImplementationCheck Phase DefinitionsThe Tools ImplementationIntegration in Open MPI
Memory Checking ExamplesDetectable Bug Classes
Performance
Conclusions
27/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Table of Content
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
I Valgrind extension and a new MemPin tool have been implemented
I Memory checks have been done for:I Send and receive buffers violation during MPI communicationI WBR error after MPI communicationI Data transferred but never used
I The performance implication is minimum
I More memory checks may be implemented for other perspectivesI for example, the CUDA memory
28/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::
Conclusions
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Thank you for your attention !
29/29 :: Advanced Memory Checking Frameworks for MPI Parallel Applications in Open MPI :: 26. September 2011 ::