MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 1 of 37
Praktikum Wissenschaftliches Rechnen(Performance-optimized Programming)
Programming with MPI
December 7th, 2005
Dipl.-Inf. Ralf-Peter [email protected]
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 2 of 37
1. MPI - The Message Passing Interface
These slides are part of an MPI course developed from the EPCCTraining and Education Centre, Edinburgh Parallel Computing Cen-tre, University of Edinburgh.
Further information under:
• Peter S. Pacheco:Parallel Programming with MPI,Morgan Kaufmann Publishers, 1997.
• Marc Snir and William Gropp et al:MPI: The Complete Reference (2-volume set),The MIT Press, 1998.
• http://www-unix.mcs.anl.gov/mpi/
• http://www.hlrs.de/mpi/
• man pages ;-)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 3 of 37
2. Overview
2.1. Paradigm of MPI programming
• sequential programming:
⇒ one process, one memory
• Message passing programming:
⇒ n processes, n memories, communication by exchangingmessages
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 4 of 37
2.2. Identification of single processes
Each process has a unique identifier called rank ∈ [0, size− 1].
The amount of overall processes called size is specified at the pro-gram’s start!
The master process always has rank 0.
All communication is based on the ranks of the participating pro-cesses.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 5 of 37
2.3. SPMD concept
SINGLE PROGRAM MULTIPLE DATA means executing n processes ofan MPI program such all processes process the same task, but withdifferent data. Thus, all variables have the same identifiers, but theyare stored at different places (distributed memory) and can have dif-ferent values.
main(int argc, char **argv){
if( mainprocess ) {master_process(/* arguments */);
} else {slave_process(/* arguments */);
}}
This is a constraint to the general message passing model!
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 6 of 37
An example: parallel installation of sockets
• MPI processes= work of one electrician foreach floor
• data= installation of sockets
• MPI communication= real communication be-tween electricians to ensurethat the holes for the wires ineach floor are drilled at thesame places
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 7 of 37
2.4. MPMD concept
MULTIPLE PROGRAM MULTIPLE DATA means executing n processesof an MPI program such all processes can process different tasks,e.g:
main(int argc, char **argv){
if( mainprocess ) {do_this(/* arguments */);
} else if( slaveprocess1 ) {do_that(/* arguments */);
} else if( · · · ) {...
}}
This enables parallel processing of different problems at the sametime!
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 8 of 37
Previous example: installation of sockets and painting walls
• MPI processes= work of one electrician orone painter for each floor
• data= installation of sockets orpainting walls
• MPI communication= real communication be-tween workers to ensure thatin one room the sockets arenot installed at the same timethe walls are painted
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 9 of 37
2.5. Messages
Messages are packets sent between single processes.
Necessary informations are:
• rank from sender and receiver
• source and destination
• type of data
• size of data
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 10 of 37
2.6. Communication
MPI offers different possibilities of communication:
• point-to-point communication
– synchronous send/receive (e.g. fax)– asynchronous send/receive (e.g. mail box)– blocking communication– non-blocking communication
• collective communication
– barriers– broadcast– scatter and gather– reduce (e.g. global sum, global minimum/maximum)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 11 of 37
3. Process model
3.1. Compiling and executing MPI programs
The compilation of an MPI program goes as follows:
> mpicc my_program.c
To execute an MPI program, type:
> mpirun -np <amount of processes> my_program
The LAM multicomputer can be started with the command lamboot,an optional declaration of single computers can also be given, e.g:
> lamboot machines with file machines
computer01computer04computer05
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 12 of 37
3.2. Headers, initialisation and finalisation
To use MPI functions the MPI header has to be included in thesource code.
#include <mpi.h>
The initialisation of MPI can be done by the following command:
MPI_Init(&argc, &argv)
The finalisation of MPI can be done by the following command:
MPI_Finalize(void)
Initialisation and finalisation are mandatory!
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 13 of 37
3.3. MPI communicator and handles
All processes of an MPI program are combined in the communicatorcalled MPI_COMM_WORLD—a so-called handle.
Handles are predefined constants returned by invoking MPI func-tions and stored in special MPI datatypes.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 14 of 37
3.4. Determination of rank and size
To determine the process ID (rank) and the amount of all participat-ing processes (size), respectively, the following commands shouldbe used:
MPI_Comm_rank(MPI_Comm comm, int *rank) and
MPI_Comm_size(MPI_Comm comm, int *size)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 15 of 37
3.5. A simple example: Hello world!
#include <stdio.h>#include <mpi.h>
int main(int argc, char **argv){
int rank, size;
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);
if(rank == 0) {printf(”%d processes running\n”, size);
} else {printf(”Slave %d: Hello World!\n”, rank);
}
MPI_Finalize();
return 0;}
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 16 of 37
3.6. MPI datatypes
Datatypes for the usage in MPI programs:
MPI DATATYPE C EQUIVALENT
MPI_CHAR signed charMPI_SHORT signed short intMPI_INT signed intMPI_LONG signed long intMPI_UNSIGNED_CHAR unsigned charMPI_UNSIGNED_SHORT unsigned short intMPI_UNSIGNED unsigned intMPI_UNSIGNED_LONG unsigned long intMPI_FLOAT floatMPI_DOUBLE doubleMPI_LONG_DOUBLE long doubleMPI_BYTEMPI_PACKED
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 17 of 37
4. Point-to-point communication
4.1. Blocking communication
Neither sender nor receiver are able to continue the program duringthe message passing stage.
4.1.1. Sending and receiving of messages
The generic sending and receiving invocation goes as follows:MPI_Send(void *buf, int count, MPI_Data_Type data type,
int dest, int tag, MPI_Comm comm) and
MPI_Recv(void *buf, int count, MPI_Data_Type data type,
int source, int tag, MPI_Comm comm, MPI_Status *status)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 18 of 37
4.1.2. Different ways to send messages
MPI offers different ways to send a message:
• synchronous send MPI_SSENDfinalises after message has been sent, independent from state
• buffered (asynchronous) send MPI_BSENDfinalises after message has been copied to send buffer, inde-pendet from receiver
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 19 of 37
• standard send MPI_SENDeither buffered or synchronous send, dependent from messagesize
• ready send MPI_RSENDreception has to be announced before send—very dangerous!
Reception (MPI_RECV) finalises after the arrival of a message, theinvocation is uniform to all different ways of sending.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 20 of 37
Comparison of different ways of sending:
• standard send (MPI_SEND)
– minimum time of transmission– risk of blocking due to synchronous send
• synchronous send (MPI_SSEND)
– risk of a deadlock– risk of serialisation– risk of waiting → idle time– high latency, good bandwidth
• buffered (asynchronous) send (MPI_BSEND)
– low latency, bad bandwidth
• ready send (MPI_RSEND)
– better don’t use
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 21 of 37
4.1.3. Remarks
Wildcards can be used for receiving messages:
• reception from arbitrary sender ⇒ MPI_ANY_SOURCE
• reception with arbitrary tag ⇒ MPI_ANY_TAG
The current sender and the current tag, respectively, can be en-quired by status or also being ignored (MPI_STATUS_IGNORE).
Basic law: messages can’t pass each other!
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 22 of 37
4.1.4. Example: each process sends a message to its neighbour
#include <stdio.h>#include <mpi.h>
int main(int argc, char **argv){
int rank, size;int buf;
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);
if(rank == 0) {MPI_Ssend(&rank, 1, MPI_INT, rank+1, 0, MPI_COMM_WORLD);
} else {MPI_Recv(&buf, 1, MPI_INT, rank-1, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);printf(”Slave %d: Got message from process %d\n”, rank, buf);if(rank != size-1) {
MPI_Ssend(&rank, 1, MPI_INT, rank+1, 0, MPI_COMM_WORLD);}
}
MPI_Finalize();
return 0;
}
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 23 of 37
4.2. Non-blocking communication
Sender and receiver are able to continue with the program duringthe message passing stage.
4.2.1. Process of non-blocking communication
Non-blocking communication can be divided into three parts:• initialisation of message passing stage
• continuation with different tasks (e.g. more initialisations)
• checking for termination of message passing stageTo obtain status information about the termination of a messagepassing stage, so-called request handles (MPI_Request) are nec-essary.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 24 of 37
4.2.2. Initialisation
The generic sending and receiving invocation goes as follows:
MPI_Isend(void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm, MPI_Request *request)
and
MPI_Irecv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Request *request)
The ways of sending already known from blocking communicationalso exist for non-blocking communication analogously:
• synchronous send MPI_ISENDfinalises after message has been sent, independent from state
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 25 of 37
• buffered (asynchronous) send MPI_IBSENDfinalises after message has been copied to send buffer, inde-pendet from receiver
• standard send MPI_ISENDeither buffered or synchronous send, dependent from messagesize
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 26 of 37
• ready send MPI_IRSENDreception has to be announced before send—very dangerous!
Caution: both alternatives – blocking and non-blocking – can becombined.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 27 of 37
4.2.3. Checking for termination
To obtain status information about the message passing stage, MPIoffers two different enquiry methods with various combination techniques—MPI_Wait and MPI_Test:
MPI_Wait(MPI_Request *request, MPI_Status *status) and
MPI_Test(MPI_Request *request, int *flag, MPI_Status *status)
These can be combined as follows:
Enquiry Wait Testexactly once MPI_Wait MPI_Testat least once, MPI_Waitany MPI_Testanyreturn of one process IDat least once, MPI_Waitsome MPI_Testsomereturn of all finished process IDsall MPI_Waitall MPI_Testall
An MPI_ISEND or MPI_IRECV with an immediate MPI_WAIT con-forms blocking communication!
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 28 of 37
4.2.4. Example: communication in a ring
#include <stdio.h>#include <mpi.h>
int main(int argc, char **argv){
int rank, size;int buf;MPI_Request send, recv;
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Irecv(&buf, 1, MPI_INT, (rank-1+size)%size, 0,MPI_COMM_WORLD, &recv);
MPI_Issend(&rank, 1, MPI_INT, (rank+1)%size, 0,MPI_COMM_WORLD, &send);
MPI_Wait(&recv, MPI_STATUS_IGNORE);MPI_Wait(&send, MPI_STATUS_IGNORE);printf(”Process %d: Got message from process %d\n”, rank, buf);
MPI_Finalize();
return 0;
}
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 29 of 37
5. Collective communication
5.1. Definition
Communication between a group of processes is called collectivecommunication. Characteristics are:
• all processes communicate
• collective communication is always blocking communication
• no tags are allowed
• all reception buffers have to be from the same size
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 30 of 37
5.2. Barrier synchronisation and broadcast
Barrier synchronisation is used only for debugging normally. Syn-chronisation is achieved by enforcing all processes to wait:
MPI_Barrier(MPI_Comm comm)
A broadcast sends a message from the calling process to all otherprocesses:
MPI_Bcast(void *buf, int count, MPI_Data_Type data type, int root,
MPI_Comm comm)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 31 of 37
5.3. Scatter and gather
The scatter command distributes information from one process to allprocesses:
MPI_Scatter(void *sbuf, int scount, MPI_Datatype sdatatype,
void *rbuf, int rcount, MPI_Datatype rdatatype,
int root, MPI_Comm comm)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 32 of 37
The gather command collects information of all processes in onesingle process:
MPI_Gather(void *sbuf, int scount, MPI_Datatype sdatatype,
void *rbuf, int rcount, MPI_Datatype rdatatype,
int root, MPI_Comm comm)
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 33 of 37
5.4. Reduce
The reduce command reduces information of all processes by ap-plying an operator in one single process:
MPI_Reduce(void *sbuf, void *rbuf, int count, MPI_Datatype
datatype, MPI_Op op, int root, MPI_Comm comm)
Example: calculating the sum
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 34 of 37
Predefined reduce commands:
operator resultMPI_MAX find maximumMPI_MIN find minimumMPI_SUM calculate sumMPI_PROD calculate productMPI_LAND make logical ANDMPI_BAND make bit by bit ANDMPI_LOR make logical ORMPI_BOR make bit by bit ORMPI_LXOR make logical XORMPI_BXOR make bit by bit XORMPI_MAXLOC find maximum and its positionMPI_MINLOC find minimum and its position
MPI also allows the definition of self-made operators
MPI_Op_create(MPI_User_function *func, int commute, MPI_Op *op)
formed upon the pattern vectorA ⊗ vectorB.
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 35 of 37
Versions of reduce operatos:
• MPI_ALLREDUCEthe result of the operation is provided to all processes
• MPI_REDUCE_SCATTERresulting vector is distributed to all processes
• MPI_SCANresults for process i are results of reduce operations of pro-cesses j ∈ [0, i]
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 36 of 37
6. Putting it all together
Finding prime numbers with Eratosthenes’ sieve
initialise MPI;get rank and size values;
max ⇐ amount of numbers for search;divide max into size− 1 parts pi;
if ( master process ) then
receive local minimum mini from all slave processes;determine global minimum min within all mini;send global minimum min to all slave processes;
else /* slave process */
find local minimum mini from pi;send local minimum mini to master process;receive global minimum min from master process;delete all multiple of min from pi;
end if
finalise MPI;
MPI - The Message . . .
Overview
Process model
Point-to-point . . .
Collective communication
Putting it all together
Some hints . . .
Page 37 of 37
7. Some hints concerning secure shell
MPI communication is based on rsh or ssh normally. To avoid thatone has to state his password for every ssh connection at program’sstart a local key can be generated. Therefore, the following steps arenecessary:
1. > cd ∼/.ssh
2. > ssh-keygen -t rsa
Generation of a private (id_rsa) and a public (id_rsa.pub) key.
3. > cat id_rsa.pub >> authorized_keys
Copies the public key to the key file.