MPI Introduction

Download MPI Introduction

Post on 11-May-2015

748 views

Category:

Education

2 download

Embed Size (px)

TRANSCRIPT

<ul><li>1.MPI Rohit Banga Prakher Anand K Swagat Manoj Gupta Advanced Computer Architecture Spring, 2010</li></ul> <p>2. ORGANIZATION </p> <ul><li>Basics of MPI </li></ul> <ul><li>Point to Point Communication </li></ul> <ul><li>Collective Communication </li></ul> <ul><li>Demo </li></ul> <p>3. GOALS </p> <ul><li>Explain basics ofMPI </li></ul> <ul><li>Start coding today! </li></ul> <ul><li>Keep It Short and Simple </li></ul> <p>4. MESSAGE PASSING INTERFACE </p> <ul><li>A message passing library specification </li></ul> <ul><li><ul><li>Extended message-passing model </li></ul></li></ul> <ul><li><ul><li>Not a language or compiler specification </li></ul></li></ul> <ul><li><ul><li>Not a specific implementation, several implementations (like pthread) </li></ul></li></ul> <ul><li>standard for distributed memory, message passing, parallel computing </li></ul> <ul><li>Distributed Memory Shared Nothing approach! </li></ul> <ul><li>Some interconnection technology TCP, INFINIBAND (on our cluster) </li></ul> <p>5. GOALS OF MPI SPECIFICATION </p> <ul><li>Provide source code portability </li></ul> <ul><li>Allow efficient implementations </li></ul> <ul><li>Flexible to port different algorithms on different hardware environments </li></ul> <ul><li>Support for heterogeneous architectures processors not identical </li></ul> <p>6. REASONS FOR USING MPI </p> <ul><li>Standardization virtually all HPC platforms </li></ul> <ul><li>Portability same code runs on another platform </li></ul> <ul><li>Performance vendor implementations should exploit native hardware features </li></ul> <ul><li>Functionality 115 routines </li></ul> <ul><li>Availability a variety of implementations available </li></ul> <p>7. BASIC MODEL </p> <ul><li>Communicators and Groups </li></ul> <ul><li>Group </li></ul> <ul><li><ul><li>ordered set of processes </li></ul></li></ul> <ul><li><ul><li>each process is associated with a unique integer rank </li></ul></li></ul> <ul><li><ul><li>rank from 0 to (N-1) for N processes </li></ul></li></ul> <ul><li><ul><li>an object in system memory accessed by handle </li></ul></li></ul> <ul><li><ul><li>MPI_GROUP_EMPTY </li></ul></li></ul> <ul><li><ul><li>MPI_GROUP_NULL </li></ul></li></ul> <p>8. BASIC MODEL (CONTD.) </p> <ul><li>Communicator </li></ul> <ul><li><ul><li>Group of processes that may communicate with each other </li></ul></li></ul> <ul><li><ul><li>MPI messages must specify a communicator </li></ul></li></ul> <ul><li><ul><li>An object in memory </li></ul></li></ul> <ul><li><ul><li>Handle to access the object </li></ul></li></ul> <ul><li>There is a default communicator (automatically defined): </li></ul> <ul><li>MPI_COMM_WORLD </li></ul> <ul><li>identify the group of all processes </li></ul> <p>9. COMMUNICATORS </p> <ul><li>Intra-Communicator All processes from the same group </li></ul> <ul><li>Inter-Communicator Processes picked up from several groups </li></ul> <p>10. COMMUNICATOR AND GROUPS </p> <ul><li>For a programmer, group and communicator are one </li></ul> <ul><li>Allow you to organize tasks, based upon function, into task groups </li></ul> <ul><li>EnableCollective Communications(later)operations across a subset of related tasks </li></ul> <ul><li>safe communications </li></ul> <ul><li>Many Communicators at the same time </li></ul> <ul><li>Dynamic can be created and destroyed at run time </li></ul> <ul><li>Process may be in more than one group/communicator unique rank in every group/communicator </li></ul> <ul><li>implementing user defined virtual topologies </li></ul> <p>11. VIRTUAL TOPOLOGIES </p> <ul><li>coord (0,0): rank 0 </li></ul> <ul><li>coord (0,1): rank 1 </li></ul> <ul><li>coord (1,0): rank 2 </li></ul> <ul><li>coord (1,1): rank 3 </li></ul> <ul><li>Attach graph topology information to an existing communicator </li></ul> <p>12. SEMANTICS </p> <ul><li>Header file </li></ul> <ul><li><ul><li>#include (C) </li></ul></li></ul> <ul><li><ul><li>include mpif.h (fortran) </li></ul></li></ul> <ul><li><ul><li>Java, Python etc. </li></ul></li></ul> <p>Format:rc = MPI_Xxxxx(parameter, ... )Example:rc = MPI_Bsend(&amp;buf,count,type,dest,tag,comm)Error code:Returned as "rc". MPI_SUCCESS if successful 13. MPI PROGRAM STRUCTURE 14. MPI FUNCTIONS MINIMAL SUBSET </p> <ul><li>MPI_Init Initialize MPI </li></ul> <ul><li>MPI_Comm_size size of group associated with the communicator </li></ul> <ul><li>MPI_Comm_rank identify the process </li></ul> <ul><li>MPI_Send </li></ul> <ul><li>MPI_Recv </li></ul> <ul><li>MPI_Finalize </li></ul> <ul><li><ul><li>We will discuss simple ones first </li></ul></li></ul> <p>15. CLASSIFICATION OF MPI ROUTINES </p> <ul><li>Environment Management </li></ul> <ul><li><ul><li>MPI_Init, MPI_Finalize </li></ul></li></ul> <ul><li>Point-to-Point Communication </li></ul> <ul><li><ul><li>MPI_Send, MPI_Recv </li></ul></li></ul> <ul><li>Collective Communication </li></ul> <ul><li><ul><li>MPI_Reduce, MPI_Bcast </li></ul></li></ul> <ul><li>Information on the Processes </li></ul> <ul><li><ul><li>MPI_Comm_rank, MPI_Get_processor_name </li></ul></li></ul> <p>16. MPI_INIT </p> <ul><li>All MPI programs call this before using other MPI functions </li></ul> <ul><li><ul><li>int MPI_Init(int *pargc, char ***pargv); </li></ul></li></ul> <ul><li>Must be called in every MPI program </li></ul> <ul><li>Must be called onlyonceand before any other MPI functions are called </li></ul> <ul><li>Pass command line arguments to all processes </li></ul> <p>int main(int argc, char **argv) { MPI_Init(&amp;argc, &amp;argv); } 17. MPI_COMM_SIZE </p> <ul><li>Number of processes in the group associated with a communicator </li></ul> <ul><li><ul><li>int MPI_Comm_size(MPI_Comm comm, int *psize); </li></ul></li></ul> <ul><li>Find out number of processes being used by your application </li></ul> <p>int main(int argc, char **argv) { MPI_Init(&amp;argc, &amp;argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &amp;p); } 18. MPI_COMM_RANK </p> <ul><li>Rank of the calling process within the communicator </li></ul> <ul><li>Unique Rank between 0 and (p-1) </li></ul> <ul><li>Can be called task ID </li></ul> <ul><li><ul><li>int MPI_Comm_rank(MPI_Comm comm, int *rank); </li></ul></li></ul> <ul><li>Unique rank for a process in each communicator it belongs to </li></ul> <ul><li>Used to identify work for the processor </li></ul> <p>int main(int argc, char **argv) { MPI_Init(&amp;argc, &amp;argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &amp;p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &amp;rank); } 19. MPI_FINALIZE </p> <ul><li>Terminates the MPI execution environment </li></ul> <ul><li>Last MPI routine to be called in any MPI program </li></ul> <ul><li><ul><li>int MPI_Finalize(void); </li></ul></li></ul> <p>int main(int argc, char **argv) { MPI_Init(&amp;argc, &amp;argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &amp;p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &amp;rank); printf(no. of processors: %dn rank: %d, p, rank); MPI_Finalize(); } 20. 21. HOW TO COMPILE THIS </p> <ul><li>Open MPI implementation on our Cluster </li></ul> <ul><li>mpicc -o test_1 test_1.c </li></ul> <ul><li>Like gcc only </li></ul> <ul><li>mpicc not a special compiler </li></ul> <ul><li><ul><li>$mpicc: gcc: no input files </li></ul></li></ul> <ul><li><ul><li>Mpi implemented just as any other library </li></ul></li></ul> <ul><li><ul><li>Just a wrapper around gcc that includes required command line parameters </li></ul></li></ul> <p>22. HOW TO RUN THIS </p> <ul><li>mpirun -np X test_1 </li></ul> <ul><li>Will run X copies of program in your current run time environment </li></ul> <ul><li>np option specifies number of copies of program </li></ul> <p>23. MPIRUN </p> <ul><li>Only rank 0 process can receive standard input. </li></ul> <ul><li><ul><li>mpirun redirects standard input of all others to /dev/null </li></ul></li></ul> <ul><li><ul><li>Open MPI redirects standard input of mpirun to standard input of rank 0 process </li></ul></li></ul> <ul><li>Node which invoked mpirun need not be the same as the node for the MPI_COMM_WORLD rank 0 process </li></ul> <ul><li>mpirun directs standard output and error of remote nodes to the node that invoked mpirun </li></ul> <ul><li>SIGTERM, SIGKILL kill all processes in the communicator </li></ul> <ul><li>SIGUSR1, SIGUSR2 propagated to all processes </li></ul> <ul><li>All other signals ignored </li></ul> <p>24. A NOTE ON IMPLEMENTATION </p> <ul><li>I want to implement my own version of MPI </li></ul> <ul><li>Evidence </li></ul> <p>MPI_Init MPI Thread MPI_Init MPI Thread 25. SOME MORE FUNCTIONS </p> <ul><li>int MPI_Init (&amp;flag) </li></ul> <ul><li><ul><li>Check if MPI_Initialized has been called </li></ul></li></ul> <ul><li><ul><li>Why? </li></ul></li></ul> <ul><li>int MPI_Wtime() </li></ul> <ul><li><ul><li>Returns elapsed wall clock time in seconds (double precision) on the calling processor </li></ul></li></ul> <ul><li>int MPI_Wtick() </li></ul> <ul><li><ul><li>Returns the resolution in seconds (double precision) of MPI_Wtime() </li></ul></li></ul> <ul><li>Message Passing Functionality </li></ul> <ul><li><ul><li>That is what MPI is meant for! </li></ul></li></ul> <p>26. POINT TO POINT COMMUNICATION 27. POINT-TO-POINT COMMUNICATION </p> <ul><li>Communication between 2 and only 2 processes </li></ul> <ul><li>One sending and one receiving </li></ul> <ul><li>Types </li></ul> <ul><li><ul><li>Synchronous send </li></ul></li></ul> <ul><li><ul><li>Blocking send / blocking receive </li></ul></li></ul> <ul><li><ul><li>Non-blocking send / non-blocking receive </li></ul></li></ul> <ul><li><ul><li>Buffered send </li></ul></li></ul> <ul><li><ul><li>Combined send/receive </li></ul></li></ul> <ul><li><ul><li>"Ready" send </li></ul></li></ul> <p>28. POINT-TO-POINT COMMUNICATION </p> <ul><li>Processes can be collected intogroups </li></ul> <ul><li>Each message is sent in a context, and </li></ul> <ul><li>must be received in the samecontext </li></ul> <ul><li>A group and context together form aCommunicator </li></ul> <ul><li>A process is identified by itsrankin the group </li></ul> <ul><li>associated with a communicator </li></ul> <ul><li>Messages are sent with an accompanying user defined integer tag, to assist the receiving process in identifying the message </li></ul> <ul><li>MPI_ANY_TAG </li></ul> <p>29. POINT-TO-POINT COMMUNICATION </p> <ul><li>How is data described? </li></ul> <ul><li>How are processes identified? </li></ul> <ul><li>Howdoes the receiver recognize messages? </li></ul> <ul><li>What does it mean for these operations to complete? </li></ul> <p>30. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send( void * buf ,intcount , MPI_Datatypedatatype , intdest , inttag , MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <p>31. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send(void * buf ,intcount , MPI_Datatypedatatype , intdest , inttag , MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <p>32. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send(void * buf , intcount ,MPI_Datatype datatype , intdest , inttag , MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <p>33. 34. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send(void * buf , intcount , MPI_Datatypedatatype ,intdest , inttag , MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <ul><li>dest: the receiver </li></ul> <ul><li>tag: the label of the message </li></ul> <ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul> <p>35. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send(void * buf , intcount , MPI_Datatypedatatype , intdest ,inttag , MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <ul><li>dest: the receiver </li></ul> <ul><li>tag: the label of the message </li></ul> <ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul> <p>36. BLOCKING SEND/RECEIVE </p> <ul><li>int MPI_Send(void * buf , intcount , MPI_Datatypedatatype , intdest , inttag ,MPI_Commcommunicator ) </li></ul> <ul><li>buf: pointer - data to send </li></ul> <ul><li>count: number of elements in buffer . </li></ul> <ul><li>Datatype : which kind of data types in buffer ? </li></ul> <ul><li>dest: the receiver </li></ul> <ul><li>tag: the label of the message </li></ul> <ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul> <p>37. BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer 38. A WORD ABOUT SPECIFICATION </p> <ul><li>The user does not know if MPI implementation: </li></ul> <ul><li><ul><li>copies BUFFER in an internal buffer, start communication, and returns control before all the data are transferred. (BUFFERING) </li></ul></li></ul> <ul><li><ul><li>create links between processors, send data and return control when all the data are sent (but NOT received) </li></ul></li></ul> <ul><li><ul><li>uses a combination of the above methods </li></ul></li></ul> <p>39. BLOCKING SEND/RECEIVE (CONTD.) </p> <ul><li>"return" after it issafeto modify the application buffer </li></ul> <ul><li>Safe </li></ul> <ul><li><ul><li>modifications will not affect the data intended for the receive task </li></ul></li></ul> <ul><li><ul><li>does not imply that the data was actually received </li></ul></li></ul> <ul><li>Blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send </li></ul> <ul><li>A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive </li></ul> <ul><li>A blocking receive only "returns" after the data has arrived and is ready for use by the program </li></ul> <p>40. NON-BLOCKING SEND/RECEIVE </p> <ul><li>return almost immediately </li></ul> <ul><li>simply "request" the MPI library to perform the operation when it is able </li></ul> <ul><li>Cannot predict when that will happen </li></ul> <ul><li>request a send/receive and start doing other work! </li></ul> <ul><li>unsafe to modify the application buffer (your variable space) until you know that the non-blocking operation has been completed </li></ul> <ul><li>MPI_Isend (&amp;buf,count,datatype,dest,tag,comm,&amp;request) </li></ul> <ul><li>MPI_Irecv (&amp;buf,count,datatype,source,tag,comm,&amp;request) </li></ul> <p>41. NON-BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer 42. </p> <ul><li>To check if the send/receive operations have completed </li></ul> <ul><li>int MPI_Irecv (void *buf, int count,MPI_Datatype type, int dest, int tag,MPI_Comm comm, MPI_Request *req); </li></ul> <ul><li>int MPI_Wait(MPI_Request *req, MPI_Status *status); </li></ul> <ul><li><ul><li>A call to this subroutine cause the code to wait until the communication pointed by req is complete </li></ul></li></ul> <ul><li><ul><li>input/output, identifier associated to a communications </li></ul></li></ul> <ul><li><ul><li>event (initiated by MPI_ISEND or MPI_IRECV). </li></ul></li></ul> <ul><li><ul><li>input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). </li></ul></li></ul> <p>NON-BLOCKING SEND/RECEIVE (CONTD.) 43. </p> <ul><li>int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status); </li></ul> <ul><li><ul><li>A call to this subroutine sets flag to true if the communication pointed by req is complete, sets flag to false otherwise. </li></ul></li></ul> <p>NON-BLOCKING SEND/RECEIVE (CONTD.) 44. STANDARD MODE </p> <ul><li>Returns when Sender is free to access and overwrite the send buffer. </li></ul> <ul><li>Might be copied directly into the matching receive buffer, or might be copied into a temporary system buffer. </li></ul> <ul><li>Message buffering decouples the send and receive operations. </li></ul> <ul><li>Message buffering can be expensive. </li></ul> <ul><li>It is up to MPI to decide whether outgoing messages will be buffered </li></ul> <ul><li>The standard mode send isnon-local . </li></ul> <p>45. SYNCHRONOUS MODE </p> <ul><li>Send can be started whether or not a matching receive was posted. </li></ul> <ul><li>Send completes successfully only if a corresponding receive was already posted and has already started to receive the message sent. </li></ul> <ul><li>Blocking send &amp; Blocking receive in synchronous mode. </li></ul> <ul><li>Simulate a synchronous communication. </li></ul> <ul><li>Synchronous Send isnon-local .</li></ul> <p>46. BUFFERED MODE </p> <ul><li>Send operation can be started whether or not a matching receive has been posted. </li></ul> <ul><li>It may complete before a matching receive is posted. </li></ul> <ul><li>Operation islocal. </li></ul> <ul><li>MPI must buffer the outgoing message. </li></ul> <ul><li>Error will occur if there is insufficient buffer space. </li></ul> <ul><li>The amount of available buffer space is controlled by the user. </li></ul> <p>47. BUFFER MANAGEMENT </p> <ul><li>int MPI_Buffer_attach( void* buffer, int size)</li></ul> <ul><li>Provides to MPI a buffer in the user's memory to be used for buffering outgoing messages. </li></ul> <ul><li>int MPI_Buffer_detach( void* buffer_addr, int* size)</li></ul> <ul><li>Detach the buffer currently associated with MPI. </li></ul> <p>MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE);/* a buffer of BUFFSIZE bytes can now be used by MPI_Bsend */MPI_Buffer_detach( &amp;buff, &amp;size);/* Buffer size reduced to zero */MPI_Buffer_attach( buff, size);/* Buffer of BUFFSIZE bytes available again */ 48. READY MODE </p> <ul><li>A send may be startedonlyif the matching receive is already posted. </li></ul> <ul><li>The user must be sure of this. </li></ul> <ul><li>If the receive is not already posted, the operation is erroneous and its outcome is undefined. </li></ul> <ul><li>Completion of the send operation does not depend on the status of a matching receive. </li></ul> <ul><li>Merely indicates that the send buffer can be reused. </li></ul> <ul><li>Ready-send could be replaced by a standard-send with no effect on the behavior of the program other than performance. </li></ul> <p>49. ORDER AND FAIRNESS </p> <ul><li>Order: </li></ul> <ul><li><ul><li>MPIMessages arenon-overtaking . </li></ul></li></ul> <ul><li><ul><li>When a receive matches 2 messages. </li></ul></li></ul> <ul><li><ul><li>When a sent message matches 2 receive statements. </li></ul></li></ul> <ul><li><ul><li>Message-passing code is deterministic, unless the processes are multi-threaded or the wild-cardMPI_ANY_SOURCEis used in a receive statement. </li></ul></li></ul> <ul><li>Fairness: </li></ul> <ul><li><ul><li>MPI does not guarantee fairness </li></ul></li></ul> <ul><li><ul><li>Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete.</li></ul></li></ul> <p>50. EXAMPLE OF NON-OVERTAKING MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank.EQ.0) THENCALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ier

Recommended

View more >