parallel programming in mpi part 2

30
1 Parallel Programming in MPI part 2

Upload: zeph-lawrence

Post on 02-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Parallel Programming in MPI part 2. 1. Today's Topic. ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications - PowerPoint PPT Presentation

TRANSCRIPT

1

11Parallel Programming in MPIpart 22Today's Topic Non-Blocking CommunicationExecute other instructions while waiting for the completion of a communication.

Implementation of collective communications

MPIMeasuring execution time of MPI programs

Deadlock 3Today's Topic Non-Blocking CommunicationExecute other instructions while waiting for the completion of a communication.

Implementation of collective communications

MPIMeasuring execution time of MPI programs

Deadlock Non-blocking communication functions = Non-blocking = Do not wait for the completion of an instruction and proceed to the next instructionExample) MPI_Irecv & MPI_Wait

4MPI_Recv

Wait for the arrival of dataMPI_IrecvProceed to the next instruction without waiting for the datadataBlockingnext instructionsnext instructionsMPI_WaitdataNon-Blocking55MPI_IrecvNon-Blocking ReceiveParameters:start address for storing received data,number of elements, data type,rank of the source, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),request

request: Communication RequestUsed for Waiting completion of this communication

Example) MPI_Request req; ... MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req); ... MPI_Wait(&req, &status);5Usage: int MPI_Irecv(void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Request *r);66MPI_IsendNon-Blocking SendParameters:start address for sending data,number of elements, data type,rank of the destination, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),request

Example) MPI_Request req; ... MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); ... MPI_Wait(&req, &status);6Usage: int MPI_Isend(void *b, int c, MPI_Datatype d, int dest, int t, MPI_Comm comm, MPI_Request *r);Non-Blocking Send?Blocking send (MPI_Send): Wait for the data to be copied to somewhere else.Until completion of the data to be transferred to the network or, until completion of the data to be copied to a temporal memory.

Non-Blocking send (MPI_Recv): 7Notice: Data is not sure in non-blocking communicationsMPI_Irecv: MPI_WaitValue of the variable specified for receiving data is not fixed before MPI_Wait8MPI_Irecv to A... ~ = A...MPI_Wait10A50Aarriveddata50Value of A at herecan be 10 or 50~ = AValue of A is 50Notice: Data is not sure in non-blocking communicationsMPI_Isend: MPI_WaitIf the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable. 9MPI_Isend A ... A = 50 ...MPI_Wait10A50Adata sent10 or 50 A = 100Modifying value of A here causes incorrect communication You can modify value of A at here without any problemMPI_WaitMPI_Isend MPI_IrecvWait for the completion of MPI_Isend or MPI_IrecvMake sure that sending data can be modified,or receiving data can be referred.

Parameters:request, status

status: MPI_Irecv status The status of the received data is stored at the completion of MPI_Irecv

10Usage: int MPI_Wait(MPI_Request *req, MPI_Status *stat);MPI_WaitallWait for the completion of specified number of non-blocking communicationsParameters:count, requests, statuses

count: The number of non-blocking communications

requests, statuses: count MPI_Request MPI_StatusArrays of MPI_Request or MPI_Status that consists at least 'count' number of elements.

11Usage: int MPI_Waitall(int c, MPI_Request *requests, MPI_Status *statuses);12Today's Topic Non-Blocking CommunicationExecute other instructions while waiting for the completion of a communication.

Implementation of collective communications

MPIMeasuring execution time of MPI programs

Deadlock Inside of the functions of collective communications MPI_Send, MPI_Recv, MPI_Isend, MPI_IrecvUsually, functions of collective communications are implemented by using message passing functions.13Inside of MPI_BcastOne of the most simple implementations14int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm){ int i, myid, procs; MPI_Status st;

MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ for (i = 0; i < procs) if (i != root) MPI_Send(a, c, d, i, 0, comm); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0;}Another implementation: With MPI_Isend15int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm){ int i, myid, procs, cntr; MPI_Status st, *stats; MPI_Request *reqs; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs); reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs); cntr = 0; for (i = 0; i < procs) if (i != root) MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++])); MPI_Waitall(procs-1, reqs, stats); free(stats); free(reqs); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0;}Another implementation: Binomial Tree16int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm){ int i, myid, procs; MPI_Status st; int mask, relative_rank, src, dst; int tag = 1, success = 0;

MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); relative_rank = myid - root; if (relative_rank < 0) relative_rank += procs; mask = 1; while (mask < num_procs){ if (relative_rank & mask){ src = myid - mask; if (src < 0) src += procs; MPI_Recv(a, c, d, src, 0, comm, &st); break; } mask = 1; while (mask > 0){ if (relative_rank + mask < procs){ dst = myid + mask; if (dst >= procs) dst -= procs; MPI_Send (a, c, d, dst, 0, comm); } mask >>= 1; } return 0;}Flow of Binomial TreeUse 'mask' to determine when and how to Send/Recv17Rank 0Rank 1Rank 2Rank 3Rank 4Rank 5Rank 6Rank 7mask = 1mask = 2mask = 4mask = 4mask = 2mask = 1Send to 4Send to 2Send to 1mask = 1Recv from 0mask = 1mask = 2mask = 1mask = 2mask = 4mask = 1mask = 2mask = 1Recv from 2mask = 1Recv from 4mask = 1Recv from 6Recv from 0Recv from 0Recv from 4mask = 1Send to 3mask = 2Send to 6mask = 1Send to 7mask = 1Send to 518Today's Topic Non-Blocking CommunicationExecute other instructions while waiting for the completion of a communication.

Implementation of collective communications

MPIMeasuring execution time of MPI programs

Deadlock MPI Measure the time of MPI programsMPI_Wtime Returns the current time in seconds.Example)Measuretime here ... double t1, t2; ... t1 = MPI_Wtime(); t2 = MPI_Wtime();

printf("Elapsed time: %e sec.\n", t2 t1); Problem on measuring time in parallel programs? Each process measures different time. Which time is the time we want?

20Read ReadSendReadSendRank 0Receive

Receive

Rank 1Rank 2t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); Measuretime here MPI_Barrier Use MPI_BarrierMPI_Barrier Synchronize processes before each measurementFor measuring total execution time.21Read ReadSendReadSendRank 0Receive

Receive

Rank 1Rank 2t1 = MPI_Wtime(); t1 = MPI_Wtime(); MPI_Barrier

MPI_Barrier

MPI_BarrierMPI_Barrier

MPI_BarrierMPI_BarrierMeasuretime here Detailed analysisAverageMPI_Reduce can be used to achieve the average:

MAX and MINUse MPI_Gather to gather all of the results to Rank 0.Let Rank 0 to find MAX and MIN22 double t1, t2, t, total;

t1 = MPI_Wtime(); ... t2 = MPI_Wtime(); t = t2 t1; MPI_Reduce(&t, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myrank == 0) printf("Ave. elapsed: %e sec.\n", total/procs);(Max)(Ave)(Min) Relationships among Max, Ave and Min Can be used for checking the load-balance.

23Max Aveis largeMax Ave is smallAve Min is largeNGMostly OK

Ave Min is smallNG

OK

Time includes Computation Time and Communication Time Measuring time for communications24 double t1, t2, t3, t4 comm=0; t3 = MPI_Wtime(); for (i = 0; i < N; i++){ computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 t1; computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 t1; } t4 = MPI_Wtime();Analyze computation timeComputation time = Total time - Communication timeOr, just measure the computation time

Balance of computation time shows balance of the amount of computation

: Communication time is difficult to analyze since it consists waiting time caused by load-imbalance. ==> Balance computation first.2526Today's Topic Non-Blocking CommunicationExecute other instructions while waiting for the completion of a communication.

Implementation of collective communications

MPIMeasuring execution time of MPI programs

Deadlock Deadlock A status of a program in which it cannot proceed by some reasons.MPIPlaces you need to be careful for deadlocks:1. MPI_Recv, MPI_Wait, MPI_Waitall

2. Collective communications A program cannot proceed until all processes call the same collective communication function if (myid == 0){ MPI_Recv from rank 1 MPI_Send to rank 1 } if (myid == 1){ MPI_Recv from rank 0 MPI_Send to rank 0 } if (myid == 0){ MPI_Irecv from rank 1 MPI_Send to rank 1 MPI_Wait } if (myid == 1){ MPI_Irecv from rank 0 MPI_Send to rank 0 MPI_Wait }Wrong case:One solution: use MPI_IrecvSummary Parallel programs need distribution of computation, distribution of data and communications.

Parallelization does not always speed up programs.

There are non-parallelizable programs

Be careful about deadlocks.28Report) Make Reduce function by yourself my_reduceFill the inside of 'my_reduce' function in the programshown in the next slidemy_reduce: MPI_Reduce Simplified version of MPI_Reduce. 0 MPI_COMM_WORLDCalculates total sum of integer numbers. The root rank is always 0.The communicator is always MPI_COMM_WORLD.

Any algorithm is OK.2930#include #include #include "mpi.h"#define N 20int my_reduce(int *a, int *b, int c){

return 0;}

int main(int argc, char *argv[]){ int i, myid, procs; int a[N], b[N];

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); for (i = 0; i < N; i++){ a[i] = i; b[i] = 0; } my_reduce(a, b, N); if (myid == 0) for (i = 0; i < N; i++) printf("b[%d] = %d , correct answer = %d\n", i, b[i], i*procs); MPI_Finalize(); return 0;}complete here by yourself