parallel processing (cs 730) lecture 9: advanced point to point communication
DESCRIPTION
Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication. Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco. Introduction. Objective: To further examine message passing communication patterns. Topics Implementing Allgather Ring - PowerPoint PPT PresentationTRANSCRIPT
Oct. 30, 2002 Parallel Processing 1
Parallel Processing (CS 730)
Lecture 9: Advanced Point to Point Communication
Jeremy R. Johnson
*Parts of this lecture was derived from chapters 13 in Pacheco
Oct. 30, 2002 Parallel Processing 2
Introduction
• Objective: To further examine message passing communication patterns.
• Topics– Implementing Allgather
• Ring
• Hypercube
– Non-blocking send/recv• MPI_Isend
• MPI_Wait
• MPI_Test
Oct. 30, 2002 Parallel Processing 3
Broadcast/Reduce RingP3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
Oct. 30, 2002 Parallel Processing 4
Bi-directional Broadcast RingP3 P2
P1P0
P3 P2
P1P0
P3 P2
P1P0
Oct. 30, 2002 Parallel Processing 5
Allgather Ring
x3 x2
x0 x1
P3 P2
P1P0
x2,x3 x1,x2
x0,x3 x0,x1
P3 P2
P1P0
x1,x2,x3
x0,x2,x3
P3 P2
P1P0
x0,x1,x2,x3
P3 P2
P1P0
x0,x1,x2
x0,x1,x3 x0,x1,x2,x3
x0,x1,x2,x3
x0,x1,x2,x3
Oct. 30, 2002 Parallel Processing 6
AllGather• int MPI_AllGather(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• MPI_Comm communicator /* in */)
Process 0
Process 1
Process 2
Process 3
x0
x1
x2
x3
Oct. 30, 2002 Parallel Processing 7
Allgather_ring
void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {
int i, p, my_rank;
int successor, predecessor;
int send_offset, recv_offset;
MPI_Status status;
MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);
for (i=0; i < blocksize; i++)
y[i + my_rank*blocksize] = x[i];
successor = (my_rank + 1) % p;
predecessor = (my_rank – 1 + p) % p;
Oct. 30, 2002 Parallel Processing 8
Allgather_ring
for (i=0; i < p-1; i++) {
send_offset = ((my_rank – i + p) % p)*blocksize;
recv_offset = ((my_rank –i – 1+p) % p)*blocksize;
MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm);
MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,
comm,&status);
}
}
Oct. 30, 2002 Parallel Processing 9
Hypercube
• Graph (recursively defined)• n-dimensional cube has 2n nodes with each node
connected to n vertices• Binary labels of adjacent nodes differ in one bit
000 001
101100
010 011
110 111
00 01
10 11
0 1
Oct. 30, 2002 Parallel Processing 10
000 001
101100
010 011
110 111
Broadcast/Reduce
Oct. 30, 2002 Parallel Processing 11
000 001
101100
010 011
110 111
AllgatherProcess
000 x0001 x1010 x2011 x3100 x4101 x5110 x6111 x7
Data
Process000 x0 x4001 x1 x5010 x2 x6011 x3 x7100 x0 x4101 x1 x5110 x2 x6111 x3 x7
Data
Process000 x0 x2 x4 x6001 x1 x3 x5 x7010 x0 x2 x4 x6011 x1 x3 x5 x7100 x0 x2 x4 x6101 x1 x3 x5 x7110 x0 x2 X4 x6111 x1 x3 x5 x7
Data
Oct. 30, 2002 Parallel Processing 12
Allgather
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Oct. 30, 2002 Parallel Processing 13
Allgather_cube
void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {
int i, d, p, my_rank;
unsigned eor_bit, and_bits;
int stage, partner;
MPI_Datatype hole_type;
int send_offset, recv_offset;
MPI_Status status;
int log_base2(int p);
MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);
for (i=0; i < blocksize; i++)
y[i + my_rank*blocksize] = x[i];
d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1;
Oct. 30, 2002 Parallel Processing 14
Allgather_cube
for (stage = 0; stage < d; stage++) {
partner = my_rank ^ eor_bit;
send_offset = (my_rank & and_bits) * blocksize;
recv_offset = (partner & and_bits)*blocksize;
MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage)*blocksize,
MPI_FLOAT,&hold_type);
MPI_Type_commit(&hole_type);
MPI_Send(y+send_offset,1,hold_type,partner, 0, comm);
MPI_Recv(y+recv_offset,1,hold_type,partner, 0, comm,&status);
MPI_Type_free(&hole_type);
eor_bit = eor_bit >> 1;
and_bits = and_bits >> 1;
}
Oct. 30, 2002 Parallel Processing 15
Buffering Assumption
• Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur.
• SendRecv can be used to guarantee that deadlock does not occur.
Oct. 30, 2002 Parallel Processing 16
SendRecv
• int MPI_Sendrecv(• void* send_buf /* in */,• int send_count /* in */,• MPI_Datatype send_type /* in */,• int dest /* in */,• int send_tag /* in */,• void* recv_buf /* out */,• int recv_count /* in */,• MPI_Datatype recv_type /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)
Oct. 30, 2002 Parallel Processing 17
SendRecvReplace
• int MPI_Sendrecv_replace(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int send_tag /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)
Oct. 30, 2002 Parallel Processing 18
Nonblocking Send/Recv
• Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur.
• The communication is posted and can be tested later for completion
• int MPI_Isend( /* Immediate */• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)
Oct. 30, 2002 Parallel Processing 19
Nonblocking Send/Recv
• int MPI_Irecv(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int source /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)
• int MPI_Wait(• MPI_Request* request /* in/out a*/,• MPI_Status* status /* out */)
• int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status);
Oct. 30, 2002 Parallel Processing 20
Allgather_ring (Overlapped)
recv_offset = ((my_rank –1 + p) % p)*blocksize;
for (i=0; i < p-1; i++) {
MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor,
0, comm, &send_request);
MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,
comm,&recv_request);
send_offset = ((my_rank – i -1 + p) % p)*blocksize;
recv_offset = ((my_rank – i – 2 +p) % p)*blocksize;
MPI_Wait(&send_request, &status);
MPI_Wait(&recv_request, &status);
}
Oct. 30, 2002 Parallel Processing 21
AlltoAll
• Sequence of permutations implemented with send_recv
0 1 2 3 4 5 6 7
7 0 1 2 3 4 5 6
6 7 0 1 2 3 4 5
5 6 7 0 1 2 3 4
4 5 6 7 0 1 2 3
3 4 5 6 7 0 1 2
2 3 4 5 6 7 0 1
1 2 3 4 5 6 7 0
Oct. 30, 2002 Parallel Processing 22
AlltoAll (2 way)
• Sequence of permutations implemented with send_recv
0 1 2 3 4 5 6 7
1 0 3 2 5 4 7 6
2 3 0 1 6 7 4 5
3 2 1 0 7 6 5 4
4 5 6 7 0 1 2 3
5 4 7 6 1 0 3 2
6 7 4 5 2 3 0 1
7 6 5 4 3 2 1 0
Oct. 30, 2002 Parallel Processing 23
Communication Modes
• Synchronous (wait for receive)
• Ready (make sure receive has been posted)
• Buffered (user provides buffer space)