parallel processing (cs 730) lecture 9: advanced point to point communication

Oct. 30, 2002 Parallel Processing 1

Parallel Processing (CS 730)

Lecture 9: Advanced Point to Point Communication

Jeremy R. Johnson

*Parts of this lecture was derived from chapters 13 in Pacheco


Introduction

• Objective: To further examine message passing communication patterns.

• Topics– Implementing Allgather

• Ring

• Hypercube

– Non-blocking send/recv• MPI_Isend

• MPI_Wait

• MPI_Test


Broadcast/Reduce RingP3 P2

P1P0

P3 P2

P1P0

P3 P2

P1P0

P3 P2

P1P0


Bi-directional Broadcast RingP3 P2

P1P0

P3 P2

P1P0

P3 P2

P1P0


Allgather Ring

x3 x2

x0 x1

P3 P2

P1P0

x2,x3 x1,x2

x0,x3 x0,x1

P3 P2

P1P0

x1,x2,x3

x0,x2,x3

P3 P2

P1P0

x0,x1,x2,x3

P3 P2

P1P0

x0,x1,x2

x0,x1,x3 x0,x1,x2,x3

x0,x1,x2,x3

x0,x1,x2,x3


AllGather• int MPI_AllGather(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• MPI_Comm communicator /* in */)

Process 0

Process 1

Process 2

Process 3

x0

x1

x2

x3


Allgather_ring

void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {

int i, p, my_rank;

int successor, predecessor;

int send_offset, recv_offset;

MPI_Status status;

MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);

for (i=0; i < blocksize; i++)

y[i + my_rank*blocksize] = x[i];

successor = (my_rank + 1) % p;

predecessor = (my_rank – 1 + p) % p;


Allgather_ring

for (i=0; i < p-1; i++) {

send_offset = ((my_rank – i + p) % p)*blocksize;

recv_offset = ((my_rank –i – 1+p) % p)*blocksize;

MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm);

MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,

comm,&status);

}

}


Hypercube

• Graph (recursively defined)• n-dimensional cube has 2n nodes with each node

connected to n vertices• Binary labels of adjacent nodes differ in one bit

000 001

101100

010 011

110 111

00 01

10 11

0 1


000 001

101100

010 011

110 111

Broadcast/Reduce


000 001

101100

010 011

110 111

AllgatherProcess

000 x0001 x1010 x2011 x3100 x4101 x5110 x6111 x7

Data

Process000 x0 x4001 x1 x5010 x2 x6011 x3 x7100 x0 x4101 x1 x5110 x2 x6111 x3 x7

Data

Process000 x0 x2 x4 x6001 x1 x3 x5 x7010 x0 x2 x4 x6011 x1 x3 x5 x7100 x0 x2 x4 x6101 x1 x3 x5 x7110 x0 x2 X4 x6111 x1 x3 x5 x7

Data


Allgather

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7


Allgather_cube

void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) {

int i, d, p, my_rank;

unsigned eor_bit, and_bits;

int stage, partner;

MPI_Datatype hole_type;

int send_offset, recv_offset;

MPI_Status status;

int log_base2(int p);

MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank);

for (i=0; i < blocksize; i++)

y[i + my_rank*blocksize] = x[i];

d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1;


Allgather_cube

for (stage = 0; stage < d; stage++) {

partner = my_rank ^ eor_bit;

send_offset = (my_rank & and_bits) * blocksize;

recv_offset = (partner & and_bits)*blocksize;

MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage)*blocksize,

MPI_FLOAT,&hold_type);

MPI_Type_commit(&hole_type);

MPI_Send(y+send_offset,1,hold_type,partner, 0, comm);

MPI_Recv(y+recv_offset,1,hold_type,partner, 0, comm,&status);

MPI_Type_free(&hole_type);

eor_bit = eor_bit >> 1;

and_bits = and_bits >> 1;

}


Buffering Assumption

• Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur.

• SendRecv can be used to guarantee that deadlock does not occur.


SendRecv

• int MPI_Sendrecv(• void* send_buf /* in */,• int send_count /* in */,• MPI_Datatype send_type /* in */,• int dest /* in */,• int send_tag /* in */,• void* recv_buf /* out */,• int recv_count /* in */,• MPI_Datatype recv_type /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)


SendRecvReplace

• int MPI_Sendrecv_replace(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int send_tag /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)


Nonblocking Send/Recv

• Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur.

• The communication is posted and can be tested later for completion

• int MPI_Isend( /* Immediate */• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int dest /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)


Nonblocking Send/Recv

• int MPI_Irecv(• void* buffer /* in */,• int count /* in */,• MPI_Datatype datatype /* in */,• int source /* in */,• int tag /* in */,• MPI_Comm comm /* in */,• MPI_Request* request /* out */)

• int MPI_Wait(• MPI_Request* request /* in/out a*/,• MPI_Status* status /* out */)

• int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status);


Allgather_ring (Overlapped)

recv_offset = ((my_rank –1 + p) % p)*blocksize;

for (i=0; i < p-1; i++) {

MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor,

0, comm, &send_request);

MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0,

comm,&recv_request);

send_offset = ((my_rank – i -1 + p) % p)*blocksize;

recv_offset = ((my_rank – i – 2 +p) % p)*blocksize;

MPI_Wait(&send_request, &status);

MPI_Wait(&recv_request, &status);

}


AlltoAll

• Sequence of permutations implemented with send_recv

0 1 2 3 4 5 6 7

7 0 1 2 3 4 5 6

6 7 0 1 2 3 4 5

5 6 7 0 1 2 3 4

4 5 6 7 0 1 2 3

3 4 5 6 7 0 1 2

2 3 4 5 6 7 0 1

1 2 3 4 5 6 7 0


AlltoAll (2 way)

• Sequence of permutations implemented with send_recv

0 1 2 3 4 5 6 7

1 0 3 2 5 4 7 6

2 3 0 1 6 7 4 5

3 2 1 0 7 6 5 4

4 5 6 7 0 1 2 3

5 4 7 6 1 0 3 2

6 7 4 5 2 3 0 1

7 6 5 4 3 2 1 0


Communication Modes

• Synchronous (wait for receive)

• Ready (make sure receive has been posted)

• Buffered (user provides buffer space)

parallel processing (cs 730) lecture 9: advanced point to point communication

Documents