1 why derived data types message data contains different data types can use several separate...
TRANSCRIPT
1
Why Derived Data Types
Message data contains different data types Can use several separate messages performance may not be
good Message data involves non-contiguous memory
locations Can copy non-contiguous data to a contiguous storage, then
communicate additional memory copies
i
j
k
A[100][80][50]
struct _tagStudent { int id; double grade; char note[100];};
struct _tagStudent Students[25];
Surface [i][j][0]
2
Derived Data Type
MPI’s solution: derived data typeNo additional memory copyTransfer directly of data with various shape and size
Idea: Specify the memory layout of data and corresponding basic data types.
Usage:Construct derived data typeCommit derived data typeUse it in communication routines where MPI_Datatype argument is required.
Free derived data type
3
Type Map & Type Signature A general data type consists of
A sequence of basic data types A sequence of byte displacements
Type map: sequence of pairs (basic data type, displacement) for the general data type E.g. double A[2] {(MPI_DOUBLE,0), (MPI_DOUBLE,8)}
_tagStudent {(MPI_INT,0), (MPI_DOUBLE,8), (MPI_CHAR,16), …}
Type signature: sequence of basic data types for the general data type E.g. double A[2] {MPI_DOUBLE, MPI_DOUBLE} _tagStudent {MPI_INT, MPI_DOUBLE, MPI_CHAR …}
4
Communication Buffer
Given a type map {(type0,disp0),(type1,disp1)} and base address buf, the communication buffer:Consists of 2 entries1st entry at address buf+disp0, of type type0;2nd entry at address buf+disp1, of type type1.
E.g. double A[2] 1st entry at A, of type MPI_DOUBLE; 2nd entry at A+8, of type MPI_DOUBLE.
If type map contains n entries similar semantics
5
Type Constructor
newtype is a concatenation of count copies of oldtype
oldtype can be a basic data type or a derived data type
i
j
k
A[100][80][50]
int MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype)MPI_TYPE_CONTIGUOUS(COUNT, OLDTYPE, NEWTYPE, IERROR) integer COUNT, OLDTYPE, NEWTYPE, IERROR
Surface: A[0][:][:]
MPI_Datatype face_jk;MPI_Type_contiguous(80*50, MPI_DOUBLE, &face_jk);MPI_Type_commit(&face_jk);MPI_Send(&A[0][0][0],1,face_jk,rank,tag,comm);// MPI_Send(&A[0][0][0],80*50,MPI_DOUBLE,rank,tag,comm);MPI_Send(&A[99][0][0],1,face_jk,rank,tag,comm);...MPI_Type_free(&face_jk);
6
Type Constructor
count – number of blocks blocklength – number of elements in each block, in terms of oldtype stride – number of elements between start of each block, in terms of
oldtype oldtype – old data type, can be basic or derived data type newtype – created new data type
Data consists of equally spaced blocks: same oldtype, same block length, same spacing in terms of oldtype Each block is a concatenation of blocklength copies of old datatype Spacing between blocks is stride number of oldtype.
int MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype)
blocklength
stride
7
Example
A[4][4]
double A[4][4];MPI_Datatype column;
MPI_Type_vector(4,1,4,MPI_DOUBLE, &column);MPI_Type_commit(&column);MPI_Send(&A[0][1],1,column,rank,tag,comm);MPI_Send(&A[0][3],1,column, rank, tag, comm);...
i
j
k
A[100][80][50]
Surface: A[:][0][:]
double A[100][80][50];MPI_Datatype face_ik;
MPI_Type_vector(100,50,80*50,MPI_DOUBLE,&face_ik);MPI_Type_commit(&face_ik);MPI_Send(&A[0][0][0],1,face_ik,rank,tag,comm);MPI_Send(&A[0][1][0],1,face_ik,rank,tag,comm);MPI_Send(&A[0][79][0],1,face_ik,rank,tag,comm);...
8
Type Constructor
Same as MPI_Type_vector, except that stride is in terms of number of bytes, not number of elements of oldtype.
blocklength is still in terms of number of elements of oldtype.
Same oldtype in different blocks; same block lengths; same spacing between neighboring blocks, but in terms of bytes (not in terms of oldtype)
int MPI_Type_hvector(int count, int blocklength, MPI_Aint stride, MPI_Datatype oldtype, MPI_Datatype *newtype)
blocklength
stride
9
Example
A[4][4]
double A[4][4];MPI_Datatype column;
MPI_Type_hvector(4,1,4*sizeof(double), MPI_DOUBLE, &column);MPI_Type_commit(&column);MPI_Send(&A[0][1],1,column,rank,tag,comm);...
i
j
k
A[100][80][50]
Surface: A[:][:][49]
double A[100][80][50];MPI_Datatype face_ij, line_j;
MPI_Type_vector(80,1,50,MPI_DOUBLE,&line_j);MPI_Type_hvector(100,1,80*50*sizeof(double), line_j, &face_ij);MPI_Type_commit(&face_ij);MPI_Send(&A[0][0][49],1,face_ij,rank,tag,comm);...
10
Type Constructor
count – number of blocks array_blocklen – number of elements per block in term s of oldtype,
dimension: count. array_disp – displacements of each block in terms of number of elements
of oldtype, dimension: count oldtype – old data type newtype – new data type
Data consists of count blocks of oldtype: same oldtype; different block lengths; different spacing between blocks block i has length array_blocklen[i] Block i has displacement array_disp[i], in terms of number of oldtype
elements.
int MPI_Type_indexed(int count, int *array_blocklen, int *array_disp, MPI_Datatype oldtype, MPI_Datatype *newtype)
blocklen[i]
disp[i]
11
Upper triangle of matrix A[4][4]
double A[4][4];MPI_Datatype upper_tri;int blocklen[4], disp[4];int i;for(i=0;i<4;i++) { blocklen[i] = 4-i; disp[i] = (4+1)*i;}MPI_Type_indexed(4,blocklen,disp,MPI_DOUBLE,&upper_tri);MPI_Type_commit(&upper_tri);MPI_Send(&A[0][0], 1, upper_tri, rank, tag, comm);...
// Strict lower triangularMPI_Type lower_tri;for(i=0;i<3;i++) { blocklen[i] = i+1; disp[i] = (i+1)*4;}MPI_Type_indexed(3, blocklen, disp, MPI_DOUBLE, &lower_tri);...
Example 1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
12
Type Constructor
Same as MPI_Type_indexed, except that array_disp is specified in terms of number of bytes instead of number of oldtype.
Same oldtype; Different block lengths; different spacing between blocks, displacement in terms of bytes, instead of number of oldtype elements
int MPI_Type_hindexed(int count, int *array_blocklen, MPI_Aint *array_disp, MPI_Datatype oldtype, MPI_Datatype *newtype)blocklen[i]
disp[i]
13
Upper triangle of matrix A[4][4]
double A[4][4];MPI_Datatype upper_tri;int blocklen[4]; MPI_Aint disp[4];int i;for(i=0;i<4;i++) { blocklen[i] = 4-i; disp[i] = (4+1)*i*sizeof(double);}MPI_Type_hindexed(4,blocklen,disp,MPI_DOUBLE,&upper_tri);MPI_Type_commit(&upper_tri);MPI_Send(A, 1, upper_tri, rank, tag, comm);...
Example 1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
14
Address Calculation
Returns the address of the memory location (or variable) The difference between two addresses gives the number of bytes
between these two memory locations. Address is different from pointers in C/C++
Cannot do pointer subtraction Pointer + (or -) an integer n new location: n*sizeof(data-type)
int MPI_Address(void *location, MPI_Aint *address)MPI_ADDRESS(location, address) <type> location(*) integer address
Struct _tagStudent{ int id; double grade; char note[100];} A_Student;
MPI_Aint addr1, addr2, disp;MPI_Address(&A_Student.id, &addr1);MPI_Address(&A_Student.grade, &addr2);Disp = addr2 – addr1;
15
Type Constructor
count – number of blocks array_blocklen – array, number of elements in each block, in terms of
oldtype; dimension: count array_disp – array, displacements of each block, in terms of number of
bytes; dimension: count array_types, array, data types of each block; dimension: count newtype – new data type
Different oldtype; different block lengths; different spacing between blocks, displacement in terms of bytes
Each block may have different data types Most general
Int MPI_Type_struct(int count, int *array_blocklen, MPI_Aint *array_disp, int *array_types, MPI_Datatype *new type)
blocklen[i]type[i]
disp[i]
16
struct _tagStudent { int id; double grade; char note[100];};
struct _tagStudent Students[25];
MPI_Datatype one_student, all_students;int block_len[3];MPI_Datatype types[3];MPI_Aint disp[3];block_len[0] = block_len[1] = 1; block_len[2] = 100;types[0] = MPI_INT;types[1] = MPI_DOUBLE;types[2] = MPI_CHAR;MPI_Address(&Students[0].id, &disp[0]); // memory addressMPI_Address(&Students[0].grade, &disp[1]);MPI_Address(&Students[0].note[0],&disp[2]);disp[1] = disp[1]-disp[0];disp[2] = disp[2]-disp[0];disp[0] = 0;MPI_Type_struct(3, block_len, disp, types, &one_student);MPI_Type_contiguous(25, one_student, &all_students);MPI_Type_commit(&all_students);MPI_Send(Students, 1, all_students, rank, tag, comm);// MPI_Type_commit(&one_student);// MPI_Send(Students, 25, one_student, rank, tag, comm);...
Example
17
Type Extent “Length” of a data type in terms of bytes
E.g. double – MPI_DOUBLE – extent is 8 or sizeof(double) int – MPI_INT – extent is 2 or sizeof(int)
Situation more complex for derived data types; There are two cases
Case 1: derived data types encountered so far (no boundary markers MPI_UB or MPI_LB) Distance between first byte and the last byte of data type, plus
some increment for memory alignment.• Memory alignment: A basic data type of length n will only be
allocated in memory starting from an address of a multiple of n
{(MPI_DOUBLE,0), (MPI_CHAR, 8)}Double – 8 bytes, byte 0-7Char – 1 byte, byte 8Increment – 7 bytes, to round off to next multiple of 8Extent is: 8+1+7 = 16
18
Type Extent Case 2: boundary marker(s) appear in data type
definition Pre-defined type MPI_LB marks lower boundary of data type; MPI_UB marks upper boundary of data type.
Length of MPI_LB and MPI_UB is zero. Extent: distance between boundary markers If only MPI_UB appears, extent is distance between first byte
and MPI_UB If only MPI_LB appears, extent is distance between MPI_LB and
last byte, plus increment for memory alignment
{(MPI_DOUBLE,0) (MPI_CHAR,8) (MPI_UB,8)}Extent of data type is 8 instead of 16.{(MPI_LB,-8) (MPI_DOUBLE,0) (MPI_CHAR,8)}Extent is: 8+8+1+7 = 24{(MPI_LB,-8) (MPI_DOUBLE,0) (MPI_CHAR,8) (MPI_UB 9)}Extent: 9+8 = 17
Can use MPI_LB and MPI_UB to modify the extent to suit one’s needs
19
Example
double A[4][4];MPI_Datatype column;
MPI_Type_vector(4,1,4,MPI_DOUBLE, &column);MPI_Type_commit(&column);// Extent of column is 13*sizeof(double)=104 bytes
// Now modify extent of column to be sizeof(double)=8 using MPI_LB, MPI_UB// Create a new type, same as column, but with extent 8// {(column, 0) (MPI_UB, 8)}MPI_Datatype modified_column;MPI_Datatype types[2];MPI_Aint disp[2];int block_len[2];types[0] = column;types[1] = MPI_UB;block_len[0] = block_len[1] = 1;disp[0] = 0;disp[1] = sizeof(double);MPI_Type_struct(2, block_len, disp, types, &modified_column);// Now modified_column is same as column, but extent is sizeof(double)=8.
20
Type Extent is Important
Concatenation of derived data types is based on their type extent
A_type
MPI_Send(buf, 2, A_type, …);orMPI_Type_contiguous(2, A_type, &B_type);
B_type
extent extent
A_type
extent
Modify extent of A_type using MPI_UB, MPI_LB
B_type
extent
21
Exampleextent extent
1.0 2.0 3.0 4.0 5.0 6.0 …
buf
A_type
MPI_Send(buf,2,A_type,...)
Actual data send out:4 numbers: 1.0, 3.0, 4.0, 6.0
A_type
extent extent
MPI_Send(buf,2,A_type,...)
Actual data sent out:4 numbers: 1.0, 3.0, 2.0, 4.0
MPI_Send(buf, 4, MPI_DOUBLE, ...)
Actual data sent out:4 numbers: 1.0, 2.0, 3.0, 4.0
22
Example
extent extent
4.0 …
buf
A_type
MPI_Recv(buf,2,A_type,...)
A_type
extent extent
MPI_Recv(buf,2,A_type,...) MPI_Recv(buf, 4, MPI_DOUBLE, ...)
Data arrived: 4 numbers: 1.0, 2.0, 3.0, 4.0
1.0 2.0 3.0
4.0
buf
1.0 2.03.0 4.0
buf
1.0 2.0 3.0
23
Type Commit & Free
A derived data type must be committed before being used in communication.
Once committed, can be used comm routines same as pre-defined data types.
If not used any more, need to free the derived data type
int MPI_Type_commit(MPI_Datatype &datatype)int MPI_Type_free(MPI_Datatype &datatype)
24
Type Matching
Type matching rules need to be generalized with derived data types
New rule: the type signature of the data sent must match the type signature of the that specified in receive routineSequence of basic data types must matchNumber of basic elements in message sent
can be smaller than that specified in receive, but must match.
25
Example1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
A
B
double A[4], B[8];double C[2], D[8];int my_rank;...MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);MPI_Datatype recv_type;if(my_rank==1) { MPI_Type_vector(4, 1, 2, MPI_DOUBLE, &recv_type); MPI_Commit(&recv_type); MPI_Recv(B, 1, recv_type, 0, tag, MPI_COMM_WORLD, &stat); MPI_Recv(D, 1, recv_type, 0, tag, MPI_COMM_WORLD, &stat); MPI_Type_free(&recv_type);}else if (my_rank==0) { MPI_Send(A, 4, MPI_DOUBLE, 1, tag, MPI_COMM_WORLD); MPI_Send(C, 2, MPI_DOUBLE, 1, tag, MPI_COMM_WORLD);}
1.0 2.0
1.0 2.0
C
D
Cpu 0: A cpu 1: BCpu 0: C cpu 1: D
26
Example
1.0
2.0
3.0
A B
double A[N][N], B[N][N], C[N];MPI_Datatype diag;...MPI_Type_vector(N, 1, N+1, MPI_DOUBLE, &diag);MPI_Type_commit(&diag);if(my_rank==0) { MPI_Send(&A[0][0], 1, diag, 1, tag, MPI_COMM_WORLD); MPI_Send(&A[0][0], 1, diag, 1, tag, MPI_COMM_WORLD);}else if(my_rank==1) { MPI_Recv(&B[0][0], 1, diag, 0, tag, MPI_COMM_WORLD, &stat); MPI_Recv(&C[0], N, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &stat);}MPI_Type_free(&diag);
1.0 2.0 3.0
C
27
Example 1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0
1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0
A B
double A[N][N], B[N][N];MPI_Datatype column, mat_transpose;...MPI_Type_vector(N, 1, N, MPI_DOUBLE, &column);MPI_Type_hvector(N, 1, sizeof(double), column, &mat_transpose);// MPI_Datatype column_modified, types[2];// int block_len[2];// MPI_Aint disp[2];// types[0] = column; types[1] = MPI_UB;// block_len[0] = block_len[1] = 1;// disp[0] = 0; disp[1] = sizeof(double);// MPI_Type_struct(2,block_len,disp,types,&column_modified);// MPI_Type_contiguous(N, column_modified, &mat_transpose);MPI_Type_commit(&mat_transpose);
if(my_rank==0) { MPI_Send(&A[0][0], N*N, MPI_DOUBLE, 1, tag, MPI_COMM_WORLD);}else if(my_rank==1) { MPI_Recv(&B[0][0], 1, mat_transpose, 0, tag, MPI_COMM_WORLD, &stat);}MPI_Type_free(&mat_transpose);
Cpu0: A^T cpu 1: B
28
Matrix Transpose Revisited
A11 A12 A13
A21 A22 A23
A31 A32 A33
A – NxN matrixDistributed on P cpusRow-wise decomposition
B = AT
B also distributed on P cpusRwo-wise decomposition
Aij – (N/P)x(N/P) matricesBij=Aji
T
Input: A[i]][j] = 2*i+j
A11T A21
T A31T
A12T A22
T A32T
A13T A23
T A33T
A B
A11T A12
T A13T
A21T A22
T A23T
A31T A32
T A33T
Local transpose
All-to-all
29
Example: Matrix Transpose
0 1 2 3
4 5 6 7
0 1 2 3
4 5 6 7
0 4 0 4
1 5 1 5
2 6 2 6
3 7 3 7
On each cpu, A is (N/P)xN matrix; First need to first re-write to P blocks of (N/P)x(N/P) matrices, then can do local transpose
0 1 2 3 4 5 6 7A: 2x4
0 1 4 5 2 3 6 7Two 2x2blocks
After all-to-all comm, have P blocks of (N/P)x(N/P) matrices; Need to merge into a (N/P)xN matrix
Three steps:1. Divide A into blocks;2. Transpose each
block locally;3. All-to-all comm;4. Merge blocks locally;
30
Transpose
All-to-all
A B
Read data column by column
Need to be careful about extent
Receive data block by block
Careful about extent
extent
Create derived data types for send and receive; No additional local manipulations
31
#include <stdio.h>#include <string.h>#include <mpi.h>#include "dmath.h"
#define DIM 1000 // global A[DIM], B[DIM]
int main(int argc, char **argv){ int ncpus, my_rank, i, j, iblock; int Nx, Ny; // Nx=DIM/ncpus, Ny=DIM, local array: A[Nx][Ny], B[Nx][Ny] double **A, **B; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &ncpus);
if(DIM%ncpus != 0) { // make sure DIM can be divided by ncpus if(my_rank==0) printf("ERROR: DIM cannot be divided by ncpus!\n"); MPI_Finalize(); return -1; } Nx = DIM/ncpus; Ny = DIM;
A = DMath::newD(Nx, Ny); // allocate memory B = DMath::newD(Nx, Ny); for(i=0;i<Nx;i++) for(j=0;j<Ny;j++) A[i][j] = 2*(my_rank*Nx+i) + j;
memset(&B[0][0], '\0', sizeof(double)*Nx*Ny); // zero out B
Matrix Transposition
32
// Create derived data types MPI_Datatype type_send, type_recv; MPI_Datatype type_line1, type_block; MPI_Aint displ[2]; MPI_Datatype types[2]; int block_len[2];
MPI_Type_vector(Nx, 1, Ny, MPI_DOUBLE, &type_line1); // a column in A types[0] = type_line1; types[1] = MPI_UB; // modify the extent of column to be 1 double block_len[0] = block_len[1] = 1; displ[0] = 0; displ[1] = sizeof(double); MPI_Type_struct(2, block_len, displ, types, &type_send); // modified column MPI_Type_commit(&type_send); // Now A is a concatenation of type_send
MPI_Type_vector(Nx, Nx, Ny, MPI_DOUBLE, &type_block); // submatrix block types[0] = type_block; types[1] = MPI_UB; // modify extent of type_block block_len[0] = block_len[1] = 1; displ[0] = 0; displ[1] = Nx*sizeof(double); MPI_Type_struct(2, block_len, displ, types, &type_recv); // modified block MPI_Type_commit(&type_recv); // Now B is a cancatenation of type_recv
// send/recv data MPI_Alltoall(&A[0][0], Nx, type_send, &B[0][0], 1, type_recv, MPI_COMM_WORLD);
// clean up MPI_Type_free(&type_send); MPI_Type_free(&type_recv); DMath::del(A); DMath::del(B); MPI_Finalize(); return 0;}