matrix-vector multiplication - puc-rionoemi/pcp-16/aula7/quinncap8.pdf · copyright © the mcgraw...

40
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Matrix-vector Multiplication Michael Quinn Parallel Programming with MPI and OpenMP Material do autor

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Matrix-vector Multiplication

Michael Quinn

Parallel Programming with MPI and OpenMP

Material do autor

Page 2: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

objetivos

• explorar operações MPI• mostrar formas diferentes de dividir um mesmo

programa sequencial

Page 3: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

2 1 0 4

3 2 1 1

4 3 1 2

3 0 2 0

1

3

4

1

Sequential Algorithm

22 1 50

4

1

3

594

1

92 1 0 4 1

3

4

1

33

1

92 3 131

4

141

1

14

1

3

4

1

3 2 1 1

44

1

133

3

171 4 192

1

19

1

3

4

1

4 3 1 2

3 3

1

3

30 112

4

110 1 11

1

3

4

13 0 2 0

Page 4: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

matrix decomposition

2 1 0 4

3 13 2 1 1

44 3 1 2

3 0 23 0 2 0

2 1 0 4

3 2 1 1

4 3 1 2

3 0 2 0

2 1 0 4

3 2 1 1

4 3 1 2

3 0 2 0

Page 5: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Storing Vectors

• Divide vector elements among processes

• Replicate vector elements

• Vector replication acceptable because vectors have only n elements, versus n2 elements in matrices

Page 6: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Rowwise Block Striped Matrix

• Partitioning through domain decomposition

• Primitive task associated with• Row of matrix• Entire vector

Page 7: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Phases of Parallel Algorithm - rowwise

Row i of Ab

Row i of Ab

ci

Inner product computation

Row i of Ab c

All-gather communication

nesse caso queremos que todostenham o resultado ao final!

Page 8: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration and Mapping

• Static number of tasks

• Regular communication pattern (all-gather)

• Computation time per task is constant

• Strategy:• Agglomerate groups of rows• Create one task per MPI process

com matrizes esparsas poderíamos ter desbalanceamento!

Page 9: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Read matrix/distribute rows

Page 10: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Function read_replicated_vector

• Process p-1• Opens file• Reads vector length

• Broadcast vector length (root process = p-1)

• Allocate space for vector

• Process p-1 reads vector, closes file

• Broadcast vector

Page 11: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Block-to-replicated Transformation -allgather

Process 0

Process 1

Process 2

Process 0

Process 1

Process 2

all-gather

Page 12: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Allgatherv

Allgatherv

After Before

0

1

2

3

Proc

esse

s

Page 13: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Allgatherv

int MPI_Allgatherv (void *send_buffer,int send_cnt,MPI_Datatype send_type,void *receive_buffer,int *receive_cnt,int *receive_disp,MPI_Datatype receive_type,MPI_Comm communicator)

#itens q cada processocontribui (um inteiro)

#itens q espera de cada processo (um array)

índice onde colocar dados de cada processo

Page 14: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Allgatherv in Action

c o n

c o n c a t e n a t e

send_buffer

receive_buffer send_ cnt = 3

3 4 4

receive_ cnt

0 3 7

receive_ disp

c a t e

c o n c a t e n a t e

send_buffer

receive_buffer

3 4 4

receive_ cnt

send_ cnt = 4

0 3 7

receive_ disp

n a t e

c o n c a t e n a t e

send_buffer

receive_buffer

3 4 4

receive_ cnt

send_ cnt = 4

0 3 7

receive_ disp

Process 2

Process 1

Process 0 Process 0

Process 1

Process 2

Page 15: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Columnwise Block Striped Matrix

• Partitioning through domain decomposition

• Task associated with• Column of matrix• Vector element

Page 16: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Proc 4

Processor 0’s initial computation

Processor 1’s initial computation

Proc 2

Proc 3

Matrix-Vector Multiplication

c0 = a0,0 b0 + a0,1 b1 + a0,2 b2 + a0,3 b3 + a4,4 b4c1 = a1,0 b0 + a1,1 b1 + a1,2 b2 + a1,3 b3 + a1,4 b4c2 = a2,0 b0 + a2,1 b1 + a2,2 b2 + a2,3 b3 + a2,4 b4c3 = a3,0 b0 + a3,1 b1 + a3,2 b2 + a3,3 b3 + b3,4 b4c4 = a4,0 b0 + a4,1 b1 + a4,2 b2 + a4,3 b3 + a4,4 b4

evita replicação do vetor B

Page 17: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

All-to-all Exchange (before)

P0 P1 P2 P3 P4

Page 18: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

All-to-all Exchange (after)

P0 P1 P2 P3 P4

Page 19: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Phases of Parallel Algorithm

Col

umn

iof A b

Col

umn

iof A b ~c

Multiplications

Col

umn

iof A b ~c

All-to-all exchange

Col

umn

iof A b c

Reduction

Page 20: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration and Mapping

• Static number of tasks

• Regular communication pattern (all-to-all)

• Computation time per task is constant

• Strategy:• Agglomerate groups of columns• Create one task per MPI process

Page 21: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Reading a Block-Column Matrix

File

Page 22: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Scatterv

Scatterv

0

1

2

3

Proc

esse

s

Before After

Page 23: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Header for MPI_Scatterv

int MPI_Scatterv (void *send_buffer,int *send_cnt,int *send_disp,MPI_Datatype send_type,void *receive_buffer,int receive_cnt,MPI_Datatype receive_type,int root,MPI_Comm communicator)

Page 24: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Checkerboard Block Decomposition

• Associate primitive task with each element of the matrix A

• Each primitive task performs one multiply

• Agglomerate primitive tasks into rectangular blocks

• Processes form a 2-D grid

• Vector b distributed by blocks among processes in first column of grid

Page 25: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Tasks after Agglomeration

b A

Page 26: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Algorithm’s Phases

b A

Redistribute b

Matrix-vector multiply

Reduce vectors across rows

c

Page 27: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Redistributing Vector b

• Step 1: Move b from processes in first row to processes in first column• If p square

» First column/first row processes send/receive portions of b• If p not square

» Gather b on process 0, 0» Process 0, 0 broadcasts to first row procs

• Step 2: First row processes scatter b within columns

Page 28: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Redistributing Vector b

(a)

Gather b Scatter b Broadcast blocks of b

Send/ Recv blocks of b

Broadcast blocks of b

(b)

when p is a square number

Page 29: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Creating Communicators

• Want processes in a virtual 2-D grid

• Create a custom communicator to do this

• Collective communications involve all processes in a communicator

• We need to do broadcasts, reductions among subsets of processes

• We will create communicators for processes in same row or same column

Page 30: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

What’s in a Communicator?

• Process group

• Context

• Attributes• Topology (lets us address processes another way)• Others we won’t consider

Page 31: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Creating 2-D Virtual Grid of Processes

• MPI_Dims_create• Input parameters

» Total number of processes in desired grid» Number of grid dimensions

• Returns number of processes in each dim

• MPI_Cart_create• Creates communicator with cartesian topology

Page 32: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Dims_create

int MPI_Dims_create (int nodes,

/* Input - Procs in grid */

int dims,/* Input - Number of dims */

int *size)/* Input/Output - Size of

each grid dimension */

• cria um array de dimensões para uso posterior• size: na entrada pode indicar uma dimensão desejada

int p = …; int size(2) = {0.0};MPI_Dims_Create (p, 2, size);

Page 33: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Cart_createint MPI_Cart_create (

MPI_Comm old_comm, /* Input - old communicator */

int dims, /* Input - grid dimensions */

int *size, /* Input - # procs in each dim */

int *periodic,/* Input - periodic[j] is 1 if dimension j

wraps around; 0 otherwise */

int reorder,/* 1 if process ranks can be reordered */

MPI_Comm *cart_comm)/* Output - new communicator */

Page 34: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Using MPI_Dims_create and MPI_Cart_create

MPI_Comm cart_comm;int p;int periodic[2];int size[2];...size[0] = size[1] = 0;MPI_Dims_create (p, 2, size);periodic[0] = periodic[1] = 0;MPI_Cart_create (MPI_COMM_WORLD, 2, size,

1, &cart_comm);

Page 35: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Useful Grid-related Functions

• MPI_Cart_rank• Given coordinates of process in Cartesian communicator,

returns process rank

• MPI_Cart_coords• Given rank of process in Cartesian communicator, returns

process’ coordinates

Page 36: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Header for MPI_Cart_rank

int MPI_Cart_rank (MPI_Comm comm,

/* In - Communicator */int *coords,

/* In - Array containing process’grid location */

int *rank)/* Out - Rank of process at

specified coords */

Page 37: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Header for MPI_Cart_coords

int MPI_Cart_coords (MPI_Comm comm,

/* In - Communicator */int rank,

/* In - Rank of process */int dims,

/* In - Dimensions in virtual grid */int *coords)

/* Out - Coordinates of specifiedprocess in virtual grid */

Page 38: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MPI_Comm_split

• Partitions the processes of a communicator into one or more subgroups

• Constructs a communicator for each subgroup

• Allows processes in each subgroup to perform their own collective communications

• Needed for columnwise scatter and rowwise reduce

Page 39: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Header for MPI_Comm_split

int MPI_Comm_split (MPI_Comm old_comm,

/* In - Existing communicator */

int partition, /* In - Partition number */

int new_rank,/* In - Ranking order of processes

in new communicator */

MPI_Comm *new_comm)/* Out - New communicator shared by

processes in same partition */

Page 40: Matrix-vector Multiplication - PUC-Rionoemi/pcp-16/aula7/quinncap8.pdf · Copyright © The McGraw -Hill Companies, Inc. Permission required for reproduction or display. Title: quinncap8.ppt

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example: Create Communicators for Process Rows

MPI_Comm grid_comm; /* 2-D process grid */

MPI_Comm grid_coords[2];/* Location of process in grid */

MPI_Comm row_comm;/* Processes in same row */

MPI_Comm_split (grid_comm, grid_coords[0],grid_coords[1], &row_comm);