1 computer science, university of warwick distributed shared memory distributed shared memory (dsm)...

52
1 Computer Science, University of Warwick Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of the distributed memory machines The users have a virtual global address space and the message passing underneath is sorted out by DSM transparently from the users Then we can use shared memory programming techniques Software of implementing DSM http://www.ics.uci.edu/~javid/dsm/page.html

Upload: ella-gallegos

Post on 28-Mar-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

1Computer Science, University of WarwickComputer Science, University of Warwick

Distributed Shared MemoryDistributed Shared Memory

Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of the distributed memory machines

The users have a virtual global address space and the message passing underneath is sorted out by DSM transparently from the users

Then we can use shared memory programming techniques

Software of implementing DSM http://www.ics.uci.edu/~javid/dsm/page.html

Page 2: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

2Computer Science, University of WarwickComputer Science, University of Warwick

Three types of DSM implementationsThree types of DSM implementations

Page-based technique

The virtual global address space is divided into equal sized chunks (pages) which are spread over the machines

Page is the minimal sharing unit

The request by a process to access a non-local piece of memory results in a page fault

a trap occurs and the DSM software fetches the required page of memory and restarts the instruction

a decision has to be made whether to replicate pages or maintain only one copy of any page and move it around the network

The granularity of the pages has to be decided before implementation

Page 3: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

3Computer Science, University of WarwickComputer Science, University of Warwick

Three types of DSM implementationsThree types of DSM implementations

Shared-variable based technique

only the variables and data structures required by more than one process are shared.

Variable is minimal sharing unit

Trade-off between consistency and network traffic

Page 4: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

4Computer Science, University of WarwickComputer Science, University of Warwick

Three types of DSM implementationsThree types of DSM implementations

Object-based technique

memory can be conceptualized as an abstract space filled with objects (including data and methods)

Object is minimal sharing unit

Trade-off between consistency and network traffic

Page 5: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

5Computer Science, University of WarwickComputer Science, University of Warwick

OpenMPOpenMP

OpenMP stands for Open specification for Multi-processing

used to assist compilers to understand and parallelise the serial code better

Can be used to specify shared memory parallelism in Fortran, C and C++ programs

OpenMP is a specification for

a set of compiler directives,

RUN TIME library routines, and

environment variables

Started mid-late 80s with emergence of shared memory parallel computers with proprietary directive-driven programming environments

OpenMP is industry standard

Page 6: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

6Computer Science, University of WarwickComputer Science, University of Warwick

OpenMPOpenMP

OpenMP specifications include:

OpenMP 1.0 for Fortran, 1997

OpenMP 1.0 for C/C++, 1998

OpenMP 2.0 for Fortran, 2000

OpenMP 2.0 for C/C++ , 2002

OpenMP 2.5 for C/C++ and Fortran, 2005

OpenMP Architecture Review Board: Compaq, HP, IBM, Intel, SGI, SUN

Page 7: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

7Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP programming modelOpenMP programming model

Shared Memory, thread-based parallelism

Explicit parallelism

Fork-join model

Page 8: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

8Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP code structure in COpenMP code structure in C

#include <omp.h>

main () {

int var1, var2, var3;

Serial code

/*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/

#pragma omp parallel private(var1, var2) shared(var3)

{

Parallel section executed by all threads

All threads join master thread and disband

}

Resume serial code

}

Page 9: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

9Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP code structure in FortranOpenMP code structure in Fortran

PROGRAM HELLO

INTEGER VAR1, VAR2, VAR3 Serial code . . . !Beginning of parallel section. Fork a team of threads. Specify variable scoping

!$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)

Parallel section executed by all threads

. . .

All threads join master thread and disband

!$OMP END PARALLEL

Resume serial code

. . .

END

Page 10: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

10Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP Directives FormatOpenMP Directives Format

C/C++

Fortran

Page 11: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

11Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP featuresOpenMP features

OpenMP directives are ignored by compilers that don’t support OpenMP, so codes can also be run on sequential machines

Compiler directives used to specify

sections of code that can be executed in parallel

critical sections

Scope of variables (private or shared)

Mainly used to parallelize loops, e.g. separate threads to handle separate iterations of the loop

There is also a run-time library that has several useful routines for checking the number of threads and number of processors, changing the number of threads, etc

Page 12: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

12Computer Science, University of WarwickComputer Science, University of Warwick

Fork-Join ModelFork-Join Model

Multiple threads are created using the parallel construct

For C and C++#pragma omp parallel {

... do stuff }

For Fortran!$OMP PARALLEL

... do stuff!$OMP END PARALLEL

Page 13: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

13Computer Science, University of WarwickComputer Science, University of Warwick

How many threads generatedHow many threads generated

The number of threads in a parallel region is determined by the following factors, in order of precedence:

Use of the omp_set_num_threads() library function

Setting of the OMP_NUM_THREADS environment variable

Implementation default - the number of CPUs on a node

Threads are numbered from 0 (master thread) to N-1

Page 14: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

14Computer Science, University of WarwickComputer Science, University of Warwick

Parallelizing loops in OpenMP – Parallelizing loops in OpenMP – Work Sharing constructWork Sharing construct

Compiler directive specifies that loop can be done in parallelFor C and C++

#pragma omp parallel forfor (i=0;i++;i<N){

value[i] = compute(i);}

For Fortran!$OMP PARALLEL DO DO (i=1:N) value(i) = compute(i); END DO!$OMP END PARALLEL DO

Can use thread scheduling to specify partition and allocation of iterations to threads

#pragma omp parallel for schedule(static,4)

schedule(static [,chunk])

Deal out blocks of iterations of size chunk to each thread

schedule(dynamic [,chunk])

Each thread grabs a chunk iterations off a queue until all are done

schedule(runtime) Find schedule from an environment variable OMP_SCHEDULE

Page 15: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

15Computer Science, University of WarwickComputer Science, University of Warwick

Synchronisation in OpenMPSynchronisation in OpenMP

Critical construct

Barrier construct

Page 16: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

16Computer Science, University of WarwickComputer Science, University of Warwick

Example of Critical Section in OpenMPExample of Critical Section in OpenMP

#include <omp.h>

main() {

int x;

x = 0;

#pragma omp parallel shared(x)

{

#pragma omp critical

x = x+1;

} /* end of parallel section */

}

Page 17: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

17Computer Science, University of WarwickComputer Science, University of Warwick

Example of Barrier in OpenMPExample of Barrier in OpenMP

#include <omp.h> #include <stdio.h>

int main (int argc, char *argv[]) { int th_id, nthreads; #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("Hello World from thread %d\n", th_id); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("There are %d threads\n",nthreads); } } return 0; }

Page 18: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

18Computer Science, University of WarwickComputer Science, University of Warwick

Data Scope Attributes in OpenMPData Scope Attributes in OpenMP

OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped

These clauses are used in conjunction with several directives (e.g. PARALLEL, DO/for) to control the scoping of enclosed variables

Three often encountered clauses:

Shared

Private

Reduction

Page 19: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

19Computer Science, University of WarwickComputer Science, University of Warwick

Shared and private data in OpenMPShared and private data in OpenMP

private(var) creates a local copy of var for each thread

shared(var) states that var is a global variable to be shared among threads

Default data storage attribute is shared

!$OMP PARALLEL DO!$OMP& PRIVATE(xx,yy) SHARED(u,f) DO j = 1,m DO i = 1,n xx = -1.0 + dx * (i-1) yy = -1.0 + dy * (j-1) u(i,j) = 0.0 f(i,j) = -alpha * (1.0-xx*xx) * & (1.0-yy*yy) END DO END DO!$OMP END PARALLEL DO

Page 20: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

20Computer Science, University of WarwickComputer Science, University of Warwick

Reduction ClauseReduction Clause

Reduction -

reduction (op : var)

e.g. add, logical OR. A local copy of the variable is made for each thread. Reduction operation done for each thread, then local values combined to create global value

double ZZ, res=0.0;#pragma omp parallel for reduction (+:res)private(ZZ)for (i=1;i<=N;i++) { ZZ = i; res = res + ZZ:}

Page 21: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

21Computer Science, University of WarwickComputer Science, University of Warwick

Run-Time Library RoutinesRun-Time Library Routines

Can perform a variety of functions, including

Query the number of threads/thread no.

Set number of threads

Page 22: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

22Computer Science, University of WarwickComputer Science, University of Warwick

Run-Time Library RoutinesRun-Time Library Routines

query routines allow you to get the number of threads and the ID of a specific thread

id = omp_get_thread_num(); //thread no.

Nthreads = omp_get_num_threads(); //number of threads

Can specify number of threads at runtime

omp_set_num_threads(Nthreads);

Page 23: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

23Computer Science, University of WarwickComputer Science, University of Warwick

Environment VariableEnvironment Variable

Controlling the execution of parallel code

Four environment variables

OMP_SCHEDULE: how iterations of a loop are scheduled

OMP_NUM_THREADS: maximum number of threads

OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads

OMP_NESTED: enable or disable nested parallelism

Page 24: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

24Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP compilersOpenMP compilers

Since parallelism is mostly achieved by parallelising loops using shared memory, OpenMP compilers work well for multiprocessor SMPs and vector machines

OpenMP could work for distributed memory machines, but would need to use a good distributed shared memory (DSM) implementation

For more information on OpenMP, see

www.openmp.org

Page 25: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

High Performance ComputingHigh Performance ComputingCourse Notes 2007-2008Course Notes 2007-2008

Message Passing Programming IMessage Passing Programming I

Page 26: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

26Computer Science, University of WarwickComputer Science, University of Warwick

Message Passing Programming

Message Passing is the most widely used parallel programming model

Message passing works by creating a number of tasks, uniquely named, that interact by sending and receiving messages to and from one another (hence the message passing)

Generally, processes communicate through sending the data from the address space of one process to that of another

Communication of processes (via files, pipe, socket)

Communication of threads within a process (via global data area)

Programs based on message passing can be based on standard sequential language programs (C/C++, Fortran), augmented with calls to library functions for sending and receiving messages

Page 27: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

27Computer Science, University of WarwickComputer Science, University of Warwick

Message Passing Interface (MPI)Message Passing Interface (MPI)

MPI is a specification, not a particular implementation

Does not specify process startup, error codes, amount of system buffer, etc

MPI is a library, not a language

The goals of MPI: functionality, portability and efficiency

Message passing model > MPI specification > MPI implementation

Page 28: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

28Computer Science, University of WarwickComputer Science, University of Warwick

OpenMP vs MPIOpenMP vs MPI

In a nutshell

MPI is used on distributed-memory systems

OpenMP is used for code parallelisation on shared-memory systems

Both are explicit parallelism

High-level control (OpenMP), lower-level control (MPI)

Page 29: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

29Computer Science, University of WarwickComputer Science, University of Warwick

A little historyA little history

Message-passing libraries developed for a number of early distributed memory computers

By 1993 there were loads of vendor specific implementations

By 1994 MPI-1 came into being

By 1996 MPI-2 was finalized

Page 30: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

30Computer Science, University of WarwickComputer Science, University of Warwick

The MPI programming modelThe MPI programming model

MPI standards -

MPI-1 (1.1, 1.2), MPI-2 (2.0)

Forwards compatibility preserved between versions

Standard bindings - for C, C++ and Fortran. Have seen MPI bindings for Python, Java etc (all non-standard)

We will stick to the C binding, for the lectures and coursework. More info on MPI www.mpi-forum.org

Implementations - For your laptop pick up MPICH (free portable implementation of MPI (http://www-unix.mcs.anl. gov/mpi/mpich/index.htm)

Coursework will use MPICH

Page 31: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

31Computer Science, University of WarwickComputer Science, University of Warwick

MPIMPI

MPI is a complex system comprising of 129 functions with numerous parameters and variants

Six of them are indispensable, but can write a large number of useful programs already

Other functions add flexibility (datatype), robustness (non-blocking send/receive), efficiency (ready-mode communication), modularity (communicators, groups) or convenience (collective operations, topology).

In the lectures, we are going to cover most commonly encountered functions

Page 32: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

32Computer Science, University of WarwickComputer Science, University of Warwick

The MPI programming modelThe MPI programming model

Computation comprises one or more processes that communicate via library routines and sending and receiving messages to other processes

(Generally) a fixed set of processes created at outset, one process per processor

Different from PVM

Page 33: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

33Computer Science, University of WarwickComputer Science, University of Warwick

Intuitive Interfaces for sending and Intuitive Interfaces for sending and receiving messages receiving messages

Send(data, destination), Receive(data, source)

minimal interface

Not enough in some situations, we also need

Message matching – add message_id at both send and receive interfaces

they become Send(data, destination, msg_id), receive(data, source, msg_id)

Message_id• Is expressed using an integer, termed as message tag

• Allows the programmer to deal with the arrival of messages in an orderly fashion (queue and then deal with

Page 34: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

34Computer Science, University of WarwickComputer Science, University of Warwick

How to express the data in the How to express the data in the send/receive interfacessend/receive interfaces

Early stages: (address, length) for the send interface

(address, max_length) for the receive interface

They are not always good The data to be sent may not be in the contiguous memory locations

Storing format for data may not be the same or known in advance in heterogeneous platform

Enventually, a triple (address, count, datatype) is used to express the data to be sent and (address, max_count, datatype) for the data to be received

Reflecting the fact that a message contains much more structures than just a string of bits, For example, (vector_A, 300, MPI_REAL)

Programmers can construct their own datatype

Now, the interfaces become send(address, count, datatype, destination, msg_id) and receive(address, max_count, datatype, source, msg_id)

Page 35: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

35Computer Science, University of WarwickComputer Science, University of Warwick

How to distinguish messagesHow to distinguish messages

Message tag is necessary, but not sufficient

So, communicator is introduced …

Page 36: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

36Computer Science, University of WarwickComputer Science, University of Warwick

CommunicatorsCommunicators

Messages are put into contexts

Contexts are allocated at run time by the system in response to programmer requests

The system can guarantee that each generated context is unique

The processes belong to groups

The notions of context and group are combined in a single object, which is called a communicator

A communicator identifies a group of processes and a communication context

The MPI library defines a initial communicator, MPI_COMM_WORLD, which contains all the processes running in the system

The messages from different process groups can have the same tag

So the send interface becomes send(address, count, datatype, destination, tag, comm)

Page 37: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

37Computer Science, University of WarwickComputer Science, University of Warwick

Status of the received messagesStatus of the received messages

The structure of the message status is added to the receive interface

Status holds the information about source, tag and actual message size

In the C language, source can be retrieved by accessing status.MPI_SOURCE,

tag can be retrieved by status.MPI_TAG and

actual message size can be retrieved by calling the function MPI_Get_count(&status, datatype, &count)

The receive interface becomes receive(address, maxcount, datatype, source, tag, communicator, status)

Page 38: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

38Computer Science, University of WarwickComputer Science, University of Warwick

How to express source and destination How to express source and destination

The processes in a communicator (group) are identified by ranks

If a communicator contains n processes, process ranks are integers from 0 to n-1

Source and destination processes in the send/receive interface are the ranks

Page 39: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

39Computer Science, University of WarwickComputer Science, University of Warwick

Some other issuesSome other issues

In the receive interface, tag can be a wildcard, which means any message will be received

In the receive interface, source can also be a wildcard, which match any source

Page 40: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

40Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)

Page 41: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

41Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)

Page 42: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

42Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)

Page 43: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

43Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Calculating the size of the data to be send …

buf address of send buffer

count * sizeof (datatype) bytes of data

Page 44: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

44Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)

Page 45: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

45Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Send (buf, count, datatype, dest, tag, comm)

Send a messagebuf address of send buffercount no. of elements to send (>=0)datatype of elementsdest process id of destination tag message tagcomm communicator (handle)

Page 46: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

46Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Recv (buf, count, datatype, source, tag, comm, status)

Receive a message

buf address of receive buffer (var param)

count max no. of elements in receive buffer (>=0)

datatype of receive buffer elements

source process id of source process, or MPI_ANY_SOURCE

tag message tag, or MPI_ANY_TAG

comm communicator

status status object

Page 47: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

47Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Init (int *argc, char ***argv)

Initiate a computation

argc (number of arguments) and argv (argument vector) are main program’s arguments

Must be called first, and once per process

MPI_Finalize ( )

Shut down a computation

The last thing that happens

Page 48: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

48Computer Science, University of WarwickComputer Science, University of Warwick

MPI basicsMPI basics

First six functions (C bindings)

MPI_Comm_size (MPI_Comm comm, int *size)

Determine number of processes in comm

comm is communicator handle, MPI_COMM_WORLD is the default (including all MPI processes)

size holds number of processes in group

MPI_Comm_rank (MPI_Comm comm, int *pid)

Determine id of current (or calling) process

pid holds id of current process

Page 49: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

49Computer Science, University of WarwickComputer Science, University of Warwick

#include "mpi.h" #include <stdio.h> int main(int argc, char *argv[]) {     int rank, nprocs;

    MPI_Init(&argc,&argv);     MPI_Comm_size(MPI_COMM_WORLD,&nprocs);     MPI_Comm_rank(MPI_COMM_WORLD,&rank);     printf("Hello, world.  I am %d of %d\n", rank, nprocs);     MPI_Finalize(); }

MPI basics – a basic exampleMPI basics – a basic example

mpirun –np 4 myprog

Hello, world. I am 1 of 4

Hello, world. I am 3 of 4

Hello, world. I am 0 of 4

Hello, world. I am 2 of 4

Page 50: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

50Computer Science, University of WarwickComputer Science, University of Warwick

MPI basics – send and recv example (1)MPI basics – send and recv example (1)

#include "mpi.h"#include <stdio.h> int main(int argc, char *argv[]){    int rank, size, i;    int buffer[10];    MPI_Status status;     MPI_Init(&argc, &argv);    MPI_Comm_size(MPI_COMM_WORLD, &size);    MPI_Comm_rank(MPI_COMM_WORLD, &rank);    if (size < 2)    {        printf("Please run with two processes.\n");         MPI_Finalize();        return 0;    }    if (rank == 0)    {        for (i=0; i<10; i++)            buffer[i] = i;        MPI_Send(buffer, 10, MPI_INT, 1, 123, MPI_COMM_WORLD);    }

Page 51: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

51Computer Science, University of WarwickComputer Science, University of Warwick

MPI basics – send and recv example (2)MPI basics – send and recv example (2)

    if (rank == 1)    {        for (i=0; i<10; i++)            buffer[i] = -1;        MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, &status);        for (i=0; i<10; i++)        {            if (buffer[i] != i)                printf("Error: buffer[%d] = %d but is expected to be %d\n", i, buffer[i], i);        }    }    MPI_Finalize();}

Page 52: 1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of

52Computer Science, University of WarwickComputer Science, University of Warwick

MPI language bindingsMPI language bindings

Standard (accepted) bindings for Fortran, C and C++

Java bindings are work in progress

JavaMPIJava wrapper to native calls

mpiJavaJNI wrappers

jmpi pure Java implementation of MPI library

MPIJ same idea

Java Grande Forum trying to sort it all out

We will use the C bindings