mpi and openmp

of 32/32
MPI and OpenMP By: Jesus Caban and Matt McKnight

Post on 11-Jan-2016




0 download

Embed Size (px)


MPI and OpenMP. By: Jesus Caban and Matt McKnight. What is MPI?. MPI: M essage P assing I nterface Is not a new programming language, is a library with functions that can be called from C/Fortran/Python Successor to PVM (Parallel Virtual Machine ) - PowerPoint PPT Presentation


  • MPI and OpenMP By: Jesus Caban and Matt McKnight

  • What is MPI?MPI: Message Passing InterfaceIs not a new programming language, is a library with functions that can be called from C/Fortran/PythonSuccessor to PVM (Parallel Virtual Machine )Developed by an open, international forum with representation from industry, academia and government laboratories.

  • What its for?Allows data to be passed between processes in a distributed memory environmentProvides source-code portabilityAllows efficient implementationA great deal of functionalitySupport for heterogeneous parallel architectures

  • MPI CommunicatorIdea:Group of processors that are allowed to communicate to each otherMost often use communicatorsMPI_COMM_WORLDNote MPI Format:MPI_XXXvar = MPI_Xxx(parameters);MPI_Xxx(parameters);

  • Getting Started Include MPI header fileInitialize MPI environmentWork:Make message passing callsSendReceiveTerminate MPI environment

  • Include FileIncludeInitializeWorkTerminateInclude MPI header file#include #include #include

    int main(int argc, char** argv){ }

  • Initialize MPIIncludeInitializeWorkTerminateInitialize MPI environmentint main(int argc, char** argv){

    int numtasks, rank;

    MPI_Init (*argc,*argv) ;

    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank); ... }

  • Initialize MPI (cont.)IncludeInitializeWorkTerminateMPI_Init (&argc,&argv)Not MPI functions called before this call.

    MPI_Comm_size(MPI_COMM_WORLD, &nump)A communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is a predefined communicator that consists of all the processes running when the program execution begins.MPI_Comm_rank(MPI_COMM_WORLD, &myrank)In order for a process to find out its rank.

  • Terminate MPI environmentIncludeInitializeWorkTerminateTerminate MPI environment#include #include #include

    int main(int argc, char** argv){ MPI_Finalize();}No MPI functions called after this call.

  • Lets work with MPIIncludeInitializeWorkTerminateWork:Make message passing calls (Send, Receive)if(my_rank != 0){MPI_Send(data, strlen(data)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);}else{MPI_Recv(data, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);}

  • Work (cont.)IncludeInitializeWorkTerminateint MPI_Send ( void* message, int count, MPI_Datatype datatype,int dest, int tag, MPI_Comm comm)int MPI_Recv ( void* message, int count, MPI_Datatype datatype,int source, int tag, MPI_Comm commMPI_Status *status)

  • Hello World!!#include "mpi.h" int main(int argc, char* argv[]) { int my_rank, p, source, dest, tag = 0; char message[100]; MPI_Status status;

    MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p);

    if (my_rank != 0) { /* Create message */ sprintf(message, Hello from process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }else { for(source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s", message); }}MPI_Finalize(); }

  • Compile and Run MPICompilegcc c hello.exe mpi_hello.c lmpimpicc mpi_hello.c

    Runmpirun np 5 hello.exe


    $mpirun np 5 hello.exeHello from process 1!Hello from process 2!Hello from process 3!Hello from process 4!

  • More MPI FunctionsMPI_Bcast( void *m, int s, MPI_Datatype dt, int root, MPI_Comm)Sends a copy of the data in m on the process with rank root to each process in the communicator.MPI_Reduce( void *operand, void* result, int count,MPI_Datatype datatye, MPI_Op operator, int root, MPI_Comm comm)Combines the operands stored in the memory referenced by operand using operation operator and stores the result in res on process root.double MPI_Wtime( void)Returns a double precision value that represents the number of seconds that have elapsed since some point in the past.MPI_Barrier ( MPI_Comm comm)Each process in comm block until every process in comm has called it.

  • More ExamplesTrapezoidal Rule:Integral from a to b of a nonnegative function f(x)Approach: Estimating the area by partitioning the region into regular geometric shapes and then add the areas of the shapesCompute Pi

  • Compute PI#include #include "mpi.h" #define PI 3.141592653589793238462643 #define PI_STR "3.141592653589793238462643" #define MAXLEN 40 #define f(x) (4./(1.+ (x)*(x)))

    void main(int argc, char *argv[]){ int N=0,rank,nprocrs,i,answer=1; double mypi,pi,h,sum, x, starttime,endtime,runtime,runtime_max; char buff[MAXLEN];

    MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf(CPU %d saying hello",rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocrs);

    if(rank==0)printf("Using a total of %d CPUs",nprocrs);

  • Compute PIwhile(answer){ if(rank==0){ printf("This program computes pi as "4.*Integral{0->1}[1/(1+x^2)]"); printf("(Using PI = %s)",PI_STR);printf("Input the Number of intervals: N ="); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&N); printf("pi will be computed with %d intervals on %d processors.", N ,nprocrs); }

    /*Procr 0 = P(0) gives N to all other processors*/ MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); if(N

  • Compute PIstarttime=MPI_Wtime(); sum=0.0;h=1./N; for(i=1+rank;i
  • Compute PIprintf("computed in = %f secs",runtime_max);fflush(stdout); printf("Do you wish to try another run? (y=1;n=0)"); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&answer); }

    /*processors wait while P(0) gets new input from user*/ MPI_Barrier(MPI_COMM_WORLD); MPI_Bcast(&answer,1,MPI_INT,0,MPI_COMM_WORLD); if(!answer)break; } end_program: printf("\nProcr %d: Saying good-bye!\n",rank); if(rank==0)printf("\nEND PROGRAM\n"); MPI_Finalize(); }

  • Compile and Run Example 2Compilegcc c pi.exe pi.c lmpi

    $mpirun np 2 pi.exeProcr 1 saying hello. Procr 0 saying hello Using a total of 2 CPUs

    This program computes pi as 4.*Integral{0->1}[1/(1+x^2)] (Using PI = 3.141592653589793238462643)

    Input the Number of intervals: N = 10pi will be computed with 10 intervals on 2 processors

    Procr 0: runtime = 0.000003 Procr 1: runtime = 0.000003

    For 10 intervals, pi = 3.14242598500110, error = 0.000833331 computed in = 0.000003 secs

  • What isSimilar to MPI, but used for shared memory parallelismSimple set of directivesIncremental parallelismUnfortunately only works with proprietary compilers


  • Compilers and Platforms Compilers and Platforms Fujitsu/Lahey Fortran, C and C++ Intel Linux Systems Sun Solaris Systems HP HP-UX PA-RISC/Itanium Fortran C aC++ HP Tru64 Unix Fortran C C++ IBM XL Fortran and C from IBM IBM AIX Systems Intel C++ and Fortran Compilers from Intel Intel IA32 Linux Systems Intel IA32 Windows Systems Intel Itanium-based Linux Systems Intel Itanium-based Windows Systems Guide Fortran and C/C++ from Intel's KAI Softare Lab Intel Linux Systems Intel Windows Systems PGF77 and PGF90 Compilers from The Portland Group, Inc. (PGI) Intel Linux Systems Intel Solaris Systems Intel Windows/NT Systems SGI MIPSpro 7.4 Compilers SGI IRIX Systems Sun Microsystems Sun ONE Studio 8, Compiler Collection, Fortran 95, C, and C++ Sun Solaris Platforms Compiler Collection Portal VAST from Veridian Pacific-Sierra Research IBM AIX Systems Intel IA32 Linux Systems Intel Windows/NT Systems SGI IRIX Systems Sun Solaris Systems

    taken from

  • How do you use OpenMP?C/C++ APIParallel Construct when a region of the program can be executed in multiple parallel threads, this fundamental construct starts the execution.#pragma omp parallel [clause[ [, ]clase] ] new-linestructured-blockThe clause is one of the following:if (scalarexpression)private (variable-list)firstprivate (variable-list)default (shared | none)shared (variable-list)copyin (variable-list)reduction (operator : variable-list)num_threads (integer-expression)

  • for ConstructDefines an iterative work-sharing construct in which the iterations of the associated loop will execute in parallel.Sections ConstructIdentifies a noniterative work-sharing construct that specifies a set of constructs that are to be divided among threads, each section being executed only once by each threadFundamental Constructs

  • single Constructassociates a structured blocks execution with only one threadparallel for ConstructShortcut for a parallel region containing only one for directiveparallel sections ConstructShortcut for a parallel region containing only a single sections directive

  • Master and Synchronization Directivesmaster ConstructSpecifies a structured block that is executed by the master thread of the teamcritical ConstructRestricts execution of the associated structured block to a single thread at a timebarrier DirectiveSynchronize all threads in a team. When this construct is encountered, all threads wait until the others have reached this point.

  • atomic ConstructEnsures that a specific memory location is updated atomically (meaning only one thread is allowed write-access at a time)flush DirectiveSpecifies a cross-thread sequence point at which all threads in a team are ensured a clean view of certain objects in memoryordered ConstructA structured block following this directive will iterate in the same order as if executed in a sequential loop.

  • DataHow do we control the data in this SMP environment?threadprivate Directivemakes files-scope and namespace-scope private to a threadData-Sharing Attributesprivate - private to each threadfirstprivatelastprivate shared shared among all threadsdefault User affects attributesreduction perform reduction on scalarscopyin assign the same value to threadprivate variablescopyprivate broadcast the value of a private variable from one member of a team to the others

  • Scalability test on SGI Origin 2000

    Timing results of the dot product test in milliseconds for n = 16 * 1024.

  • Timing results of matrix times matrix test in milliseconds for n = 128

  • Architecture comparisonFrom

  • ReferencesBook: Parallel Programming with MPI, Peter