![Page 1: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/1.jpg)
Introduction to Parallel Programming at MCSR
Presentation at
Delta State University
January 17, 2007Jason Hale
![Page 2: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/2.jpg)
What is MCSR’s Mission?
Mississippi Center for Supercomputer Research
Established in 1987 by the Mississippi Legislature
Mission: Enhance Computational Research Climate at Mississippi’s 8 Public Universities
also: Support High Performance Computing (HPC) Education in Mississippi
![Page 3: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/3.jpg)
How Does MCSR Support Research?
Research Accounts on MCSR SupercomputersAvailable to all researcher at MS universitiesNo cost to the researcher or the institution800+ Research Accounts Active in 2006
ServicesConsultingSeminarsSoftware Installation, Compiling, and Troubleshooting onMCSR systems
![Page 4: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/4.jpg)
What Research at MCSR?
MCSR research users reported a total of over $38,000,000 in Active Research Funds (FY 2006)
Currently Active Research Areas:Computational ChemistryCivil EngineeringOperations ResearchFluid Dynamics….
http://www.mcsr.olemiss.edu/research.php
![Page 5: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/5.jpg)
Education at MCSR
Over 64 Courses Supported since 2000
- Alcorn State University- Delta State University- The University of Southern Mississippi- Mississippi Valley State University- The University of Mississippi
C/C++, Fortran, MPI, OpenMP, MySQL, HTML, Javascript, Matlab, PHP, Perl, ….
http://www.mcsr.olemiss.edu/education.php
![Page 6: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/6.jpg)
• Gaussian 03, GAMESS, Amber, MPQC, NWChem chemistry packages
• PBS Professional 7.0 (for batch scheduling)• Fortran, C, C++ (Intel, PGI, GNU)• Abaqus, Patran (Engineering)
• What software do you need?
Software at MCSR
![Page 7: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/7.jpg)
What Research at MCSR?
Supported Research Funds
$0$5,000,000
$10,000,000$15,000,000$20,000,000$25,000,000$30,000,000$35,000,000$40,000,000$45,000,000$50,000,000
SupportedResearch Funds
![Page 8: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/8.jpg)
What is a Supercomputer?
Loosely speaking, it is a “large” computer with an architecture that has been optimized for bigger solving problems faster than a conventional desktop, mainframe, or server computer.
- Pipelining
- Parallelism (lots of CPUs or Computers)
![Page 9: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/9.jpg)
Supercomputers at MCSR: redwood
- 224 CPU SGI Altix 3700 Supercomputer- 224 GB of shared memory
![Page 10: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/10.jpg)
Supercomputers at MCSR: mimosa
- 253 CPU Intel Linux Cluster – Pentium 4- Distributed memory – 500MB – 1GB per node- Gigabit Ethernet
![Page 11: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/11.jpg)
Supercomputers at MCSR: sweetgum
- SGI Origin 2800 128-CPU Supercomputer- 64 GB of shared memory
![Page 12: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/12.jpg)
What is Parallel Computing?
Using more than one computer (or processor) to complete a computational problem
Examples of Parallelism in Every Day Life?
![Page 13: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/13.jpg)
How May a Problem be Parallelized?
Data DecompositionExamples?
Task DecompositionExamples?
![Page 14: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/14.jpg)
Introduction to Parallel Programming at MCSR
• Message Passing Computing– Processes coordinate and communicate results via calls to message passing
library routines– Programmers “parallelize” algorithm and add message calls– At MCSR, this is via MPI programming with C or Fortran
• Sweetgum – Origin 2800 Supercomputer (128 CPUs)• Mimosa – Beowulf Cluster with 253 Nodes• Redwood – Altix 3700 Supercomputer (224 CPUs)
• Shared Memory Computing– Processes or threads coordinate and communicate results via shared memory
variables– Care must be taken not to modify the wrong memory areas– At MCSR, this is via OpenMP programming with C or Fortran on sweetgum
![Page 15: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/15.jpg)
Message Passing Computing at MCSR
• Process Creation• Slave and Master Processes• Static vs. Dynamic Work Allocation • Compilation• Models• Basics• Synchronous Message Passing• Collective Message Passing• Deadlocks• Examples
![Page 16: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/16.jpg)
Message Passing Process Creation
• Dynamic– one process spawns other processes & gives them work
– PVM
– More flexible
– More overhead - process creation and cleanup
• Static– Total number of processes determined before execution
begins
– MPI
![Page 17: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/17.jpg)
Message Passing Processes
• Often, one process will be the master, and the remaining processes will be the slaves
• Each process has a unique rank/identifier
• Each process runs in a separate memory space and has its own copy of variables
![Page 18: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/18.jpg)
Message Passing Work Allocation
• Master Process– Does initial sequential processing– Initially distributes work among the slaves
• Statically or Dynamically
– Collects the intermediate results from slaves– Combines into the final solution
• Slave Process– Receives work from, and returns results to, the master– May distribute work amongst themselves
(decentralized load balancing)
![Page 19: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/19.jpg)
Message Passing Compilation
• Compile/link programs w/ message passing libraries using regular (sequential) compilers
• Fortran MPI example:include mpif.h
• C MPI example:#include “mpi.h”
• See http://www.mcsr.olemiss.edu/appssubpage.php?pagename=mpi.inc
for exact MCSR MPI directory locations
![Page 20: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/20.jpg)
Message Passing Models
• SPMD – Shared Program/Multiple Data– Single version of the source code used for each process– Master executes one portion of the program; slaves
execute another; some portions executed by both– Requires one compilation per architecture type– MPI
• MPMP – Multiple Program/Multiple Data– Once source code for master; another for slave– Each must be compiled separately– PVM
![Page 21: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/21.jpg)
Message Passing Basics
• Each process must first establish the message passing environment
• Fortran MPI example:integer ierror
call MPI_INIT (ierror)
• C MPI example:int ierror;ierror = MPI_Init(&argc, &argv);
![Page 22: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/22.jpg)
Message Passing Basics
• Each process has a rank, or id number– 0, 1, 2, … n-1, where there are n processes
• With SPMD, each process must determine its own rank by calling a library routine
• Fortran MPI Example:integer comm, rank, ierrorcall MPI_COMM_RANK(MPI_COMM_WORLD, rank,
ierror)
• C MPI Exampleierror = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
![Page 23: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/23.jpg)
Message Passing Basics
• Each process has a rank, or id number– 0, 1, 2, … n-1, where there are n processes
• Each process may use a library call to determine how many total processes it has to play with
• Fortran MPI Example:integer comm, size, ierrorcall MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
• C MPI Exampleierror = MPI_Comm_rank(MPI_COMM_WORLD, &size);
![Page 24: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/24.jpg)
Message Passing Basics
• Each process has a rank, or id number– 0, 1, 2, … n-1, where there are n processes
• Once a process knows the size, it also knows the ranks (id #’s) of those other processes, and can send or receive a message to/from any other process.
• Fortran MPI Example:call MPI_SEND(buf, count, datatype, dest, tag, comm, ierror) ------DATA---------- ---EVELOPE--- -status------call MPI_RECV(buf, count, datatype, sourc,tag,comm, status,ierror)
![Page 25: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/25.jpg)
MPI Send and Receive Arguments
• Buf starting location of data• Count number of elements• Datatype MPI_Integer, MPI_Real, MPI_Character…• Destination rank of process to whom msg being sent• Source rank of sender from whom msg being received
or MPI_ANY_SOURCE
• Tag integer chosen by program to indicate type of messageor MPI_ANY_TAG
• Communicator id’s the process team, e.g., MPI_COMM_WORLD
• Status the result of the call (such as the # data items received)
![Page 26: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/26.jpg)
Synchronous Message Passing
• Message calls may be blocking or nonblocking
• Blocking Send– Waits to return until the message has been received by the
destination process
– This synchronizes the sender with the receiver
• Nonblocking Send– Return is immediate, without regard for whether the message has
been transferred to the receiver
– DANGER: Sender must not change the variable containing the old message before the transfer is done.
– MPI_ISend() is nonblocking
![Page 27: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/27.jpg)
Synchronous Message Passing
• Locally Blocking Send– The message is copied from the send parameter
variable to intermediate buffer in the calling process– Returns as soon as the local copy is complete– Does not wait for receiver to transfer the message from
the buffer– Does not synchronize– The sender’s message variable may safely be reused
immediately – MPI_Send() is locally blocking
![Page 28: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/28.jpg)
Synchronous Message Passing
• Blocking Receive– The call waits until a message matching the given tag has been
received from the specified source process.– MPI_RECV() is blocking.
• Nonblocking Receive– If this process has a qualifying message waiting, retrieves that
message and returns– If no messages have been received yet, returns anyway– Used if the receiver has other work it can be doing while it waits– Status tells the receive whether the message was received– MPI_Irecv() is nonblocking– MPI_Wait() and MPI_Test() can be used to periodically check to see
if the message is ready, and finally wait for it, if desired
![Page 29: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/29.jpg)
Collective Message Passing
• Broadcast– Sends a message from one to all processes in the group
• Scatter– Distributes each element of a data array to a different
process for computation
• Gather– The reverse of scatter…retrieves data elements into an
array from multiple processes
![Page 30: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/30.jpg)
Collective Message Passing w/MPI
MPI_Bcast() Broadcast from root to all other processes
MPI_Gather() Gather values for group of processes
MPI_Scatter() Scatters buffer in parts to group of processes
MPI_Alltoall() Sends data from all processes to all processes
MPI_Reduce() Combine values on all processes to single val
MPI_Reduce_Scatter() Broadcast from root to all other processes
MPI_Bcast() Broadcast from root to all other processes
![Page 31: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/31.jpg)
Message Passing Deadlock
• Deadlock can occur when all critical processes are waiting for messages that never come, or waiting for buffers to clear out so that their own messages can be sent
• Possible Causes– Program/algorithm errors
– Message and buffer sizes
• Solutions– Order operations more carefully
– Use nonblocking operations
– Add debugging output statements to your code to find the problem
![Page 32: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/32.jpg)
Portable Batch System in SGI
• Sweetgum: – PBS Professional 7.0 is installed on sweetgum.
Queue Max # Processors Max # Running Memory Limit CPU Time Limit Special Validationper User Job Jobs per Queue per User Job per User Job Required
SM-defR 4 40 500mb 288 hrs NoMM-defR 4 20 1gb 288 hrs NoLM-defR 4 2 4gb 288 hrs YesLM-XR 4 1 4gb 672 hrs YesLM-8p 8 1 4gb 672 hrs YesLM-16p 16 1 4gb 672 hrs Yes
![Page 33: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/33.jpg)
Portable Batch System in Linux
• Mimosa PBS Configuration: – PBS Professional 7.1 is installed on mimosa
Queue Max # Nodes Default Memory Default Shared Max # Running Special Validationper User Job (MB) Memory (MB) Jobs per Queue Required
MCSR-2N 2 400 256 32 NoMCSR-4N 4 600 256 12 YesMCSR-8N 8 800 256 8 YesMCSR-16N 16 1000 256 4 YesMCSR-32N 32 1200 256 4 YesMCSR-64N 64 1200 256 2 YesMCSR-CA 0 400 256 13 Yes
![Page 34: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/34.jpg)
Sample Portable Batch System Script Sample
mimosa% vi example.pbs #!/bin/bash
#PBS -l nodes=4 (MIMOSA)
#PBS –l ncpus=4 (SWEETGUM)
#PBS -q MCSR-4N
#PBS –N example
export PGI=/usr/local/apps/pgi-6.1
export PATH=$PGI/linux86/6.1/bin:$PATH
cd $PWD
rm *.pbs.[eo]*
pgcc –o add_mpi.exe add_mpi.c –lmpich
mpirun -np 4 add_mpi.exe
mimosa % qsub example.pbs37537.mimosa.mcsr.olemiss.edu
![Page 35: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/35.jpg)
Sample Portable Batch System Script Sample
Mimosa% qstatJob id Name User Time Use S Queue--------------- -------- --------- ----------- - -----------37521.mimosa 4_3.pbs r0829 01:05:17 R MCSR-2N 37524.mimosa 2_4.pbs r0829 01:00:58 R MCSR-2N 37525.mimosa GC8w.pbs lgorb 01:03:25 R MCSR-2N 37526.mimosa 3_6.pbs r0829 01:01:54 R MCSR-2N 37528.mimosa GCr8w.pbs lgorb 00:59:19 R MCSR-2N 37530.mimosa ATr7w.pbs lgorb 00:55:29 R MCSR-2N 37537.mimosa example tpirim 0 Q MCSR-16N 37539.mimosa try1 cs49011 00:00:00 R MCSR-CA
– Further information about using PBS at MCSR: http://www.mcsr.olemiss.edu/appssubpage.php?pagename=pbs_1.inc&menu=vMBPBS.inc
![Page 36: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/36.jpg)
For More Information
Hello World MPI Examples on Sweetgum (/usr/local/appl/mpihello) and Mimosa (/usr/local/apps/ppro/mpiworkshop):
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex1.inc
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex2.inc
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex3.inc
WebsitesMPI at MCSR: http://www.mcsr.olemiss.edu/appssubpage.php?pagename=mpi.inc
PBS at MCSR: http://www.mcsr.olemiss.edu/appssubpage.php?pagename=pbs_1.inc&menu=vMBPBS.inc
Mimosa Cluster: http://www.mcsr.olemiss.edu/supercomputerssubpage.php?pagename=mimosa2.inc
MCSR Accounts: http://www.mcsr.olemiss.edu/supercomputerssubpage.php?pagename=accounts.incThe
![Page 37: Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56649e7c5503460f94b7e9ea/html5/thumbnails/37.jpg)
MPI Programming Exercises
Hello World
sequential
parallel (w/MPI and PBS)
Add and Array of numbers
sequential
parallel (w/MPI and PBS)