Parallel Programming and MPI

Download Parallel Programming and MPI

Post on 18-Nov-2014




3 download

Embed Size (px)


<p>Parallel Programming and MPIA course for IIT-M. September 2008 R Badrinath, STSD Bangalore (</p> <p> 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice</p> <p>Context and Background </p> <p>IIT- Madras has recently added a good deal of compute power. Why Further R&amp;D in sciences, engineering Provide computing services to the region Create new opportunities in education and skills </p> <p>Why this course Update skills to program modern cluster computers</p> <p>Length -2 theory and 2 practice sessions, 4 hrs each</p> <p>2</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Audience Check</p> <p>3</p> <p>Contents1. 2.3. 4.5.6.</p> <p>Instead we MPI_Init Understand MPI_Comm_rank IssuesMPI_Comm_size MPI_SendMPI_Bcast</p> <p>Understand Concepts examples</p> <p>Learn MPI_Recv MPI_SendrecvMPI_ScatterMPI_Gather</p> <p>enough to pickup from the manual</p> <p>Go by motivating MPI_Create_comm</p> <p>Try out some of the examples</p> <p>4 September 2008 IIT-Madras</p> <p>Outline Sequential Shared Parallel MPI MPI The</p> <p>vs Parallel programming</p> <p>vs Distributed Memory work breakdown models vs Computation</p> <p> Communication</p> <p>Examples Concepts role of IO</p> <p>5</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Sequential vs Parallel We</p> <p>are used to sequential programming C, Java, C+ +, etc. E.g., Bubble Sort, Binary Search, Strassen Multiplication, FFT, BLAST, idea Specify the steps in perfect order We are used to parallelism a lot more than we think as a concept; not for programming Launch a set of tasks; communicate to make progress. E.g., Sorting 500 answer papers by making 5 equal piles, have them sorted by 5 people, merge them together.</p> <p> Main</p> <p> Reality</p> <p> Methodology</p> <p>6</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Shared vs Distributed Memory Programming </p> <p>Shared Memory All tasks access the same memory, hence the same data. pthreads Distributed Memory All memory is local. Data sharing is by explicitly transporting data from one task to another (send-receive pairs in MPI, e.g.)</p> <p>Program Memory 7</p> <p>Communications channel</p> <p>HW Programming model relationship Tasks vs CPUs; SMPs vs ClustersSeptember 2008 IIT-Madras</p> <p>Designing Parallel Programs</p> <p>8</p> <p>Simple Parallel Program sorting numbers in a large array A Notionally</p> <p>divide A into 5 pieces [0..99;100..199;200..299;300..399;400..499 ]. part is sorted by an independent sequential algorithm and left within its region. resultant parts are merged by simply reordering among adjacent parts.</p> <p> Each</p> <p> The</p> <p>9</p> <p>September 2008</p> <p>IIT-Madras</p> <p>What is different Think about How</p> <p>many people doing the work. (Degree of Parallelism) is needed to begin the work. (Initialization) to work part. (Data/IO access) does what. (Work distribution)</p> <p> What Who</p> <p> Access</p> <p> Whether When What</p> <p>they need info from each other to finish their own job. (Communication) are they all done. (Synchronization) needs to be done to collate the result.</p> <p>10</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Work Break-down Parallel Prefer Usually</p> <p>algorithm</p> <p>simple intuitive breakdowns</p> <p>highly optimized sequential algorithms are not easily parallelizable work often involves some pre- or post- processing (much like divide and conquer) vs large grain parallelism and relationship to communicationSeptember 2008</p> <p> Breaking</p> <p> Fine</p> <p>11</p> <p>IIT-Madras</p> <p>Digression work</p> <p>Lets get a simple MPI Program to</p> <p>#include #include int main() { int total_size, my_rank; MPI_Init(NULL,NULL); MPI_Comm_size(MPI_COMM_WORLD, &amp;total_size); MPI_Comm_rank(MPI_COMM_WORLD, &amp;my_rank); printf("\n Total number of programs = %d, out of which rank of this process is %d\n", total_size, my_rank); MPI_Finalize(); return 0; }12 September 2008 IIT-Madras</p> <p>Getting it to work</p> <p>Compile it: mpicc o simple simple.c # If you want HP-MPI set your path # /opt/hpmpi/bin</p> <p>Run it This depends a bit on the system mpirun -np2 simple qsub l ncpus=2 o simple.out /opt/hpmpi/bin/mpirun /simple [Fun: qsub l ncpus=2 I hostname ]</p> <p>Results are in the output file. What is mpirun ? What does qsub have to do with MPI?... More about qsub in a separate talk.</p> <p>13</p> <p>September 2008</p> <p>IIT-Madras</p> <p>What goes on Same Each</p> <p>program is run at the same time on 2 different CPUs is slightly different in that each returns different values for some simple calls like MPI_Comm_rank. gives each instance its identity can make different instances run different pieces of code based on this identity difference it is an SPMD model of computation</p> <p> This We</p> <p> Typically</p> <p>14</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Continuing work breakdown</p> <p>Simple Example: Find shortest distancesPROBLEM: Find shortest path distances 1 2 7 2 5 1 7 2 3 3</p> <p>2</p> <p>0</p> <p>6</p> <p>4</p> <p>Let Nodes be numbered 0,1,,n-1 Let us put all of this in a matrix A[i][j] is the distance from i to j</p> <p>0 7 1 .. ..</p> <p>2 0 5 .. ..</p> <p>1 .. 0 2 ..</p> <p>.. .. 2 0 ..</p> <p>6 .. 3 2 0</p> <p>15</p> <p>September 2008</p> <p>IIT-Madras</p> <p>Floyds (sequential) algorithmFor (k=0; k</p>