parallel programming in c with the message passing...
TRANSCRIPT
![Page 1: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/1.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 5
The Sieve of EratosthenesThe Sieve of Eratosthenes
![Page 2: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/2.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter Objectives
Analysis of block allocation schemesAnalysis of block allocation schemes Function MPI_BcastFunction MPI_Bcast Performance enhancementsPerformance enhancements
![Page 3: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/3.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
Sequential algorithmSequential algorithm Sources of parallelismSources of parallelism Data decomposition optionsData decomposition options Parallel algorithm development, analysisParallel algorithm development, analysis MPI programMPI program BenchmarkingBenchmarking OptimizationsOptimizations
![Page 4: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/4.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Sequential Algorithm
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
2 4 6 8 10 12 14 16
18 20 22 24 26 28 30
32 34 36 38 40 42 44 46
48 50 52 54 56 58 60
3 9 15
21 27
33 39 45
51 57
5
25
35
55
7
49
Complexity: (n ln ln n)
![Page 5: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/5.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pseudocode1. Create list of unmarked natural numbers 2, 3, …, n2. k 23. Repeat
(a) Mark all multiples of k between k2 and n(b) k smallest unmarked number > k
until k2 > n4. The unmarked numbers are primes
![Page 6: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/6.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Sources of Parallelism
Domain decompositionDomain decompositionDivide data into piecesDivide data into piecesAssociate computational steps with dataAssociate computational steps with data
One primitive task per array elementOne primitive task per array element
![Page 7: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/7.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Making 3(a) Parallel
Mark all multiples of k between k2 and n
for all j where k2 j n do if j mod k = 0 then mark j (it is not a prime) endifendfor
![Page 8: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/8.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Making 3(b) ParallelFind smallest unmarked number > k
Min-reduction (to find smallest unmarked number > k)
Broadcast (to get result to all tasks)
![Page 9: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/9.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration Goals
Consolidate tasksConsolidate tasks Reduce communication costReduce communication cost Balance computations among processesBalance computations among processes
![Page 10: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/10.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Decomposition Options
Interleaved (cyclic)Interleaved (cyclic)Easy to determine “owner” of each indexEasy to determine “owner” of each indexLeads to load imbalance Leads to load imbalance for this problemfor this problem
BlockBlockBalances loadsBalances loadsMore complicated to determine owner if More complicated to determine owner if
nn not a multiple of not a multiple of pp
![Page 11: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/11.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Block Decomposition Options
Want to balance workload when Want to balance workload when nn not a not a multiple of multiple of pp
Each process gets either Each process gets either n/pn/p or or n/pn/p elementselements
Seek simple expressionsSeek simple expressionsFind low, high indices given an ownerFind low, high indices given an ownerFind owner given an indexFind owner given an index
![Page 12: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/12.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Method #1
Let Let r r = = nn mod mod pp If If rr = 0, all blocks have same size = 0, all blocks have same size ElseElseFirst First rr blocks have size blocks have size n/pn/pRemaining Remaining p-rp-r blocks have size blocks have size n/pn/p
![Page 13: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/13.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Examples17 elements divided among 7 processes
17 elements divided among 5 processes
17 elements divided among 3 processes
![Page 14: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/14.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Method #1 Calculations
First element controlled by process First element controlled by process ii
Last element controlled by process Last element controlled by process ii
Process controlling element Process controlling element jj
),min(/ ripni
1),1min(/)1( ripni
)//)(,)1//(min( pnrjpnj
![Page 15: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/15.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Method #2
Scatters larger blocks among processesScatters larger blocks among processes First element controlled by process First element controlled by process ii
Last element controlled by process Last element controlled by process ii
Process controlling element Process controlling element jj
pin /
1/)1( pni
njp /)1)1(
![Page 16: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/16.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Examples17 elements divided among 7 processes
17 elements divided among 5 processes
17 elements divided among 3 processes
![Page 17: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/17.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Comparing Methods
24Low index
47Owner
46High index
Method 2Method 1Operations
Assuming no operations for “floor” function
Our choice
![Page 18: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/18.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
Illustrate how block decomposition method Illustrate how block decomposition method #2 would divide 13 elements among 5 #2 would divide 13 elements among 5 processes.processes.
13(0)/ 5 = 0
13(1)/5 = 2
13(2)/ 5 = 5
13(3)/ 5 = 7
13(4)/ 5 = 10
![Page 19: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/19.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Block Decomposition Macros#define BLOCK_LOW(id,p,n) ((i)*(n)/(p))
#define BLOCK_HIGH(id,p,n) \ (BLOCK_LOW((id)+1,p,n)-1)
#define BLOCK_SIZE(id,p,n) \ (BLOCK_LOW((id)+1)-BLOCK_LOW(id))
#define BLOCK_OWNER(index,p,n) \ (((p)*(index)+1)-1)/(n))
![Page 20: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/20.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Local vs. Global Indices
L 0 1
L 0 1 2
L 0 1
L 0 1 2
L 0 1 2
G 0 1 G 2 3 4
G 5 6
G 7 8 9 G 10 11 12
![Page 21: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/21.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Looping over Elements
Sequential programSequential programfor (i = 0; i < n; i++) {for (i = 0; i < n; i++) { … …}}
Parallel programParallel programsize = BLOCK_SIZE (id,p,n);size = BLOCK_SIZE (id,p,n);for (i = 0; i < size; i++) {for (i = 0; i < size; i++) { gi = i + BLOCK_LOW(id,p,n);gi = i + BLOCK_LOW(id,p,n);}}
Index i on this process…
…takes place of sequential program’s index gi
![Page 22: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/22.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Decomposition Affects Implementation
Largest prime used to sieve is Largest prime used to sieve is nn First process has First process has nn//pp elements elements It has all sieving primes if It has all sieving primes if pp < < nn First process always broadcasts next sieving First process always broadcasts next sieving
primeprime No reduction step neededNo reduction step needed
![Page 23: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/23.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Fast Marking
Block decomposition allows same marking as Block decomposition allows same marking as sequential algorithm:sequential algorithm:
jj, , j j + + kk, , j j + 2+ 2kk, , j j + 3+ 3kk, …, …
instead ofinstead of
for all for all jj in block in blockif if jj mod mod kk = 0 then mark = 0 then mark jj (it is not a prime) (it is not a prime)
![Page 24: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/24.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Algorithm Development1. Create list of unmarked natural numbers 2, 3, …, n
2. k 2
3. Repeat
(a) Mark all multiples of k between k2 and n
(b) k smallest unmarked number > k
until k2 > m
4. The unmarked numbers are primes
Each process creates its share of listEach process does this
Each process marks its share of list
Process 0 only
(c) Process 0 broadcasts k to rest of processes
5. Reduction to determine number of primes
![Page 25: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/25.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Function MPI_Bcastint MPI_Bcast (
void *buffer, /* Addr of 1st element */
int count, /* # elements to broadcast */
MPI_Datatype datatype, /* Type of elements */
int root, /* ID of root process */
MPI_Comm comm) /* Communicator */
MPI_Bcast (&k, 1, MPI_INT, 0, MPI_COMM_WORLD);
![Page 26: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/26.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Task/Channel Graph
![Page 27: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/27.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Analysis
is time needed to mark a cellis time needed to mark a cell Sequential execution time: Sequential execution time: nn ln ln ln ln nn Number of broadcasts: Number of broadcasts: n n / ln / ln n n Broadcast time: Broadcast time: log log pp Expected execution time:Expected execution time:
pnnpnn log)ln/(/lnln
![Page 28: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/28.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Code (1/4)#include <mpi.h>#include <math.h>#include <stdio.h>#include "MyMPI.h"#define MIN(a,b) ((a)<(b)?(a):(b))
int main (int argc, char *argv[]){ ... MPI_Init (&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); elapsed_time = -MPI_Wtime(); MPI_Comm_rank (MPI_COMM_WORLD, &id); MPI_Comm_size (MPI_COMM_WORLD, &p);if (argc != 2) { if (!id) printf ("Command line: %s <m>\n", argv[0]); MPI_Finalize(); exit (1);}
![Page 29: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/29.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Code (2/4) n = atoi(argv[1]); low_value = 2 + BLOCK_LOW(id,p,n-1); high_value = 2 + BLOCK_HIGH(id,p,n-1); size = BLOCK_SIZE(id,p,n-1); proc0_size = (n-1)/p; if ((2 + proc0_size) < (int) sqrt((double) n)) { if (!id) printf ("Too many processes\n"); MPI_Finalize(); exit (1); }
marked = (char *) malloc (size); if (marked == NULL) { printf ("Cannot allocate enough memory\n"); MPI_Finalize(); exit (1); }
![Page 30: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/30.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Code (3/4) for (i = 0; i < size; i++) marked[i] = 0; if (!id) index = 0; prime = 2; do { if (prime * prime > low_value) first = prime * prime - low_value; else { if (!(low_value % prime)) first = 0; else first = prime - (low_value % prime); } for (i = first; i < size; i += prime) marked[i] = 1; if (!id) { while (marked[++index]); prime = index + 2; } MPI_Bcast (&prime, 1, MPI_INT, 0, MPI_COMM_WORLD); } while (prime * prime <= n);
![Page 31: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/31.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Code (4/4)
count = 0; for (i = 0; i < size; i++) if (!marked[i]) count++; MPI_Reduce (&count, &global_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); elapsed_time += MPI_Wtime(); if (!id) { printf ("%d primes are less than or equal to %d\n", global_count, n); printf ("Total elapsed time: %10.6f\n", elapsed_time); } MPI_Finalize (); return 0;}
![Page 32: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/32.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Benchmarking
Execute sequential algorithmExecute sequential algorithm Determine Determine = 85.47 nanosec = 85.47 nanosec Execute series of broadcastsExecute series of broadcasts Determine Determine = 250 = 250 secsec
![Page 33: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/33.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Execution Times (sec)
4.2223.9278
4.6874.3717
5.1594.9646
5.9935.7945
7.0556.7684
9.0398.8433
13.01112.7212
24.90024.9001
Actual (sec)PredictedProcessors
![Page 34: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/34.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Improvements
Delete even integersDelete even integers Cuts number of computations in halfCuts number of computations in half Frees storage for larger values of Frees storage for larger values of nn
Each process finds own sieving primesEach process finds own sieving primes Replicating computation of primes to Replicating computation of primes to nn Eliminates broadcast stepEliminates broadcast step
Reorganize loopsReorganize loops Increases cache hit rateIncreases cache hit rate
![Page 35: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/35.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Reorganize Loops
Cache hit rate
Lower
Higher
![Page 36: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/36.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Comparing 4 Versions
1.585
1.820
2.127
2.559
3.201
4.272
6.378
12.466
Sieve 3 Sieve 4
0.3422.8563.9278
0.3913.0594.3717
0.4563.2704.9646
0.5433.6525.7945
0.6794.0726.7684
0.9015.0198.8433
1.3306.60912.7212
2.54312.23724.9001
Procs Sieve 2Sieve 110-fold improvement
7-fold improvement
![Page 37: Parallel Programming in C with the Message Passing Interfaceacc6.its.brooklyn.cuny.edu/~cisc7340/examples/Chapter5.pdf6.378 12.466 Sieve 3 Sieve 4 8 3.927 2.856 0.342 7 4.371 3.059](https://reader033.vdocuments.mx/reader033/viewer/2022060719/607f0b099f1dca49633fdd53/html5/thumbnails/37.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary
Sieve of Eratosthenes: parallel design uses Sieve of Eratosthenes: parallel design uses domain decompositiondomain decomposition
Compared two block distributionsCompared two block distributionsChose one with simpler formulasChose one with simpler formulas
Introduced Introduced MPI_BcastMPI_Bcast Optimizations reveal importance of Optimizations reveal importance of
maximizing single-processor performancemaximizing single-processor performance