an introduction to parallel computingusers.jyu.fi/~tro/tiea342/material_for_tuesdays/intro_2.pdf ·...
TRANSCRIPT
![Page 1: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/1.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
WARNING”Taking a couple of programming courses orprogramming a home computer does not qualify anyoneto produce safety-critical [parallel ] software.Any engineer [physicist, mathematician,...] is notautomatically qualified to be a software engineer – anextensive program of study and experience is required.Safety-critical software engineering requires training andexperience in addition to that required for noncriticalsoftware.”
(An Investigation of the Therac-25 Accidents, IEEE Computer, Vol. 26, No. 7, July 1993, pp. 18-41)
R. Makinen An Introduction to Parallel Computing
![Page 2: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/2.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
Traditionally developing parallel programs has beena very manual process. The programmer isresponsible for both identifying and implementingparallelism.
Manually developing parallel codes is a timeconsuming and error-prone process.
Automatic tools exist to assist the programmer withconverting serial programs into parallel programs.The most common type of tool is a parallelizingcompiler.
R. Makinen An Introduction to Parallel Computing
![Page 3: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/3.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
Traditionally developing parallel programs has beena very manual process. The programmer isresponsible for both identifying and implementingparallelism.
Manually developing parallel codes is a timeconsuming and error-prone process.
Automatic tools exist to assist the programmer withconverting serial programs into parallel programs.The most common type of tool is a parallelizingcompiler.
R. Makinen An Introduction to Parallel Computing
![Page 4: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/4.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
Traditionally developing parallel programs has beena very manual process. The programmer isresponsible for both identifying and implementingparallelism.
Manually developing parallel codes is a timeconsuming and error-prone process.
Automatic tools exist to assist the programmer withconverting serial programs into parallel programs.The most common type of tool is a parallelizingcompiler.
R. Makinen An Introduction to Parallel Computing
![Page 5: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/5.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
A parallelizing compiler generally works in twodifferent ways:
Fully Automatic: The compiler analyzes the sourcecode and identifies opportunities for parallelism(loops etc).Programmer Directed: Using e.g. compiler directives,the programmer explicitly tells the compiler how toparallelize the code.
Automatic parallellization may not give increase inperformance =⇒ limited use.
R. Makinen An Introduction to Parallel Computing
![Page 6: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/6.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
A parallelizing compiler generally works in twodifferent ways:
Fully Automatic: The compiler analyzes the sourcecode and identifies opportunities for parallelism(loops etc).Programmer Directed: Using e.g. compiler directives,the programmer explicitly tells the compiler how toparallelize the code.
Automatic parallellization may not give increase inperformance =⇒ limited use.
R. Makinen An Introduction to Parallel Computing
![Page 7: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/7.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
A parallelizing compiler generally works in twodifferent ways:
Fully Automatic: The compiler analyzes the sourcecode and identifies opportunities for parallelism(loops etc).Programmer Directed: Using e.g. compiler directives,the programmer explicitly tells the compiler how toparallelize the code.
Automatic parallellization may not give increase inperformance =⇒ limited use.
R. Makinen An Introduction to Parallel Computing
![Page 8: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/8.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Automatic vs. Manual Parallelization
A parallelizing compiler generally works in twodifferent ways:
Fully Automatic: The compiler analyzes the sourcecode and identifies opportunities for parallelism(loops etc).Programmer Directed: Using e.g. compiler directives,the programmer explicitly tells the compiler how toparallelize the code.
Automatic parallellization may not give increase inperformance =⇒ limited use.
R. Makinen An Introduction to Parallel Computing
![Page 9: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/9.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Before parallelization attempts
1 Choose numerically stable and efficient algorithm.It makes no sense to parallelize unefficient orunreliable code!
2 Identify the program’s hotspots.3 Identify inhibitors to parallelism in the hotspots.4 Determine whether or not the hotspots can be
parallelized.
R. Makinen An Introduction to Parallel Computing
![Page 10: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/10.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Before parallelization attempts
1 Choose numerically stable and efficient algorithm.It makes no sense to parallelize unefficient orunreliable code!
2 Identify the program’s hotspots.3 Identify inhibitors to parallelism in the hotspots.4 Determine whether or not the hotspots can be
parallelized.
R. Makinen An Introduction to Parallel Computing
![Page 11: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/11.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Before parallelization attempts
1 Choose numerically stable and efficient algorithm.It makes no sense to parallelize unefficient orunreliable code!
2 Identify the program’s hotspots.3 Identify inhibitors to parallelism in the hotspots.4 Determine whether or not the hotspots can be
parallelized.
R. Makinen An Introduction to Parallel Computing
![Page 12: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/12.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Before parallelization attempts
1 Choose numerically stable and efficient algorithm.It makes no sense to parallelize unefficient orunreliable code!
2 Identify the program’s hotspots.3 Identify inhibitors to parallelism in the hotspots.4 Determine whether or not the hotspots can be
parallelized.
R. Makinen An Introduction to Parallel Computing
![Page 13: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/13.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Examples
A parallelizable problem: Numerical integration of afunction f : R → R:∫ b
af (x) dx ≈
n∑i=1
wif (xi),
where (wi, xi), i = 1, ..., n are given constants.
A non-parallelizable problem: Calculation of theFibonacci numbers by Fk+2 = Fk+1 + Fk. Thecalculation of the k + 2 value uses those of bothk + 1 and k. These three terms cannot be calculatedindependently and therefore, not in parallel.
R. Makinen An Introduction to Parallel Computing
![Page 14: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/14.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Examples
A parallelizable problem: Numerical integration of afunction f : R → R:∫ b
af (x) dx ≈
n∑i=1
wif (xi),
where (wi, xi), i = 1, ..., n are given constants.
A non-parallelizable problem: Calculation of theFibonacci numbers by Fk+2 = Fk+1 + Fk. Thecalculation of the k + 2 value uses those of bothk + 1 and k. These three terms cannot be calculatedindependently and therefore, not in parallel.
R. Makinen An Introduction to Parallel Computing
![Page 15: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/15.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Concentrate on hotspots
Know where most of the real work is being done.
The main work in most scientific computingprograms is done in a few places (linear systemsolvers, force calculation in MD,...)
Profilers and performance analysis tools will helphere.
Focus on parallelizing the hotspots and ignoresections of the program that account for little CPUusage.
R. Makinen An Introduction to Parallel Computing
![Page 16: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/16.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Concentrate on hotspots
Know where most of the real work is being done.
The main work in most scientific computingprograms is done in a few places (linear systemsolvers, force calculation in MD,...)
Profilers and performance analysis tools will helphere.
Focus on parallelizing the hotspots and ignoresections of the program that account for little CPUusage.
R. Makinen An Introduction to Parallel Computing
![Page 17: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/17.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Concentrate on hotspots
Know where most of the real work is being done.
The main work in most scientific computingprograms is done in a few places (linear systemsolvers, force calculation in MD,...)
Profilers and performance analysis tools will helphere.
Focus on parallelizing the hotspots and ignoresections of the program that account for little CPUusage.
R. Makinen An Introduction to Parallel Computing
![Page 18: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/18.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Concentrate on hotspots
Know where most of the real work is being done.
The main work in most scientific computingprograms is done in a few places (linear systemsolvers, force calculation in MD,...)
Profilers and performance analysis tools will helphere.
Focus on parallelizing the hotspots and ignoresections of the program that account for little CPUusage.
R. Makinen An Introduction to Parallel Computing
![Page 19: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/19.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
I/O usually inhibits parallelism
do i =1,ndo j =1,n
tmp=1.0/(1+( i−j )∗∗2)write (6 ,fmt= ’ ( g9.2) ’ ,&
advance= ’no ’ ) tmpa ( i , j ) = tmp
end dowrite (6 ,∗ ) ’ ’
end do
do i =1,ndo j =1,n
a ( i , j )=1.0/(1+( i−j )∗∗2)end do
end dodo i =1,n
do j =1,nwrite (6 ,fmt= ’ ( g9.2) ’ ,&advance= ’no ’ ) a ( i , j )
end dowrite (6 ,∗ ) ’ ’
end do
R. Makinen An Introduction to Parallel Computing
![Page 20: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/20.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partition
One of the first steps in designing a parallel programis to break the problem into discrete ”chunks” ofwork that can be distributed to multiple tasks.
This is known as decomposition or partitioning of theproblem.
There are two basic ways to partition computationalwork among parallel tasks: domain decompositionand functional decomposition.
R. Makinen An Introduction to Parallel Computing
![Page 21: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/21.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partition
One of the first steps in designing a parallel programis to break the problem into discrete ”chunks” ofwork that can be distributed to multiple tasks.
This is known as decomposition or partitioning of theproblem.
There are two basic ways to partition computationalwork among parallel tasks: domain decompositionand functional decomposition.
R. Makinen An Introduction to Parallel Computing
![Page 22: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/22.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partition
One of the first steps in designing a parallel programis to break the problem into discrete ”chunks” ofwork that can be distributed to multiple tasks.
This is known as decomposition or partitioning of theproblem.
There are two basic ways to partition computationalwork among parallel tasks: domain decompositionand functional decomposition.
R. Makinen An Introduction to Parallel Computing
![Page 23: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/23.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Domain Decomposition
The data associated with a problem is decomposed.Each parallel task then works on a portion of of thedata.There are different ways to partition data.A matrix can be distributed onto four processors e.g.in the following ways:
R. Makinen An Introduction to Parallel Computing
![Page 24: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/24.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Domain Decomposition
The data associated with a problem is decomposed.Each parallel task then works on a portion of of thedata.There are different ways to partition data.A matrix can be distributed onto four processors e.g.in the following ways:
R. Makinen An Introduction to Parallel Computing
![Page 25: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/25.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Domain Decomposition
The data associated with a problem is decomposed.Each parallel task then works on a portion of of thedata.There are different ways to partition data.A matrix can be distributed onto four processors e.g.in the following ways:
R. Makinen An Introduction to Parallel Computing
![Page 26: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/26.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Functional Decomposition
The problem is decomposed according to the workthat must be done. Each task then performs aportion of the overall work.
Functional decomposition lends itself well toproblems that can be split into different tasks. Forexample CFD: task1 computes x-veloc., task2y-veloc, task3 pressure, and task4 temperature.
Combining these two types of problem decompositionis common and natural.
R. Makinen An Introduction to Parallel Computing
![Page 27: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/27.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Functional Decomposition
The problem is decomposed according to the workthat must be done. Each task then performs aportion of the overall work.
Functional decomposition lends itself well toproblems that can be split into different tasks. Forexample CFD: task1 computes x-veloc., task2y-veloc, task3 pressure, and task4 temperature.
Combining these two types of problem decompositionis common and natural.
R. Makinen An Introduction to Parallel Computing
![Page 28: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/28.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Partitioning by Functional Decomposition
The problem is decomposed according to the workthat must be done. Each task then performs aportion of the overall work.
Functional decomposition lends itself well toproblems that can be split into different tasks. Forexample CFD: task1 computes x-veloc., task2y-veloc, task3 pressure, and task4 temperature.
Combining these two types of problem decompositionis common and natural.
R. Makinen An Introduction to Parallel Computing
![Page 29: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/29.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Some problems can be decomposed and executed inparallel with virtually no need for tasks to share data.
These types of problems are called embarrassinglyparallel. Very little inter-task communication isrequired.
Let matrices A, B, and C be decomposed ontodifferent processors in similar ways. Then matrixaddition
ci,j = ai,j + bi,j , i, j = 1, ..., n
is embarassingly parallel.
R. Makinen An Introduction to Parallel Computing
![Page 30: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/30.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Some problems can be decomposed and executed inparallel with virtually no need for tasks to share data.
These types of problems are called embarrassinglyparallel. Very little inter-task communication isrequired.
Let matrices A, B, and C be decomposed ontodifferent processors in similar ways. Then matrixaddition
ci,j = ai,j + bi,j , i, j = 1, ..., n
is embarassingly parallel.
R. Makinen An Introduction to Parallel Computing
![Page 31: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/31.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Some problems can be decomposed and executed inparallel with virtually no need for tasks to share data.
These types of problems are called embarrassinglyparallel. Very little inter-task communication isrequired.
Let matrices A, B, and C be decomposed ontodifferent processors in similar ways. Then matrixaddition
ci,j = ai,j + bi,j , i, j = 1, ..., n
is embarassingly parallel.
R. Makinen An Introduction to Parallel Computing
![Page 32: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/32.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Usually, you do need communications.
Most parallel applications are not trivial, and dorequire tasks to share data with each other.
Let (square) matrices A, B, and C be decomposedonto different processors in similar ways. The matrixmultiplication
ci,j =n∑
k=1
ai,kbk,j , i, j = 1, ..., n
requires substantial amount of communicationbetween tasks.
R. Makinen An Introduction to Parallel Computing
![Page 33: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/33.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Usually, you do need communications.
Most parallel applications are not trivial, and dorequire tasks to share data with each other.
Let (square) matrices A, B, and C be decomposedonto different processors in similar ways. The matrixmultiplication
ci,j =n∑
k=1
ai,kbk,j , i, j = 1, ..., n
requires substantial amount of communicationbetween tasks.
R. Makinen An Introduction to Parallel Computing
![Page 34: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/34.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
When communication is needed?
Usually, you do need communications.
Most parallel applications are not trivial, and dorequire tasks to share data with each other.
Let (square) matrices A, B, and C be decomposedonto different processors in similar ways. The matrixmultiplication
ci,j =n∑
k=1
ai,kbk,j , i, j = 1, ..., n
requires substantial amount of communicationbetween tasks.
R. Makinen An Introduction to Parallel Computing
![Page 35: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/35.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Cost of communications
Inter-task communication always implies overhead.
Machine cycles and resources that could be used forcomputation are instead used to package andtransmit data.
Communications frequently require some type ofsynchronization between tasks, which can result intasks waiting instead of doing work.
Competing communication traffic can saturate theavailable network bandwidth causing performanceproblems.
R. Makinen An Introduction to Parallel Computing
![Page 36: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/36.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Cost of communications
Inter-task communication always implies overhead.
Machine cycles and resources that could be used forcomputation are instead used to package andtransmit data.
Communications frequently require some type ofsynchronization between tasks, which can result intasks waiting instead of doing work.
Competing communication traffic can saturate theavailable network bandwidth causing performanceproblems.
R. Makinen An Introduction to Parallel Computing
![Page 37: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/37.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Cost of communications
Inter-task communication always implies overhead.
Machine cycles and resources that could be used forcomputation are instead used to package andtransmit data.
Communications frequently require some type ofsynchronization between tasks, which can result intasks waiting instead of doing work.
Competing communication traffic can saturate theavailable network bandwidth causing performanceproblems.
R. Makinen An Introduction to Parallel Computing
![Page 38: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/38.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Cost of communications
Inter-task communication always implies overhead.
Machine cycles and resources that could be used forcomputation are instead used to package andtransmit data.
Communications frequently require some type ofsynchronization between tasks, which can result intasks waiting instead of doing work.
Competing communication traffic can saturate theavailable network bandwidth causing performanceproblems.
R. Makinen An Introduction to Parallel Computing
![Page 39: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/39.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Latency vs. Bandwidth
Latency is the time it takes to send a minimal (0byte) message from point A to point B. Commonlyexpressed as microseconds.
Bandwidth is the amount of data that can becommunicated per unit of time. Commonlyexpressed as megabytes/sec.
The time needed communicate L megabytes longmessage from A to B is thus
t = tlat + L/B
R. Makinen An Introduction to Parallel Computing
![Page 40: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/40.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Latency vs. Bandwidth
Latency is the time it takes to send a minimal (0byte) message from point A to point B. Commonlyexpressed as microseconds.
Bandwidth is the amount of data that can becommunicated per unit of time. Commonlyexpressed as megabytes/sec.
The time needed communicate L megabytes longmessage from A to B is thus
t = tlat + L/B
R. Makinen An Introduction to Parallel Computing
![Page 41: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/41.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Latency vs. Bandwidth
Latency is the time it takes to send a minimal (0byte) message from point A to point B. Commonlyexpressed as microseconds.
Bandwidth is the amount of data that can becommunicated per unit of time. Commonlyexpressed as megabytes/sec.
The time needed communicate L megabytes longmessage from A to B is thus
t = tlat + L/B
R. Makinen An Introduction to Parallel Computing
![Page 42: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/42.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Sending many small messages can cause latency todominate communication overheads:
call send( nrows, 1, dest, message_type )call send( ncols, 1, dest, message_type )call send( offset, 1, dest, message_type )
It is more efficient to pack small messages into alarger message:
mdata(1:3) = (/ nrows, ncols, offset /)call send( mdata, 3, dest, message_type )
R. Makinen An Introduction to Parallel Computing
![Page 43: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/43.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Sending many small messages can cause latency todominate communication overheads:
call send( nrows, 1, dest, message_type )call send( ncols, 1, dest, message_type )call send( offset, 1, dest, message_type )
It is more efficient to pack small messages into alarger message:
mdata(1:3) = (/ nrows, ncols, offset /)call send( mdata, 3, dest, message_type )
R. Makinen An Introduction to Parallel Computing
![Page 44: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/44.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronous vs. asynchronous communications
Synchronous communications require some type of”handshaking” between tasks that are sharing data.This can be explicitly structured in code by theprogrammer, or it may happen at a lower levelunknown to the programmer.
Synchronous communications are often referred toas blocking communications since other work mustwait until the communications have completed.
R. Makinen An Introduction to Parallel Computing
![Page 45: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/45.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronous vs. asynchronous communications
Synchronous communications require some type of”handshaking” between tasks that are sharing data.This can be explicitly structured in code by theprogrammer, or it may happen at a lower levelunknown to the programmer.
Synchronous communications are often referred toas blocking communications since other work mustwait until the communications have completed.
R. Makinen An Introduction to Parallel Computing
![Page 46: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/46.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronous vs. asynchronous communications
Asynchronous communications allow tasks totransfer data independently from one another. Forexample, task 1 can prepare and send a message totask 2, and then immediately begin doing otherwork. When task 2 actually receives the data doesn’tmatter.
Asynchronous communications are often referred toas non-blocking communications since other workcan be done while the communications are takingplace.
Interleaving computation with communication is thebenefit for using asynchronous communications.
R. Makinen An Introduction to Parallel Computing
![Page 47: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/47.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronous vs. asynchronous communications
Asynchronous communications allow tasks totransfer data independently from one another. Forexample, task 1 can prepare and send a message totask 2, and then immediately begin doing otherwork. When task 2 actually receives the data doesn’tmatter.
Asynchronous communications are often referred toas non-blocking communications since other workcan be done while the communications are takingplace.
Interleaving computation with communication is thebenefit for using asynchronous communications.
R. Makinen An Introduction to Parallel Computing
![Page 48: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/48.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronous vs. asynchronous communications
Asynchronous communications allow tasks totransfer data independently from one another. Forexample, task 1 can prepare and send a message totask 2, and then immediately begin doing otherwork. When task 2 actually receives the data doesn’tmatter.
Asynchronous communications are often referred toas non-blocking communications since other workcan be done while the communications are takingplace.
Interleaving computation with communication is thebenefit for using asynchronous communications.
R. Makinen An Introduction to Parallel Computing
![Page 49: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/49.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Example: blocking communication
d = F3(c, b)
receive(b)
c = F1(a)
send(a)
e = F4(a, b)
send(b)
b = F2(a)
receive(a)
time
R. Makinen An Introduction to Parallel Computing
![Page 50: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/50.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Example: non-blocking communication
d = F3(c, b)
receive(b)
c = F1(a)
send(a)
e = F4(a, b)
send(b)
b = F2(a)
receive(a)
time
R. Makinen An Introduction to Parallel Computing
![Page 51: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/51.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Scope of communications
Knowing which tasks must communicate with eachother is critical during the design stage of a parallelcode.
Point-to-point involves two tasks with one taskacting as the sender/producer of data, and the otheracting as the receiver.
Collective involves data sharing between all tasks (orbetween several tasks being members of a specifiedgroup).
Point-to-point and collective communication may beimplemented synchronously or asynchronously.
R. Makinen An Introduction to Parallel Computing
![Page 52: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/52.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Scope of communications
Knowing which tasks must communicate with eachother is critical during the design stage of a parallelcode.
Point-to-point involves two tasks with one taskacting as the sender/producer of data, and the otheracting as the receiver.
Collective involves data sharing between all tasks (orbetween several tasks being members of a specifiedgroup).
Point-to-point and collective communication may beimplemented synchronously or asynchronously.
R. Makinen An Introduction to Parallel Computing
![Page 53: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/53.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Scope of communications
Knowing which tasks must communicate with eachother is critical during the design stage of a parallelcode.
Point-to-point involves two tasks with one taskacting as the sender/producer of data, and the otheracting as the receiver.
Collective involves data sharing between all tasks (orbetween several tasks being members of a specifiedgroup).
Point-to-point and collective communication may beimplemented synchronously or asynchronously.
R. Makinen An Introduction to Parallel Computing
![Page 54: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/54.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Scope of communications
Knowing which tasks must communicate with eachother is critical during the design stage of a parallelcode.
Point-to-point involves two tasks with one taskacting as the sender/producer of data, and the otheracting as the receiver.
Collective involves data sharing between all tasks (orbetween several tasks being members of a specifiedgroup).
Point-to-point and collective communication may beimplemented synchronously or asynchronously.
R. Makinen An Introduction to Parallel Computing
![Page 55: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/55.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Typical collective communications
a a a a
a
broadcast
a b c d
a|b|c|d
scatter
a b c d
a|b|c|d
gather
a b c d
a + b + c + d reduction
R. Makinen An Introduction to Parallel Computing
![Page 56: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/56.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Typical collective communications
a a a a
a
broadcast
a b c d
a|b|c|d
scatter
a b c d
a|b|c|d
gather
a b c d
a + b + c + d reduction
R. Makinen An Introduction to Parallel Computing
![Page 57: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/57.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Typical collective communications
a a a a
a
broadcast
a b c d
a|b|c|d
scatter
a b c d
a|b|c|d
gather
a b c d
a + b + c + d reduction
R. Makinen An Introduction to Parallel Computing
![Page 58: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/58.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Typical collective communications
a a a a
a
broadcast
a b c d
a|b|c|d
scatter
a b c d
a|b|c|d
gather
a b c d
a + b + c + d reduction
R. Makinen An Introduction to Parallel Computing
![Page 59: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/59.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by a barrier
Usually all tasks are involved.
Each task performs its work until it reaches thebarrier. It then stops, or ”blocks”.
When the last task reaches the barrier, all tasks aresynchronized.
What happens from here varies. Often, a serialsection of work must be done. In other cases, thetasks are automatically released to continue theirwork.
R. Makinen An Introduction to Parallel Computing
![Page 60: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/60.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by a barrier
Usually all tasks are involved.
Each task performs its work until it reaches thebarrier. It then stops, or ”blocks”.
When the last task reaches the barrier, all tasks aresynchronized.
What happens from here varies. Often, a serialsection of work must be done. In other cases, thetasks are automatically released to continue theirwork.
R. Makinen An Introduction to Parallel Computing
![Page 61: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/61.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by a barrier
Usually all tasks are involved.
Each task performs its work until it reaches thebarrier. It then stops, or ”blocks”.
When the last task reaches the barrier, all tasks aresynchronized.
What happens from here varies. Often, a serialsection of work must be done. In other cases, thetasks are automatically released to continue theirwork.
R. Makinen An Introduction to Parallel Computing
![Page 62: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/62.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by a barrier
Usually all tasks are involved.
Each task performs its work until it reaches thebarrier. It then stops, or ”blocks”.
When the last task reaches the barrier, all tasks aresynchronized.
What happens from here varies. Often, a serialsection of work must be done. In other cases, thetasks are automatically released to continue theirwork.
R. Makinen An Introduction to Parallel Computing
![Page 63: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/63.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Syncronisation by a lock/semaphore
Can involve any number of tasks.
Typically used to serialize (protect) access to globaldata or a section of code. Only one task at a timemay use (own) the lock/semaphore.
The first task to acquire the lock ”sets” it. This taskcan then safely (serially) access the protected data orcode.
Other tasks can attempt to acquire the lock but mustwait until the task that owns the lock releases it.
R. Makinen An Introduction to Parallel Computing
![Page 64: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/64.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Syncronisation by a lock/semaphore
Can involve any number of tasks.
Typically used to serialize (protect) access to globaldata or a section of code. Only one task at a timemay use (own) the lock/semaphore.
The first task to acquire the lock ”sets” it. This taskcan then safely (serially) access the protected data orcode.
Other tasks can attempt to acquire the lock but mustwait until the task that owns the lock releases it.
R. Makinen An Introduction to Parallel Computing
![Page 65: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/65.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Syncronisation by a lock/semaphore
Can involve any number of tasks.
Typically used to serialize (protect) access to globaldata or a section of code. Only one task at a timemay use (own) the lock/semaphore.
The first task to acquire the lock ”sets” it. This taskcan then safely (serially) access the protected data orcode.
Other tasks can attempt to acquire the lock but mustwait until the task that owns the lock releases it.
R. Makinen An Introduction to Parallel Computing
![Page 66: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/66.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Syncronisation by a lock/semaphore
Can involve any number of tasks.
Typically used to serialize (protect) access to globaldata or a section of code. Only one task at a timemay use (own) the lock/semaphore.
The first task to acquire the lock ”sets” it. This taskcan then safely (serially) access the protected data orcode.
Other tasks can attempt to acquire the lock but mustwait until the task that owns the lock releases it.
R. Makinen An Introduction to Parallel Computing
![Page 67: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/67.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by communication operations
Involves only those tasks executing a communicationoperation
When a task performs a communication operation,some form of coordination is required with the othertask(s) participating in the communication.
Before a task can perform a send operation, it mustfirst receive an acknowledgment from the receivingtask that it is OK to send.
R. Makinen An Introduction to Parallel Computing
![Page 68: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/68.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by communication operations
Involves only those tasks executing a communicationoperation
When a task performs a communication operation,some form of coordination is required with the othertask(s) participating in the communication.
Before a task can perform a send operation, it mustfirst receive an acknowledgment from the receivingtask that it is OK to send.
R. Makinen An Introduction to Parallel Computing
![Page 69: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/69.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Synchronization by communication operations
Involves only those tasks executing a communicationoperation
When a task performs a communication operation,some form of coordination is required with the othertask(s) participating in the communication.
Before a task can perform a send operation, it mustfirst receive an acknowledgment from the receivingtask that it is OK to send.
R. Makinen An Introduction to Parallel Computing
![Page 70: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/70.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Data dependency – definition
A dependence exists between program statementswhen the order of statement execution affects theresults of the program.
A data dependence results from multiple use of thesame location(s) in storage by different tasks.
Dependencies are important to parallel programmingbecause they are one of the primary inhibitors toparallelism.
R. Makinen An Introduction to Parallel Computing
![Page 71: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/71.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Data dependency – definition
A dependence exists between program statementswhen the order of statement execution affects theresults of the program.
A data dependence results from multiple use of thesame location(s) in storage by different tasks.
Dependencies are important to parallel programmingbecause they are one of the primary inhibitors toparallelism.
R. Makinen An Introduction to Parallel Computing
![Page 72: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/72.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Data dependency – definition
A dependence exists between program statementswhen the order of statement execution affects theresults of the program.
A data dependence results from multiple use of thesame location(s) in storage by different tasks.
Dependencies are important to parallel programmingbecause they are one of the primary inhibitors toparallelism.
R. Makinen An Introduction to Parallel Computing
![Page 73: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/73.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Example: Loop carried data dependence
DO j = my start , my enda ( j ) = a ( j −1) ∗ 2.0
END DO
The value of a(j-1) must be computed before the valueof a(j), therefore a(j) exhibits a data dependency ona(j-1). Parallelism is inhibited.If Task 2 has a(j) and task 1 has a(j-1), computingthe correct value of a(j) necessitates:
Distributed memory architecture: task 2 must obtainthe value of a(j-1) from task 1 after task 1 finishesits computation.Shared memory architecture: task 2 must reada(j-1) after task 1 updates it.
R. Makinen An Introduction to Parallel Computing
![Page 74: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/74.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
How to Handle Data Dependencies?
Distributed memory architectures: communicaterequired data at synchronization points.
Shared memory architectures: synchronize memoryaccess (read/write) operations between tasks.
R. Makinen An Introduction to Parallel Computing
![Page 75: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/75.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Load balancing
Load balancing refers to the practice of distributingwork among tasks so that all tasks are kept busy allof the time.
Load balancing can be considered a minimization oftask idle time.
Load balancing is important to parallel programs forperformance reasons.
Example: if all tasks are subject to a barriersynchronization point, the slowest task willdetermine the overall performance.
R. Makinen An Introduction to Parallel Computing
![Page 76: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/76.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Load balancing
Load balancing refers to the practice of distributingwork among tasks so that all tasks are kept busy allof the time.
Load balancing can be considered a minimization oftask idle time.
Load balancing is important to parallel programs forperformance reasons.
Example: if all tasks are subject to a barriersynchronization point, the slowest task willdetermine the overall performance.
R. Makinen An Introduction to Parallel Computing
![Page 77: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/77.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Load balancing
Load balancing refers to the practice of distributingwork among tasks so that all tasks are kept busy allof the time.
Load balancing can be considered a minimization oftask idle time.
Load balancing is important to parallel programs forperformance reasons.
Example: if all tasks are subject to a barriersynchronization point, the slowest task willdetermine the overall performance.
R. Makinen An Introduction to Parallel Computing
![Page 78: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/78.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Load balancing
Load balancing refers to the practice of distributingwork among tasks so that all tasks are kept busy allof the time.
Load balancing can be considered a minimization oftask idle time.
Load balancing is important to parallel programs forperformance reasons.
Example: if all tasks are subject to a barriersynchronization point, the slowest task willdetermine the overall performance.
R. Makinen An Introduction to Parallel Computing
![Page 79: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/79.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Equally partition the work each task receives
For array operations where each task performssimilar work, evenly distribute the data set amongthe tasks.
For loop iterations where the work done in eachiteration is similar, evenly distribute the iterationsacross the tasks.
If a heterogeneous mix of machines with varyingperformance characteristics are being used, useperformance analysis tools to detect any loadimbalances. Adjust work accordingly.
R. Makinen An Introduction to Parallel Computing
![Page 80: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/80.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Equally partition the work each task receives
For array operations where each task performssimilar work, evenly distribute the data set amongthe tasks.
For loop iterations where the work done in eachiteration is similar, evenly distribute the iterationsacross the tasks.
If a heterogeneous mix of machines with varyingperformance characteristics are being used, useperformance analysis tools to detect any loadimbalances. Adjust work accordingly.
R. Makinen An Introduction to Parallel Computing
![Page 81: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/81.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Equally partition the work each task receives
For array operations where each task performssimilar work, evenly distribute the data set amongthe tasks.
For loop iterations where the work done in eachiteration is similar, evenly distribute the iterationsacross the tasks.
If a heterogeneous mix of machines with varyingperformance characteristics are being used, useperformance analysis tools to detect any loadimbalances. Adjust work accordingly.
R. Makinen An Introduction to Parallel Computing
![Page 82: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/82.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Use dynamic work assignment
Certain classes of problems result in load imbalanceseven if data is evenly distributed among tasks.Example: Adaptive mesh methods – some tasks mayneed to refine their mesh while others don’t.
Use a scheduler/task pool approach: As each taskfinishes its work, it queues to get a new piece ofwork.It may become necessary to design an algorithmwhich detects and handles load imbalances as theyoccur dynamically within the code.
R. Makinen An Introduction to Parallel Computing
![Page 83: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/83.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Use dynamic work assignment
Certain classes of problems result in load imbalanceseven if data is evenly distributed among tasks.Example: Adaptive mesh methods – some tasks mayneed to refine their mesh while others don’t.
Use a scheduler/task pool approach: As each taskfinishes its work, it queues to get a new piece ofwork.It may become necessary to design an algorithmwhich detects and handles load imbalances as theyoccur dynamically within the code.
R. Makinen An Introduction to Parallel Computing
![Page 84: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/84.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Use dynamic work assignment
Certain classes of problems result in load imbalanceseven if data is evenly distributed among tasks.Example: Adaptive mesh methods – some tasks mayneed to refine their mesh while others don’t.
Use a scheduler/task pool approach: As each taskfinishes its work, it queues to get a new piece ofwork.It may become necessary to design an algorithmwhich detects and handles load imbalances as theyoccur dynamically within the code.
R. Makinen An Introduction to Parallel Computing
![Page 85: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/85.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Granularity
Computation / Communication Ratio:
In parallel computing, granularity is a qualitativemeasure of the ratio of computation tocommunication.
Periods of computation are typically separated fromperiods of communication by synchronization events.
R. Makinen An Introduction to Parallel Computing
![Page 86: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/86.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Granularity
Computation / Communication Ratio:
In parallel computing, granularity is a qualitativemeasure of the ratio of computation tocommunication.
Periods of computation are typically separated fromperiods of communication by synchronization events.
R. Makinen An Introduction to Parallel Computing
![Page 87: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/87.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine-grain Parallelism
Relatively small amounts of computation are donebetween communication events.
Low computation to communication ratio.
Makes load balancing easier.
Implies high communication overhead and lessopportunity for performance enhancement.
To fine granularity =⇒ overhead required forcommunications and synchronization between taskstakes longer than the computation :(
R. Makinen An Introduction to Parallel Computing
![Page 88: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/88.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine-grain Parallelism
Relatively small amounts of computation are donebetween communication events.
Low computation to communication ratio.
Makes load balancing easier.
Implies high communication overhead and lessopportunity for performance enhancement.
To fine granularity =⇒ overhead required forcommunications and synchronization between taskstakes longer than the computation :(
R. Makinen An Introduction to Parallel Computing
![Page 89: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/89.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine-grain Parallelism
Relatively small amounts of computation are donebetween communication events.
Low computation to communication ratio.
Makes load balancing easier.
Implies high communication overhead and lessopportunity for performance enhancement.
To fine granularity =⇒ overhead required forcommunications and synchronization between taskstakes longer than the computation :(
R. Makinen An Introduction to Parallel Computing
![Page 90: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/90.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine-grain Parallelism
Relatively small amounts of computation are donebetween communication events.
Low computation to communication ratio.
Makes load balancing easier.
Implies high communication overhead and lessopportunity for performance enhancement.
To fine granularity =⇒ overhead required forcommunications and synchronization between taskstakes longer than the computation :(
R. Makinen An Introduction to Parallel Computing
![Page 91: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/91.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine-grain Parallelism
Relatively small amounts of computation are donebetween communication events.
Low computation to communication ratio.
Makes load balancing easier.
Implies high communication overhead and lessopportunity for performance enhancement.
To fine granularity =⇒ overhead required forcommunications and synchronization between taskstakes longer than the computation :(
R. Makinen An Introduction to Parallel Computing
![Page 92: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/92.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Coarse-grain Parallelism
Relatively large amounts of computational work aredone between communication/synchronizationevents.
High computation to communication ratio.
Implies more opportunity for performance increase.
Harder to load balance efficiently.
R. Makinen An Introduction to Parallel Computing
![Page 93: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/93.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Coarse-grain Parallelism
Relatively large amounts of computational work aredone between communication/synchronizationevents.
High computation to communication ratio.
Implies more opportunity for performance increase.
Harder to load balance efficiently.
R. Makinen An Introduction to Parallel Computing
![Page 94: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/94.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Coarse-grain Parallelism
Relatively large amounts of computational work aredone between communication/synchronizationevents.
High computation to communication ratio.
Implies more opportunity for performance increase.
Harder to load balance efficiently.
R. Makinen An Introduction to Parallel Computing
![Page 95: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/95.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Coarse-grain Parallelism
Relatively large amounts of computational work aredone between communication/synchronizationevents.
High computation to communication ratio.
Implies more opportunity for performance increase.
Harder to load balance efficiently.
R. Makinen An Introduction to Parallel Computing
![Page 96: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/96.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine vs. coarse-grain parallelism – which is best?
Depends on the algorithm and the hardwareenvironment!
Usually the overhead associated withcommunications and synchronization is high=⇒ advantageous to have coarse granularity.
Fine-grain parallelism can help to reduce overheadsdue to load imbalance.
R. Makinen An Introduction to Parallel Computing
![Page 97: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/97.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine vs. coarse-grain parallelism – which is best?
Depends on the algorithm and the hardwareenvironment!
Usually the overhead associated withcommunications and synchronization is high=⇒ advantageous to have coarse granularity.
Fine-grain parallelism can help to reduce overheadsdue to load imbalance.
R. Makinen An Introduction to Parallel Computing
![Page 98: An Introduction to Parallel Computingusers.jyu.fi/~tro/TIEA342/material_for_tuesdays/intro_2.pdf · Load balancing and granularity Partition One of the first steps in designing a](https://reader033.vdocuments.mx/reader033/viewer/2022041800/5e507ac02b531e460e4da6fb/html5/thumbnails/98.jpg)
Designing Parallel Programs
GeneralPartitioningCommunications and syncronizationData DependenciesLoad balancing and granularity
Fine vs. coarse-grain parallelism – which is best?
Depends on the algorithm and the hardwareenvironment!
Usually the overhead associated withcommunications and synchronization is high=⇒ advantageous to have coarse granularity.
Fine-grain parallelism can help to reduce overheadsdue to load imbalance.
R. Makinen An Introduction to Parallel Computing