exploring parallel computing
DESCRIPTION
Presentation in Numerical Methods in Quantum Physics. Covering essential questions and comparing two major branches of parallization like OpenMP and MPI. With two working examples like simple matrix matrix multiplication and the approximation of pi.TRANSCRIPT
Exploring ParallelComputing
Fabian Frie
Numerical Methods in Quantum Physics
February 6th 2014
,Page 2 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Syllabus
1 IntroductionWhat is Parallel Computing?Scalability
2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison
3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π
4 Conclusion
,Page 3 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
1 IntroductionWhat is Parallel Computing?Scalability
2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison
3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π
4 Conclusion
,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
What is Parallel Computing?Introduction
É Parallelization is another optimization technique toreduce execution time
É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved
by distributing domains over different processors.É fine-grain parallelism: parallelization achieved by
distributing iterations equally over differentprocessors.
,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
What is Parallel Computing?Introduction
É Parallelization is another optimization technique toreduce execution time
É Thread: series of instructions for a processor unit
É Coarse-grain parallelism: parallelization achievedby distributing domains over different processors.
É fine-grain parallelism: parallelization achieved bydistributing iterations equally over differentprocessors.
,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
What is Parallel Computing?Introduction
É Parallelization is another optimization technique toreduce execution time
É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved
by distributing domains over different processors.
É fine-grain parallelism: parallelization achieved bydistributing iterations equally over differentprocessors.
,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
What is Parallel Computing?Introduction
É Parallelization is another optimization technique toreduce execution time
É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved
by distributing domains over different processors.É fine-grain parallelism: parallelization achieved by
distributing iterations equally over differentprocessors.
,Page 5 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Scalability IAmdahl’s Law
É Define the speed–up with respect to the number ofthreads n by
S(n) =∆t(1)
∆t(n)
É unless the application is embarrassingly parallel,S(n) will deviate from the ideal curve
É Assume the program has a parallel fraction f thanwith n processors the execution time will changeaccording to
∆t(n) =f
n∆t(1) + (1− f )∆t(1)
,Page 6 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Scalability IIAmdahl’s Law
É Amdahl’s Law states: If the fraction f of a programcan be made parallel than the maximum speedupthat can be achieved by using n threads is
S(n) =1
(1− f ) + f /n
,Page 7 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Scalability IIIAmdahl’s Law
,Page 8 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
1 IntroductionWhat is Parallel Computing?Scalability
2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison
3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π
4 Conclusion
,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Memory ArchitecturesShared↔Distributed
É Shared Memory Architectures
É Symmetric Multi Processor (SMP):A shared address space with equal access cost foreach processor.
É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.
É Distributed Memory Architectures
É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.
,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Memory ArchitecturesShared↔Distributed
É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):
A shared address space with equal access cost foreach processor.
É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.
É Distributed Memory Architectures
É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.
,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Memory ArchitecturesShared↔Distributed
É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):
A shared address space with equal access cost foreach processor.
É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.
É Distributed Memory Architectures
É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.
,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Memory ArchitecturesShared↔Distributed
É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):
A shared address space with equal access cost foreach processor.
É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.
É Distributed Memory Architectures
É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.
,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Memory ArchitecturesShared↔Distributed
É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):
A shared address space with equal access cost foreach processor.
É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.
É Distributed Memory ArchitecturesÉ Clusters: Each processor acts on its own private
memory space. For remote data, communication isrequired.
,Page 10 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Shared Memory ArchitectureIntel Core i7 980X Extreme Edition
,Page 11 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring MPI IWhat is MPI?
É MPI ≡ »Message Passing Interface«É MPI is extensive parallel programming API for
distributed memory (clusters, grids)É First introduced in 1994É MPI supports C, C++, and FortranÉ All data is private to processing unitÉ Data communication must be programmed
explicitly
,Page 12 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring MPI IIWhat is MPI?
ProsÉ Flexibility: Can use any
cluster of any sizeÉ Widely availableÉ Widely used : popular
in High performancecomputing
ConsÉ Redesign of applicationÉ More resources
required: Typically morememory
É Error–prone & hard todebug: Due to manylayers
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)
É OpenMP is build for shared memory architecturessuch as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or privateÉ All threads have access to the same, shared,
memoryÉ Use of mostly implicit synchronization
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures
such as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or privateÉ All threads have access to the same, shared,
memoryÉ Use of mostly implicit synchronization
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures
such as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or privateÉ All threads have access to the same, shared,
memoryÉ Use of mostly implicit synchronization
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures
such as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or private
É All threads have access to the same, shared,memory
É Use of mostly implicit synchronization
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures
such as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or privateÉ All threads have access to the same, shared,
memory
É Use of mostly implicit synchronization
,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Exploring OpenMPParallel Programming Models
É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures
such as Symmetric Multi Processing (SMP)machines
É Supports both coarse-grained and fine-grainedparallelism
É Data can be shared or privateÉ All threads have access to the same, shared,
memoryÉ Use of mostly implicit synchronization
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPI
É popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preservedÉ requires only one
libraryÉ easier modelÉ requires runtime
enviroment
OpenMP
É popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicitÉ no runtime enviroment
required
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely used
É ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preservedÉ requires only one
libraryÉ easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely used
É limited to one system(SMP), not grid ready
É easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicitÉ no runtime enviroment
required
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for grids
É high steep learningcurve
É No data scoping(shared, private,...)
É sequential code is notpreserved
É requires only onelibrary
É easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid ready
É easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicitÉ no runtime enviroment
required
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curve
É No data scoping(shared, private,...)
É sequential code is notpreserved
É requires only onelibrary
É easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learn
É data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicitÉ no runtime enviroment
required
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)
É sequential code is notpreserved
É requires only onelibrary
É easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping required
É preserve sequentialcode
É requires compilersupport
É performance issuesimplicit
É no runtime enviromentrequired
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preserved
É requires only onelibrary
É easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential
code
É requires compilersupport
É performance issuesimplicit
É no runtime enviromentrequired
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preservedÉ requires only one
library
É easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
support
É performance issuesimplicit
É no runtime enviromentrequired
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preservedÉ requires only one
libraryÉ easier model
É requires runtimeenviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicit
É no runtime enviromentrequired
,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning
curveÉ No data scoping
(shared, private,...)É sequential code is not
preservedÉ requires only one
libraryÉ easier modelÉ requires runtime
enviroment
OpenMPÉ popular, widely usedÉ limited to one system
(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential
codeÉ requires compiler
supportÉ performance issues
implicitÉ no runtime enviroment
required
,Page 15 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
1 IntroductionWhat is Parallel Computing?Scalability
2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison
3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π
4 Conclusion
,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Simple Tasks with OpenMPExamples
Matrix MatrixMultiplication
C = AB (1)
Ci,j =∑
k
Ai,k · Bk,j (2)
Approximation of π
1∫
0
dx4
1 + x2= (3)
[arctan(x)]10 = π (4)
N∑
i=0
4
1 + x2i
∆x ≈ π (5)
⇒ How efficiently can these problems beparallelized?
,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Simple Tasks with OpenMPExamples
Matrix MatrixMultiplication
C = AB (1)
Ci,j =∑
k
Ai,k · Bk,j (2)
Approximation of π
1∫
0
dx4
1 + x2= (3)
[arctan(x)]10 = π (4)
N∑
i=0
4
1 + x2i
∆x ≈ π (5)
⇒ How efficiently can these problems beparallelized?
,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Simple Tasks with OpenMPExamples
Matrix MatrixMultiplication
C = AB (1)
Ci,j =∑
k
Ai,k · Bk,j (2)
Approximation of π
1∫
0
dx4
1 + x2= (3)
[arctan(x)]10 = π (4)
N∑
i=0
4
1 + x2i
∆x ≈ π (5)
⇒ How efficiently can these problems beparallelized?
,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Simple Tasks with OpenMPExamples
Matrix MatrixMultiplication
C = AB (1)
Ci,j =∑
k
Ai,k · Bk,j (2)
Approximation of π
1∫
0
dx4
1 + x2= (3)
[arctan(x)]10 = π (4)
N∑
i=0
4
1 + x2i
∆x ≈ π (5)
⇒ How efficiently can these problems beparallelized?
,Page 17 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Matrix Matrix MultiplicationExamples
,Page 18 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Approximation of πExamples
,Page 19 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Approximation of π ISource Code
1 program integ_pi2 use omp_lib3 implicit none4
5 integer(kind=8) :: ii, num_steps, jj6 integer :: tid, nthreads7 real(kind=8) :: step, xx, pi, summ, start_time, run_time8
9 num_steps = 10000000010 step = 1d0/dble(num_steps)11
12 do jj = 1,8 ! Number of requested threads13 pi = 0d014 call omp_set_num_threads(jj)15 start_time = omp_get_wtime()16 nthreads = omp_get_num_threads()17
18 !$omp single
,Page 20 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Approximation of π IISource Code
19 write(*,*) "Number of threads: ", nthreads20 !$omp end single21
22 !$omp parallel do reduction(+:pi) private(ii,xx)23 do ii = 0,num_steps24 xx = (dble(ii)+0.5d0) * step25 pi = pi + 4d0 / (1d0 + xx*xx)26 enddo27 !$omp end parallel do28
29 run_time = omp_get_wtime()-start_time30 pi = pi * step31 write(*,*) "pi approx ", pi32 write(*,*) "wtime: ", run_time33 enddo34 end program integ_pi
,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Wrap UpProspects
É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons
É Nested parallelism: Devide and conquere principleÉ Problems with Data Races and Deathlocks
,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Wrap UpProspects
É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons
É Nested parallelism: Devide and conquere principle
É Problems with Data Races and Deathlocks
,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Wrap UpProspects
É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons
É Nested parallelism: Devide and conquere principleÉ Problems with Data Races and Deathlocks
,Page 22 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Thank you for your attention!
Enjoy your meal!
,Page 22 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Thank you for your attention!
Enjoy your meal!
,Page 23 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Ruud van der Pas Barabara Chapman Gabriele Jost.Using OpenMP: Portable Shared Memory ParallelProgramming. MIT Press, Cambridge.
Miguel Hermanns. „Parallel Programming inFortran 95 using OpenMP“. In: School ofAeronautical Engineering, 2002.
Timothy G. Mattson. „A Hands-on Introduction toOpenMP“. In: OpenMP Architecture Review Board,2008.
Ruud van der Pas. „Basic Concepts inParallelization“. In: IWOMP 2010 CCS, Univeristy ofTsukuba, 2010.
W. H. Press u. a. Numerical Recipes: The Art ofScientific Computing. 3. Aufl. Cambridge,University Press, 2007.
,Page 24 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Matrix Matrix Multiplication ISource Code
1 program matmult2 use omp_lib3 implicit none4
5 integer nra, nca, ncb, tid, nthreads, ii, jj, kk, chunk,nn6 parameter (nra=900)7 parameter (nca=900)8 parameter (ncb=100)9 real*8 a(nra,nca), b(nca,ncb), c(nra,ncb), time
10
11 chunk = 1012 do nn = 1,813 call omp_set_num_threads(nn)14 !$omp parallel shared(a,b,c,nthreads,chunk) private(tid,ii,jj,kk)15 tid = omp_get_thread_num()16
17 ! !$omp single18 ! write(*,*) "threads: ", omp_get_num_threads()
,Page 25 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Matrix Matrix Multiplication IISource Code
19 ! !$omp end single20
21 !$omp do schedule(static,chunk)22 do ii = 1, nra23 do jj = 1, nca24 a(ii,jj) = (ii-1)+(jj-1)25 enddo26 enddo27 !$omp end do28
29 !$omp do schedule(static,chunk)30 do ii = 1, nca31 do jj = 1, ncb32 b(ii,jj) = (ii-1)*(jj-1)33 enddo34 enddo35 !$omp end do36
,Page 26 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Matrix Matrix Multiplication IIISource Code
37 !$omp do schedule(static,chunk)38 do ii = 1, nra39 do jj = 1, ncb40 c(ii,jj) = 0d041 enddo42 enddo43 !$omp end do44
45 time = omp_get_wtime()46 !$omp do schedule(static,chunk)47 do ii = 1,nra48 do jj = 1,ncb49 do kk =1,nca50 c(ii,jj) = c(ii,jj) + a(ii,kk) * b(kk,jj)51 enddo52 enddo53 enddo54 !$omp end do
,Page 27 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014
Matrix Matrix Multiplication IVSource Code
55
56 !$omp end parallel57 write(*,*) omp_get_wtime() - time58 enddo59 endprogram