exploring parallel computing

53
Exploring Parallel Computing Fabian Frie Numerical Methods in Quantum Physics February 6th 2014

Upload: fabianfrie

Post on 21-Jul-2016

240 views

Category:

Documents


1 download

DESCRIPTION

Presentation in Numerical Methods in Quantum Physics. Covering essential questions and comparing two major branches of parallization like OpenMP and MPI. With two working examples like simple matrix matrix multiplication and the approximation of pi.

TRANSCRIPT

Page 1: Exploring Parallel Computing

Exploring ParallelComputing

Fabian Frie

Numerical Methods in Quantum Physics

February 6th 2014

Page 2: Exploring Parallel Computing

,Page 2 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Syllabus

1 IntroductionWhat is Parallel Computing?Scalability

2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison

3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π

4 Conclusion

Page 3: Exploring Parallel Computing

,Page 3 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

1 IntroductionWhat is Parallel Computing?Scalability

2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison

3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π

4 Conclusion

Page 4: Exploring Parallel Computing

,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

What is Parallel Computing?Introduction

É Parallelization is another optimization technique toreduce execution time

É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved

by distributing domains over different processors.É fine-grain parallelism: parallelization achieved by

distributing iterations equally over differentprocessors.

Page 5: Exploring Parallel Computing

,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

What is Parallel Computing?Introduction

É Parallelization is another optimization technique toreduce execution time

É Thread: series of instructions for a processor unit

É Coarse-grain parallelism: parallelization achievedby distributing domains over different processors.

É fine-grain parallelism: parallelization achieved bydistributing iterations equally over differentprocessors.

Page 6: Exploring Parallel Computing

,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

What is Parallel Computing?Introduction

É Parallelization is another optimization technique toreduce execution time

É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved

by distributing domains over different processors.

É fine-grain parallelism: parallelization achieved bydistributing iterations equally over differentprocessors.

Page 7: Exploring Parallel Computing

,Page 4 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

What is Parallel Computing?Introduction

É Parallelization is another optimization technique toreduce execution time

É Thread: series of instructions for a processor unitÉ Coarse-grain parallelism: parallelization achieved

by distributing domains over different processors.É fine-grain parallelism: parallelization achieved by

distributing iterations equally over differentprocessors.

Page 8: Exploring Parallel Computing

,Page 5 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Scalability IAmdahl’s Law

É Define the speed–up with respect to the number ofthreads n by

S(n) =∆t(1)

∆t(n)

É unless the application is embarrassingly parallel,S(n) will deviate from the ideal curve

É Assume the program has a parallel fraction f thanwith n processors the execution time will changeaccording to

∆t(n) =f

n∆t(1) + (1− f )∆t(1)

Page 9: Exploring Parallel Computing

,Page 6 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Scalability IIAmdahl’s Law

É Amdahl’s Law states: If the fraction f of a programcan be made parallel than the maximum speedupthat can be achieved by using n threads is

S(n) =1

(1− f ) + f /n

Page 10: Exploring Parallel Computing

,Page 7 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Scalability IIIAmdahl’s Law

Page 11: Exploring Parallel Computing

,Page 8 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

1 IntroductionWhat is Parallel Computing?Scalability

2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison

3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π

4 Conclusion

Page 12: Exploring Parallel Computing

,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Memory ArchitecturesShared↔Distributed

É Shared Memory Architectures

É Symmetric Multi Processor (SMP):A shared address space with equal access cost foreach processor.

É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.

É Distributed Memory Architectures

É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.

Page 13: Exploring Parallel Computing

,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Memory ArchitecturesShared↔Distributed

É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):

A shared address space with equal access cost foreach processor.

É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.

É Distributed Memory Architectures

É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.

Page 14: Exploring Parallel Computing

,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Memory ArchitecturesShared↔Distributed

É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):

A shared address space with equal access cost foreach processor.

É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.

É Distributed Memory Architectures

É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.

Page 15: Exploring Parallel Computing

,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Memory ArchitecturesShared↔Distributed

É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):

A shared address space with equal access cost foreach processor.

É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.

É Distributed Memory Architectures

É Clusters: Each processor acts on its own privatememory space. For remote data, communication isrequired.

Page 16: Exploring Parallel Computing

,Page 9 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Memory ArchitecturesShared↔Distributed

É Shared Memory ArchitecturesÉ Symmetric Multi Processor (SMP):

A shared address space with equal access cost foreach processor.

É Non Uniform Memory Access (NUMA):Different memory regions have different accesscosts.

É Distributed Memory ArchitecturesÉ Clusters: Each processor acts on its own private

memory space. For remote data, communication isrequired.

Page 17: Exploring Parallel Computing

,Page 10 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Shared Memory ArchitectureIntel Core i7 980X Extreme Edition

Page 18: Exploring Parallel Computing

,Page 11 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring MPI IWhat is MPI?

É MPI ≡ »Message Passing Interface«É MPI is extensive parallel programming API for

distributed memory (clusters, grids)É First introduced in 1994É MPI supports C, C++, and FortranÉ All data is private to processing unitÉ Data communication must be programmed

explicitly

Page 19: Exploring Parallel Computing

,Page 12 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring MPI IIWhat is MPI?

ProsÉ Flexibility: Can use any

cluster of any sizeÉ Widely availableÉ Widely used : popular

in High performancecomputing

ConsÉ Redesign of applicationÉ More resources

required: Typically morememory

É Error–prone & hard todebug: Due to manylayers

Page 20: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)

É OpenMP is build for shared memory architecturessuch as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or privateÉ All threads have access to the same, shared,

memoryÉ Use of mostly implicit synchronization

Page 21: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures

such as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or privateÉ All threads have access to the same, shared,

memoryÉ Use of mostly implicit synchronization

Page 22: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures

such as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or privateÉ All threads have access to the same, shared,

memoryÉ Use of mostly implicit synchronization

Page 23: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures

such as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or private

É All threads have access to the same, shared,memory

É Use of mostly implicit synchronization

Page 24: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures

such as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or privateÉ All threads have access to the same, shared,

memory

É Use of mostly implicit synchronization

Page 25: Exploring Parallel Computing

,Page 13 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Exploring OpenMPParallel Programming Models

É OpenMP ≡ »Open Multi Processing« (API)É OpenMP is build for shared memory architectures

such as Symmetric Multi Processing (SMP)machines

É Supports both coarse-grained and fine-grainedparallelism

É Data can be shared or privateÉ All threads have access to the same, shared,

memoryÉ Use of mostly implicit synchronization

Page 26: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPI

É popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preservedÉ requires only one

libraryÉ easier modelÉ requires runtime

enviroment

OpenMP

É popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicitÉ no runtime enviroment

required

Page 27: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely used

É ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preservedÉ requires only one

libraryÉ easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely used

É limited to one system(SMP), not grid ready

É easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicitÉ no runtime enviroment

required

Page 28: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for grids

É high steep learningcurve

É No data scoping(shared, private,...)

É sequential code is notpreserved

É requires only onelibrary

É easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid ready

É easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicitÉ no runtime enviroment

required

Page 29: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curve

É No data scoping(shared, private,...)

É sequential code is notpreserved

É requires only onelibrary

É easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learn

É data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicitÉ no runtime enviroment

required

Page 30: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)

É sequential code is notpreserved

É requires only onelibrary

É easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping required

É preserve sequentialcode

É requires compilersupport

É performance issuesimplicit

É no runtime enviromentrequired

Page 31: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preserved

É requires only onelibrary

É easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential

code

É requires compilersupport

É performance issuesimplicit

É no runtime enviromentrequired

Page 32: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preservedÉ requires only one

library

É easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

support

É performance issuesimplicit

É no runtime enviromentrequired

Page 33: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preservedÉ requires only one

libraryÉ easier model

É requires runtimeenviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicit

É no runtime enviromentrequired

Page 34: Exploring Parallel Computing

,Page 14 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

ComparisonParallel Programming ModelsMPIÉ popular, widely usedÉ ready for gridsÉ high steep learning

curveÉ No data scoping

(shared, private,...)É sequential code is not

preservedÉ requires only one

libraryÉ easier modelÉ requires runtime

enviroment

OpenMPÉ popular, widely usedÉ limited to one system

(SMP), not grid readyÉ easy to learnÉ data scoping requiredÉ preserve sequential

codeÉ requires compiler

supportÉ performance issues

implicitÉ no runtime enviroment

required

Page 35: Exploring Parallel Computing

,Page 15 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

1 IntroductionWhat is Parallel Computing?Scalability

2 Parallel Programming ModelsMemory ModelsExploring MPIExploring OpenMPComparison

3 Examples with OpenMPMatrix Matrix MultiplicationApproximation of π

4 Conclusion

Page 36: Exploring Parallel Computing

,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Simple Tasks with OpenMPExamples

Matrix MatrixMultiplication

C = AB (1)

Ci,j =∑

k

Ai,k · Bk,j (2)

Approximation of π

1∫

0

dx4

1 + x2= (3)

[arctan(x)]10 = π (4)

N∑

i=0

4

1 + x2i

∆x ≈ π (5)

⇒ How efficiently can these problems beparallelized?

Page 37: Exploring Parallel Computing

,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Simple Tasks with OpenMPExamples

Matrix MatrixMultiplication

C = AB (1)

Ci,j =∑

k

Ai,k · Bk,j (2)

Approximation of π

1∫

0

dx4

1 + x2= (3)

[arctan(x)]10 = π (4)

N∑

i=0

4

1 + x2i

∆x ≈ π (5)

⇒ How efficiently can these problems beparallelized?

Page 38: Exploring Parallel Computing

,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Simple Tasks with OpenMPExamples

Matrix MatrixMultiplication

C = AB (1)

Ci,j =∑

k

Ai,k · Bk,j (2)

Approximation of π

1∫

0

dx4

1 + x2= (3)

[arctan(x)]10 = π (4)

N∑

i=0

4

1 + x2i

∆x ≈ π (5)

⇒ How efficiently can these problems beparallelized?

Page 39: Exploring Parallel Computing

,Page 16 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Simple Tasks with OpenMPExamples

Matrix MatrixMultiplication

C = AB (1)

Ci,j =∑

k

Ai,k · Bk,j (2)

Approximation of π

1∫

0

dx4

1 + x2= (3)

[arctan(x)]10 = π (4)

N∑

i=0

4

1 + x2i

∆x ≈ π (5)

⇒ How efficiently can these problems beparallelized?

Page 40: Exploring Parallel Computing

,Page 17 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Matrix Matrix MultiplicationExamples

Page 41: Exploring Parallel Computing

,Page 18 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Approximation of πExamples

Page 42: Exploring Parallel Computing

,Page 19 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Approximation of π ISource Code

1 program integ_pi2 use omp_lib3 implicit none4

5 integer(kind=8) :: ii, num_steps, jj6 integer :: tid, nthreads7 real(kind=8) :: step, xx, pi, summ, start_time, run_time8

9 num_steps = 10000000010 step = 1d0/dble(num_steps)11

12 do jj = 1,8 ! Number of requested threads13 pi = 0d014 call omp_set_num_threads(jj)15 start_time = omp_get_wtime()16 nthreads = omp_get_num_threads()17

18 !$omp single

Page 43: Exploring Parallel Computing

,Page 20 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Approximation of π IISource Code

19 write(*,*) "Number of threads: ", nthreads20 !$omp end single21

22 !$omp parallel do reduction(+:pi) private(ii,xx)23 do ii = 0,num_steps24 xx = (dble(ii)+0.5d0) * step25 pi = pi + 4d0 / (1d0 + xx*xx)26 enddo27 !$omp end parallel do28

29 run_time = omp_get_wtime()-start_time30 pi = pi * step31 write(*,*) "pi approx ", pi32 write(*,*) "wtime: ", run_time33 enddo34 end program integ_pi

Page 44: Exploring Parallel Computing

,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Wrap UpProspects

É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons

É Nested parallelism: Devide and conquere principleÉ Problems with Data Races and Deathlocks

Page 45: Exploring Parallel Computing

,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Wrap UpProspects

É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons

É Nested parallelism: Devide and conquere principle

É Problems with Data Races and Deathlocks

Page 46: Exploring Parallel Computing

,Page 21 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Wrap UpProspects

É Hybrid parallelism: Combine MPI and OpenMP forseveral reasons

É Nested parallelism: Devide and conquere principleÉ Problems with Data Races and Deathlocks

Page 47: Exploring Parallel Computing

,Page 22 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Thank you for your attention!

Enjoy your meal!

Page 48: Exploring Parallel Computing

,Page 22 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Thank you for your attention!

Enjoy your meal!

Page 49: Exploring Parallel Computing

,Page 23 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Ruud van der Pas Barabara Chapman Gabriele Jost.Using OpenMP: Portable Shared Memory ParallelProgramming. MIT Press, Cambridge.

Miguel Hermanns. „Parallel Programming inFortran 95 using OpenMP“. In: School ofAeronautical Engineering, 2002.

Timothy G. Mattson. „A Hands-on Introduction toOpenMP“. In: OpenMP Architecture Review Board,2008.

Ruud van der Pas. „Basic Concepts inParallelization“. In: IWOMP 2010 CCS, Univeristy ofTsukuba, 2010.

W. H. Press u. a. Numerical Recipes: The Art ofScientific Computing. 3. Aufl. Cambridge,University Press, 2007.

Page 50: Exploring Parallel Computing

,Page 24 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Matrix Matrix Multiplication ISource Code

1 program matmult2 use omp_lib3 implicit none4

5 integer nra, nca, ncb, tid, nthreads, ii, jj, kk, chunk,nn6 parameter (nra=900)7 parameter (nca=900)8 parameter (ncb=100)9 real*8 a(nra,nca), b(nca,ncb), c(nra,ncb), time

10

11 chunk = 1012 do nn = 1,813 call omp_set_num_threads(nn)14 !$omp parallel shared(a,b,c,nthreads,chunk) private(tid,ii,jj,kk)15 tid = omp_get_thread_num()16

17 ! !$omp single18 ! write(*,*) "threads: ", omp_get_num_threads()

Page 51: Exploring Parallel Computing

,Page 25 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Matrix Matrix Multiplication IISource Code

19 ! !$omp end single20

21 !$omp do schedule(static,chunk)22 do ii = 1, nra23 do jj = 1, nca24 a(ii,jj) = (ii-1)+(jj-1)25 enddo26 enddo27 !$omp end do28

29 !$omp do schedule(static,chunk)30 do ii = 1, nca31 do jj = 1, ncb32 b(ii,jj) = (ii-1)*(jj-1)33 enddo34 enddo35 !$omp end do36

Page 52: Exploring Parallel Computing

,Page 26 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Matrix Matrix Multiplication IIISource Code

37 !$omp do schedule(static,chunk)38 do ii = 1, nra39 do jj = 1, ncb40 c(ii,jj) = 0d041 enddo42 enddo43 !$omp end do44

45 time = omp_get_wtime()46 !$omp do schedule(static,chunk)47 do ii = 1,nra48 do jj = 1,ncb49 do kk =1,nca50 c(ii,jj) = c(ii,jj) + a(ii,kk) * b(kk,jj)51 enddo52 enddo53 enddo54 !$omp end do

Page 53: Exploring Parallel Computing

,Page 27 | F. Frie, Haupseminar NMQP | Exploring Parallel Computing | February 2014

Matrix Matrix Multiplication IVSource Code

55

56 !$omp end parallel57 write(*,*) omp_get_wtime() - time58 enddo59 endprogram