the application of posix threads and openmp to the u.s. nrc neutron kinetics code parcs

34
The Application of POSIX The Application of POSIX Threads And OpenMP to the Threads And OpenMP to the U.S. NRC Neutron Kinetics U.S. NRC Neutron Kinetics Code PARCS Code PARCS D.J. Lee and T.J. Downar D.J. Lee and T.J. Downar School of Nuclear Engineering School of Nuclear Engineering Purdue University Purdue University July, 2001 July, 2001

Upload: adamma

Post on 14-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS. D.J. Lee and T.J. Downar School of Nuclear Engineering Purdue University. July, 2001. Contents. Introduction Parallelism in PARCS Parallel Performance of PARCS Cache Analysis Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

The Application of POSIX The Application of POSIX Threads And OpenMP to the Threads And OpenMP to the U.S. NRC Neutron Kinetics U.S. NRC Neutron Kinetics

Code PARCSCode PARCSD.J. Lee and T.J. DownarD.J. Lee and T.J. Downar

School of Nuclear Engineering School of Nuclear Engineering Purdue UniversityPurdue University

July, 2001July, 2001

Page 2: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

2

ContentsContents

• IntroductionIntroduction

• Parallelism in PARCSParallelism in PARCS

• Parallel Performance of PARCS Parallel Performance of PARCS

• Cache AnalysisCache Analysis

• ConclusionsConclusions

Page 3: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

3

IntroductionIntroduction

Page 4: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

4

PARCSPARCS

• ““PPurdue urdue AAdvanced dvanced RReactor eactor CCore ore SSimulator”imulator”

• U.S. NRC(Nuclear Regulatory Commission) U.S. NRC(Nuclear Regulatory Commission) Code for Nuclear Reactor Safety AnalysisCode for Nuclear Reactor Safety Analysis

• Developed at School of Nuclear Engineering Developed at School of Nuclear Engineering of Purdue Universityof Purdue University

• A Multi-Dimensional Multi-Group Reactor A Multi-Dimensional Multi-Group Reactor Kinetics Code Based on Nonlinear Nodal Kinetics Code Based on Nonlinear Nodal MethodMethod

Page 5: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

5

Nuclear Power PlantNuclear Power Plant

Nuclear Reactor Core

Page 6: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

6

Equations Solved in PARCS Equations Solved in PARCS

• Time-Dependent Boltzmann Transport Time-Dependent Boltzmann Transport Equation Equation

• T/H Field EquationsT/H Field Equations– Heat Conduction Equation Heat Conduction Equation

– Heat Convection EquationHeat Convection Equation

),,,(),',',()',',(''

),,,(),(),,,(),,,(1

tErStErEErdEd

tErErtErtErt

s

Page 7: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

7

Spatial CouplingSpatial Coupling

Thermal-Hydraulics:Thermal-Hydraulics:

• Computes new Computes new coolant/fuel propertiescoolant/fuel properties

• Sends moderator Sends moderator temp., vapor and temp., vapor and liquid densities, void liquid densities, void fraction, boron conc., fraction, boron conc., and average, and average, centerline, and centerline, and surface fuel temp.surface fuel temp.

• Uses neutronic power Uses neutronic power as heat source for as heat source for conductionconduction

Thermal-Hydraulics:Thermal-Hydraulics:

• Computes new Computes new coolant/fuel propertiescoolant/fuel properties

• Sends moderator Sends moderator temp., vapor and temp., vapor and liquid densities, void liquid densities, void fraction, boron conc., fraction, boron conc., and average, and average, centerline, and centerline, and surface fuel temp.surface fuel temp.

• Uses neutronic power Uses neutronic power as heat source for as heat source for conductionconduction

Neutronics:Neutronics:

• Uses coolant and Uses coolant and fuel properties for fuel properties for local node conditionslocal node conditions

• Updates Updates macroscopic cross macroscopic cross sections based on sections based on local node conditionslocal node conditions

• Computes 3-D fluxComputes 3-D flux

• Sends node-wise Sends node-wise power distributionpower distribution

Neutronics:Neutronics:

• Uses coolant and Uses coolant and fuel properties for fuel properties for local node conditionslocal node conditions

• Updates Updates macroscopic cross macroscopic cross sections based on sections based on local node conditionslocal node conditions

• Computes 3-D fluxComputes 3-D flux

• Sends node-wise Sends node-wise power distributionpower distribution

Page 8: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

8

High Necessity of HPC for PARCSHigh Necessity of HPC for PARCS• Acceleration Techniques in PARCSAcceleration Techniques in PARCS

– Nonlinear CMFD Method : Global(Low Order)+Local(High Order)Nonlinear CMFD Method : Global(Low Order)+Local(High Order)

– BILU3D Preconditioned BICGSTABBILU3D Preconditioned BICGSTAB

– Wielandt Shift MethodWielandt Shift Method

• Still, Computational Burden of PARCS is Very LargeStill, Computational Burden of PARCS is Very Large– Typically, The Calculation Speed is More Than an Order of Typically, The Calculation Speed is More Than an Order of

Magnitude Slower Than Real Time Magnitude Slower Than Real Time

– ExampleExample• NEACRP Benchmark NEACRP Benchmark

Several Tens of Seconds for 0.5 sec. SimulationSeveral Tens of Seconds for 0.5 sec. Simulation• PARCS/TRAC Coupled RUN PARCS/TRAC Coupled RUN

4 Hours for 100 sec. Simulation4 Hours for 100 sec. Simulation

Page 9: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

9

Parallelism Parallelism In PARCSIn PARCS

Page 10: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

10

PARCS Computational ModulesPARCS Computational Modules

• CMFDCMFD: Solves the “Global” Coarse Mesh : Solves the “Global” Coarse Mesh Finite Difference EquationFinite Difference Equation

• NODALNODAL: Solves “Local” Higher Order : Solves “Local” Higher Order Differenced EquationsDifferenced Equations

• XSECXSEC: Provides Temperature/Fluid Feedback : Provides Temperature/Fluid Feedback through Cross Sections (Coefficients of through Cross Sections (Coefficients of Boltzmann Equation)Boltzmann Equation)

• T/HT/H: Solution of Temperature/Fluid Field : Solution of Temperature/Fluid Field EquationsEquations

Page 11: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

11

Parallelism in PARCSParallelism in PARCS

• NODAL and XsecNODAL and Xsec Module: Module:– Node by Node CalculationNode by Node Calculation

– Naturally ParallelizableNaturally Parallelizable

• T/HT/H Module: Module:– Channel by Channel CalculationChannel by Channel Calculation

– Naturally ParallelizableNaturally Parallelizable

• CMFDCMFD Module: Module:– Domain Decomposition PreconditioningDomain Decomposition Preconditioning

– Example: Split the Reactor into Two Halves Example: Split the Reactor into Two Halves

– The Number of Iteration Depends on the Number of The Number of Iteration Depends on the Number of DomainsDomains

Page 12: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

12

Why Multi-Threaded Programming ?Why Multi-Threaded Programming ?• Coupling of DomainsCoupling of Domains

– The Information of One Plane at the Interface of Two The Information of One Plane at the Interface of Two Domains Should Be Transferred to Each OtherDomains Should Be Transferred to Each Other

– The Size of Information to be Exchanged is NOT SMALL The Size of Information to be Exchanged is NOT SMALL Compared with the Amount of Calculations for Each DomainCompared with the Amount of Calculations for Each Domain

• Message PassingMessage Passing– Large Communication Overhead Large Communication Overhead

• Multi-ThreadingMulti-Threading– Shared Address SpaceShared Address Space

– Negligible Communication Overhead Negligible Communication Overhead

Page 13: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

13

Multi-threaded ProgrammingMulti-threaded Programming

• OpenMPOpenMP– FORTRAN, C, C++FORTRAN, C, C++

– Simple Implementation based on DirectivesSimple Implementation based on Directives

• POSIX ThreadsPOSIX Threads– No Interface to FORTRANNo Interface to FORTRAN

– Developed FORTRAN-to-C WrapperDeveloped FORTRAN-to-C Wrapper

– Much Caution Required to Avoid Race ConditionsMuch Caution Required to Avoid Race Conditions

Page 14: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

14

POSIX THREADS WITH FORTRAN: POSIX THREADS WITH FORTRAN: nuc_threadsnuc_threads

• Mixed language interface accessible to both Mixed language interface accessible to both Fortran and C sections of the codeFortran and C sections of the code

• Minimal set of threads functions:Minimal set of threads functions:– nuc_init(*ncpu): nuc_init(*ncpu): initializes mutex and condition initializes mutex and condition

variables.  variables.  

– nuc_frk(*func_name,*nuc_arg,*arg):nuc_frk(*func_name,*nuc_arg,*arg): creates the creates the POSIX threads.  POSIX threads.  

– nuc_bar(*iam): nuc_bar(*iam): used for synchronization.  used for synchronization.  

– nuc_gsum(*iam,*A,*globsum):nuc_gsum(*iam,*A,*globsum): used to get a global used to get a global sum of an array updated by each thread. sum of an array updated by each thread.

Page 15: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

15

Thread 1Thread 1Thread 1Thread 1 Thread 2Thread 2Thread 2Thread 2

BeginBeginBeginBegin

EndEndEndEnd

ForkForkForkFork

JoinJoinJoinJoin

Thread 1Thread 1Thread 1Thread 1 Thread 2Thread 2Thread 2Thread 2

ForkForkForkFork

JoinJoinJoinJoin

Thread 1Thread 1Thread 1Thread 1 Thread 2Thread 2Thread 2Thread 2

BeginBeginBeginBegin

EndEndEndEnd

ForkForkForkFork

JoinJoinJoinJoin

Implementation of OpenMP and Implementation of OpenMP and PthreadsPthreads

OpenMPOpenMPPthreadsPthreads

SynchronizatioSynchronizationn

SynchronizatioSynchronizationn

SynchronizatioSynchronizationn

SynchronizatioSynchronizationn

idleidle

Page 16: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

16

Parallel Parallel Performance of Performance of

PARCS PARCS

Page 17: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

17

Applications Applications

• Matrix Vector MultiplicationMatrix Vector Multiplication– Subroutine “MatVec” of PARCSSubroutine “MatVec” of PARCS

– Size of Matrix Is Same As NEACRP BenchmarkSize of Matrix Is Same As NEACRP Benchmark

• NEACRP Reactor Transient BenchmarkNEACRP Reactor Transient Benchmark– Control Rod Ejection From Hot Zero Power Control Rod Ejection From Hot Zero Power

ConditionCondition

– Full 3-Dimensional TransientFull 3-Dimensional Transient

Page 18: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

18

Specification of MachineSpecification of Machine

PlatformPlatformPlatformPlatform SUN ULTRA-80SUN ULTRA-80SUN ULTRA-80SUN ULTRA-80 SGI ORIGIN 2000SGI ORIGIN 2000SGI ORIGIN 2000SGI ORIGIN 2000

Number of CPUsNumber of CPUsNumber of CPUsNumber of CPUs 2222 32323232

  CPU TypeCPU Type

  CPU TypeCPU Type

ULTRA SPARC IIULTRA SPARC II450 MHz450 MHzULTRA SPARC IIULTRA SPARC II450 MHz450 MHz

MIPS R10000MIPS R10000250 MHz250 MHz4-way superscalar4-way superscalar

MIPS R10000MIPS R10000250 MHz250 MHz4-way superscalar4-way superscalar

  L1 CacheL1 Cache

  L1 CacheL1 Cache

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

32 KB D-cache32 KB D-cache32 KB I-cache32 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

32 KB D-cache32 KB D-cache32 KB I-cache32 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

L2 CacheL2 CacheL2 CacheL2 Cache 4MB4MB4MB4MB4MB per CPU4MB per CPUCache Line Size : Cache Line Size : 128bytes128bytes

4MB per CPU4MB per CPUCache Line Size : Cache Line Size : 128bytes128bytes

Main MemoryMain MemoryMain MemoryMain Memory 1GB1GB1GB1GB 16GB16GB16GB16GB

CompilerCompilerCompilerCompiler SUN Workshop 6SUN Workshop 6-FORTRAN 90 6.1-FORTRAN 90 6.1SUN Workshop 6SUN Workshop 6-FORTRAN 90 6.1-FORTRAN 90 6.1

MIPSpro Compiler 7.2.1MIPSpro Compiler 7.2.1- FORTRAN 90- FORTRAN 90MIPSpro Compiler 7.2.1MIPSpro Compiler 7.2.1- FORTRAN 90- FORTRAN 90

Page 19: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

19

Specification of MachineSpecification of Machine

PlatformPlatformPlatformPlatform LINUX MachineLINUX MachineLINUX MachineLINUX Machine

Number of CPUsNumber of CPUsNumber of CPUsNumber of CPUs 4444

  CPU TypeCPU Type

  CPU TypeCPU Type

Intel Pentium-IIIIntel Pentium-III550 MHz550 MHzIntel Pentium-IIIIntel Pentium-III550 MHz550 MHz

  L1 CacheL1 Cache

  L1 CacheL1 Cache

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : ? bytesCache Line Size : ? bytes

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : ? bytesCache Line Size : ? bytes

L2 CacheL2 CacheL2 CacheL2 Cache 512KB512KB512KB512KB

Main MemoryMain MemoryMain MemoryMain Memory 1GB1GB1GB1GB

CompilerCompilerCompilerCompiler NAGWare FORTRAN 90 NAGWare FORTRAN 90 Version 4.2Version 4.2NAGWare FORTRAN 90 NAGWare FORTRAN 90 Version 4.2Version 4.2

ftp://download.intel.com/design/PentiumIII/xeon/datashts/24509402.pdfftp://download.intel.com/design/PentiumIII/xeon/datashts/24509402.pdfSlot 2 technology, 100MHz bus, non-blocking cacheSlot 2 technology, 100MHz bus, non-blocking cache

Page 20: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

20

SGISGISGISGI

SUNSUNSUNSUN

MachineMachineMachineMachine

Matrix-Vector MultiplicationMatrix-Vector Multiplication((MatVec Subroutine of PARCSMatVec Subroutine of PARCS))

1.731.731.731.73

3.763.763.763.76

SerialSerialSerialSerialOpenMPOpenMPOpenMPOpenMP

11*1)*1)11*1)*1) 2222 4444 8888

PthreadsPthreadsPthreadsPthreads

1111 2222 4444 8888

23.4323.4323.4323.43 13.2613.2613.2613.26 ---- ----

(0.16)(0.16)(0.16)(0.16) (0.28)(0.28)(0.28)(0.28) ---- ----

3.71 3.71 *2)*2)3.71 3.71 *2)*2) 1.931.931.931.93 ---- ----

(1.02) (1.02) *3)*3)(1.02) (1.02) *3)*3) (1.95)(1.95)(1.95)(1.95) ---- ----

1.731.731.731.73 0.920.920.920.92 0.520.520.520.52 0.370.370.370.37

(1.00)(1.00)(1.00)(1.00) (1.89)(1.89)(1.89)(1.89) (3.30)(3.30)*4)*4)(3.30)(3.30)*4)*4) (4.72)(4.72)(4.72)(4.72)

1.721.721.721.72 1.801.801.801.80 1.911.911.911.91 1.961.961.961.96

(1.01)(1.01)(1.01)(1.01) (0.96)(0.96)(0.96)(0.96) (0.91)(0.91)(0.91)(0.91) (0.88)(0.88)(0.88)(0.88)

*1) Number of Threads *4) Core is Divided into 18 Planes *1) Number of Threads *4) Core is Divided into 18 Planes

*2) Time(seconds) *2) Time(seconds)

*3) Speedup*3) Speedup

Page 21: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

21

SGISGISGISGI

1 24 8

OpenMP

Pthreads0

1

2

3

4

5

Serial Run Time: 1.73 Serial Run Time: 1.73 ss

Serial Run Time: 1.73 Serial Run Time: 1.73 ss

SUNSUNSUNSUN

12

OpenMP

Pthreads

0

1

2

3

4

5

Serial Run Time: 3.76 Serial Run Time: 3.76 ss

Serial Run Time: 3.76 Serial Run Time: 3.76 ss

Matrix-Vector MultiplicationMatrix-Vector Multiplication((Subroutine of PARCSSubroutine of PARCS))

12

OpenMP

Pthreads

0

1

2

3

4

5

1 2 48

OpenMP

Pthreads0

1

2

3

4

5

Page 22: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

22

NEACRP BenchmarkNEACRP Benchmark((Simulation with Multiple ThreadsSimulation with Multiple Threads))

Transient Power

050

100150200250300350400450500

0 0.1 0.2 0.3 0.4 0.5

TIME(sec)

Po

wer

(%

) serial

2 threads

4 threads

8 thredsthreads

Page 23: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

23

# of# ofUpdatesUpdates

# of# ofUpdatesUpdates

TimeTime(sec)(sec)TimeTime(sec)(sec)

Parallel Performance (SUN)Parallel Performance (SUN)

CMFDCMFDCMFDCMFD 36.736.736.736.7 32.132.132.132.1

NodalNodalNodalNodal 11.511.511.511.5 11.311.311.311.3

T/HT/HT/HT/H 29.629.629.629.6 27.927.927.927.9

XsecXsecXsecXsec 7.67.67.67.6 7.17.17.17.1

CMFDCMFDCMFDCMFD 445445445445 445445445445

NodalNodalNodalNodal 31313131 31313131

T/HT/HT/HT/H 216216216216 216216216216

ModuleModuleModuleModule SerialSerialSerialSerialPthreadsPthreadsPthreadsPthreads

11*)*)11*)*)

TotalTotalTotalTotal 85.485.485.485.4 78.578.578.578.5

XsecXsecXsecXsec 225225225225 225225225225

20.820.820.820.8 1.771.771.771.77

6.46.46.46.4 1.781.781.781.78

14.514.514.514.5 2.042.042.042.04

3.73.73.73.7 2.042.042.042.04

456456456456 ----

33333333 ----

216216216216 ----

2222 SpeedupSpeedupSpeedupSpeedup

45.545.545.545.5 1.881.881.881.88

226226226226 ----

*) Number of Threads*) Number of Threads

Page 24: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

24

Parallel Performance (SGI)Parallel Performance (SGI)

# of# ofUpdatesUpdates

# of# ofUpdatesUpdates

TimeTime(sec)(sec)TimeTime(sec)(sec)

CMFDCMFDCMFDCMFD 19.319.319.319.3

NodalNodalNodalNodal 9.29.29.29.2

T/HT/HT/HT/H 25.325.325.325.3

XsecXsecXsecXsec 4.44.44.44.4

CMFDCMFDCMFDCMFD 445445445445

NodalNodalNodalNodal 31313131

T/HT/HT/HT/H 216216216216

ModuleModuleModuleModuleOpenMPOpenMPOpenMPOpenMP

1 1 *1)*1)1 1 *1)*1)

TotalTotalTotalTotal 58.158.158.158.1

XsecXsecXsecXsec 225225225225

8.938.938.938.93 2.212.212.212.21 8.858.858.858.85 2.232.232.232.23

3.563.563.563.56 2.532.532.532.53 2.872.872.872.87 3.143.143.143.14

8.928.928.928.92 2.992.992.992.99 7.147.147.147.14 3.733.733.733.73

1.371.371.371.37 3.533.533.533.53 1.111.111.111.11 4.354.354.354.35

497497497497 ---- 565565565565 ----

38383838 ---- 39393939 ----

216216216216 ---- 217217217217 ----

4444 SpeedupSpeedupSpeedupSpeedup 8888 SpeedupSpeedupSpeedupSpeedup

22.822.822.822.8 2.642.64*2)*2)2.642.64*2)*2) 20.020.020.020.0 3.023.02*2)*2)3.023.02*2)*2)

228228228228 ---- 227227227227 ----

19.819.819.819.8

9.09.09.09.0

26.626.626.626.6

4.84.84.84.8

445445445445

31313131

216216216216

SerialSerialSerialSerial

60.260.260.260.2

225225225225

12.112.112.112.1 1.631.631.631.63

5.85.85.85.8 1.551.551.551.55

12.312.312.312.3 2.172.172.172.17

2.42.42.42.4 2.012.012.012.01

456456456456 ----

33333333 ----

216216216216 ----

2222 SpeedupSpeedupSpeedupSpeedup

32.632.632.632.6 1.851.851.851.85

226226226226 ----

*1) Number of Threads *2) Core is divided into 18 planes*1) Number of Threads *2) Core is divided into 18 planes

Page 25: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

25

Cache Analysis Cache Analysis

Page 26: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

26

CPUCPUCPUCPU

L1 CacheL1 CacheL1 CacheL1 Cache

L2 CacheL2 CacheL2 CacheL2 Cache

MemoryMemoryMemoryMemory

Memory Access TypeMemory Access TypeMemory Access TypeMemory Access Type CyclesCyclesCyclesCycles

L1 cache hitL1 cache hitL1 cache hitL1 cache hit 2222

L1 cache miss L1 cache miss satisfied by L2 cache satisfied by L2 cache hithit

L1 cache miss L1 cache miss satisfied by L2 cache satisfied by L2 cache hithit

8888

L2 cache miss L2 cache miss satisfied from satisfied from memorymemory

L2 cache miss L2 cache miss satisfied from satisfied from memorymemory

75757575

Memory Access TimeMemory Access Time

Typical Memory Access Cycles Typical Memory Access Cycles (SGI)(SGI)

Page 27: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

27

CMFDCMFD(BICG)(BICG)

CMFDCMFD(BICG)(BICG)

ModuleModuleModuleModule

NodalNodalNodalNodal

T/HT/H(TRTH)(TRTH)T/HT/H(TRTH)(TRTH)

XSECXSECXSECXSEC

Cache Miss Measurements (SGI)Cache Miss Measurements (SGI)

CacheCacheCacheCache SerialSerialSerialSerialOpenMPOpenMPOpenMPOpenMP

11*1)*1)11*1)*1) 2222 4444 8888

L1L1L1L1 477,691477,691477,691477,691 479,474479,474479,474479,474 258,027258,027258,027258,027 156,461156,461156,461156,461 105,733105,733105,733105,733

L1L1L1L1 857,744857,744857,744857,744 853,866853,866853,866853,866 444,849444,849444,849444,849 249,507249,507249,507249,507 160,699160,699160,699160,699

L2L2L2L2 54,16354,16354,16354,163 55,53455,53455,53455,534 33,84633,84633,84633,846 19,01619,01619,01619,016 12,84812,84812,84812,848

L1L1L1L1 165,133165,133165,133165,133 60,58760,58760,58760,587 39,41939,41939,41939,419 25,85025,85025,85025,850 19,81619,81619,81619,816

L2L2L2L2 9,5519,5519,5519,551 9,5129,5129,5129,512 9,6739,6739,6739,673 6,4516,4516,4516,451 4,6204,6204,6204,620

L1L1L1L1 62,32462,32462,32462,324 57,46257,46257,46257,462 29,84529,84529,84529,845 17,71517,71517,71517,715 11,34411,34411,34411,344

L2L2L2L2 9,4569,4569,4569,456 9,5189,5189,5189,518 5,5175,5175,5175,517 3,7373,7373,7373,737 2,5782,5782,5782,578

L2L2L2L2 28,24228,24228,24228,242 29,65029,65029,65029,650 17,00717,00717,00717,007 11,75111,75111,75111,751 9,3099,3099,3099,309

*1) Number of Threads *1) Number of Threads

Page 28: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

28

Cache Miss & Speedup Cache Miss & Speedup of XSEC Module (SGI)of XSEC Module (SGI)

0

2000

4000

6000

8000

10000

0 2 4 6 8 10

Number of CPUs

L2

MIS

SE

S

0

1

2

3

4

5

SP

EE

DU

P

L2 MISSES SPEEDUP

Page 29: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

29

ModuleModuleModuleModule

Cache Miss Ratio (SGI)Cache Miss Ratio (SGI)OpenMPOpenMPOpenMPOpenMP

CMFDCMFD(BICG)(BICG)

CMFDCMFD(BICG)(BICG)

CacheCacheCacheCache SerialSerialSerialSerial11*1)*1)11*1)*1) 2222 4444 8888

L1L1L1L1 1.001.001.001.00

NodalNodalNodalNodalL1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

T/HT/H(TRTH)(TRTH)

T/HT/H(TRTH)(TRTH)

L1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

XSECXSECXSECXSECL1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

3.053.053.053.05

3.443.443.443.44

2.852.852.852.85

6.396.396.396.39

1.481.481.481.48

3.523.523.523.52

2.532.532.532.53

2.402.402.402.40

4.524.524.524.52

5.345.345.345.34

4.224.224.224.22

8.338.338.338.33

2.072.072.072.07

5.495.495.495.49

3.673.673.673.67

3.033.033.033.03

1.851.851.851.85

1.931.931.931.93

1.601.601.601.60

4.194.194.194.19

0.990.990.990.99

2.092.092.092.09

1.711.711.711.71

1.661.661.661.66

1.001.001.001.00

1.001.001.001.00

0.980.980.980.98

2.732.732.732.73

1.001.001.001.00

1.081.081.081.08

0.990.990.990.99

0.950.950.950.95

Cache Miss Ratio =Cache Miss Ratio =ExecutionParallelofMissesCache

ExecutionSerialofMissesCache

*1) Number of Threads *1) Number of Threads

Page 30: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

30

Speedup Estimation Using Cache Speedup Estimation Using Cache MissesMisses

where

= Total data access time for serial execution

= Total data access time for 2 threads execution.

where

= Total data access time for serial execution

= Total data access time for 2 threads execution.

thtotal

serialtotal

T

TS

2

serialtotalT

thtotalT 2

•Speedup•Speedup

where

= Total L2 cache access time = Total memory access time = Number of L1 data cache misses satisfied by L2 cache hit = Number of L2 data cache misses satisfied from main memory = L2 cache access time for 1 word = Main memory access time for 1 word.

where

= Total L2 cache access time = Total memory access time = Number of L1 data cache misses satisfied by L2 cache hit = Number of L2 data cache misses satisfied from main memory = L2 cache access time for 1 word = Main memory access time for 1 word.

MemMemLLmemLtotal tntnTTT 222

2LTmemT

2LnMemn

2LtMemt

•Data Access Time•Data Access Time

Page 31: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

31

Estimated 2-thread Speedup Based Estimated 2-thread Speedup Based on Data Cache Misses for OpenMP on Data Cache Misses for OpenMP

on SGIon SGI

CMFD (BICG)CMFD (BICG)CMFD (BICG)CMFD (BICG) 1.631.631.631.63 1.781.781.781.78

NodalNodalNodalNodal 1.551.551.551.55 1.801.801.801.80

T/H (TRTH)T/H (TRTH)T/H (TRTH)T/H (TRTH) 2.172.172.172.17 2.042.042.042.04

ModuleModuleModuleModuleSpeedupSpeedupSpeedupSpeedup

MeasuredMeasuredMeasuredMeasured PredictedPredictedPredictedPredicted

XSECXSECXSECXSEC 2.012.012.012.01 1.861.861.861.86

Page 32: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

32

Conclusions Conclusions

Page 33: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

33

ConclusionsConclusions

• Comparison of OpenMP and POSIX ThreadsComparison of OpenMP and POSIX Threads– OpenMP is comparable to POSIX Threads in OpenMP is comparable to POSIX Threads in

terms of Parallel Performanceterms of Parallel Performance

– OpenMP is much easier to Implement than OpenMP is much easier to Implement than POSIX Threads due to the Directive based POSIX Threads due to the Directive based NatureNature

• Cache AnalysisCache Analysis– The Prediction of Speedup based on Data The Prediction of Speedup based on Data

Cache Misses Agrees well with the Measured Cache Misses Agrees well with the Measured SpeedupSpeedup

Page 34: The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

34

Continuing WorkContinuing Work

• AlgorithmicAlgorithmic

- 3-D Domain Decomposition3-D Domain Decomposition

• Software Software

- SUN CompilerSUN Compiler

- Pthreads Scheduling on SGIPthreads Scheduling on SGI

• Alternate PlatformsAlternate Platforms