exppg yloiting concurrency at the petascale in neutron ... · exppg yloiting concurrency at the...

26
Exploiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory, October 13-15, 2008 Dinesh Kaushik Micheal Smith Won Sik Yang Won Sik Yang Barry Smith Andrew Siegel Argonne National Laboratory Argonne National Laboratory Argonne, IL 60439 USA [email protected]

Upload: others

Post on 19-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Exploiting Concurrency at the p g yPetascale in Neutron Transport

Young Investigators SymposiumOak Ridge National Laboratory, October 13-15, 2008

Dinesh KaushikMicheal SmithWon Sik YangWon Sik YangBarry SmithAndrew Siegel

Argonne National LaboratoryArgonne National LaboratoryArgonne, IL 60439 [email protected]

Page 2: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Organization of the Presentation

Overview of SHARP Project at ArgonneNeutron transport backgroundNeutron transport backgroundComputational issues for neutronicsResource requirements for UNICParallel performance of UNICParallel performance of UNICSummary

Page 3: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Simulation based High-efficiency Advanced Reactor Prototyping (SHARP)yp g ( )

A tight integration of multiphysics and multiscale modeling of physicsphenomena based on a first principle approachan integrated system of software tools that will describe the overall nuclearg yplant behavior in a high fidelity way and the coupling among the differentphenomena taking place during reactor operation ranging from neutronics tofuel behavior, from thermal-hydraulics to structural mechanicsth bilit f d i i b i d t d t ti d d i ( ti diti )the ability of deriving basic data and static and dynamic (operating conditions)properties from first principles based methodologies and fundamentalexperimentsthe ability to define and plug-in new and different combinations of physics-the ability to define and plug-in new and different combinations of physics-module implementations to study different phenomena, define and combinedifferent numerical techniques, configure the code easily to run on newplatforms, and develop new physics components without expert knowledge ofthe entire system.

Page 4: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

SHARP Collaborators

Advanced Thermal Modeling

ANL, LANL

Advanced Neutronics Modeling

ANL, ORNL, LANL, INLNeutronics Modeling

Advanced LLNL

Code

Structural MechanicsLLNL

Code Framework Design ANL, LLNL

Page 5: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

UNIC: Neutronics Module in SHARP

– A 3D unstructured deterministic neutron transport code– solvessolves • second order form of transport using FEM (PN2ND and SN2ND)

and • first order form by method of characteristics (MOC)y ( )

– Parallel implementation using PETSc solvers– Scales to over 80,000 processors of BlueGene/P at Argonne• larger computations are in progresslarger computations are in progress

Page 6: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Homogenization at various levels

Homogenized assembly

Homogenized assembly internals

Homogenized pin cells

Fully explicit assemblyp y

Page 7: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Fine Detail: Wire wrapped pins in subassemblyStarting point for TH simulation development and deployment:Starting point for TH simulation development and deployment:

– Most important contributor to subassembly energy redistribution is interchannel cross flow

– A better understanding of flow distribution (interior, edge, corner) can lead to improved subchannel models and potentially improved designsimproved subchannel models and potentially improved designs

– Resolving wire wrap (diameter = 0.11 cm) leads to 10-100 billion element meshes and about 1015 degrees of freedom (DOF) for advanced burner test reactor (ABTR) core (2.3 m in diameter and 3.5 meter long)

Fuel PinCorner

SubchannelEdge

Subchannel Fuel Pinand Wire

Subchannel Subchannel

InteriorSubchannel

HDuct Wall

Fuel Pin D

Wire Wrap

P

Page 8: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

The Steady State Transport Equation (p46 in Lewis)

ˆ ˆ ˆ ˆ ˆ ˆ∫ ∫ˆ ˆ ˆ ˆ ˆ ˆ( , , ) ( , ) ( , , ) ( , ' , ' ) ( , ', ') ' '

1 ˆ( , ) ( , ') ( , ') ( , ', ') ' '

t s

f

r E r E r E r E E r E d dE

r E r E r E r E d dEk

ψ ψ ψ

χ ν ψ

Ω⋅∇ Ω +Σ Ω = Σ Ω →Ω → Ω Ω

+ Σ Ω Ω

∫ ∫

∫ ∫

r r r r r r

r r r r

ˆ( , , )kS r E+ Ωr

ˆ( , , )r Eψ Ωr

The neutron flux (neutron density multiplied by speed)

( , )t r EΣr

ˆ ˆ( , ' , ' )s r E E d dEΣ Ω →Ω → Ωr

The total probability of interaction in the domain

The scattering transfer kernel

( , ) ( , ) ( , )fr E r E r Eχ ν Σr r r

ˆ( )S EΩ

The steady state multiplicative fission source

If a fixed source is present then k = 1( , , )S r EΩ If a fixed source is present then k = 1

k The multiplication eigenvalue

Page 9: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Solving the Eigenvalue Problem

T⎡ ⎤

A x xλ= Cast the transport equation as a pseudo matrix-vector operation

T = streaming/collision/scattering F = fission

1 2ˆ ˆ ˆ( , ) ( , ) ( , )

T

Gr r rψ ψ ψ ψ⎡ ⎤= Ω Ω Ω⎣ ⎦r r r

L

, , ' 'ˆ ˆ ˆ ˆ ˆ ˆ( , ) ( ) ( , ) ( , ' ) ( , ') '

G

g t g g s g g gT r r r r r dψ ψ ψ ψ→= Ω⋅∇ Ω +Σ Ω − Σ Ω →Ω Ω Ω∑∫r r r r r r

, ,' 1

g g g g g gg =∑∫

' , ' '' 1

ˆ( ) ( ) ( ) ( , ') 'G

g g f g gg

F r r r r dψ χ ν ψ=

= Σ Ω Ω∑∫r r r r

1A T F−

A x xλ=

1g

1T Fk

ψ ψ= Standard eigenvalue notation:1A T F

xkψ

λ

=== kλ

Page 10: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Features of Second Order Form Solutions in UNIC

PN2ND and SN2ND solvers have been developed to solve the steady state secondPN2ND and SN2ND solvers have been developed to solve the steady-state, second-order, even-parity neutron transport equation

– PN2ND: Spherical harmonic method in 1D, 2D and 3D geometries with FE mixed mesh capabilities

– SN2ND: Discrete ordinates in 2D and 3D geometries with FE mixed mesh capabilities

These second order methods have been implemented on large scale parallel machines– Linear tetrahedral and quadratic hexahedral elements– Linear tetrahedral and quadratic hexahedral elements– Fixed source and eigenvalue problems– Arbitrarily oriented reflective and vacuum boundary conditions– PETSc solvers are utilized to solve within-group equations• Conjugate gradient method with SSOR and ICC preconditioners• Other solution methods and preconditioners will be investigated

– Synthetic diffusion acceleration for within-group scattering iterationP it ti th d f i l bl– Power iteration method for eigenvalue problem• Various acceleration schemes are being investigated

– MeTiS is employed for mesh partitioning

Page 11: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Power Distribution for ABTR Full Core Benchmark

Group 1 Flux

Sample UNIC solution for ABTR. (Left) MeTiS decomposition for 512 processors (center) homogenized fuel regions are extracted and display the power distribution where the lower remnant displays the MeTiS decomposition and (right) vertical slice through the model showing the power distribution. Power Distribution

Page 12: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

k-Eigenvalue Power Iteration

Steady state multigroup eigenvalue and/or fixed source iterations forSteady state multigroup eigenvalue and/or fixed source iterations forone, two, and three dimensional unstructured finite element meshgeometries.Main loop (Power iteration)Begin Outer Iteration

Begin Loop over energy groupsObtain group scattering+fission+fixed sourcesSolve a symmetric positive definite linear system for flux

(preconditioned conjugate gradient)

End Loop over energy groupsCompute total fission sourceCheck for convergence in eigenvalue, angular flux, and sources

End Outer Iteration

Page 13: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Resource Requirements for UNICPerformance – A Crude Analysis

The conjugate gradient (CG) method is O[n i] where n number of non zeroes in the matrix (p m q) For a well conditionedn = number of non zeroes in the matrix (p m q). For a well conditioned system, the number of iterations (i) needed to converge is small.Matrix vector product dominates the execution time

Two matrix vector products per CG iteration– Two matrix-vector products per CG iteration– 2 n floating point operations (flops) needed for each matrix vector

product– Neglect the vector operations (dot products and scale) for serial caseNeglect the vector operations (dot products and scale) for serial case

since (n >> m q)

Page 14: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Resource Requirements for UNICStorage (PN2ND Solver)

Massive linear systemsN energy groups m mesh points q angular terms and p nonzeroes perN energy groups, m mesh points, q angular terms, and p nonzeroes per row, we need m q p 8 1.5 bytes are required to store the matrix in compressed row format.Since this is a symmetric matrix, we need half of this.Since this is a symmetric matrix, we need half of this.As an estimate, for 4000 nonzeroes in each row (p) for 108 rows (m = 106

and q = 100):– 2.4 terabytes (TB) of memory per group for the matrix alone (800 MB for y ( ) y p g p (

one vector) and 24 petabytes (PB) for 10,000 groups.

Page 15: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Resource Requirements for UNICPerformance (contd.)

For i iterations of CG, O outer iterations, and N groups (assuming the same number of inner iterations for all the groups) the total flops are 4 n i Nnumber of inner iterations for all the groups), the total flops are 4 n i N O.CG performance is memory bandwidth limited– 10 - 20% of machine peak is practical10 20% of machine peak is practical

For i = 100, O = 50, N = 10,000, and n = 4 1011, the total work is 8x1019

flops (80,000 petaflops)– At 100 TFlop/s, 1111 hrsp ,– At 1 PFlops/s, 111 hrs

Page 16: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Parallelism in Space - Key Features of the Implementation Strategy

Follow the “owner computes” rule under the dual constraints of minimizing the number of messages and overlapping communication with computationEach processor “ghosts” its stencil dependences in its neighborsGhost nodes ordered after contiguous owned nodesDomain mapped from (user) global ordering into local orderingsScatter/gather operations created between local sequential vectors and globalScatter/gather operations created between local sequential vectors and global distributed vectors, based on runtime connectivity patternsKrylov-Schwarz operations translated into local tasks and communication tasksProfiling used to help eliminate performance bottlenecks in communication and memory hierarchyhierarchy

Page 17: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Parallelism in Angle and Energy

Recently added capability to parallelize by space, angle, and energy in UNIC– Previous implementation only considered spatial parallelizationy– Memory limitations prevented large problems from being executed

Implemented matrix-free CG solvers for PN2ND and SN2ND

– New version uses single stenciled preconditioner matrix for each angle-group partition (far less memory)

– Partitions can be solved simultaneouslyy– Current fission source iteration scheme

still uses Gauss Seidel in energyAdded fission source acceleration

T h b h– Tchebychev– Self-adjusting within-group flux error

tolerance to minimize preconditioner effort

– Time to solution reduced by 10+ factor

Page 18: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Energy Group Parallelism Issues

Sodium Fast Reactor Fuel Assembly (230 Groups)

PWR Fuel Assembly (172 Groups)Assembly (230 Groups) Groups)

Scattering Cross Section Stenciling

Page 19: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Angular Convergence of Eigenvalue of ABTR Core Using UNIC on BlueGene/P

SN2ND Solver– 9 energy groups– Mesh (assembly homogenized)• 187,560 hexahedral quadratic elements and 785,801 vertices • Spread over 512 processor cores

ProcessorCores

AngularResolution

Anglesin 2π

Space-AngleDOF

Eigenvalue

2,048 1 4 3,143,204 1.00633

4,608 2 9 7,072,209 1.00823

8,192 3 16 12,572,816 1.00754

12,800 4 25 19,645,025 1.00776

18,432 5 36 28,288,836 1.00773

25,088 6 49 38,504,249 1.00781

32,768 7 64 50,291,264 1.00780

41,472 8 81 63,649,881 1.00782

51,200 9 100 78,580,100 1.00781

61,952 10 121 95,081,921 1.00782, , ,

73,728 11 144 113,155,344 1.00782

Page 20: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Weak Scaling of UNIC on BlueGene/P

SN2ND Solver2 048 to 73 728 Cores (Virtual Node Mode)2,048 to 73,728 Cores (Virtual Node Mode)

Page 21: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Weak Scaling of UNIC on BlueGene/P (contd.)

Load imbalance in reduce operationsNeed to balance boundary verticesNeed to balance boundary vertices

Page 22: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Path forward - Areas for Collaboration

More efficient custom preconditioners that take advantage of the matrix sparsity patternp y pParallelize across groups to increase the amount of concurrencyLoad balancing to be handled through better partitioningExplore the hybrid (mixed MPI/OpenMP) programming modelp y ( p ) p g g– Better algorithmic convergence rate

Memory reducing algorithms for matrix vector products– Tensor matrix vector product implementationTensor matrix vector product implementation

Page 23: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Summary

SHARP is striving to be a high-fidelity reactor analysis framework for the communityy– Employ tight coupling among the multiphysics components to improve

accuracy– Competing with legacy codes (in memory and execution time) is

importantOur neutronics code UNIC shows reasonable scalability on current large platformsCComputational challenges need to be tackled at the modeling, algorithmic, and architectural levels– Progressively reduce reliance on homogenization

Mesh generation cr cial• Mesh generation crucial– Scalable algorithms to exploit the petascale computing power– Available memory and memory bandwidth will stay the limiting factor

in the near futurein the near future

Page 24: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

"I would rather have today's algorithms on yesterday's computers than vice versa " --- Phillipe Tointcomputers than vice versa. Phillipe Toint

Page 25: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Relevant Publications

A. Siegel, A. Caceres, P. Fischer, D. Kaushik, G. Palmiotti, C. Rabiti, M. A. Smith, T. Tautges, W. S. Yang, “Advanced Simulation for Fast , g , g,Reactor Analysis”, SciDAC Review, Issue 9, Fall 2008, pp 12-21, http://www.scidacreview.org/0803/pdf/nuclear.pdfM. Smith, C. Rabiti, D. Kaushik, W. S. Yang, G. Palmiotti, “Report on Ad d N t i C d D l t i FY 2007 ” ANL R t ANLAdvanced Neutronics Code Development in FY 2007,” ANL Report ANL-AFCI-209, May 2008.M. A. Smith, G. Palmiotti, C. Rabiti, D. Kaushik, A. Siegel, B. Smith, E. E. Lewis “PNFE Component of the UNÌC Code ” Joint International TopicalLewis, PNFE Component of the UNÌC Code, Joint International Topical Meeting on Mathematics & Computation and Supercomputing in Nuclear Applications (M&C + SNA 2007), Monterey, California, April 15-19, 2007. D. Kaushik, W. Gropp, M. Minkoff, B. Smith, “Improving the Performance pp p gof Tensor Matrix Vector Multiplication in Cumulative Reaction Probability Based Quantum Chemistry Codes,” accepted for the Proceedings of the 15th International Conference on High Performance Computing (HiPC 2008) Bangalore India December 20082008), Bangalore, India, December, 2008.

Page 26: Exppg yloiting Concurrency at the Petascale in Neutron ... · Exppg yloiting Concurrency at the Petascale in Neutron Transport Young Investigators Symposium Oak Ridge National Laboratory,

Acknowledgements

This work has been supported by DOE Office of Nuclear Energy, Science and Technology through the Advanced Fuel Cycle Initiative (AFCI)gy g y ( )Cristian Rabiti and Giuseppe Palmiotti of INL for their contribution to UNICDavid Keyes of Columbia University and Jean Ragusa of Texas A&M for many interesting discussions Computer time was provided by ORNL on Cray XT3/4, ANL for BlueGene/P, and SDSC for TeraGrid