how we use mpi: a naïve curmudgeon’s view
DESCRIPTION
How We Use MPI: A Naïve Curmudgeon’s View. Bronson Messer Scientific Computing Group Leadership Computing Facility National Center for Computational Sciences Oak Ridge National Laboratory Theoretical Astrophysics Group Oak Ridge National Laboratory Department of Physics & Astronomy - PowerPoint PPT PresentationTRANSCRIPT
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
11
How We Use MPI:A Naïve Curmudgeon’s View
Bronson MesserScientific Computing Group
Leadership Computing FacilityNational Center for Computational Sciences
Oak Ridge National Laboratory
Theoretical Astrophysics GroupOak Ridge National Laboratory
Department of Physics & AstronomyUniversity of Tennessee, Knoxville
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
22
Why do we (I and other idiot astrophysicists) use MPI? It is ubiquitous!
… and everywhere it exists it performs – OK, ‘performs’ connotes good performance, but ‘poor’
performance on a given platform is always met with alarm– AND, we have now figured out how to ameliorate some
shortcomings in performance through avoidance
… and it’s pretty darn easy to use– Even ‘modern’ (i.e. grew up with their notion of ‘computer’
meaning ‘information appliance’) grad students can figure out how to program poorly using MPI in a matter of days
That’s it!
Importantly, right now our need for expressiveness is close to being met
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
33
Selected Petascale Science DriversWe Have Worked With Science Teams to Understand and Define Specific Science Objectives
Application Area
Science Driver Science Objective Impact
Combustion(S3D)
Predictive engineering engine design simulation tool for new engine design
Understanding flame stabilization in lifted autoigniting diesel fuel jets relevant to low-temperature combustion for engine design at realistic operating conditions
Potential for 50% increase in efficiency and 20% savings in petroleum consumption with lower emission, leaner burning engines
Fusion(GTC)
Understand and quantify physics and properties of ITER scaling and H‑mode confinement
Strongly coupled and consistent wall-to-edge-to-core modeling of ITER plasmas; attain a realistic assessment of ignition margins
ITER design and operation
Chemistry(MADNESS)
Computational catalysis Describe large systems accurately with modern hybrid and meta density functional theory functionals
Generate quantitative catalytic reaction rates and guide small system calibration
Nanoscale Science(DCA++)
Material-specific understanding of high‑temperature superconductivity theory
Understand the quantitative differences in the transition temperatures of high-temperature superconductors
Macroscopic quantum effect at elevated temperatures (>150K); new materials for power transmission and oxide electronics
Climate(POP)
Accurate representation of ocean circulation
Fully coupled eddy-resolving ocean and sea ice model to reduce the coupled model biases where ice and deep water parameters are governed by the accurate representation of current systems
Reduce current uncertainties in coupled ocean-sea ice system model
Geoscience(PFLOTRAN)
Perform multiscale, multiphase, multi-component modeling of a 3-D field CO2
injection scenario
Include oil phase and four-phase liquid-gas-aqueous-oil system to describe dissipation of the supercritical CO2 phase and escape of CO2 to the
surface
Demonstrate viability of and potential for sequestration of anthropogenic CO2 in deep geologic formations
Astrophysics(CHIMERA)
Understand the core-collapse supernova mechanism for a range of progenitor star masses
Perform core-collapse simulations with sophisticated spectral neutrino transport, detailed nuclear burning, and general relativistic gravity
Understand the origin of many elements in the Periodic Table and the creation of neutron stars and black holes
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
44
Science WorkloadJob Sizes and Resource Usage of Key Applications
Code2007 Resource
Utilization(M core-hours)
Projected 2008 Resource Utilization
(M core-hours)
Typical Job Size in 2006-
2007(K cores)
Anticipated Job Size in
2008(K cores)
CHIMERA2
(under development)
160.25
(under development)
>10
GTC 8 7 8 12
S3D 6.5 18 8-12 >15
POP 4.8 4.7 4 8
MADNESS1
(under development)
40.25
(under development)
>8
DCA++N/A
(under development)
3-8N/A
(under development)
4-16 (w/o disorder)>40 (with disorder)
PFLOTRAN0.37
(under development)
>21-2
(under development)
>10
AORSA 0.61 1 15-20 >20
Total aggregate allocation for CHIMERA production & GenASiS development this FY:38 million CPU-hours (16M INCITE, 18M NSF, 4M NERSC)
Total aggregate allocation for CHIMERA production & GenASiS development this FY:38 million CPU-hours (16M INCITE, 18M NSF, 4M NERSC)
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
55
Current Planned Pioneering Application RunsSimulation Specs on the 250 TF Jaguar System*
CodeQuad-Core
Nodes
Global Memory
Reqm (TB)
Wall-Clock Time Reqm
(hours)
Number of Runs
Local Storage Reqms
(TB)
Archival Storage Reqms
(TB)
Resolution and Fidelity
MADNESS 7824 48122
1012
5 50 600B coefficients
CHIMERA78244045
168
100100
11
13 50256x128x256 or 256x90x180
20 energy groups, 14 alpha nuclei
GTC-SGTC-C
39003900
4060
3636
22
350 550600M grid points, 60B particles400M grid points, 250B particles
DCA++20006000
1648
12 to 24 20 1 1Lattices of 16 to 32 sites
80 to 120 time slicesO(102-103) disorder realizations
S3D 7824 10 140 1 50 1001B grid points, 15 μm grid spacing4 ns time step, 23 transport vars
POP 2500 1 400 1 1 23600x2400x42 tripole grid (0.1°)
20-yr run; partial bottom cells; first with biogeochemistry at this scale
Multi-physics applications are very good present-day laboratories for multi-core ideas.
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
66
Current Workhorse
mCHIMERA
bCHIMERA
Ray-by-ray MGFLD transport (E)3D (magneto)hydrodynamics150 species nuclear network
Ray-by-ray Boltzmann transport (E)3D (magneto)hydrodynamics
150-300 species nuclear network
Possible Future Workhorse
The “Ultimate Goal”
Full 3D Boltzmann transport (Eφ)3D (magneto)hydrodynamics
150-300 species nuclear network
Bruenn et al. (2006) Messer et al. (2007)
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
77
Pioneering Application: CHIMERA*Physical Models and Algorithms
Physical Models A “chimera” of three separate yet mature
codes– Coupled into a single executable
Three primary modules (“heads”)– MVH3: Stellar gasdynamics
– MGFLD-TRANS: ``ray-by-ray-plus'' neutrino transport
– XNET: thermonuclear kinetics
The heads are augmented by– Sophisticated equation of state for nuclear
matter
– Self-gravity solver capable of an approximation to general-relativistic gravity
Numerical Algorithms Directionally-split hydrodynamics with a
standard Riemann solver for shock capturing
Solutions for ray-by-ray neutrino transport and thermonuclear kinetics are obtained during the radial hydro sweep
– All necessary data for those modules is local to a processor during the radial sweep
– Computed along each radial ray using only data that is local to that ray
Physics modules are coupled with standard operator-splitting
– Valid because characteristic time scales for each module are widely disparate
Neutrino transport solution– Sparse linear solve local to a ray
Nuclear burning solution– Dense linear solve local to a zone
Early-time distribution of entropy in 2D exploding core collapse simulation
* Conservative Hydrodynamics Including Multi-Energy Ray-by-ray Transport
* Conservative Hydrodynamics Including Multi-Energy Ray-by-ray Transport
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
88
CHIMERA is: a “chimera” of 3 separate, mature codes
VH1 (MVH3)
• Multidimensional hydrodynamics• http://wonka.physics.ncsu.edu/pub/VH-1/• non-polytropic EOS• 3D domain decomposition
- uses directional sweeps to define subcommunicators for data transpose (MPI_alltoall)- results in all processes performing ‘several_to_several’
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
99
MN
4N
3N
N+1 N+2 2N
1 2 3 4 Nmype +1 =
jcol = 0 1 2 3 N-1
krow =
M-1 3
2
1 0
Y
Z
Using M*N processors; X data starts local to proc
MVH3: Dicing instead of slicing
MPI_ALLTOALL( MPI_COMM_ROW )
Y Hydro is done after transposing data only with processors with the same value of krow: transposing I and J but keeping K constant.
MPI_ALLTOALL( MPI_COMM_COL )
jcol = mod(mype,N)krow = mype/Nmpi_comm_split(mpi_comm_world, krow, mype, mpi_comm_row)mpi_comm_split(mpi_comm_world, jcol, mype, mpi_comm_col)
zro(imax,js,ks)
Local data includes all of the X domain, but only portions of Y and Z.
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1010
MGFLD-TRANS
• Multi-group (energy) neutrino radiation hydro solver• GR corrections• 4 neutrino flavors with many modern interactions included• flux limiter is “tuned” from Boltzmann transport simulations
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1111
XNET
• Nuclear kinetics solver• Currently have implemented only an α network• 150 species to be included in future simulations• Custom interface routine written for CHIMERA• All else is ‘stock’
QuickTime™ and a decompressor
are needed to see this picture.
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1212
CHIMERA
How does CHIMERA work?
νν
ϑϑ
φφ
rr
VH1/MVH3
MGFLD-TRANS
XNET
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1313
Example: XNET performance and implementation XNET runs at ~50% of peak on a single XT4
processorRoughly 50% Jacobian build / 50% dense solve
1 XNET solve is required per SPATIAL ZONE (i.e. hundreds per ray)
Best load balancing on a node with OpenMP or a subcommunicator is interleaved
r=0 r=rmax
hot cool
lots of burning little burning
1 2 3 4 1 2 3 4
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1414
Communication Patterns
Application Collectives Point-to-point Asynchronous Other
POP 45% 10% 45%(MPI_Waitall)
GTC 66% 34%
PFLOTRAN 95% 5%(MPI_barrier)
CHIMERA 96% 4%
AORSA 65% 35%(MPI_Wait)
S3D 15% 85%
LSMS (with one-sided comm.) 5% 15%
85% (MPI2 one-sided comm.:
MPI_win, MPI_put, etc)
LSMS (w/o one-sided comm.) 45% 55%
NOTE: absolute time for collectives is identical for both
LSMS versions
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1515
A lot of “big” codes don’t really stress the XT network
Solar Physics3.3% Accelerator Physics
3.1%
Astrophysics14.1%
Biology4.8%
Chemistry7.4%
Climate13.6%
Computer Science2.8%
Engineering0.56%
Combustion14.4%
Nuclear Physics5.2%
Atomic Physics1.4%
QCD4.9%
Geosciences1.2%
Fusion7.2%
Materials Science 16.0%
Solar Physics3.3% Accelerator Physics
3.1%
Astrophysics14.1%
Biology4.8%
Chemistry7.4%
Climate13.6%
Computer Science2.8%
Engineering0.56%
Combustion14.4%
Nuclear Physics5.2%
Atomic Physics1.4%
QCD4.9%
Geosciences1.2%
Fusion7.2%
Materials Science 16.0%
2007 INCITE
MADNESS
DCA++
S3D
CHIMERA
POPPFLOTRAN
Distribution in this space depends upon the applications and the problem being simulated for a given application
Computation
Co
mm
un
ica
tion
0% 100%0%
100%
GTC
S3D
POP
CHIMERA
DCA++
MADNESS
Computation
Co
mm
un
ica
tion
0% 100%0%
100%
GTC
S3D
POP
DCA++
MADNESSPFLOTRAN
GTC
CHIMERA
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1616
Relative Per Core Performance
Code XT4 2 Socket F/ Seastar2 2 Socket F/Gemini
POP 1.0 1.01 1.47
CHIMERA* 1.0 0.89 1.03
GTC 1.0 1.03 1.04
S3D 1.0 0.89 1.01
PFLOTRAN 1.0 1.23 1.39
MADNESS 1.0 1.04 1.04
* Only hydrodynamics module used in benchmark* Only hydrodynamics module used in benchmark
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1717
GenASiS development
GenASiS is not completely “wed” to an programming model yet– Lots of abstraction– Function overloading used everywhere in the
code– Many implementations are possible ‘under the
hood’
Full, 3D rad-hydro simulations will require an exascale computer in any event, so we have time…
(Why, they couldn’t hit an elephant at this dis… [Gen. John Sedgwick, 1864])
SNL MPI WorkshopSNL MPI Workshop
Managed by UT-Battellefor the Department of Energy
1818
Opinions and questions
Ubiquity and performance are go/no-go metrics for any future methods/languages/ideas.– Does this present a chicken/egg conundrum: must
things be built and tested on architectures unready to exhibit the expected performance?
Are the users of an exascale machine the present users of petascle-ish platforms? Is the mapping one-to-one?
Writing code from scratch is not anathema, but you’re lucky if you can afford to do it.– Even then, design decisions are often made during
this process based not on wise reflection, but attempts to snag the proverbial (but elusive) low-hanging fruit.