role of spectral turbulence simulations in developing hpc systems
DESCRIPTION
Role of spectral turbulence simulations in developing HPC systems. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN. Background. Experience of developing the Earth Simulator 40Tflops vector-type distributed-memory supercomputer system - PowerPoint PPT PresentationTRANSCRIPT
One-day Meeting, INI, September 26th, 2008
Role of spectral turbulence simulations in developing HPC systems
YOKOKAWA, MitsuoNext-Generation Supercomputer R&D CenterRIKEN
One-day Meeting, INI, September 26th, 2008 2
Background
Experience of developing the Earth Simulator40Tflops vector-type distributed-memory supercomputer system
A simulation code for box turbulence flow was used in the final adjustment of the systemLarge simulation on box turbulence flow was carried out.
A Peta-flops supercomputer project
One-day Meeting, INI, September 26th, 2008 3
Contents
Simulations on the Earth Simulator
A Japanese peta-scale supercomputer project
Trends of HPC system
Summary
One-day Meeting, INI, September 26th, 2008 4
Simulations on the Earth Simulator
One-day Meeting, INI, September 26th, 2008 5
The Earth Simulator
It was completed in 2002.35.86Tflops sustained in LINPACK benchmark was achieved.It was chosen as one of 2002 best inventions by “TIME.”
PN cabinets(320)
IN cabinets(65)
65m (71yd)
Double Floor for Cables
Power Supply System
Cartridge Tape Library System
Magnetic Disk System
Air Conditioning SystemSeismic Isolation System
50m (55yd)
One-day Meeting, INI, September 26th, 2008 6
Why I did?
It is important to make performance evaluation of the Earth Simulator at the final adjustment phase.
Suitable codes should be chosenTo evaluate performance of vector processor,To measure performance all-to-all communication among compute-nodes through a crossbar switch, To make an operation of the Earth Simulator stable.
CandidatesLINPACK Benchmark?Atmospheric general circulation model (AGCM)?Any other code?
One-day Meeting, INI, September 26th, 2008 7
Why I did? (cont’d)
Spectral turbulence simulation codeIntensive computational kernel & a lot of data communicationsSimple codeSignificance to computational science.
One of the grand challenges in computational science and high performance computing
A new spectral code for the Earth SimulatorFourier spectral method for spatial discretizationSome techniques (mode truncation and phase shift techniques) for aliasing error in calculating nonlinear termsFourth-order Runge-Kutta method for time integration
One-day Meeting, INI, September 26th, 2008 8
Points of coding
Optimization to the Earth SimulatorCoordinated assignment of calculation to three-level of parallelism (vector processing, micro-tasking, and MPI parallelization) Higher-radix FFT
B/F (data transfer rate between CPU and memories vs. operation performance)
Removal of redundant processes and variables
One-day Meeting, INI, September 26th, 2008 9
3.21sec
Calculation for one time step
Number of nodes
Wall
tim
e
30.7sec
64 128 256 512
100
10
1
0.1
0.01
3days by 512 PNs
One-day Meeting, INI, September 26th, 2008 10
PerformanceTflop
s 16.4Tflops
Number of PNs64 128 256 512
100
10
1
50% of the peak(single precision & analytical FLOP number)
One-day Meeting, INI, September 26th, 2008 11
Achievement of box turbulence flow simulations
1
10
100
1000
10000
1960 1970 1980 1990 2000 2010Year
Orszag(1969) IBM 360-95
Kerr(1985) Cray-1S NCAR
K & I & Y (2002) Earth Simulator
323
643
1283
10243
20483, 40963
Nu
mb
er
of
gri
d p
oin
ts
Yamamoto(1994)Numerical Wind
Tunnel
Jimenez et al.(1993) Caltech Delta
machine5123
Siggia(1981) Cray-1
NCAR
Gotoh&Fukayama(2001)
VPP5000/56 NUCC
2403
One-day Meeting, INI, September 26th, 2008 12
A Japanese Peta-Scale Supercomputer Project
One-day Meeting, INI, September 26th, 2008 13
Next-Generation Supercomputer ProjectObjectives are
to develop the world's most advanced and high-performance supercomputerto develop and deploy its usage technologies as well as application software.
as one of Japan's Key Technologies of National Importance.
Period & Budget: FY2006-FY2012, ~1 billion US$ (expected)
RIKEN (The Institute of Physical and Chemical Research) plays the central role of the project in developing the supercomputer under the law.
One-day Meeting, INI, September 26th, 2008 14
Goals of the project
Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops.
Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields.
Establishment of an “Advanced Computational Science and Technology Center (tentative)” as one of the Center of Excellences for research, personnel development and training built around the supercomputer.
One-day Meeting, INI, September 26th, 2008 15
Major applications for the system
Grand Challenges
One-day Meeting, INI, September 26th, 2008 16
Configuration of the system
The Next-Generation Supercomputer will be a hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations.
Calculations will be performed in processing units that are suitable for the particular simulation.
Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible.
One-day Meeting, INI, September 26th, 2008 17
Roadmap of the project 2008 2009 2010 2011
Computerbuilding
Researchbuilding
20072006 2012
Shared file system
Processing unit
Front-end unit(total system software)
Next-GenerationIntegrated NanoscienceSimulation
Next-GenerationIntegratedLife Simulation
Operation▲ Completion▲
VerificationVerificationDevelopment, production, and evaluationDevelopment, production, and evaluation
Tuning and improvementTuning and improvement
VerificationVerification
Production, installation, and adjustmentProduction, installation, and adjustment
Production, installation, and adjustment
Production, installation, and adjustment
ConstructionConstructionDesignDesign
ConstructionConstructionDesignDesign
OperationOperationPreparationPreparationDecisions on policies and systemsDecisions on policies and systems
Prototype andevaluationDetailed designDetailed design
Conceptualdesign
Detailed designDetailed designBasicdesignBasicdesign
Development, production, and evaluationDevelopment, production, and evaluation
Production and evaluation
Sys
tem
Buildin
gs
Operation
Detailed designDetailed designBasicdesignBasicdesign
2008 2009 2010 2011
Computerbuilding
Researchbuilding
20072006 2012
Shared file system
Processing unit
Front-end unit(total system software)
Next-GenerationIntegrated NanoscienceSimulation
Next-GenerationIntegratedLife Simulation
Operation▲ Completion▲
VerificationVerificationDevelopment, production, and evaluationDevelopment, production, and evaluation
Tuning and improvementTuning and improvement
VerificationVerification
Production, installation, and adjustmentProduction, installation, and adjustment
Production, installation, and adjustment
Production, installation, and adjustment
ConstructionConstructionDesignDesign
ConstructionConstructionDesignDesign
OperationOperationPreparationPreparationDecisions on policies and systemsDecisions on policies and systems
Prototype andevaluation
Prototype andevaluationDetailed designDetailed design
Conceptualdesign
Conceptualdesign
Detailed designDetailed designBasicdesignBasicdesign
Development, production, and evaluationDevelopment, production, and evaluation
Production and evaluation
Sys
tem
Buildin
gs
Operation
Detailed designDetailed designBasicdesignBasicdesign
We are here.
One-day Meeting, INI, September 26th, 2008 18
Location of the supercomputer site, Kobe-City
Tokyo
Kobe
450km (280miles) west
from Tokyo
Mt. Rokko
Sannomiya Sta.
Port Island
Kobe Sky Bridge5km from Sannomiya Sta.12 min. by Portliner Monorail
Ashiya City
Kobe Airport
Shinkansen-LineShin-Kobe Station
Photo: June, 2006
Next-GenerationSupercomputer Site
Portliner Monorail
One-day Meeting, INI, September 26th, 2008 19
Artists’ image of a building
One-day Meeting, INI, September 26th, 2008 20
Photo of the site (under construction)
June 10, 2008
July 17, 2008
Aug. 20, 2008
Photo From South-Side
One-day Meeting, INI, September 26th, 2008 21
Trends of HPC system
One-day Meeting, INI, September 26th, 2008 22
Trends of HPC system
It will have the large number of processors around 1 million or more.
Each chip will be multi-core(8, 16, or 32), or many-core(more than 64) processor.
low performance for each coresmall main memory capacity for each core
fine-grain parallelism
Each processor consumes low energy – low power processor
Narrow bandwidth between CPU and main memoryBottleneck of the number of signal pins
Bi-sectional bandwidth among compute-nodes will be narrow.
One-to-one connection is very expensive and power-consuming
One-day Meeting, INI, September 26th, 2008 23
Impact to spectral simulations
High performance in LINPACK benchmarkThe more the number of processors is, the higher the LINPACK performance is.It is not necessary that LINPACK performance denotes real-world application performance, especially spectral simulations
Small memory capacity for each processorfine-grain decomposition of spaceincreasing communication cost among parallel compute nodes
Narrow memory bandwidth and narrow inter-node bi-sectional bandwidth
memory wall problem and low all-to-all communication performancenecessity of a low B/F algorithm in place of FFT
One-day Meeting, INI, September 26th, 2008 24
Impact to spectral simulations (cont’d)
The trend does not completely fit doing 3D-FFT, i.e. box turbulence simulations are getting to be difficult to perform.
We can use more and more computational resource near future, …
But finer resolution simulation by spectral methods needs a long-time calculation time because of extremely slow of communications among parallel compute nodes, and we might not be able to obtain the final results in reasonable time.
One-day Meeting, INI, September 26th, 2008 25
Estimates for more than 40963 simulation
If simulation performance with 500TFlops sustained can be used,
81923 simulation needs 7 second for one-time step100TB total memory8 days for 100,000 steps and 1PBytes for a complete simulation
163843 simulation1 min for one-time step800TB total memory3 months for 125,000 steps and 10PB in total for a complete simulation
One-day Meeting, INI, September 26th, 2008 26
Summary
Spectral methods is a very useful algorithm to evaluate the HPC system.
In this sense, the trend of HPC system architecture is going to worse.
Even if peak performance of the system is so high…We cannot expect high sustained performance.It may take a long time to finish a simulation due to very slow data transfer between nodes.
Can we discard spectral methods and change the algorithm? Or, we have to
put strong pressure on computer architecture community, andthink of any international collaboration for developing the supercomputer system which fit the turbulent study.
I would think of a HPC system as a particle accelerator like CERN.