isc 2018, highlights of frankfurt, the 51st
TRANSCRIPT
Highlights of the 51st TOP500 List
ISC 2018, Frankfurt,
June 25, 2018
Erich Strohmaier
ISC18 TOP500 TOPICS
• New #1 • New TOP5 • Slow-down, aging, and concentration • 25 years of systems and sites • China, a new twist • Industry and GPUs • HPCG petaflops
41ST LIST: THE TOP10 # Site Manufacturer Computer Country Cores Rmax
[Pflops] Power [MW]
1 Oak Ridge National Laboratory IBM
Summit IBM Power System,
P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 USA 2,282,544 122.3 8.8
2 National Supercomputing Center in Wuxi NRCPC
Sunway TaihuLight NRCPC Sunway SW26010,
260C 1.45GHz China 10,649,600 93.0 15.4
3 Lawrence Livermore National Laboratory IBM
Sierra IBM Power System,
P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 USA 1,572,480 71.6
4 National University of Defense Technology NUDT
Tianhe-2A ANUDT TH-IVB-FEP,
Xeon 12C 2.2GHz, Matrix-2000 China 4,981,760 61.4 18.5
5 National Institute of Advanced Industrial Science and Technology Fujitsu
AI Bridging Cloud Infrastructure (ABCI) PRIMERGY CX2550 M4,
Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Japan 391,680 19.9 1.65
6 Swiss National Supercomputing Centre (CSCS) Cray
Piz Daint Cray XC50,
Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Switzerland 361,760 19.6 2.27
7 Oak Ridge National Laboratory Cray
Titan Cray XK7,
Opteron 16C 2.2GHz, Gemini, NVIDIA K20x USA 560,640 17.6 8.21
8 Lawrence Livermore National Laboratory IBM
Sequoia BlueGene/Q,
Power BQC 16C 1.6GHz, Custom USA 1,572,864 17.2 7.89
9 Los Alamos NL / Sandia NL Cray
Trinity Cray XC40,
Intel Xeon Phi 7250 68C 1.4GHz, Aries USA 979,968 14.1 3.84
10 Lawrence Berkeley National Laboratory Cray
Cori Cray XC40,
Intel Xeons Phi 7250 68C 1.4 GHz, Aries USA 622,336 14.0 3.94
5
System Performance
• Peak performance of 200 petaflops for modeling & simulation
• Peak of 3.3 ExaOps for data analytics and artificial intelligence
Each node has
• 2 IBM POWER9 processors
• 6 NVIDIA Tesla V100 GPUs • 608 GB of fast memory • 1.6 TB of NVMe memory
The system includes
• 4608 nodes • Dual-rail Mellanox EDR
InfiniBand network • 250 PB IBM Spectrum
Scale file system transferring data at 2.5 TB/s
System Overview
41ST LIST: THE TOP10 # Site Manufacturer Computer Country Cores Rmax
[Pflops] Power [MW]
1 Oak Ridge National Laboratory IBM
Summit IBM Power System,
P9 22C 3.07GHz, Mellanox EDR, NVIDIA GV100 USA 2,282,544 122.3 8.8
2 National Supercomputing Center in Wuxi NRCPC
Sunway TaihuLight NRCPC Sunway SW26010,
260C 1.45GHz China 10,649,600 93.0 15.4
3 Lawrence Livermore National Laboratory IBM
Sierra IBM Power System,
P9 22C 3.1GHz, Mellanox EDR, NVIDIA GV100 USA 1,572,480 71.6
4 National University of Defense Technology NUDT
Tianhe-2A ANUDT TH-IVB-FEP,
Xeon 12C 2.2GHz, Matrix-2000 China 4,981,760 61.4 18.5
5 National Institute of Advanced Industrial Science and Technology Fujitsu
AI Bridging Cloud Infrastructure (ABCI) PRIMERGY CX2550 M4,
Xeon Gold 20C 2.4GHz, IB-EDR, NVIDIA V100 Japan 391,680 19.9 1.65
6 Swiss National Supercomputing Centre (CSCS) Cray
Piz Daint Cray XC50,
Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Switzerland 361,760 19.6 2.27
7 Oak Ridge National Laboratory Cray
Titan Cray XK7,
Opteron 16C 2.2GHz, Gemini, NVIDIA K20x USA 560,640 17.6 8.21
8 Lawrence Livermore National Laboratory IBM
Sequoia BlueGene/Q,
Power BQC 16C 1.6GHz, Custom USA 1,572,864 17.2 7.89
9 Los Alamos NL / Sandia NL Cray
Trinity Cray XC40,
Intel Xeon Phi 7250 68C 1.4GHz, Aries USA 979,968 14.1 3.84
10 Lawrence Berkeley National Laboratory Cray
Cori Cray XC40,
Intel Xeons Phi 7250 68C 1.4 GHz, Aries USA 622,336 14.0 3.94
AVERAGE SYSTEM AGE
0
5
10
15
20
25Ag
e[M
onths]
7.6month
0
20
40
60
80
100
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
RANK AT WHICH HALF OF TOTAL PERFORMANCE IS ACCUMULATED
PERFORMANCE DEVELOPMENT
1.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+081.00E+091.00E+10
1994199619982000200220042006200820102012201420162018
June2008
June2013
SUM
N=1
N=50059.7GFlop/s
422MFlop/s
1.17TFlop/s
122PFlop/s
716TFlop/s
1.21EFlop/s
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s
1 Eflop/s
10 Eflop/s
ANNUAL PERFORMANCE INCREASE OF THE TOP500
11.21.41.61.82
2.22.42.6
1994199619982000200220042006200820102012201420162018
Moore’sLaw
TOP500
TOP500:Averages1000x11y
15y20y
How do we integrate/aggregate over time/editions? • System counts are focused on the low end • Moore’s Law overpowers everything performance based • Normalize each list by average performance
• HPL not Peak • Average performance not max (#1) or min
• Add up contributions from various lists over time • Each full lists contributed a total of 500 • 51 lists together have a total weight of 25,500
25 Years – 51 Editions
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1995 2000 2005 2010 2015
SummitSunwayTaihuLightTianhe-2TitanSequoiaKcomputerTianhe-1AJaguarRoadrunnerBlueGene/LEarth-SimulatorASCIWhiteASCIRedCP-PACS/2048SR2201/1024XP/S140NumericalWindTunnelCM-5/1024
No. 1’s – HPL R_max
1 Tflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
100 Pflop/s
10 Pflop/s
1 Eflop/s
050
100150200250300350400450500
1995 2000 2005 2010 2015
SummitSunwayTaihuLightTianhe-2TitanSequoiaKcomputerTianhe-1AJaguarRoadrunnerBlueGene/LEarth-SimulatorASCIWhiteASCIRedCP-PACS/2048SR2201/1024XP/S140NumericalWindTunnelCM-5/1024
# 1’s – Accumulated Norm-HPL
# Site Country Norm-HPL
1 LLNL USA 1,504
2 LANL USA 816
3 ORNL USA 795
4 SNL USA 658
5 NSCCGuangzhou China 474
6 RIKENAICS Japan 361
7 NASA/Ames USA 356
8 FZJülich Germany 339
9 JAMSTEC Japan 325
10 NERSC/LBNL USA 310
11 ANL USA 305
Dominant Sites
0
100
200
300
400
5001993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
China
Korea,South
Italy
Canada
France
UnitedKingdom
Germany
Japan
UnitedStates
COUNTRIES
UnitedStates,25%
China,41%
Japan,7%
UnitedKindom,5%
Germany,4%France,4%Netherlands,2%
Ireland,1%SouthKorea,1% Others,10%
UnitedStates
China
Japan
UnitedKindom
Germany
France
Netherlands
Ireland
COUNTRIES / SYSTEM SHARE
0
100
200
300
400
5001993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
Russia
China
Europe
Japan
USA
PRODUCERS
Lenovo,117,23%
HPE,79,16%
Inspur,68,13%Sugon,55,11%
CrayInc.,53,10%
Bull,21,4%IBM,18,4%
Huawei,14,3%DellEMC,13,3%
Fujitsu,13,3%
PenguinCompu`ng,11,2%
Others,38,8%Lenovo
HPE
Inspur
Sugon
CrayInc.
Bull
IBM
Huawei
VENDORS / SYSTEM SHARE
#ofsystems,%of500
Lenovo,143,12%
HPE,120,10%
Inspur,73,6%
Sugon,53,4%
CrayInc.,188,16%Bull,52,4%
IBM,239,20%
Huawei,12,1%
DellEMC,26,2%
Fujitsu,64,5%
PenguinCompu`ng,15,1%
NRCPC,94,8%
others,132,11%Lenovo
HPE
Inspur
Sugon
CrayInc.
Bull
IBM
Huawei
VENDORS / PERFORMANCE SHARE
SumofPflop/s,%ofwholelist
0102030405060708090
100110120130
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
System
s
Matrix-2000
PEZY-SC
Kepler/Phi
XeonPhiMain
IntelXeonPhi
Clearspeed
IBMCell
ATIRadeon
NvidiaVolta
NvidiaPascal
NvidiaKepler
NvidiaFermi
ACCELERATORS
PERFORMANCE SHARE OF ACCELERATORS
0%
10%
20%
30%
40%
50%
60%
2006200720082009201020112012201320142015201620172018
FracYo
nofTotalTOP5
00
Performan
ce XeonPhiMain
Accelerators
Computer Rmax/Power
Shoubou system B, ZettaScaler-2.2 Xeon16C1.3GHz InfinibandEDR PEZY-SC2 18.4
Suiren2, ZettaScaler-2.2 Xeon16C1.3GHz InfinibandEDR PEZY-SC2 16.8
Sakura, ZettaScaler-2.2 Xeon8C2.3GHz InfinibandEDR PEZY-SC2 16.7
DGX Saturn V, NVIDIA DGX-1 Volta36 Xeon20C2.2GHz InfinibandEDR TeslaV100 15.1*
Summit, IBM Power System Power922C3.07GHzInfinibandEDR VoltaGV100 13.9
Tsubame 3.0, SGI ICE XA Xeon14C2.4GHz IntelOmni-Path TeslaP100SXM2 13.7*
AIST AI Cloud, NEC 4U-8GPU Xeon10C1.8GHz InfinibandEDR TeslaP100SXM2 12.7 AI Bridging Cloud Infrastructure (ABCI), Fujitsu PRIMERGY, NVIDIA Tesla V100
XeonGold20C2.4GHz InfinibandEDR TeslaV100SXM2 12.1
MareNostrum P9 CTE, IBM Power System Power922C3.1GHz InfinibandEDR TeslaV100 11.9
Wilkes-2, Dell C4130 Xeon12C2.2GHz InfinibandEDR TeslaP100 10.4
MOST ENERGY EFFICIENT ARCHITECTURES
[Gflops/Wa[]*EfficiencybasedonPowerop`mizedHPLrunsofequalsizetoTOP500run.
POWER EFFICIENCY
0
1
2
3
4
5
6
7
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Linp
ack/Po
wer[Gflo
ps/W
]
TOP10
TOP50
TOP500
ENERGY EFFICIENCY
02468101214161820
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Linp
ack/Po
wer[Gflo
ps/W
]
TOP500Average
Max-Efficiency
BlueGene/QCell
MicAMDFirePro TsubameKFC
NVIDIAK20x–K80
Ze[aScaler-1.6cDGXSaturnV
Tsubame3.0
Ze[aScaler-2.2
41ST LIST: THE TOP10 # T Site Manufacturer Computer Country HPCG
[Pflop/s] Rmax [Pflop/s]
HPCG/ Peak
HPCG/ HPL
1 1 Oak Ridge National Laboratory IBM Summit
IBM Power System, P9 22C 3.07 GHz, Volta GV100, EDR
USA 2.9258 122.3 1.6% 2.4%
2 3 Lawrence Livermore National Laboratory IBM
Sierra IBM Power System,
P9 22C 3.1 GHz, Volta GV100, EDR USA 1.7957 71.6 1.5% 2.5%
3 16 RIKEN Advanced Institute for Computational Science Fujitsu
K Computer SPARC64 VIIIfx 2.0GHz,
Tofu Interconnect Japan 0.6027 10.5 5.3% 5.7%
4 9 Los Alamos NL / Sandia NL Cray
Trinity Cray XC40,
Intel Xeon Phi 7250 68C 1.4GHz, Aries USA 0.5461 14.1 1.2% 3.9%
5 6 Swiss National Supercomputing Centre (CSCS) Cray
Piz Daint Cray XC50,
Xeon E5 12C 2.6GHz, Aries, NVIDIA Tesla P100 Switzerland 0.4864 19.6 1.9% 2.5%
6 2 National Supercomputing Center in Wuxi NRCPC
Sunway TaihuLight NRCPC Sunway SW26010,
260C 1.45GHz China 0.4808 93.0 0.4% 0.5%
7 12 JCAHPC Joint Center for Advanced HPC Fujitsu
Oakforest-PACS PRIMERGY CX1640 M1,
Intel Xeons Phi 7250 68C 1.4 GHz, OmniPath Japan 0.3855 13.6 1.5% 2.8%
8 10 Lawrence Berkeley National Laboratory Cray
Cori Cray XC40,
Intel Xeons Phi 7250 68C 1.4 GHz, Aries USA 0.3554 14.0 1.3% 2.5%
9 14 Commissariat a l'Energie Atomique (CEA) Bull
Tera-1000-2 Bull Sequana X1000,
Intel Xeon Phi 7250 68C 1.4 GHz, Bull BXI 1.2 France 0.3338 12.0 1.4% 2.8%
10 8 Lawrence Livermore National Laboratory IBM
Sequoia BlueGene/Q,
Power BQC 16C 1.6GHz, Custom USA 0.3304 17.2 1.6% 1.9%
ISC18 TOP500 HIGHLIGHTS • ORNL’s Summit is new #1 (IBM, NVIDIA, Mellanox). • Four ‘new’ system in the TOP5! (Summit, Sierra, Tianhe-2A, ABCI) • Slow-down in performance growth since 2013 goes hand in hand with
– Longer system usage (~2x) and – Concentration of capabilities at the top (relatively larger top systems)
• Lenovo is first Chinese manufacturer to sell systems in numbers outside of China (everywhere) (China: 20, USA: 21, ROW: 23).
• Accelerated system get finally adopted by industrial users (25% of new systems in November + June).
• Summit and Sierra are the first systems to achieve over 1 Pflop/s on HPCG.