porting quantum espresso to gpu accelerated systems · ga 676598 european center of excellence - a...
TRANSCRIPT
![Page 1: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/1.jpg)
GA 676598EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018
Porting Quantum ESPRESSO to GPU Accelerated SystemsPietro Bonfà, Fabio Affinito, Carlo Cavazzoni
CINECA, Casalecchio di Reno, Italy
![Page 2: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/2.jpg)
https://www.nvidia.com/en-us/data-center/tesla-k80/
![Page 3: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/3.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
What is QuantumESPRESSO
Porting strategy
Benchmarks
Conclusions
Outlook
![Page 4: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/4.jpg)
What is QuantumESPRESSO
![Page 5: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/5.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
What is QuantumESPRESSO● QUANTUM ESPRESSO is an initiative coordinated by the QUANTUM
ESPRESSO Foundation, with the participation of SISSA, CINECA, ICTP, EPFL, with many partners in Europe and Worldwide.
● QUANTUM ESPRESSO is not a single application for quantum simulations; it is rather a distribution of packages performing different tasks and destined to be interoperable.
● Free as in freedom (GPLv2) and open development.
![Page 6: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/6.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
What is QuantumESPRESSO● Runs from standalone workstation to massively parallel systems.
● Large scientific user base, vehicle for new methods and new algorithms.
○ V6.2.1 → 70400 downloads○ >50 contributors○ 1600+ registered users○ ~ 500k lines, Fortran (& C)
● Simplify transition of new science to HPC systems.
$ ./configure && make all
Posts/month in ML
![Page 7: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/7.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
What is QuantumESPRESSO
![Page 8: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/8.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE LibrariesSome of the time consuming workloads of many packages are already encapsulated in a number of libraries, namely
LAXLib FFTXlib KS_Solvers
FFTW, MKL, ESSL, ...
![Page 9: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/9.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Clues from profilingPWscf (CPU version) running on a single KNL node with 64 MPI processes
(best time to solution).
![Page 10: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/10.jpg)
Porting strategy
![Page 11: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/11.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Past and present QE GPU portsPorting effort carried out by MaX and supported by NVIDIA.
CUDA C based plugin for QE 5.x (pw.x) developed by F. Spiga and I. Girotto.
2012
2013
2014
2015
2016
2017
2018 Independent CUDA Fortran based port of QE 6.1 (pw.x) developed by F. Spiga and NVIDIA. Provides best performance, most used features implemented.
![Page 12: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/12.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v5.4: CUDA C Plugin✓✓ Self contained
● BLAS → PHIGEMM● LAPACK→ MAGMA● 3 CUDA C kernels + cuFFT
![Page 13: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/13.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v5.4: CUDA C Plugin✓✓ Self contained
✓ Good performance
F. Spiga: http://www.tcm.phy.cam.ac.uk/~mdt26/esdg_slides/spiga_may13.pdf
![Page 14: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/14.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v5.4: CUDA C Plugin✓✓ Self contained
✓ Good performance
✗ Boilerplate code InterfaceKernel
![Page 15: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/15.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v6.1: CUDA Fortran✓ Single programming language: Fortran + CUDA Fortran
● BLAS → cuBLAS● LAPACK→ Custom GPU Eigensolver (outperforms MAGMA)● CUF Kernel directives and CUDA Fortran kernels
![Page 16: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/16.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v6.1: CUDA Fortran✓ Single programming language: Fortran + CUDA Fortran
✓✓ Very good performance
For a detailed description of the code and the benchmarks see: http://www.dcs.warwick.ac.uk/pmbs/pmbs17/PMBS17/
![Page 17: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/17.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
QE v6.1: CUDA Fortran✓ Single programming language: Fortran + CUDA Fortran
✓✓ Very good performance
✗ Diverged from master branch
✗ Only selected features implemented
![Page 18: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/18.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
New Porting StrategyLanguage: CUDA Fortran, leverage on existing v6.1 code.
Programming model: explicit and directive based.
Plan:
1. Preserve modularity.2. Maintain alignment with master branch. Maintain “hackability”.3. Leave user experience intact.4. General GPU architecture solutions.5. Performance, of course.
![Page 19: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/19.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
New Porting Strategy
![Page 20: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/20.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
New Porting Strategy
![Page 21: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/21.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
New Porting StrategyApplication: pw.x
Accelerated, Working, Unavailable, Broken
GPU version
Total Energy (K points)
Forces Stress Collinear Magnetism
Non-collinear magnetism
Gamma trick
US PP PAW PP DFT+U All other functionalities
v5.4 A W W B (?) U A A ? W (?) W (?)
v6.1 A A A A U W (*) A A (*) U U (*)
v6.3 A W W A A A A A (*) W W
![Page 22: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/22.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
New Porting Strategy
Libraries Global Variables
Memory Allocation
![Page 23: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/23.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Libraries● Full API support:
● Unit testing:
● Target best performance: CUDA Fortran, explicit CUDA API (concurrency, hardware specific options).
![Page 24: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/24.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Libraries - FFTXlib● Many small 3D FFTs (101 → 103)
![Page 25: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/25.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Libraries - FFTXlib● Many small 3D FFTs (101 → 103)● Overlap of communication and computation
![Page 26: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/26.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Libraries - FFTXlib● Many small 3D FFTs (101 → 103)● Overlap of communication and computation● Batched work
# bands times
![Page 27: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/27.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Libraries - FFTXlib● Many small 3D FFTs (101 → 103)● Overlap of communication and computation● Batched work
4 bands 1D FFT
4 bands 1D FFT
Scatter
Scatter
8 ba
nds
Alltoall
4 bands 2D FFTAlltoall
4 bands 2D FFT
![Page 28: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/28.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Home-brewed managed memory:
1. Prioritize data encapsulation efforts.2. Enforce a simple and effective update scheme for global variables.3. Can provide asynchronous updates (not implemented yet).4. General data duplication scheme.5. Saves performance on old hardware.
Global Variables
USE us, ONLY : nqx, dq, spline_psUSE us_gpum, ONLY : tab_d, tab_d2y_d!implicit none!if (lmaxkb.lt.0) returncall start_clock ('init_us_2')
call using_tab_d(READ) ! <- sync. hereif (spline_ps) call using_tab_d2y_d(READWRITE) <-’
![Page 29: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/29.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Memory allocation● pw.x allocates many scratch variables. This impacts substantially the
performance of the accelerated version of the subroutines.● At the same time GPU memory is limited.
USE some_module, ONLY : work!implicit none!IF( ALLOCATED( work ) .and. SIZE( work ) < lwork ) DEALLOCATE( work )IF( .not. ALLOCATED( work ) ) ALLOCATE( work( max_lwork ) )[...]
QE GPU v6.1
![Page 30: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/30.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Memory allocation● pw.x allocates many scratch variables. This impacts substantially the
performance of the accelerated version of the subroutines.● At the same time GPU memory is limited.
USE some_module, ONLY : work!implicit none!IF( ALLOCATED( work ) .and. SIZE( work ) < lwork ) DEALLOCATE( work )IF( .not. ALLOCATED( work ) ) ALLOCATE( work( max_lwork ) )[...]
USE buffer_module,ONLY : gpu_buffer!implicit none!REAL, POINTER :: work(:)gpu_buffer%lock_buffer(work, 10, ierr)[...]gpu_buffer%release_buffer(work, ierr)
QE GPU v6.3
![Page 31: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/31.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
✓ Self contained ✓ Single programming language: Fortran + CUDA Fortran✓ Aligned with official develop branch❓Performance...
Recap
Libraries
Global Variables
Memory Allocation
![Page 32: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/32.jpg)
Benchmarks
![Page 33: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/33.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Benchmark systemsCompute units
Piz Daint XC50 @ CSCS:Model: Xeon E5-2690 v3 (HSW) @ 2.60 GHzCores: 1x12 = 12Accelerators: 1 x P100RAM: 64 GB/node
Galileo @ CINECAModel: Xeon E5-2630 v3 (HSW) @ 2.40 GHzCores: 2x8 = 16Accelerators: 2 x K80RAM: 128 GB/node
Marconi @ CINECAModel: Xeon E5-2697 v4 (BDW) @ 2.30 GHzCores: 2x18 = 36 RAM: 128 GB/node
Q3 20161.3 TFLOPs
Q1 20150.6 + 2x2.9 TFLOPs
Q4 20160.5 + 4.7 TFLOPs
![Page 34: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/34.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Benchmark systemsCompute units
Piz Daint XC50 @ CSCS:Aries routing and communications ASIC, and Dragonfly network topology.
Galileo @ CINECAInfiniband network, with OFED v1.5.3, capable of a maximum bandwidth of 40Gbit/s between each pair of nodes.
Marconi @ CINECAIntel Omnipath, 100 Gb/s. Fat Tree OPA(2:1 oversubscription tapering at the level of the core switches only)
Q3 20161.3 TFLOPs
Q1 20150.6 + 2x2.7 TFLOPs
Q4 20160.5 + 4.7 TFLOPs
GPU
CPU NIC
GPU
CPU NICCPU
GPU
![Page 35: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/35.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
● Total time for the iterative solution of the KS equation is compared for the CPU and the GPU versions of pw.x.
● Best time to solution per compute unit(s) is reported.
● Optimal execution parameters for v6.1 and v6.3 may differ.
Benchmark details
Initialization
Iterations for electronic ground state
Forces and Stress
pw.x
Structural optimization
![Page 36: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/36.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
C70Very small test case, gamma trick.
number of atoms/cell = 280number of atomic types = 1number of electrons = 1120number of Kohn-Sham states = 672kinetic-energy cutoff = 45 Rycharge density cutoff = 450 Ryconvergence threshold = 1.0E-08
Dense grid: 1685364 G-vectors FFT dimensions: ( 225, 128, 240) Smooth grid: 426442 G-vectors FFT dimensions: ( 144, 81, 150)
Iterations to reach convergence: 16
![Page 37: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/37.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
C70Very small test case, gamma trick.
1. Speedup GPU vs CPU ~ 1.5x2. v6.1 is missing gamma trick
( vs )
![Page 38: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/38.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
C70Very small test case, gamma trick.
1. Speedup GPU vs CPU ~ 1.5x2. v6.1 is missing gamma trick
( vs )3. CPU version scales better
![Page 39: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/39.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
C70Very small test case, gamma trick.
1. Speedup GPU vs CPU ~ 1.5x2. v6.1 is missing gamma trick
( vs )3. CPU version scales better4. At saturation GPU still faster
![Page 40: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/40.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
Iterations to reach convergence: 21±1
number of atoms/cell = 112number of atomic types = 1number of electrons = 1232number of Kohn-Sham states = 800kinetic-energy cutoff = 25 Rycharge density cutoff = 200 Ryconvergence threshold = 1.0E-06
Dense grid: 2158381 G-vectors FFT dimensions: ( 180, 90, 288)Smooth grid: 763307 G-vectors FFT dimensions: ( 125, 64, 200)
![Page 41: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/41.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)
~
![Page 42: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/42.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.
![Page 43: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/43.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 44: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/44.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 45: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/45.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 46: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/46.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
AuSurfSmall test case, 2 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 47: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/47.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
Iterations to reach convergence: [45, 49, 50, 51, 52]
number of atoms/cell = 96number of atomic types = 2number of electrons = 544number of Kohn-Sham states = 326kinetic-energy cutoff = 130 Rycharge density cutoff = 520 Ryconvergence threshold = 1.0E-08
Dense grid: 3645397 G-vectors FFT dimensions: ( 200, 180, 216)
![Page 48: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/48.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU ≳ 2x2. v6.1 allocates more memory
(but vs in this case)
![Page 49: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/49.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.
![Page 50: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/50.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.
![Page 51: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/51.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 52: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/52.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 53: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/53.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Ta2O5Large test case, 26 k-points.
1. Speedup GPU vs CPU > 2x2. v6.1 allocates more memory
(but vs in this case)3. CPU and GPU versions both
scaling well.4. v6.3 on GPUs is significantly
slower than v6.1.
![Page 54: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/54.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Porting statusQE 6.3 GPU is:
✓ aligned with develop branch of community, ✓ passes all 186 tests of the feature testing suite,✓ undergoing integration with the main project,✓ provides good performance, generally better than 2x (far from saturation),✓ ready for alpha release.
![Page 55: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/55.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Porting statusQE 6.3 GPU is:
✓ aligned with develop branch of community, ✓ passes all 186 tests of the feature testing suite,✓ undergoing integration with the main project,✓ provides good performance, generally better than 2x (far from saturation),✓ ready for alpha release.
Collaboration and support from: J. Romero, M. Marić, M. Fatica, E. Phillips (NVIDIA)F. Spiga (ARM), A. Chandran (FZJ), I. Girotto (ICTP), P. Giannozzi (Univ. Udine), P. Delugas, S. De Gironcoli (SISSA).
![Page 56: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/56.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Conclusions● Preserved modularity
○ For code maintainability○ For simpler development and debugging
● Preserved all functionalities○ Same user experience○ Various level of acceleration for the
various functionalities
● Preserved (promote?) data encapsulation
![Page 57: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/57.jpg)
(from www.nvidia.com/en-us/data-center/tesla-k80 )
![Page 58: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/58.jpg)
(modified from www.nvidia.com/en-us/data-center/tesla-k80 )
![Page 59: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/59.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Outlook and perspectives● Investigate performance degradation from v6.1 to v6.3
○ How much is coming from missing components?○ Impact of directive based programming model?
● More benchmarking on different HW combinations.
● More code validation, initialization and forces ported to CUDA Fortran.
● Prepare first alpha release.
![Page 60: Porting Quantum ESPRESSO to GPU Accelerated Systems · GA 676598 EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE GTC2018 Porting Quantum ESPRESSO to GPU Accelerated Systems](https://reader033.vdocuments.mx/reader033/viewer/2022050109/5f4706b926972c42be4be846/html5/thumbnails/60.jpg)
EUROPEAN CENTER OF EXCELLENCE - A H2020 E-INFRASTRUCTURE
Outlook and perspectives● Investigate performance degradation from v6.1 to v6.3
○ How much is coming from missing components?○ Impact of directive based programming model?
● More benchmarking on different HW combinations.
● More code validation, initialization and forces ported to CUDA Fortran.
● Prepare first alpha release.THANK YOU FOR YOUR ATTENTION!
Credits: icons made by freepik from flaticon