april 4-7, 2016 | silicon valley enabling the electronic ......4 gaussian a computational chemistry...
TRANSCRIPT
![Page 1: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/1.jpg)
April 4-7, 2016 | Silicon Valley
Roberto Gomperts (NVIDIA), Michael Frisch (Gaussian, Inc.), Giovanni Scalmani (Gaussian, Inc.), Brent Leback (NVIDIA/PGI)
ENABLING THE ELECTRONIC STRUCTURE PROGRAM GAUSSIAN ON GPGPUS USING OPENACC
![Page 2: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/2.jpg)
2
PREVIOUSLY Earlier Presentations
GRC Poster 2012
ACS Spring 2014
GTC Spring 2014 ( recording at http://on-demand.gputechconf.com/gtc/2014/video/S4613-enabling-gaussian-09-gpgpus.mp4 )
WATOC Fall 2014
![Page 3: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/3.jpg)
3
TOPICS
Gaussian: Design Guidelines, Parallelism and Memory Model
Implementation: Top-Down/Bottom-Up
OpenACC: Extensions, Hints & Tricks
Early Performance
Closing Remarks
![Page 4: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/4.jpg)
4
GAUSSIAN
A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling
Gaussian 09 is licensed for a wide variety of computer systems
All versions of Gaussian 09 contain virtually every scientific/modeling feature, and none imposes any artificial limitations on calculations other than computational resources and time constraints
Researchers use Gaussian to, among others, study molecules and reactions; predict and interpret spectra; explore thermochemistry, photochemistry and other excited states; include solvent effects, and many more
4/1/2016
![Page 5: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/5.jpg)
5
DESIGN GUIDELINES
General
Establish a Framework for the GPU-enabling of Gaussian
Code Maintainability (Code Unification)
Leverage Existing code/algorithms, including Parallelism and Memory Model
Simplifies Resolving Problems
Simplifies Improvement on existing code
Simplifies Adding New Code
4/1/2016
![Page 6: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/6.jpg)
6
DESIGN GUIDELINES
Accelerate Gaussian for Relevant and Appropriate Theories and Methods
Relevant: many users of Gaussian
Appropriate: time consuming and good mapping to GPUs
Resource Utilization
Ensure efficient use of all available Computational Resources
CPU cores and memory
Available GPUs and memory
4/1/2016
![Page 7: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/7.jpg)
7
CURRENT STATUS Single Node
Implemented
Energies for Closed and Open Shell HF and DFT (less than a handful of XC-functionals missing)
First derivatives for the same as above
Second derivatives for the same as above
Using only
OpenACC
CUDA library calls (BLAS)
4/1/2016
![Page 8: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/8.jpg)
8
IMPLEMENTATION MODEL Application Code
+
GPU CPU Small Fraction of the Code
Large Fraction of Execution
time
Compute-Intensive Functions
Rest of Sequential CPU Code
![Page 9: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/9.jpg)
9
GAUSSIAN PARALLELISM MODEL
CPU Cluster
OpenMP
CPU Node
GPU
OpenACC
![Page 10: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/10.jpg)
10
GAUSSIAN: MEMORY MODEL
CPU Cluster
OpenMP
CPU Node
GPU
OpenACC
![Page 11: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/11.jpg)
11
TREES IN THE FOREST OpenMP Parallel Region
Static Call Tree
Integrals Generation
Integrals “Digestion”
PRISMC ACLEAR
ARRMAX AUNITM
ACLEAR BRADRV
C2PGEN CHEKS4
CLMLPT COPRIM
DIGJE DOIRT1
DOIRT2 DOIRT3
DOIRT4 DOIRTR
DOIRTS GGEMV
DODYP DOPRIN
DOTSTM FNDHSC
FNDHSU GENCIT
GENSCL GETCWU
GLINCO IARMAX
INDSE INTMEM
INTPWP ISALG
KETDRV LBIT
LCOUNT LDCNVR
MAKZON MAXINT
MDVLEN PCKC4A
CALERF GLINCO
LOADC4 LOADDC
KPBCDC MAXINT
MDVLEN PC4TSJ
CULL2 PC4TSX
CALERF CNCALC
CULL1 PCKBXT
CULL2 PCKCEN
CULL1 PCKDEN
PCKFRG PCKRAN
IGFIX VPCKRN
CULL1 PETIIJ
PICKC4 GLINCO
IGFIX LOADC4
LOADDC PCKFRG
PICKS4 PRMDIG
PRMROW AMOVE
PRMZON PROFFZ
PRSMAR PRSMSE
SEHAM STEPTT
SUMRF PRLIN1
AMOVE GLINB
GLININ GLINLO
GLINLP GLININ
INTPWP GLINOP
GLINRE IMOVE
LBIT LCLEAR
PRLIN2 GLIN2D
GLININ GLINLO
GLINRE PRMRAF
ACLEAR AMOVE
AUNITM BRADRV
C2PGEN C4PRAF
CHEKS4 COPRIM
DIGRAF BOGGRI
DGSA03 DGSA06
DGSA13 DGSA16
DGST01 DGST02
DG0201 DG0202
DG0203 DG0206
DG0210 DG0220
DG0230 DG0233
DG0260 DG0266
DGST03 DGST06
DGST0X DGST11
DGST12 DG1211
DG1222 DG1233
DG1266 DGST16
DGST1X DGST24
DGSTAS LODRGO
MAKER3 NFUNSH
RSSHL STOR2Y
STORR1 RSSHL
STR2R1 STORR2
RSSHL STR2R2
STORR3 RSSHL
STR2R3 DODYP
DOPRIN DOTSTM
GENCIT GETCWU
GLINCO IARMAX
ICLEAR INDC
ISALG KETDRV
LDCNVR MAKZON
PETIIJ PICKR4
CHKSYM GLINCO
PRLOAD PRMPTH
CHOOSE PTHINF
PRMRAL IFRPOS
INTOWP ITRPOS
LNK1E MDCACH
MDVLEN PRSMDI
PRMROW PRMZON
PRRDRV RFBRKT
RFCONT STEPTT
![Page 12: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/12.jpg)
12
APPROACH Data and Compute Regions Management
Data Management Top Down:
Create and initialize as appropriate large data region on device
Peruse the device memory as Compute Regions are enabled
Compute Bottom Up:
Create as many Accelerator Routines as possible
Incrementally add Compute Regions driving the Accelerator Routines
Incrementally add Routines with own Compute Regions
4/1/2016
![Page 13: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/13.jpg)
13
OPENACC OpenMP Parallel Region
OpenACC directives at the “leaves”
Device Memory Management at highest possible level
OpenACC directives at the “leaves”
Device Memory Management at highest possible level
Move Directives up the calling tree
PRISMC ACLEAR
ARRMAX AUNITM
ACLEAR BRADRV
C2PGEN CHEKS4
CLMLPT COPRIM
DIGJE DOIRT1
DOIRT2 DOIRT3
DOIRT4 DOIRTR
DOIRTS GGEMV
DODYP DOPRIN
DOTSTM FNDHSC
FNDHSU GENCIT
GENSCL GETCWU
GLINCO IARMAX
INDSE INTMEM
INTPWP ISALG
KETDRV LBIT
LCOUNT LDCNVR
MAKZON MAXINT
MDVLEN PCKC4A
CALERF GLINCO
LOADC4 LOADDC
KPBCDC MAXINT
MDVLEN PC4TSJ
CULL2 PC4TSX
CALERF CNCALC
CULL1 PCKBXT
CULL2 PCKCEN
CULL1 PCKDEN
PCKFRG PCKRAN
IGFIX VPCKRN
CULL1 PETIIJ
PICKC4 GLINCO
IGFIX LOADC4
LOADDC PCKFRG
PICKS4 PRMDIG
PRMROW AMOVE
PRMZON PROFFZ
PRSMAR PRSMSE
SEHAM STEPTT
SUMRF PRLIN1
AMOVE GLINB
GLININ GLINLO
GLINLP GLININ
INTPWP GLINOP
GLINRE IMOVE
LBIT LCLEAR
PRLIN2 GLIN2D
GLININ GLINLO
GLINRE PRMRAF
ACLEAR AMOVE
AUNITM BRADRV
C2PGEN C4PRAF
CHEKS4 COPRIM
DIGRAF BOGGRI
DGSA03 DGSA06
DGSA13 DGSA16
DGST01 DGST02
DG0201 DG0202
DG0203 DG0206
DG0210 DG0220
DG0230 DG0233
DG0260 DG0266
DGST03 DGST06
DGST0X DGST11
DGST12 DG1211
DG1222 DG1233
DG1266 DGST16
DGST1X DGST24
DGSTAS LODRGO
MAKER3 NFUNSH
RSSHL STOR2Y
STORR1 RSSHL
STR2R1 STORR2
RSSHL STR2R2
STORR3 RSSHL
STR2R3 DODYP
DOPRIN DOTSTM
GENCIT GETCWU
GLINCO IARMAX
ICLEAR INDC
ISALG KETDRV
LDCNVR MAKZON
PETIIJ PICKR4
CHKSYM GLINCO
PRLOAD PRMPTH
CHOOSE PTHINF
PRMRAL IFRPOS
INTOWP ITRPOS
LNK1E MDCACH
MDVLEN PRSMDI
PRMROW PRMZON
PRRDRV RFBRKT
RFCONT STEPTT
![Page 14: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/14.jpg)
14
OPENACC PGI Extensions
Data Directives
NoCreate (Also in Compute Directives)
Compute Directives
Gang levels: (Dim:1,2,3)
Collapse (Force:N)
4/1/2016
![Page 15: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/15.jpg)
15
OPENACC EXTENSIONS NoCreate
Subroutine DoSomething(NTT,FA,FB,…) Real*8 FA(*), FB(*) C$ACC Parallel If(OnGPU) C$ACC+ Present(FA) NoCreate(FB) C$ACC Loop Gang Vector Do I = 1,NTT FA(i) = expression If(OpenShell) then FB(i) = expression EndIf EndDo C$ACC End Parallel
FA is always allocated On Device
FB only allocated if OpenShell=.T.
Note: The If(OnGPU) clause requires compilation with –ta=host,nvidia
![Page 16: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/16.jpg)
16
OPENACC EXTENSIONS Collapse Force
Subroutine DoSomething(N,NV,NU,IVIND,IUIND,W…) Integer IVIND(*),IUIND(*) Real*8 W(N,*,*) C$ACC Routine(DoWork) Vector C$ACC Parallel If(OnGPU) Present(IVIND,IUIND,W) C$ACC Loop Gang Collapse(Force:2) Do IV = 1,NV IVD = IVIND(IV) Do IU = 1,NU IUD = IUIND(IU) Call DoWork(N,W(1,IUD,IVD)) EnDo EndDo C$ACC End Parallel
IVD statement Prevents Regular Collapse
![Page 17: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/17.jpg)
17
OPENACC EXTENSIONS Multiple Dimension Gangs
Subroutine DoSomething(N,NV,NU,W…) Real*8 W(N,*,*) C$ACC Routine(DoWork) Gang C$ACC Parallel If(OnGPU) Present(IVIND,IUIND,W) C$ACC+ Num_Gangs(nGng1,nGng2,nGng3) Vector_Length(lvGPU) C$ACC Loop Gang(Dim:3) Do IV = 1,NV C$ACC Loop Gang(Dim:2) Do IU = 1,NU Call DoWork(N,W(1,IU,IV)) EnDo EndDo C$ACC End Parallel
DoWork contains a Gang Vector Loop
![Page 18: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/18.jpg)
18
HINTS & TRICKS Out of OpenACC Scope & Bind
Bind clause
A routine that can be called in different contexts:
Called with its own OpenACC compute region
Called outside an OpenACC scope within an OpenACC region but in different modes: Gang-Vector, Vector, Seq
4/1/2016
![Page 19: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/19.jpg)
19
HINTS & TRICKS Bottom-Up approach
Subroutine DoSomething(N,A,…) Real*8 A(*) C$ACC Routine(aClear) Gang … Call aClear(N,A) … C$ACC Parallel If(OnGPU) Present(A) C$ACC+ Num_Gangs(nGng1) Vector_Length(lvGPU) Call aClear(N,A) C$ACC End Parallel …
aClear clears Host copy
aClear clears Device copy If OnGPU =.T.
![Page 20: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/20.jpg)
20
HINTS & TRICKS Out of Scope or in Gang Compute Region
Subroutine aClear(N,A) Real*8 A(*) Parameter(Zer0=0.0d0) C$ACC Routine Gang C$ACC Loop Gang Vector Do I = 1, N A(I) = Zero EndDo
aClear can be called outside an OpenACC region
![Page 21: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/21.jpg)
21
HINTS & TRICKS Bottom-Up approach
Subroutine DoSomething(N,A,…) Real*8 A(*) C$ACC Routine(aClear) Gang … Call aClrGP(N,A) Call aClear(N,A) … C$ACC Parallel If(OnGPU) Present(A) C$ACC+ Num_Gangs(nGng1) Vector_Length(lvGPU) Call aClear(N,A) C$ACC End Parallel …
aClrGP clears Device copy If OnGPU =.T.
aClear clears Host copy
aClear clears Device copy If OnGPU =.T.
![Page 22: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/22.jpg)
22
HINTS & TRICKS Inside or outside OpenACC scope
Subroutine aClrGP(N,A) Real*8 A(*) Parameter(Zer0=0.0d0) C$ACC Kernels Loop If(OnGPU) Present(A) C$ACC Loop Gang Vector Do I = 1, N A(I) = Zero EndDo
Clears Device copy of A if OnGPU = .T.
Otherwise Host copy
![Page 23: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/23.jpg)
23
HINTS & TRICKS Multiple ARs in Compute Region
Subroutine DoSomething(N,A,B,…) Real*8 A(3),B(*) C$ACC Routine(aClear) Gang C$ACC Routine(Use_A) Gang … C$ACC Parallel If(OnGPU) Present(A,B) C$ACC+ Num_Gangs(nGng1) Vector_Length(lvGPU) Call aClear(3,A) Call Use_A(N,A,B,…) C$ACC End Parallel …
Data Dependency Hazard
A has to be initialized before being used
N>3
![Page 24: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/24.jpg)
24
HINTS & TRICKS Bind
Subroutine DoSomething(N,A,B,…) Real*8 A(3),B(*) C$ACC Routine(aClear) Seq Bind(aClear_s) C$ACC Routine(Use_A) Gang … C$ACC Parallel If(OnGPU) Present(A,B) C$ACC+ Num_Gangs(nGng1) Vector_Length(lvGPU) Call aClear(3,A) Call Use_A(N,A,B,…) C$ACC End Parallel …
Allows to use the same Name for aClear
![Page 25: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/25.jpg)
25
HINTS & TRICKS Bound Seq
Subroutine aClear_s(N,A) Real*8 A(*) Parameter(Zer0=0.0d0) C$ACC Routine Seq C$ACC Loop Seq Do I = 1, N A(I) = Zero EndDo
Seq instead of Gang
![Page 26: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/26.jpg)
26
HINTS & TRICKS Conditional on GPU or Host
Beyond the “if” clause in Compute Regions
Marking a section of code in a Compute Region that should only be executed by the host or the device
ACC_On_Device(ACC_Device_HOST)
ACC_On_Device(ACC_Device_Not_HOST)
PGI’s Implementation
Not exactly like an “#ifdef” macro but close
4/1/2016
![Page 27: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/27.jpg)
27
ACC_ON_DEVICE(…)
Subroutine DoSomething(…) Implicit Real*8(A-H,O-Z) C$ACC Routine Seq C$ACC Routine(DeviceSub) Seq … If(ACC_On_Device(ACC_Device_HOST) then Call HostSub(…) EndIf If(ACC_On_Device(ACC_Device_Not_HOST) then Call DeviceSub(…) EndIf …
HostSub runs only on Host No need for Routine Directive
DeviceSub runs only on Device
![Page 28: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/28.jpg)
28
ADVANCED OPTIMIZATIONS
• It is imperative to use a Dynamic Load Distribution Mechanism
• Already in place for CPU parallelism
• On the fly move work from GPU to Core
• Improve performance in certain places by controlled replication of matrices
4/1/2016
![Page 29: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/29.jpg)
29
PERFORMANCE ASSESMENT Hardware
Processor
Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz (32 cores/2 sockets)
Memory: 256GB
GPU
Tesla K80 (8 GPUs/2 Boards), Boost Clocks: MEM 2505, SM 875
Topology
4/1/2016
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity GPU0 X PIX SOC SOC SOC SOC SOC SOC 0-15 GPU1 PIX X SOC SOC SOC SOC SOC SOC 0-15 GPU2 SOC SOC X PIX PHB PHB PHB PHB 16-31 GPU3 SOC SOC PIX X PHB PHB PHB PHB 16-31 GPU4 SOC SOC PHB PHB X PIX PXB PXB 16-31 GPU5 SOC SOC PHB PHB PIX X PXB PXB 16-31 GPU6 SOC SOC PHB PHB PXB PXB X PIX 16-31 GPU7 SOC SOC PHB PHB PXB PXB PIX X 16-31
![Page 30: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/30.jpg)
30
PERFORMANCE ASSESSMENT
What do we compare?
Runs with 32 cores and 0 GPUs with 32 cores and 8 GPUs
Energies, 1st and 2nd derivatives, Closed and Open Shell
Various basis sets and XC-functionals
All calculations with
High Integral Accuracy (10-12)
Ultra Fine Grid
Molecular Systems
4/1/2016
![Page 31: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/31.jpg)
31
PERFORMANCE ASSESEMENT Molecular Systems
4/1/2016
Valinomycin Open Shell
168 atoms
Force Calculation
Basis set: 6-311+G(2d,p)
Basis Functions: 2646
XC-functional: HSEH1PBE
Charge: +1
Multiplicity: 2
Convergence: 29 cycles
Alanine 25
259 atoms
Energy Calculation
Basis set: cc-pVTZ
Basis Functions: 5690
XC-functional: wB97X-d
Frequency calculation
Basis set: 6-31G*
Basis Functions: 2195
XC-functional: APFD
![Page 32: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/32.jpg)
32
RESULTS
0
100
200
300
400
500
600
700
800
Ala25 E[32/8] Ala25 E[32/0] Val UF[32/8] Val UF[32/0] Ala25 V[32/8] Ala25 V[32/0]
ERI XC Rest
1.39
1.16
1.59
1.48
1.19
1.70
1.22
1.15
1.36
Tota
l Executi
on T
ime (
m)
![Page 33: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/33.jpg)
33
“DILUTION” WITH MANY CORES
1.2
5
1.1
3
1.5
0
1.2
5
2.0
0
1.5
0
3.2
5
2.1
3
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
c/g=1 c/g=2 c/g=3 c/g=4 c/g=6 c/g=8
g/c speed up 2.0 g/c speed up 3.0 g/c speed up 5.0 g/c speed up 10.0
Speed u
p c
Core
s g G
PU
s/c C
ore
s 0 G
PU
s
Ratio Number of Cores/Number of GPUs
![Page 34: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/34.jpg)
34
CLOSING REMARKS
Significant Progress has been made in enabling Gaussian on GPUs with OpenACC
OpenACC is increasingly becoming more versatile
Significant work lies ahead to improve performance
Expand feature set:
PBC, Solvation, MP2, ONIOM, triples-Corrections
![Page 35: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/35.jpg)
35
ACKNOWLEDGEMENTS
Development is taking place with:
Hewlett-Packard (HP) Series SL2500 Servers (Intel® Xeon® E5-2680 v2 (2.8GHz/10-core/25MB/8.0GT-s QPI/115W, DDR3-1866)
NVIDIA® Tesla® GPUs (K40 and later)
PGI Accelerator Compilers (16.x) with OpenACC (2.5 standard)
4/1/2016
![Page 36: April 4-7, 2016 | Silicon Valley ENABLING THE ELECTRONIC ......4 GAUSSIAN A Computational Chemistry Package that provides state-of-the-art capabilities for electronic structure modeling](https://reader033.vdocuments.mx/reader033/viewer/2022061000/60afe35459d935569b503886/html5/thumbnails/36.jpg)
April 4-7, 2016 | Silicon Valley
THANK YOU
JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join