vasp: running on hpc resourcesto the presently considered subspace: • rayleigh-ritz optimization...

38
University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria VASP: running on HPC resources

Upload: others

Post on 26-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

UniversityofVienna,FacultyofPhysicsandCenterforComputationalMaterialsScience,

Vienna,Austria

VASP:runningonHPCresources

Page 2: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

TheMany-BodySchrödingerequation

H (r1, ..., rN ) = E (r1, ..., rN )0

@�1

2

X

i

�i +X

i

V (ri) +X

i 6=j

1

|ri � rj |

1

A (r1, ..., rN ) = E (r1, ..., rN )

(r1, ..., rN ) ! { 1(r), 2(r), ..., N (r)}

Forinstance,many-bodyWFstoragedemandsareprohibitive:

Asolution:maponto“one-electron”theory:

5electronsona10×10×10grid~10PetaBytes !

(r1, ..., rN ) (#grid points)N

Page 3: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Hohenberg-Kohn-ShamDFT

E[⇢] = Ts[{ i[⇢]}] + EH [⇢] + Exc[⇢] + EZ [⇢] + U [Z]

(r1, ..., rN ) =NY

i

i(ri)

⇢(r) =NX

i

| i(r)|2

⇣�1

2�+ VZ(r) + VH [⇢](r) + Vxc[⇢](r)

⌘ i(r) = ✏i i(r)

(r1, ..., rN ) ! { 1(r), 2(r), ..., N (r)}

Maponto“one-electron”theory:

Totalenergyisafunctionalofthedensity:

Thedensityiscomputedusingtheone-electronorbitals:

Theone-electronorbitalsarethesolutionsoftheKohn-Shamequation:

BUT:Exc[⇢] =??? Vxc[⇢](r) =???

Page 4: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Exchange-Correlation

Exc[⇢] =??? Vxc[⇢](r) =???

• Exchange-Correlationfunctionals aremodeledontheuniform-electron-gas(UEG):Thecorrelationenergy(andpotential)hasbeencalculatedbymeansofMonte-Carlomethodsforawiderangeofdensities,andhasbeenparametrized toyieldadensityfunctional.

• LDA:wesimplypretendthataninhomogeneouselectronicdensitylocallybehaveslikeahomogeneouselectrongas.

• Many,many,manydifferentfunctionals available:LDA,GGA,meta-GGA,van-der-Waalsfunctionals,etc etc

Page 5: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

AnN-electronsystem:N=O(1023)

N ⇥ (#grid points)(#grid points)N

(r1, ..., rN ) ! { 1(r), 2(r), ..., N (r)}

Hohenberg-Kohn-ShamDFTtakesusalongway:

Niceforatomsandmolecules,butinarealisticpieceofsolidstatematerialN=O(1023)!

Page 6: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Translationalinvariance:PeriodicBoundaryConditions

nk(r+R) = nk(r)eikR

nk(r) = unk(r)eikr

unk(r+R) = unk(r)

Translationalinvarianceimplies:

and

AllstatesarelabeledbyBlochvector k andthebandindex n:

• TheBlochvectork isusuallyconstrainedtoliewithinthefirstBrillouin zoneofthereciprocalspacelattice.

• Thebandindexnisoftheorderifthenumberofelectronsperunitcell.

Page 7: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Reciprocalspace&thefirstBrillouin zone

b2b1

b3

z

yx

C

b2 b1

b3

z

yx

B

4pi/L

a2a1

a3 y

Lz

x

A

b1 =2⇡

⌦a2 ⇥ a3 b2 =

2⇡

⌦a3 ⇥ a1 b3 =

2⇡

⌦a1 ⇥ a2

⌦ = a1 · a2 ⇥ a3 ai · bj = 2⇡�ij

Page 8: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Samplingthe1st BZ

⇢(r) =1

⌦BZ

X

n

Z

BZfnk| nk(r)|2dk

⇢(r) =X

nk

wkfnk| nk(r)|2dk,

Theevaluationofmanykeyquantitiesinvolvesanintegraloverthe1st BZ.Forinstancethechargedensity:

WeexploitthefactthattheorbitalsatBlochvectorsk thatareclosetogetherarealmostidenticalandapproximatetheintegraloverthe1st BZbyaweightedsumoveradiscretesetofk-points:

Fazit:theintractabletaskofdeterminingwithN=1023,hasbeenreducedtocalculatingatadiscretesetofk-pointsinthe1st BZ,foranumberofbandsthatisoftheorderifthenumberofelectronsintheunitcell.

(r1, ..., rN ) nk(r)

Page 9: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

E[⇢, {R, Z}] = Ts[{ nk[⇢]}] + EH [⇢, {R, Z}] + Exc[⇢] + U({R, Z})

Ts[{ nk[⇢]}] =X

nk

wkfnkh nk|�1

2�| nki

EH[⇢, {R, Z}] = 1

2

ZZ⇢eZ(r)⇢eZ(r0)

|r� r0| dr0dr

⇢eZ(r) = ⇢(r) +X

i

Zi�(r�Ri) ⇢(r) =X

nk

wkfnk| nk(r)|2dk

⇣�1

2�+ VH [⇢eZ ](r) + Vxc[⇢](r)

⌘ nk(r) = ✏nk nk(r)

VH [⇢eZ ](r) =

Z⇢eZ(r0)

|r� r0|dr0

Thetotalenergy

Thekineticenergy

TheHartree energy

where

TheKohn-Shamequations

TheHartree potential

Page 10: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Aplanewavebasisset

unk(r+R) = unk(r) nk(r) = unk(r)eikr

unk(r) =1

⌦1/2

X

G

CGnkeiGr nk(r) =

1

⌦1/2

X

G

CGnkei(G+k)r

⇢(r) =X

G

⇢GeiGr

1

2|G+ k|2 < Ecuto↵

V (r) =X

G

VGeiGr

Allcell-periodicfunctionsareexpandedinplanewaves(Fourieranalysis):

Thebasissetincludesallplanewavesforwhich

Crnk =X

G

CGnkeiGr FFT �! CGnk =

1

NFFT

X

r

Crnke�iGr

TransformationbymeansofFFTbetween“real”spaceand“reciprocal”space:

Page 11: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

τ1

τ1 π / τ 1

τ2

b1

b2

real space reciprocal spaceFFT

0 1 2 3 0

1 1 x = n / N 1 1 g = n 2

N−1 −1−2−3−4 0 1 2 3 4 55

N/2−N/2+1

cutG

Crnk =X

G

CGnkeiGr FFT �! CGnk =

1

NFFT

X

r

Crnke�iGr

Page 12: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Thechargedensity

ψrψG ψr

ρ rρG

2 Gcut

Gcut

FFT

FFT

Page 13: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

TheSelf-Consistency-Cycle(cont.)

Twosub-problems:

• OptimizationofIterativeDiagonalizatione.g. RMM-DIISorBlockedDavidson

• ConstructionofDensityMixinge.g. Broyden mixer

{ n}

⇢in

Page 14: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Theself-Consistency-Cycle

H = hG|H[⇢]|G0i ! diagonalize H ! { i, ✏i} i = 1, .., NFFT

⇢0 ! H0 ! ⇢0 ! ⇢1 = f(⇢0, ⇢0) ! H1 ! ...

⇢ = ⇢0

Anaïvealgorithm:expresstheHamiltonmatrixinaplanewavebasisanddiagonalize it:

Self-consistency-cycle:

Iterateuntil:

BUT: wedonotneedNFFT eigenvectorsoftheHamiltonian(atacostofO(NFFT3)).

ActuallyweonlytheNb lowesteigenstates ofH,whereNb isoftheorderofthenumberofelectronsperunitcell(Nb <<NFFT).

Solution:useiterativematrixdiagonalization techniquestofindtheNb lowestEigenvectoroftheHamiltonian:RMM-DIIS,blocked-Davidson,etc.

Page 15: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Keyingredients:SubspacediagonalizationandtheResidual

X

m

HnmBmk =X

m

✏appk SnmBmk

Hnm = h n|H| mi Snm = h n|S| mi

| ki =X

m

Bmk| mi

|R( n)i = (H � ✏appS)| ni

• Rayleigh-Ritz:diagonalization oftheNb xNb subspace

with

yieldsNb eigenvectorswitheigenvaluesεapp.

Theseeigenstates arethebestapproximationtotheexactNb lowesteigenstates ofH withinthesubspacespannedbythecurrentorbitals.

• TheResidual:

✏app =h n|H| nih n|S| ni

(itsnormismeasurefortheerrorintheeigenvector)

Page 16: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Blocked-Davidson• Takeasubsetofallbands: { n|n = 1, .., N} ) { 1

k|k = 1, .., n1}

{ 1k/g

1k = K(H� ✏appS)

1k|k = 1, .., n1}

{ 2k|k = 1, .., n1}

• Extendthissubsetbyaddingthe(preconditioned)residualvectorstothepresentlyconsideredsubspace:

• Rayleigh-Ritzoptimization(“sub-spacerotation”)inthe2n1 dimensionalsubspacetodeterminethen1 lowesteigenvectors:

diag{ 1k/g

1k}

• Extendsubspacewiththeresidualsof { 2k}

{ 1k/g

1k/g

2k = K(H� ✏appS)

2k|k = 1, .., n1}

• Rayleigh-Ritzoptimization ) { 3k|k = 1, .., n1}

• Etc …

{ mk |k = 1, .., n1} { n|n = 1, .., n1}

• Theoptimizedsetreplacestheoriginalsubset:

• Continuewithnextsubset:,etc,…{ 1k|k = n1 + 1, .., n2}

Aftertreatingallbands:Rayleigh-Ritzoptimizationof { n|n = 1, .., N}

Page 17: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

TheactionoftheHamiltonianH| nki ✓

�1

2�+ V (r)

◆ nk(r)

hr|G+ ki = 1

⌦1/2ei(G+k)r �! hG+ k| nki = CGnk

NNPLWhG+ k|� 1

2�| nki =

1

2|G+ k|2CGnk

hG+ k|V | nki =1

NFFT

X

r

VrCrnke�iGr NFFT logNFFT

Theaction

Usingtheconvention

• Kineticenergy:

• Localpotential:• Exchange-correlation:easilyobtainedinrealspace• FFTtoreciprocalspace• Hartree potential:solvePoissoneq.inreciprocalspace• Addallcontributions• FFTtorealspaceTheaction

V = VH[⇢] + Vxc[⇢] + Vext

Vxc,r = Vxc[⇢r]

VH,G =4⇡

|G|2 ⇢GVG = VH,G + Vxc,G + Vext,G

{VG} �! {Vr}

{Vxc,r} �! {Vxc,G}

Page 18: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

SolvingtheKSequations

• FFTsextensivelyusedtoevaluate

• Weactuallyuseamixedbasisset(Projector-Augmented-Waves):

whichinvolvesprojectionofthepseudo-wavefunctionsonlocalprojectionoperators(DGEMM).

• Oneneedstokeepthesolutions(bands)ateachk-pointorthonormal:essentiallydonebyaCholeski decomposition(LU)oftheoverlapmatrix,followedbyaninversionofU(LAPACK/scaLAPACK)andatransformationbetweenthewavefunction(ZGEMM).

• DiagonalizationoftheHamiltonianinthesubspaceofthecurrentwavefunctions(LAPACK/scaLAPACK orELPA),followedbyaunitarytransformationbetweenthewavefunction(ZGEMM).

⇣�1

2�+ Vext(r) + VH(r) + Vxc(r)

⌘ nk(r) = ✏nk nk(r)

H| ni

| ni = | e ni+X

i

(|�ii � |e�ii)hepi| e ni

Page 19: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

AtypicalworkloadH| ni• Action:

FFT( n)

V (r) n(r)

N lnN

N

8 n

8 n N2

N2 lnN fftw

hpi| ni N 8 i, nN3

const. N2 (realsp.) BLAS3(DGEMM)

• Subspacerotation:

Hnm = h n|H| mi 8 n,m N3 BLAS3(ZGEMM)

diag(H) N3 (sca)LAPACK

| 0

ni =X

m

Unm| mi 8 n N3 BLAS3(ZGEMM)

LREAL=A

Page 20: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Scalingwithsystemsize(N)

Self-ConsistencyCycle(SCC):RMM-DIISrunningon8IntelX5550quadcore procs.(total:32x2.66GHzcores)

Page 21: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Distributionofworkanddata

Distributeworkanddata“over-orbitals”• Default• NCORE=1

(orequivalently:NPAR=#-of-MPI-ranks)• KPAR=1

⇣�1

2�+ Vext(r) + VH(r) + Vxc(r)

⌘ nk(r) = ✏nk nk(r)

TheKohn-Shamequation:

• Orbitalindexn

Page 22: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Distributionofworkanddata

Distributeworkanddata“over-orbitals”• Default• NCORE=1

(orequivalently:NPAR=#-of-MPI-ranks)• KPAR=1

Distributionworkanddata“over-plane-waves”• NCORE=#-of-MPI-ranks

(orequivalently:NPAR=1)• KPAR=1

Page 23: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Distributionofworkanddata

Combinationsof”over-orbitals”and”over-plane-wave”distributionsareallowedaswell

Page 24: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

DistributionofworkanddataAdditionallyworkmaybedistributed”over-k-points”• KPAR=n (n>1)• m =(#-of-MPI-ranks/n)mustbeaninteger• Workisdistributedinaround-robinfashionovergroupsofmMPI-ranks• Dataisduplicated!

⇣�1

2�+ Vext(r) + VH(r) + Vxc(r)

⌘ nk(r) = ✏nk nk(r)

• Orbitalindexn,k-pointindexk

Page 25: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

All-2-Allcommunication

Page 26: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

ParallelFFT:Ball-2-Cube

(A) (B) (C) (D)

G2G

4G

4G

4G

G

4G

FFT

FFT

FFT

z

x

y

In-houseparallelBall-2-CubeFFT:• Less1D-FFTs(reduction:≈ 1.76 ×)• BUT:communicationfrom(B)→(C)and(C)→(D)• ForsmalltomediumsizedFFTgridsahighlyoptimized3D-FFT

(e.g.fromIntel’smkl)isequallyfast

Page 27: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

HardwareconsiderationsTypicalconfiguration:

• N interconnectednodes• 2packages/node• M cores/package

Distribution“over-plane-waves”:MPI-ranksthatshareanorbitalshouldresideonthesamenode(betterevenonthesamepackage).

• NCORE=n ≤ 2M• (2M/ n)shouldbeaninteger• Typically:n =M orn =M/2

Distribution“over-k-points”:• KPAR=n ≤ #-of-k-points(NKPTS)• (#-of-MPI-ranks / n)shouldbeaninteger• Ifmemoryallows:KPAR=NKPTS

DefaultplacementofMPI-ranksonthenodes/packages/coresdependsontheparticularsoftheMPIimplementationanditsconfiguration!

Page 28: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Forinstance:• 2 interconnectednodes• 2packages/node• 4cores/package

DefaultplacementofMPI-ranksonthenodes/packages/coresdependsontheparticularsoftheMPIimplementationanditsconfiguration!

Page 29: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Forinstance:• 2 interconnectednodes• 2packages/node• 4cores/package

Good:PlacesubsequentMPI-ranksclosetogether,i.e.,firstonsubsequentcoresofthesamepackage,thenmovingonthesecondpackageofthesamenode,beforestartingtofillthenextnode.

Page 30: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Forinstance:• 2 interconnectednodes• 2packages/node• 4cores/package

Bad:DistributesubsequentMPI-ranksinaround-robinfashionoverthepackages.

Page 31: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Forinstance:• 2 interconnectednodes• 2packages/node• 4cores/package

Worse:DistributesubsequentMPI-ranksinaround-robinfashionoverthenodes.

Page 32: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

• N interconnectednodes• 2packages/node• M cores/package• 2NMMPI-ranks

Distribution“over-plane-waves”:MPI-ranksthatshareanorbitalshouldresideonthesamenode(betterevenonthesamepackage).

• NCORE=n ≤ 2M• (2M/ n)shouldbeaninteger• Typically:n =M orn =M/2

Distribution“over-k-points”:• KPAR=n ≤ #-of-k-points(NKPTS)• (#-of-MPI-ranks / n)shouldbeaninteger• Ifmemoryallows:KPAR=NKPTS

Distribution“over-orbitals”:• Thenumberoforbitals(NBANDS)issuch

thatNBANDS/(2NM /NCORE/KPAR)isanintegernumber

⇒ Increasing#-of-MPI-ranksmayleadtoanunnecessarilylargeNBANDS(i.e.,adding“empty”orbitals)

• SomealgorithmsconvergefasterwheneachMPIrankowns(partof)afeworbitals(e.g. blocked-Davidson)

• Generallyspeaking:havinglotsofMPIranksandverylittlework/dataperrankisneveragoodideasince(all-2-all)communicationbecomesunreasonablyexpensive.

NCORE,KPAR,andthe#-of-MPI-ranks

Page 33: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Strong/Weakscaling(SiN)

Page 34: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

ScalingunderMPI(onaCrayXC-40)

CourtesyofP.Saxe,MaterialsDesignInc.(andCray).

• 5713e-• 2k-points• RMM-DIIS• PBE• KPAR=2

Page 35: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

ScalingunderMPI(onaCrayXC-40)

• 864e-• 8k-points• HSE06• Davidson• KPAR=8

CourtesyofP.Saxe,MaterialsDesignInc.(andCray).

Page 36: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Largesystems

hpi| ni N 8 i, nN3

const. N2 (realsp.) BLAS3(DGEMM)LREAL=A

• Usereal-spacePAWprojectors(insteadofreciprocalspaceprojectors)

• Ifyoucanlimitk-pointsamplingtotheGamma(𝚪 ≡ 𝐤 = 𝟎)point:usethe“gamma-only”versionofVASP

Inthe”gamma-only”versionofVASPtheorbitalsarestoredasrealquantitiesinreal-space:• real-2-complexFFTs• DGEMMsinsteadofZGEMMs

Page 37: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

Compilation

• FastFFTs:theFFTsfromIntel’smkl-libraryseemtobeunbeatablyfast…• scaLAPACK forlargesystems• AcompilerthateffectivelygeneratesAVX2instructionsandlibraries(e.g. BLAS)

thatareoptimizedforAVX2(upto20%performancegain)

Page 38: VASP: running on HPC resourcesto the presently considered subspace: • Rayleigh-Ritz optimization (“sub-space rotation”) in the 2n 1 dimensional subspace to determine the n 1

TheEnd

Thankyou!