iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 m. c. payne et...

53
Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients M. C. Payne Cavendish Laboratory, Madingley Road, Cambridge, C830HE, United Kingdom M. P. Teter and D. C. Ailan Applied Process Research, Corning Incorporated, Corning, New York 14831 T. A. Arias and J. D. Joannopouios Department of Physics, Massachusetts institute of Technology, Cambridge, Massachusetts 02 139 This article describes recent technical developments that have made the total-energy pseudopotential the most powerful ah initio quantum-mechanical modeling method presently available. In addition to present- ing technical details of the pseudopotential method, the article aims to heighten awareness of the capabili- ties of the method in order to stimulate its application to as wide a range of problems in as many scientific disciplines as possible. CONTENTS equa- Inatrix ent" of I. Introduction II. Total-Energy Pseudopotential Calculations A. Overview of approximations B. Electron-electron interactions 1. Exchange and correlation 2. Density-functional theory a. The Kohn-Sham energy functional b. Kohn-Sham equations c. Local-density approximation C. Periodic supercells 1. Bloch's theorem 2. k-point sampling 3. Plane-wave basis sets 4. Plane-wave representation of Kohn-Sham tions 5. Nonperiodic systems D. Electron-ion interactions 1. Pseudopotential approximation a. Norm conservation b. Generation procedure c. Convergence properties d. Plane-wave basis sets 2. Structure factor 3. Total ionic potential E. Ion-ion interactions F. Computational procedure with conventional diag onalization G. Drawbacks of conventional procedure H. Alternative methods III. The Molecular-Dynamics Method A. Eigenvalue solution by successive "improvem a trial wave function B. Molecular-dynamics procedure 1. Molecular-dynamics Lagrangian 2. Constraints C. Molecular-dynamics equations of motion 1. Unconstrained equations of motion 2. Constrained equations of motion 3. Partially constrained equations of motion D. Integration of equations of motion 1. The Verlet algorithm 2. Stability of the Verlet algorithm 1046 1049 1049 1050 1050 1050 1050 1050 1051 1051 1052 1052 1052 1053 1053 1054 1054 1055 1055 1056 1056 1056 1056 1057 1057 1058 1058 1058 1058 1059 1059 1059 1060 1060 1060 1060 1061 1061 1061 E. Orthogonalization of electronic wave functions F. Damping G. Self-consistency H. Kohn-Sham eigenstates I. Computational procedure with molecular dynamics J. Instabilities and fluctuations in the Kohn-Sham en- ergy K. Computational cost of molecular dynamics 1. Calculation of charge density 2. Construction of Hamiltonian matrix 3. Accelerations of the coefficients 4. Integration of equations of motion 5 ~ Ortho gonalization 6. Comparison to conventional matrix diagonaliza- tions 7. Local pseudopotentials IV. Improvements in Algorithms A. Improved integration 1. Analytic integration of second-order equations of motion 2. Analytic integration of first-order equations of motion B. Orthogonalization of wave functions 1. The Gram-Schmidt scheme 2. Convergence to Kohn-Sham eigenstates C. Comparison between algorithms D. Remaining difhculties V. Direct Minimization of the Kohn-Sham Energy Func- tional A. Minimization of a function 1. The method of steepest descents 2. The conjugate-gradients technique B. Application of the conjugate-gradients method 1. The update of a single band 2. Constraints 3. Preconditioning 4. Conjugate directions 5. Search for the energy minimum 6. Calculational procedure 7. Computational cost C. Speed of convergence 1. Large energy cutoff 2. Long supercells 3. A real system 1062 1062 1062 1063 1063 1063 1064 1064 1065 1065 1065 1065 1065 1065 1066 1066 1067 1068 1068 1068 1068 1069 1069 1069 1070 1070 1070 1072 1072 1072 1072 1074 1074 1075 1076 1076 1076 1077 1077 Reviews of Modern Physics, Vol. 64, No. 4, October 1992 Copyright 1992 The American Physicat Society & 045

Upload: others

Post on 25-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

Iterative minimization techniques for ab initio total-energy calculations:molecular dynamics and conjugate gradients

M. C. Payne

Cavendish Laboratory, Madingley Road, Cambridge, C830HE, United Kingdom

M. P. Teter and D. C. Ailan

Applied Process Research, Corning Incorporated, Corning, New York 14831

T. A. Arias and J. D. Joannopouios

Department of Physics, Massachusetts institute of Technology, Cambridge, Massachusetts 02139This article describes recent technical developments that have made the total-energy pseudopotential themost powerful ah initio quantum-mechanical modeling method presently available. In addition to present-ing technical details of the pseudopotential method, the article aims to heighten awareness of the capabili-ties of the method in order to stimulate its application to as wide a range of problems in as many scientificdisciplines as possible.

CONTENTS

equa-

Inatrix

ent" of

I. IntroductionII. Total-Energy Pseudopotential Calculations

A. Overview of approximationsB. Electron-electron interactions

1. Exchange and correlation2. Density-functional theory

a. The Kohn-Sham energy functionalb. Kohn-Sham equationsc. Local-density approximation

C. Periodic supercells1. Bloch's theorem2. k-point sampling3. Plane-wave basis sets4. Plane-wave representation of Kohn-Sham

tions5. Nonperiodic systems

D. Electron-ion interactions1. Pseudopotential approximation

a. Norm conservationb. Generation procedurec. Convergence propertiesd. Plane-wave basis sets

2. Structure factor3. Total ionic potential

E. Ion-ion interactionsF. Computational procedure with conventional

diag onalizationG. Drawbacks of conventional procedureH. Alternative methods

III. The Molecular-Dynamics MethodA. Eigenvalue solution by successive "improvem

a trial wave functionB. Molecular-dynamics procedure

1. Molecular-dynamics Lagrangian2. Constraints

C. Molecular-dynamics equations of motion1. Unconstrained equations of motion2. Constrained equations of motion3. Partially constrained equations of motion

D. Integration of equations of motion1. The Verlet algorithm2. Stability of the Verlet algorithm

1046104910491050105010501050105010511051105210521052

10531053105410541055105510561056105610561057

1057105810581058

10581059105910591060106010601060106110611061

E. Orthogonalization of electronic wave functionsF. DampingG. Self-consistencyH. Kohn-Sham eigenstatesI. Computational procedure with molecular dynamicsJ. Instabilities and fluctuations in the Kohn-Sham en-

ergyK. Computational cost of molecular dynamics

1. Calculation of charge density2. Construction of Hamiltonian matrix3. Accelerations of the coefficients4. Integration of equations of motion5 ~ Ortho gonalization6. Comparison to conventional matrix diagonaliza-

tions7. Local pseudopotentials

IV. Improvements in AlgorithmsA. Improved integration

1. Analytic integration of second-order equationsof motion

2. Analytic integration of first-order equations ofmotion

B. Orthogonalization of wave functions

1. The Gram-Schmidt scheme2. Convergence to Kohn-Sham eigenstates

C. Comparison between algorithmsD. Remaining difhculties

V. Direct Minimization of the Kohn-Sham Energy Func-tionalA. Minimization of a function

1. The method of steepest descents2. The conjugate-gradients technique

B. Application of the conjugate-gradients method1. The update of a single band2. Constraints3. Preconditioning4. Conjugate directions5. Search for the energy minimum6. Calculational procedure7. Computational cost

C. Speed of convergence1. Large energy cutoff2. Long supercells3. A real system

10621062106210631063

1063106410641065106510651065

1065106510661066

1067

10681068

1068106810691069

1069107010701070

107210721072107210741074107510761076107610771077

Reviews of Modern Physics, Vol. 64, No. 4, October 1992 Copyright 1992 The American Physicat Society & 045

Page 2: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1046 M. C. Payne et a/. : Abinitio iterative minimization techniques

Choice of Initial Wave Functions for the ElectronicStatesA. Convergence to the ground state

1. Spanning the ground state2. Symmetry conservation

B. Rate of convergenceC. Specific initial configurations

1. Reduced basis sets2. Symmetrized combinations of plane waves

3. Random initial configurations4. Random initial velocities

Relaxation of the Ionic SystemA. The Car-Parrinello LagrangianB. Equations of motionC. The Hellmann-Feynman theorem

1. Errors in Hellmann-Feynman forces2. Consequences of the Hellmann-Feynman

theoremD. Pulay forcesE. Pulay stressesF. Local energy minimization

1. Local energy minimization with the molecular-dynamics method

2. Local energy minimization with the conjugate-gradients method

G. Global energy minimization1. Simulated annealing2. Monte Carlo methods3. Location of low-energy configurations

Dynamical SimulationsA. Limitations to dynamical simulationsB. Accuracy of dynamical trajectoriesC. Error cancellation in dynamical simulationsD. Car-Parrinello dynamics; constraintsE. Conjugate-gradients dynamicsF. Comparison of Car-Parrinello and conjugate-

gradient dynamicsG. Phonon frequenciesNonlocal Pseudopotentials

A. Kleinman-Bylander potentials1. Enhanced projections2. Computational cost

B. Vanderbilt potentialsC. Real-space projection; nonuniqueness of spherical

harmonic projectionD. Potentials for real-space projectionParallel ComputersConcluding Remarks

owledgmentsences

VI.

VII.

VIII.

IX.

X.XI.

AcknRefer

107810781078107910791079107910791080108010801081108110811082

1082108210831083

1083

10831083108410841085108510851085108610871088

1089108910901091109110911091

109210931094109510961096

I. INTRODUCTION

There is little doubt that most of low-energy physics,chemistry, and biology can be explained by the quantummechanics of electrons and ions. The limits of applicabil-ity and even the interpretation of the predictions ofmodern quantum theory are lively areas of debateamongst philosophers. Questions such as "How do weinterpret the probabilistic nature of wave functions?, ""%'hat constitutes a measurement?, " "How much can weever know about the state of a system?, " and "Can quan-tum mechanics describe consciousness'?" are of funda-mental importance. Despite the fact that these questions

are still debated, it is clear that whether or not a morecomplete description of the world is possible, thosethings that modern quantum theory does predict are pre-dicted with incredible accuracy. One outstanding exam-ple of this accuracy is the calculation of the gyromagnet-ic ratio of the electron, which agrees with the experimen-tal result to the limit of the measurement, some 10significant figures. Quantum theory has also provencorrect and provided fundamental understanding for awide variety of phenomena, including the energy levels ofatoms, the covalent bond, and the distinction betweenmetals and insulators. Further, in every instance of itsapplication to date, the equations of quantum mechanicshave yet to be shown to fail. There is, therefore, everyreason to believe that an understanding of a great num-ber of phenomena can be achieved by continuing to solvethese equations.

As we shall soon see, the ability of quantum mechanicsto predict the total energy of a system of electrons andnuclei, enables one to reap a tremendous benefit from aquantum-mechanical calculation. In fact this entire arti-cle is dedicated to just this one type of quantum-mechanical calculation, the foundation for which is quitestrong. The quantum-mechanical rules, or Hamiltonians,for calculating the total energy of simple one-atom sys-tems have provided some of the most precise tests of thetheory, and the rules for calculating the energies of morecomplicated systems are simple, straightforward exten-sions of these atomic Hamiltonians. It is therefore em-inently reasonable to expect quantum mechanics to pre-dict accurately the total energies of aggregates of atomsas well. So far, this expectation has been confirmed timeand time again by experiment.

A few moments' thought shows that nearly all physicalproperties are related to total energies or to difFerencesbetween total energies. For instance, the equilibrium lat-tice constant of a crystal is the lattice constant that mini-mizes the total energy; surfaces and defects of solidsadopt the structures that minimize their correspondingtotal energies. If total energies can be calculated, anyphysical property that can be related to a total energy orto a difference between total energies can be determinedcomputationally. For example, to predict the equilibri-um lattice constant of a crystal, a series of total-energycalculations are performed to determine the total energyas a function of the lattice constant. As shown in Fig. 1,the results are then plotted on a graph of energy versuslattice constant, and a smooth curve is constructedthrough the points. The theoretical value for the equilib-rium lattice constant is the value of the lattice constant atthe minimum of this curve. Total-energy techniques alsohave been successfully used to predict with accuracyequilibrium lattice constants, bulk moduli, phonons,piezoelectric constants, and phase-transition pressuresand temperatures (for reviews see Cohen, 1984; Joanno-poulos, 1985; Pickett, 1989).

Part of our aim in this article is to introduce the use-fulness of quantum total-energy techniques to a broad

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 3: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1047

LATTICE CONSTANT

FIG. 1. Theoretical determination of an equilibrium latticeconstant. Calculations (open circles) at various possible latticeconstants are performed and a smooth function is fitted throughthe points. The predicted lattice constant is determined by theminimum in the curve.

range of scientists, including, for example, chemists, biol-ogists, and geophysicists, who can at last benefit fromthese techniques. It is often suggested that quantummechanics was primarily developed to describe events onthe atomic scale, raising the question, "Of what use arequantum-mechanical calculations in a science not con-cerned directly with events on the atomic scale' ?" Sinceour world is composed of and defined by the interactionsof atoms and molecules, a detailed and fundamental un-derstanding of the world must ultimately rest oncomprehending these interactions. The good news thatthis article brings is that the search for this kind of un-derstanding is no longer a mere idle philosophical musingbut rather a practical methodology.

There are many examples of connections betweenatomic and macroscopic levels being made every day.The "lock and key" mechanism in biological systems hasled to the development of "designer drugs" whose shapescorrespond to the "key" in the relevant biological reac-tion. In turn the shapes of the drugs are known and un-derstood only because of the quantum-mechanicaldescription of covalent bonds. Although no drug has yetbeen designed by determining the shape of the moleculeby solving the Schrodinger equation, this particular ap-plication of quantum mechanics is not far away. Thereare also examples of the application of quantum mechan-ics beyond the atomic scale in materials science, for in-stance in the onset of failure in a material. The failure ofa material starts on the atomic scale when one bond isstressed beyond its yield-stress and breaks. Thus it is ob-vious that where and when a material actually starts tofail is determined by quantum mechanics. Though thereare many examples in materials science of significant pro-gress having been made without any need for quantum-mechanical modeling, this progress is often limited as onepushes forward and encounters the atomic world. Forinstance, the understanding of the properties and behav-ior of dislocations comes from classical elasticity theory,but even in these cases very little is known about the coreof a dislocation, precisely because this part of the disloca-

tion requires detailed quantum-mechanical modeling.Moreover, even classical elasticity theory, which can onlybe applied on a macroscopic scale, is directly related tothe atomic world through the elastic constants, the pa-rameters of elasticity theory determined by thequantum-mechanical behavior of the material. Thoughmaterials constants such as the elastic constants mayoften be simply measured in the laboratory, the geophysi-cist modeling a continental drift does not have the luxuryof performing experiments to bypass quantuIn-mechanical calculations. It is not possible, at present, togenerate geophysical pressures in the laboratory. There-fore the relevant high-pressure parts of the phase dia-grams of the materials that constitute the earth are un-known. Geophysicists will greatly benefit fromquantum-mechanical modeling, which can provide themwith the parameters needed to pursue their research.

When should a scientist consider quantum-mechanicalmodeling? The examples given above suggest thatquantum-mechanical modeling be considered in situa-tions where experiment is impossible. This principle isnot limited, as in the geophysics example, to situationsthat are completely inaccessible to experiment, but alsoincludes the performance of "computational experi-ments, " which afford far greater "experimental" controlthan their physical counterparts. For instance, one can"reach into" a theoretical chemical calculation and, atwill, bend bonds at experimentally unstable and inacces-sible angles to gain insight into the processes controllingchemical reactions. Or, one can study the properties ofisolated defects in a material in which segregation of im-purities towards those defects tends to cloud the experi-mental results.

Another relevant consideration is what can be calcu-lated quantum mechanically and at what cost. Theboundary of feasible quantum-mechanical calculationshas shifted significantly, to the extent that it may now bemore cost effective to employ quantum-mechanical mod-eling even when experiments do offer an alternative.Moreover, many fields of science, not just physics andbasic materials science, may benefit. It is true that theoriginal modeling of the covalent bond was not quantita-tively accurate, and it did not give the correct value forthe energy associated with the bond. However, chemistssolving the Schrodinger equation nowadays accuratelycalculate the energy of covalent bonds as well as theirequilibrium lengths, force constants, and polarizabilities.Physicists have developed many methods that can beused to calculate a wide range of physical properties ofmaterials, such as lattice constants and elastic constants.These methods, which require only a specification of theions present (by their atomic number), are usually re-ferred to as ab initio methods.

Many of the ab initio methods that physicists andchemists use have existed for more than a decade, and itis not the purpose of this article to describe or comparethe wide range of methods that exist. All of the ab initiomethods have been continuously refined over recent years

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 4: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1048 M. C. Payne et al. : Abinitio iterative minimization techniques

and all have benefited from the availability of increasing-ly powerful computers. A decade ago most ab initiomethods were capable only of modeling systems of a fewatoms, and hence applicability to real-world systems atthat point was extremely limited. Most methods can nowmodel systems that contain some tens of atoms and areused to study a small but significant range of interestingproblems. Of all the methods, one, the total-energy pseu-dopotential method, stands alone. A decade ago thismethod was also capable of modeling only few-atom sys-tems. Now, however, this method can model thousand-atom systems, and it is already clear that this numberwill increase by at least a factor of 10 within the next five

years. Pushing back the limits of quantum-mechanicalmodeling to this extent, the total-energy pseudopotentialmethod opens up a wide range of interesting problems toquantum-mechanical calculation, and the future shouldbring the application of quantum mechanics to many newfields of science. The increase in the number of atomsthat can be handled is directly due to an increase in com-putational efficiency of the ab initio pseudopotentialmethod, which also means that there is an increasingclass of problems for which it is more cost effective to usequantum-mechanical modeling than experiment to deter-mine the physical parameters of systems. One purpose ofthis article is to heighten awareness amongst scientists ina range of scientific disciplines that the world is quantummechanical and that there now exists an ab initio methodthat allows the quantum mechanics to be solved and in-corporated into everyday science.

There is an economy of scale to ab initio total-energycalculations because so many physical properties are re-lated to total energies. While just one piece of theoretical"apparatus" is needed to calculate all the physical prop-erties that are related to total energies, completelydifferent sets of experimental apparatus are required tomeasure each class of physical property of a material.This represents an enormous advantage of quantum-mechanical modeling over experimental measurements.Comparing the decreasing cost of computers with thecost of a large number of different pieces of experimentalapparatus needed to carry out the same functions, onesees that the cost effectiveness of quantum-mechanicalmodeling methods over physical experimentation willcontinue to increase with time. This, then, is the time forresearchers in a wide range of scientific disciplines toconsider very seriously whether quantum-mechanicalmodeling may be applied in a cost-effective way to theirown field of research, even if the field of research is farremoved from what is usually assumed to be thequantum-mechanical world.

Total-energy pseudopotential calculations do requiresignificant amounts of computer time, even for systemscontaining a few atoms in the unit cell, and the computa-tional time required to perform the calculations alwaysincreases with the number of atoms in the unit cell.Thus, for large systems containing hundreds of atoms inthe unit cell, it is essential to use the most efficient nu-

merical algorithms. In the following pages we shall dis-cuss in detail the latest methods for doing this. Amongthe methods we shall discuss, two have revolutionized thefield of ab initio total-energy calculation, each increasingthe number of atoms in a calculable system by more thanan order of magnitude over previously existing tech-niques. As we shall discuss in detail, the molecular-dynamics method developed by Car and Parrinello (1985)transformed the way in which we viewed quantum-mechanical calculations and hence total-energy pseudo-potential calculations; instead of finding a coupled self-consistent solution to a descretized partial differentialequation through matrix techniques, Car and Parrinellominimized a single function through simulated annealing.The Car-Parrinello method can be used to perform calcu-lations for systems containing on the order of one hun-dred atoms in the unit cell. However, severe difficultiesare encountered in certain cases when one attempts touse this method to perform calculations on much largersystems. Recently, conjugate-gradients methods havebeen developed (Teter et al. 1989; Gillan, 1989) thatovercome the difficulties encountered with themolecular-dynamics technique. Conjugate-gradientsmethods have again transformed total-energy pseudopo-tential calculations by replacing simulated annealingminimization with a direct, completely self-consistentsecond-order search for the minimum. Using thesemethods, one can perform calculations for systems con-taining many hundreds, and soon thousands, of atoms.

The molecular-dynamics and conjugate-gradientsmethods allow pseudopotential calculations to be per-formed for much larger systems than was possible usingconventional matrix diagonalization methods. They alsoallow, for the first time, tractable ab initio quantum-mechanical simulations to be performed for systems atnonzero temperatures. While these capabilities offer theobvious advantage of permitting more complex systemsto be studied, there is yet another benefit to be gained byusing these new computational methods. By increasingthe efficiency of the total-energy pseudopotential tech-nique, they have greatly extended the range of applica-tion of this technique by allowing, for the first time, theinclusion of noble and transition-metal atoms and first-row elements such as oxygen in large pseudopotentialcalculations. Until recently it was widely believed thatcomputations including such elements would be com-pletely intractable with pseudopotentials in a plane-waverepresentation. Recent work (Allan and Teter, 1987;Bar-Yam et al. , 1989; Rappe et al. , 1990; Vanderbilt,1990; Trouillier and Martins, 1991) has shown that pseu-dopotential calculations can be performed for systemscontaining these atoms by employing a substantial butmanageable number of plane waves in the basis set. Withtheir increased efficiency, molecular-dynamics andconjugate-gradients methods can handle very largeplane-wave basis sets and therefore permit large total-energy pseudopotential calculations to be performed withthese new pseudopotentials, thus opening the way for

Rev. Mod. Phys. , Vol. 64, No. 4, October 1 992

Page 5: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et af.: Ahinitio iterative minimization techniques 1049

study of a larger variety of chemical systems than waspreviously possible.

This article provides a detailed description of thetotal-energy pseudopotential method, the molecular-dynamics method, and conjugate-gradient minimization.The article also discusses related techniques and develop-ments in a number of areas that have played a role in in-creasing the computational efficiency of these methods.It is hoped that the information presented here issufficiently detailed and at the leading edge of the workbeing done to be useful to scientists working both in andoutside the field of ab initio quantum-mechanical calcula-tions.

II. TOTAL-ENERGY PSEUI3OPOTENTIALCALCULATIONS

This section describes the total-energy pseudopotentialmethod. An extremely useful review of the pseudopoten-tial method can be found in articles by Ihm et al. (1979)and Denteneer and van Haeringen (1985). These articlesare essential reading for anyone intending to implementcodes for total-energy pseudo potential calculations.Total-energy calculations can only be performed if alarge number of simplifications and approximations areused. These simplifications and approximations are de-scribed in the following sections.

A. Overview of approximations

Prediction of the electronic and geometric structure ofa solid requires calculation of the quantum-mechanicaltotal energy of the system and subsequent minimizationof that energy with respect to the electronic and nuclearcoordinates. Because of the large difference in mass be-tween the electrons and nuclei and the fact that theforces on the particles are the same, the electronsrespond essentially instantaneously to the motion of thenuclei. Thus the nuclei can be treated adiabatically, lead-

ing to a separation of electronic and nuclear coordinatesin the many-body wave function —the so-called Born-Oppenheimer approximation. This "adiabatic principle"reduces the many-body problem to the solution of the dy-namics of the electrons in some frozen-in configuration ofthe nuclei.

Even with this simplification, the many-body problemremains formidable. Further simplifications, however,can be introduced that allow total-energy calculations tobe performed accurately and efBciently. These includedensity functional th-eory to model the electron-electroninteractions, pseudopotential theory to model theelectron-ion interactions, supercells to model systemswith aperiodic geometries, and iterative minimizationtechniques to relax the electronic coordinates.

Very briefly, the essential concepts are the following:(i) Density-functional theory (Hohenberg and Kohn,

1964; Kohn and Sham, 1965) allows one, in principle, tomap exactly the problem of a strongly interacting elec-tron gas (in the presence of nuclei) onto that of a singleparticle moving in an effective nonlocal potential. Al-though this potential is not known precisely, local ap-proximations to it work remarkably well. At present, wehave no a priori arguments to explain why these approxi-mations work. Density-functional theory was revitalizedin recent years only because theorists performed total-energy calculations using these potentials and showedthat they reproduced a variety of ground-state propertieswithin a few percent of experiment. Thus the acceptanceof local approximations to density-functional theory hasonly emerged, a posteriori, after many successful investi-gations of many types of materials and systems. General-ly, total-energy differences between related structures canbe believed to within a few percent and structural param-

O

eters to at least within a tenth of an A. Cohesive ener-gies, however, can be in error by more than 10%.

(ii) Pseudopotential theory (Phillips, 1958; Heine andCohen, 1970) allows one to replace the strong electronion potential with a much weaker potential —apseudopotential —that describes all the salient features ofa valence electron moving through the solid, includingrelativistic effects. Thus the original solid is now re-placed by pseudo valence electrons and pseudo-ion cores.These pseudoelectrons experience exactly the same po-tential outside the core region as the original electronsbut have a much weaker potential inside the core region.The fact that the potential is weaker is crucial, however,because it makes the solution of the Schrodinger equationmuch simpler by allowing expansion of the wave func-tions in a relatively small set of plane waves. Use ofplane waves as basis functions makes the accurate andsystematic study of complex, low-symmetry configura-tions of atoms much more tractable.

(iii) The supercell approximation allows one to dealwith aperiodic configurations of atoms within the frame-work of Bloch's theorem (see Ashcroft and Mermin,1976). One simply constructs a large unit cell containingthe configuration in question and repeats it periodicallythroughout space. By studying the properties of the sys-tem for larger and larger unit cells, one can gauge the im-portance of the induced periodicity and systematicallyfilter it out. This approach has been successfully testedagainst "exact" Koster-Slater Green's-function methods(see Baraff and Schluter, 1979), which are only tractablefor very-high-symmetry configurations.

(iv) Finally, new iterative diagonalization approaches(Car and Parrinello, 1985; Payne et al. , 1986; Williamsand Soler, 1987; Gillan, 1989; Stich et al. , 1989; Teteret al. , 1989) can be used to minimize the total-energyfunctional. These are much more efficient than the tradi-tional diagonalization methods. These new methods al-low expedient calculation of ionic forces and total ener-gies and significantly raise the level of modern total-energy calculations. These methods are the subject ofSecs. III, IV, and V.

Rev. Mod. Phys. , Vol. 64, No. 4, October 3 992

Page 6: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1050 M. C. Payne et a/. : Abinitio iterative minimization techniques

B. Electron-electron interactions

The most difficult problem in any electronic structurecalculation is posed by the need to take account of theeffects of the electron-electron interaction. Electrons re-pel each other due to the Coulomb interaction betweentheir charges. The Coulomb energy of a system of elec-trons can be reduced by keeping the electrons spatiallyseparated, but this has to balanced against the kinetic-energy cost of deforming the electronic wave functions inorder to separate the electrons. The effects of theelectron-electron interaction are briefly described below.

1. Exchange and correlation

The wave function of a many-electron system must beantisymmetric under exchange of any two electrons be-cause the electrons are fermions. The antisymmetry ofthe wave function produces a spatial separation betweenelectrons that have the same spin and thus reduces theCoulomb energy of the electronic system. The reductionin the energy of the electronic system due to the antisym-metry of the wave function is called the exchange energy.It is straightforward to include exchange in a total-energy calculation, and this is generally referred to as theHartree-Fock approximation.

The Coulomb energy of the electronic system can bereduced below its Hartree-Fock value if electrons thathave opposite spins are also spatially separated. In thiscase the Coulomb energy of the electronic system is re-duced at the cost of increasing the kinetic energy of theelectrons. The difference between the many-body energyof an electronic system and the energy of the system cal-culated in the Hartree-Fock approximation is called thecorrelation energy (see Fetter and Walecka, 1971). It is

extremely difficult to calculate the correlation energy of acomplex system, although some promising steps are be-ing taken in this direction using quantum. Monte Carlosimulations of the electron-gas dynamics (Fahy et al. ,1988; Li et al. , 1991). At present these methods are nottractable in total-energy calculations of systems with anydegree of complexity, and alternative methods are re-quired to describe the effects of the electron-electron in-teraction.

2. Density-functional theory

Density-functional theory, developed by Hohenbergand Kohn (1964) and Kohn and Sham (1965), providedsome hope of a simple method for describing the effectsof exchange and correlation in an electron gas. Hohen-berg and Kohn proved that the total energy, includingexchange and correlation, of an electron gas (even in thepresence of a static external potential) is a unique func-tional of the electron density. The minimum value of thetotal-energy functional is the ground-state energy of thesystem, and the density that yields this minimum value isthe exact single-particle ground-state density. Kohn andSham then showed how it is possible, formally, to replacethe many-electron problem by an exactly equivalent setof self-consistent one-electron equations. For more de-tails about density-functional theory see von Barth(1984), Dreizler and da Providencia (1985), Jones andGunnarson (1989), and Kryachko and Ludena (1990).

a. The Kohn-Sham energy functiona)

The Kohn-Sham total-energy functional for a set ofdoubly occupied electronic states lt; can be written

Et IO ]]=2X f4g2

V P, d'r+ f V;,„(r)n (r)d'r+ f, d rd'r'+Exc[n (r)]+E;,„(IRI]),2&5 2 r —r'

(2.1)

where E;,„ is the Coulomb energy associated with in-teractions among the nuclei (or iona) at positions IRI],V; „ is the static total electron-ion potential, n (r} is theelectronic density given by

n (r)=2 + ~P,.(r)~ (2.2)

and E~c[n (r) ] is the exchange-correlation functional.Only the minimum value of the Kohn-Sham energy

functional has physical meaning. At the minimum, theKohn-Sham energy functional is equal to the ground-state energy of the system of electrons with the ions inpositions I RI ].

b. Kohn-Sham equations

It is necessary to determine the set of wave functionsthat minimize the Kohn-Sham energy functional.

These are given by the self-consistent solutions to theKohn-Sham equations (Kohn and Sham, 1965):

V + V~,„(r)+VH(r)+ Vzc(r) g;(r) =e;l(;(r),

(2.3}

where P; is the wave function of electronic state i, c; isthe Kohn-Sham eigenvalue, and VH is the Hartree poten-tial of the electrons given by

Rev. Mod. Phys. , Vol. 64, No. 4, October 1 992

Page 7: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques l 051

Vtt(r)=e I, d r' . (2.4) 5Exc[n (r)]5n (r)

B[n (r)sxc(r) ]Bn (r)

(2.6b)

The exchange-correlation potential, V&&, is given formal-ly by the functional derivative

5Exc[n (r)]xc( )= (2.5)

c. Local-density approximation

The Hohenberg-Kohn theorem provides some motiva-tion for using approximate methods to describe theexchange-correlation energy as a function of the electrondensity. The simplest method of describing theexchange-correlation energy of an electronic system is touse the local density appr-oximation (LDA; Kohn andSham, 1965), and this approximation is almost universal-ly used in total-energy pseudopotential calculations. Inthe local-density approximation the exchange-correlationenergy of an electronic system is constructed by assum-ing that the exchange-correlation energy per electron at apoint r in the electron gas, exc(r), is equal to theexchange-correlation energy per electron in a homogene-ous electron gas that has the same density as the electrongas at point r. Thus

and

Exc[n(r)]= f exc(r)n(r)d r (2.6a)

The Kohn-Sham equations represent a mapping of theinteracting many-electron system onto a system of nonin-teracting electrons moving in an effective potential due toall the other electrons. If the exchange-correlation ener-

gy functional were known exactly, then taking the func-tional derivative with respect to the density would pro-duce an exchange-correlation potential that included theeffects of exchange and correlation exactly.

The Kohn-Sham equations must be solved self-consistently so that the occupied electronic states gen-erate a charge density that produces the electronic poten-tial that was used to construct the equations. The sum ofthe single-particle Kohn-Sham eigenvalues does not givethe total electronic energy because this overcounts theeffects of the electron-electron interaction in the Hartreeenergy and in the exchange-correlation energy. TheKohn-Sham eigenvalues are not, strictly speaking, the en-ergies of the single-particle electron states, but rather thederivatives of the total energy with respect to the occupa-tion numbers of these states (Janak, 1978). Nevertheless,the highest occupied eigenvalue in an atomic or molecu-lar calculation is nearly the unrelaxed ionization energyfor that system (Perdew et al. , 1982).

The Kohn-Sham equations are a set of eigenequations,and the terms within the brackets in Eq. (2.3) can be re-garded as a Hamiltonian. The bulk of the work involvedin a total-energy pseudopotential calculation is the solu-tion of this eigenvalue problem once an approximate ex-pression for the exchange-correlation energy is given.

with

exc(r) =ex'c [n (r)] . (2.6c)

The local-density approximation assumes that theexchange-correlation energy functional is purely local.Several parametrizations exist for the exchange-correlation energy of a homogeneous electron gas(Wigner, 1938; Kohn and Sham, 1965; Hedin andLundqvist, 1971;Vosko et al. , 1980; Perdew and Zunger,1981), all of which lead to total-energy results that arevery similar. These parametrizations use interpolationformulas to link exact results for the exchange-correlation energy of high-density electron gases and cal-culations of the exchange-correlation energy of inter-mediate and low-density electron gases.

The local-density approximation, in principle, ignorescorrections to the exchange-correlation energy at a pointr due to nearby inhomogeneities in the electron density.Considering the inexact nature of the approximation, it isremarkable that calculations performed using the LDAhave been so successful. Recent work has shown thatthis success can be partially attributed to the fact that thelocal-density approximation gives the correct sum rulefor the exchange-correlation hole (Harris and Jones,1974; Gunnarsson and Lundqvist, 1976; Langreth andPerdew, 1977). A number of attempts to improve theLDA, for instance by using gradient expansions, have notshown any improvement over results obtained using thesimple LDA. One of the reasons why these improve-ments" to the LDA do so poorly is that they do not obeythe sum rule for the exchange-correlation hole. Methodsthat do enforce the sum rule appear to offer a consistentimprovement over the LDA (Langreth and Mehl, 1981,1983).

The LDA appears to give a single well-defined globalminimum for the energy of a non-spin-polarized systemof electrons in a fixed ionic potential. Therefore any en-ergy minimization scheme will locate the global energyminimum of the electronic system. For magnetic materi-als, however, one would expect to have more than one lo-cal minimum in the electronic energy. If the energyfunctional for the electronic system had many local mini-ma, it would be extremely costly to perform total-energycalculations because the global energy minimum couldonly be located by sampling the energy functional over alarge region of phase space.

C. Periodic supercells

In the preceding section it was demonstrated that cer-tain observables of the many-body problem can bemapped into equivalent observables in an effectivesingle-particle problem. However, there still remains theformidable task of handling an infinite number of nonin-teracting electrons moving in the static potential of an

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 8: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1052 M. C. Payne et al. : Abinitio iterative minimization techniques

infinite number of nuclei or ions. Two difficulties must beovercome: a wave function must be calculated for eachof the infinite number of electrons in the system, and,since each electronic wave function extends over the en-tire solid, the basis set required to expand each wavefunction is infinite. Both problems can be surmounted byperforming calculations on periodic systems and applyingBloch's theorem to the electronic wave functions.

1. 8loch's theorem

f;{r)=exp[ik r]f;(r) . (2.7)

The cell-periodic part of the wave function can be ex-panded using a basis set consisting of a discrete set ofplane waves whose wave vectors are reciprocal latticevectors of the crystal,

f, (r) =g c; Gexp[iG. r],G

(2.8)

where the reciprocal lattice vectors 6 are defined by6'1=2~m for all 1 where 1 is a lattice vector of the crys-tal and m is an integer. Therefore each electronic wavefunction can be written as a sum of plane waves,

Bloch's theorem states that in a periodic solid eachelectronic wave function can be written as the product. ofa cell-periodic part and a wavelike part (see Ashcroft andMermin, 1976),

bution to the total energy from a filled electronic band bycalculating the electronic states ai special sets of k pointsin the Brillouin zone (Chadi and Cohen, 1973; Joanno-poulos and Cohen, 1973; Monkhorst and Pack, 1976;Evarestov and Smirnov, 1983). Using these methods, onecan obtain an accurate approximation for the electronicpotential and the total energy of an insulator or a semi-conductor by calculating the electronic states at a verysmall number of k points. The electronic potential andtotal energy are more difficult to calculate if the system ismetallic because a dense set of k points is required todefine the Fermi surface precisely.

The magnitude of any error in the total energy due toinadequacy of the k-point sampling can always be re-duced by using a denser set of k points. The computedtotal energy will converge as the density of k points in-creases, and the error due to the k-point sampling thenapproaches zero. In principle, a converged electronic po-tential and total energy can always be obtained providedthat the computational time is available to calculate theelectronic wave functions at a sufficiently dense set of kpoints. The computational cost of performing a verydense sampling of k space can be significantly reduced byusing the k p total-energy method (Robertson and Payne,1990, 1991). In this technique solutions on the dense setof k points are generated from the solutions on a muchcoarser grid of k points using k.p perturbation theory.

P;(r)=g c,. k+Gexp[i(k+G) r] . (2.9)3. Plane-wave basis sets

2. k-point sampling

Electronic states are allowed only at a set of k pointsdetermined by the boundary conditions that apply to thebulk solid. The density of allowed k points is proportion-al to the volume of the solid. The infinite number of elec-trons in the solid are accounted for by an infinite numberof k points, and only a finite number of electronic statesare occupied at each k point. The Bloch theoremchanges the problem of calculating an infinite number ofelectronic wave functions to one of calculating a finitenumber of electronic wave functions at an infinite num-ber of k points. The occupied states at each k point con-tribute to the electronic potential in the bulk solid sothat, in principle, an infinite number of calculations areneeded to compute this potential. However, the electron-ic wave functions at k points that are very close togetherwill be almost identical. Hence it is possible to representthe electronic wave functions over a region of k space bythe wave functions at a single k point. In this case theelectronic states at only a finite number of k points arerequired to calculate the electronic potential and hencedetermine the tota1 energy of the solid.

Methods have been devised for obtaining very accurateapproximations to the electronic potential and the contri-

Bloch's theorem states that the electronic wave func-tions at each k point can be expanded in terms of adiscrete plane-wave basis set. In principle, an infiniteplane-wave basis set is required to expand the electronicwave functions. However, the coefficients c;&+G for theplane waves with small kinetic energy (A' /2m)~k+G~are typically more important than those with large kinet-ic energy. Thus the plane-wave basis set can be truncat-ed to include only plane waves that have kinetic energiesless than some particular cutoff energy. If a continuumof plane-wave basis states were required to expand eachelectronic wave function, the basis set would be infinitelylarge no matter how small the cutoff energy. Applicationof the Bloch theorem allows the electronic wave func-tions to be expanded in terms of a discrete set of planewaves. Introduction of an energy cutofF' to the discreteplane-wave basis set produces a finite basis set.

The truncation of the plane-wave basis set at a finitecutoff energy will lead to an error in the computed totalenergy. However, it is possible to reduce the magnitudeof the error by increasing the value of the cutoff energy.In principle, the cutoff energy should be increased untilthe calculated total energy has converged, but it will beshown later that it is possible to perform calculations atlower cutoff energies.

One of the difficulties associated with the use of plane-wave basis sets is that the number of basis states changesdiscontinuously with cutoff energy. In general these

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 9: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1053

discontinuities will occur at different cutoffs for differentk points in the k-point set. (In addition, at a fixed-energycutoff, a change in the size or shape of the unit cell willcause discontinuation in the plane-wave basis set. ) This

problem can be reduced by using denser k-point sets, sothat the weight attached to any particular plane-wavebasis state is reduced. However, the problem is stillpresent even with quite dense k-point samplings. It canbe handled by applying a correction factor which ac-counts approximately for the difference between thenumber of states in a basis set with infinitely large num-ber of k points and the number of basis states actuallyused in the calculation (Francis and Payne, 1990).

4. Plang-wave representationof Kohn-Sham equations

W ~ ~ W W I P W W \ ~ M M ~ ~ WI

II

Yiv 'gIw w ~ II m m ~ y

II

N~ ~ ~ ~ W

I

FIG. 2. Schematic illustration of a supercell geometry for apoint defect (i.e., vacancy) in a bulk solid. The supercell is thearea enclosed by the dashed lines.

%'hen plane waves are used as a basis set for the elec-tronic wave functions, the Kohn-Sham equations assumea particularly simple form. Substitution of Eq. (2.9) into(2.3) and integration over r gives the secular equation

~I +G~'S, + V,.„(6—6 )2'

+ VH(6 —6')+ Vxc(G —6') c, z+G,

~i ~i, k+G ' (2.10)

In this form, the kinetic energy is diagonal, and thevarious potentials are described in terms of their Fouriertransforms. Solution of Eq. (2.10) proceeds by diagonali-zation of a Hamiltonian matrix whose matrix elements

Hk+& k+G are given by the terms in the brackets above.The size of the matrix is determined by the choice ofcutoff energy (fi /2m )

~

k+ 6, ~, and will be intractablylarge for systems that contain both valence and core elec-trons. This is a severe problem, but it can be overcomeby use of the pseudopotential approximation, as dis-cussed in Sec. II.D.

surrounded by a region of bulk crystal. Periodic bound-ary conditions are applied to the supercell so that the su-percell is reproduced throughout space, as implied in thefigure. Therefore the energy per unit cell of a crystalcontaining an array of defects is calculated, rather thanthe energy of a crystal containing a single defect. It isessential to include enough bulk solid in the supercell toprevent the defects in neighboring cells from interactingappreciably with each other. The independence of de-fects in neighboring cells can be checked by increasingthe volume of the supercell until the computed defect en-

ergy has converged. It can then be assumed that defectsin neighboring unit cells no longer interact.

A surface may have periodicity in the plane of the sur-face, but it cannot have periodicity perpendicular to thesurface. The supercell for a surface calculation is illus-trated schematically in Fig. 3. The supercell contains a

5. Nonperiodic systems

The Bloch theorem can be applied neither to a systemthat contains a single defect nor in the direction perpen-dicular to a crystal surface. A continuous plane-wavebasis set would be required for the defect calculation,and, although the plane-wave basis set for the surface cal-culation would be discrete in the plane of the surface, itwould be continuous in the direction perpendicular to thesurface. Hence an infinite number of plane-wave basisstates would be required for both of these calculations, nomatter how small the cutoff energy chosen for the basisset. Calculations using plane-wave basis sets can only beperformed on these systems if a periodic supercell is used.The supercell for a point-defect calculation is illustratedschematically in Fig. 2. The supercell contains the defect

FIG. 3. Schematic illustration of a supercell geometry for asurface of a bulk solid. Same convention as in Fig. 2.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 10: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniqUes

I

IIIIIIW M M MIIIIIIIIIIIIIII

W WIIIIII

IW W W M M M M W M m M M W W W ss M M m

IIIIIIE

FIG. 4. Schematic illustration of a supercell geometry for amolecule. Same convention as in Fig. 2.

D. Electron-ion interactions

1. Pseudopotential approximation

crystal slab and a vacuum region. The supercell is re-peated over all space, so the total energy of an array ofcrystal slabs is calculated. To ensure that the results ofthe calculation accurately represent an isolated surface,the vacuum regions must be wide enough so that faces ofadjacent crystal slabs do not interact across the vacuumregion, and the crystal slab must be thick enough so thatthe two surfaces of each crystal slab do not interactthrough the bulk crystal.

Finally, even molecules can be studied in this fashion(Joannopoulos et al. , 1991), as illustrated in Fig. 4.Again, the supercell needs to be large enough so that theinteractions between the molecules are negligible.

pseudo wave functions rather than the true valence wavefunctions. An ionic potential, valence wave function andthe corresponding pseudopotential and pseudo wavefunction are illustrated schematically in Fig. 5. Thevalence wave functions oscillate rapidly in the region oc-cupied by the core electrons due to the strong ionic po-tential in this region. These oscillations maintain theorthogonality between the core wave functions and thevalence wave functions, which is required by the ex-clusion principle. The pseudopotential is constructed,ideally, so that its scattering properties or phase shifts forthe pseudo wave functions are identical to the scatteringproperties of the ion and the core electrons for thevalence wave functions, but in such a way that the pseu-do wave functions have no radial nodes in the core re-gion. In the core region, the total phase shift producedby the ion and the core electrons will be greater by ~, foreach node that the valence functions had in the core re-gion, than the phase shift produced by the ion and thevalence electrons. Outside the core region the two poten-tials are identical, and the scattering from the two poten-tials is indistinguishable. The phase shift produced bythe ion core is di6'erent for each angular momentumcomponent of the valence wave function, and so thescattering from the pseudopotential must be angularmomentum dependent. The most general form for apseudopotential is

(2.11)

where llm ) are the spherical harmonics and V& is thepseudopotential for angular momentum l. Acting on theelectronic wave function with this operator decomposesthe wave function into spherical harmonics, each ofwhich is then multiplied by the relevant pseudopotentialV(.

Although Bloch's theorem states that the electronicwave functions can be expanded using a discrete set ofplane waves, a plane-wave basis set is usually very poorlysuited to expanding electronic wave functions because avery large number of plane waves are needed to expandthe tightly bound core orbitals and to follow the rapid os-cillations of the wave functions of the valence electrons inthe core region. An extremely large plane-wave basis setwould be required to perform an all-electron calculation,and a vast amount of computational time ~ould be re-quired to calculate the electronic wave functions. Thepseudopotential approximation (Phillips, 1958; Heineand Cohen, 1970; Yin and Cohen, 1982a) allows the elec-tronic wave functions to be expanded using a muchsmaller number of plane-wave basis states.

It is well known that most physical properties of solidsare dependent on the valence electrons to a much greaterextent than on the core electrons. The pseudopotentialapproximation exploits this by removing the core elec-trons and by replacing them and the strong ionic poten-tial by a weaker pseudopotential that acts on a set of

FIG. 5. Schematic illustration of all-electron (solid lines) andpseudoelectron (dashed lines) potentials and their correspond-ing wave functions. The radius at which all-electron and pseu-doelectron values match is designated r, .

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 11: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et aj.: Abinitio iterative minimization techniques 1055

A pseudopotential that uses the same potential for allthe angular momentum components of the wave functionis called a local pseudopotential. A local pseudopotentialis a function only of the distance from the nucleus. It ispossible to produce arbitrary, predetermined phase shiftsfor each angular momentum state with a local potential,but there are limits to the amount that the phase shiftscan be adjusted for the different angular momentumstates, while maintaining the crucial smoothness andweakness of the pseudopotential. Without a smooth,weak pseudopotential it becomes dificult to expand thewave functions using a reasonable number of plane-wavebasis states.

a. Norm conservation

In total-energy calculations, the exchange-correlationenergy of the electronic system is a function of the elec-tron density. If the exchange-correlation energy is to bedesired accurately, it is necessary that outside the coreregions the pseudo wave functions and real wave func-tions be identical, not just in their spatial dependencesbut also in their absolute magnitudes, so that the twowave functions generate identical charge densities. Ad-justment of the pseudopotential to ensure that the in-tegrals of the squared amplitudes of the real and thepseudo wave functions inside the core regions are identi-cal guarantees the equality of the wave function andpseudo wave function outside the core region. One of thefirst attempts to construct pseudopotentials of this typewas by Starkloff and Joannopoulos (Joannopoulos et al.1977, Starkloff and Joannopoulos 1977). They intro-duced a class of local pseudopotentials that described thevalence energies and wave functions of many heavyatoms accurately.

Of course, in general, the scattering from the ion coreis best described by a nonlocal pseudopotential that usesa different potential for each angular momentum com-ponent of the wave function. Various groups (Redondoet al. , 1977; Hamann et al. , 1979; Zunger and Cohen,1979; Kerker, 1980; Bachelet et ah. , 1982; Shirley et al. ,1989) have now introduced nonlocal pseudopotentials ofthis type that work extremely well. Moreover, as pointedout by Hamann, Schluter, and Chiang (1979), a match ofthe pseudo and real wave functions outside the core re-gion also assures that the first-order energy dependenceof the scattering from the ion core is correct, so that thescattering is accurately described over a wide range of en-ergy. A method for the construction of pseudopotentialsthat corrects even the higher-order energy dependence ofthe scattering has recently been introduced by Shirleyet al. (1989). Local and nonlocal pseudopotentials ofthese types are currently termed ab initio or norns con-serving and are capable of describing the scattering dueto the ion in a variety of atomic environments, a propertyreferred to as transferability.

b. Generation procedure

Select Exc

Solve all-electroneigenvalues &wove functions

e pseucheigenvoluesual to theall-electronalenCe elgenvalueS &

, YES

he pseudo wave fun'qual to the all-electrolence wove functions be

a cutoff radtus r~?

„YESPseudopotential.

generated

i NO

Select a parametrized formfor the pseudopotential u {r)

Choosea set of parameters ~

Solve pseudo-atomgenvatues 8 wave functions

FIG. 6. Flow chart describing the construction of an ionicpseudopotential for an atom.

The typical method for generating an ionic pseudopo-tential for an atom of species a, v is illustrated in Fig. 6and proceeds as follows. All-electron calculations areperformed for an isolated atom in its ground state andsome excited states, using a given form for the exchange-correlation density functional. This provides valenceelectron eigenvalues and valence electron wave functionsfor the atom. A parametrized form for the ionic pseudo-potential is chosen. The parameters are then adjusted, sothat a pseudoatom calculation using the same form forexchange-correlation as in the all-electron atom givesboth pseudowave functions that match the valence wavefunctions outside some cutoff radius r, and pseudoeigen-values that are equal to the valence eigenvalues. The ion-ic pseudopotential obtained in this fashion is then used,without further modification, for any environment of theatom. The electronic density in any new environment ofthe atom is then determined using both the ionic pseudo-potential obtained in this way and the same form ofexchange-correlation functional as employed in the con-struction of the ionic pseudopotential. A generalizationof this pseudopotential construction procedure for solu-tions of the atom that are not normalizable has recentlybeen introduced by Hamann (1989).

Finally, it should be noted that ionic pseudopotentialsare constructed with r, ranging typically from one to twotimes the value of the core radius. It should also be not-ed that, in general, the smaller the value of r„ the more"transferable" the potential. (The entire procedure forsolving the problem of a solid, given an ionic pseudopo-tential, is outlined in Sec. II.F.)

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 12: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1056 M. C. Payne et al. : Abinitio iterative minimization techniques

c. Convergence properties

The replacement of the true ionic potential by a weak-er pseudopotential allows the electronic wave functionsto be expanded using far fewer plane-wave basis statesthan would be needed to expand the wave functions in afull ionic potential. The rapid oscillations of the valencewave functions in the cores of the atoms have been re-moved, and the small core electron states are no longerpresent. The pseudopotential approximation has a num-ber of other advantages in addition to reducing the num-ber of plane-wave basis states needed to expand the elec-tronic wave functions. The removal of the core electronsmeans that fewer electronic wave functions have to becalculated. More importantly, the total energy of thevalence electron system is typically a thousand timessmaller than the energy of the all-electron system. Thedifference between the electronic energies of different ion-ic configurations appears almost totally in the energy ofthe valence electrons, so that the accuracy required todetermine energy differences between ionic configura-tions in a pseudopotential calculation is much smallerthan the accuracy required in an all-electron calculation.The energy differences are just as large when only thevalence electrons are included in the calculation, but thetotal energies are typically a thousand times smaller inthe pseudopotential calculation than in the all-electroncalculation. But, of course, the total energy is no longermeaningful. Only differences have meaning.

d. Plane-wave basis sets

One property of a pseudopotential that is not incor-porated into the standard generation procedure is thevalue of the cutoff energy required for the plane-wavebasis sets. Obviously, the smaller this value, the smallerthe basis set required for any particular calculation, andthe faster the calculation. A number of approaches tothis problem have been adopted. Some authors add addi-tional constraints in the process of generating the pseu-dopotential which are intended to produce a more rapid-ly convergent potential (Trouillier and Martins, 1991).Alternatively, a separate optimization procedure can beapplied after generating a pseudopotential using one ofthe standard techniques (Rappe et al. , 1990; Rappe andJoannopoulos, 1991). A rather more radical approachhas been suggested by Vanderbilt (Vanderbilt, 1990;Laasonen et al. , 1991),which involves relaxing the normconservation of the pseudopotential. This approach will

be described more fully in Sec. IX.B.

2. Structure factor

The total ionic potential in a solid is obtained by plac-ing an ionic pseudopotential at the position of every ionin the solid. The information about the positions of theions is contained in the structure factor. The value of the

structure factor at wave vector Cx for ions of species o. isgiven by

S (G)=gexp[iG. RI J,I

(2.12)

where the sum is over the positions of all the ions ofspecies a in a single unit cell.

The periodicity of the system restricts the nonzerocomponents of the ioni. c potential to reciprocal-latticevectors. Hence it is only necessary to calculate the struc-ture factor at the set of reciprocal-lattice vectors.

3. Total ionic potential

The total ionic potential V;,„ is obtained by summingthe product of the structure factor and the pseudopoten-tial over all the species of ions. For example, for a localpotential V;,„ is given by

V„„(G)=gS (G)U (G) . (2.13)

U „„=f [Zlr v(r) J4mrdr, — . (2.14)

where U is the pseudopotential for the I =Q angularmomentum state. This integral is nonzero only withinthe core region because the potentials are identical out-side the core region.

There is no contribution to the total energy from theZ/6 component of the pseudopotential at G=O be-cause of the cancellation of the infinities in the electron-ion, electron-electron, and ion-ion energies. However,the non-Coulomb part of the pseudopotential at Cx=Odoes contribute to the total energy. The contribution isequal to

+el+ X+ava, core (2.15)

where X,&is the total number of electrons in the system,

At large distances the pseudopotential is a pureCoulomb potential of the form Z/r, where Z is thevalence of the atom. On taking the Fourier transform,one finds that the pseudopotential diverges as Z/6 atsmall wave vectors. Therefore the total ionic potential atCs=O is infinite, so the electron-ion energy is infinite.However, there are similar divergences in the Coulombenergies due to the electron-electron interactions and theion-ion interactions. The Coulomb G=O contributionsto the total energy from the three interactions cancel ex-actly. This is not surprising because there is no Coulombpotential at G=O in a charge-neutral system, and sothere cannot be a contribution to the total energy fromthe G =0 component of the Coulomb potential.

The pseudopotential is, however, not a pure Coulombpotential and therefore not exactly proportional to Z/6for small 6. There is a constant contribution to the pseu-dopotential at small 6, equal to the integral of thedifference between the pure Coulomb Z/r potential andthe pseudopotential. This constant for species e is

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 13: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et a!.: Abinitio iterative minimization techniques 3057

E. ion-ion interactions

It is extremely dificult to compute the Coulomb ener-

gy of the ionic system using a direct real-space summa-tion because the Coulomb interaction is long ranged.The Coulomb interaction is also long ranged in recipro-cal space, so the problem is not solved by performing thesummation in reciprocal space. Ewald (1917a, 1917b,1921) developed a rapidly convergent method for per-forming Coulomb summations over periodic lattices.

Ewald's method is based on the following identity:

1

IR&+I —R&l

—g I exp[ IR&+I —Rql p ]dp

+ g f"exp0 6 o 4p2

1X exp[i (R, —R2).G] 3 dp,p'(2.16)

where I are lattice vectors, G are reciprocal-lattice vec-tors, and Q is the volume of the unit cell. This identity

is the total number of ions of species e, and Q is thevolume of the unit cell.

provides a method for rewriting the lattice summationfor the Coulomb energy due to the interaction betweenan ion positioned at R2 and an array of atoms positionedat the points R&+I. The identity holds for all positivevalues of g.

At first sight, the infinite Coulomb summation on theleft-hand side of Eq. (2.16) has been replaced by twoinfinite summations, one over lattice vectors and the oth-er over reciprocal-lattice vectors. However, if onechooses an appropriate value of g the two surnrnationsbecome rapidly convergent in their respective spaces.Then the real and reciprocal-space summations can becomputed with only a few lattice vectors and a fewreciprocal-lattice vectors.

As mentioned in the preceding section, the contribu-tions to the total energy from the electron-ion, ion-ion,and electron-electron interactions at G =0 cancel exact-ly, and so the Ci=O contribution to the Coulomb energyof the ionic system must be removed in order to computethe correct total energy. In the Ewald summations theG=O contribution to the Coulomb energy has been di-vided between the real-space and the reciprocal-spacesummations, so that it is not sufhcient simply to omit theCi=O term in the reciprocal-space Ewald summation.The Cx =0 term in the reciprocal-space suinmationshould be omitted and two terms added to the Ewa1d en-

ergy to give the correct total energy. The correct formfor the total energy is (Yin and Cohen, 1982b)

erfc( i7 I R, + I —R2 I )

2 I I IR+I —R~l

2g 4~ 1

v' " n o~~, ICI2'"P cos[(R& —R2).6]—4q gA

(2.17)

where ZI and ZJ are the valences of ions I and J, respec-tively, and erfc is the complementary error function.

An ion does not interact with its own Coulomb charge,so the 1=0 term must be omitted froin the real-spacesummation when I =J. This is indicated by the 1 in thefirst summation in Eq. (2.17).

F. Computational procedure with conventionalmatrix diagonalization

The sequence of steps required to carry out a total-energy pseudopotential calculation with conventionalmatrix diagonalization techniques is shown in the Aowdiagram in Fig. 7. The procedure requires an initialguess for the electronic charge density, from which theHartree potential and the exchange-correlation potentialcan be calculated. The Hamiltonian matrices for each ofthe k points included in the calculation must be con-structed, as in Eq. (2.10), and diagonalized to obtain theKohn-Sham eigenstates. These eigenstates will normally

I

generate a different charge density from the one original-ly used to construct the electronic potentials, and hence anew set of Hamiltonian matrices must be constructed us-ing the new electronic potentials. The eigenstates of thenew Hamiltonians are obtained, and the process is re-peated until the solutions are self-consistent. In practicethe new electronic potential is taken to be a combinationof the electronic potentials generated by the old and thenew eigenstates, since this speeds the convergence toself-consistency. To complete the total-energy calcula-tion, tests should be performed to ensure that the totalenergy is converged both as a function of the number of kpoints and as a function of the cutoff energy for theplane-wave basis set. Very few total-energy calculationsare taken to absolute convergence. For most calcula-tions, the difference in energy between different ionicconfigurations is more important than the absolute ener-gies of the individual configurations, and the calculationsare performed using an energy cutoff and number of kpoints at which the energy differences have convergedrather than an energy cutoff and number of k points atwhich the absolute energies have converged.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 14: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1058 M. C. Payne et al. : Abinitio iterative minimization techniques

Construct VlQQ given atomic numbers and

positions of ions

lPick a cutoff for the ptanewave basis set(e't-"+at'-'l

Pick a trial density n(r)

Calculate VH {n) and Vxc {n)

Solve Htrr = —&

+Vion+VH+Vxc f=F tIi$2y2

by diagonalization of Hk+G, k+ G'

Calculate new n ( r )

IS SOLUTION SELF-CONSISTENT ~

ir YES

Compute Total Energy'

No Generate New

Density n (r)

FIG. 7. Flow chart describing the computational procedure forthe calculation of the total energy of a solid, using conventionalmatrix diagonalization.

G. Drawbacks of conventional procedure

H. Alternative methods

It has been demonstrated that the total-energy pseudo-potential technique gives accurate and reliable values fortotal energies of solids. However, as described above, thepower of the pseudopotential method is severely restrict-ed when using conventional matrix diagonalization tech-niques to solve for the Kohn-Sham eigenstates. In Secs.III, IV, and V, descriptions are given of alternativemethods for performing total-energy pseudopotential cal-culations. These methods are alternative techniques for

Pseudopotential calculations with a plane-wave basisare not very well suited to conventional matrix diagonali-zation techniques. In a total-energy pseudopotential cal-culation there are typically 100 plane-wave basis statesfor each atom in the system. The cost of matrix diago-nalization increases as the third power of the number ofplane-wave basis states, and the memory required tostore the Hamiltonian matrix increases as the square ofthe number of basis states. As a result, conventional ma-trix diagonalization techniques are restricted to the orderof 1000 plane-wave basis states, and this in turn restrictsthe number of atoms in the unit cell to the order of 10.Using conventional matrix diagonalization methods, theKohn-Sham eigenvalues of all of the electronic states arecalculated, although only those of the lowest occupiedstates are required to compute the total energy. Further-Inore, considerable effort is expended to compute the ei-genvalues to machine accuracy, even when the electronicpotential is far from self-consistency.

minimizing the Kohn-Sham energy functional and leadto the same self-consistent Kohn-Sham eigenstates and

eigen values as conventional matrix diag onalization.However, they are much better suited to performingtotal-energy pseudopotential calculations because thecomputational time and memory requirements scalemore slowly with the size of the system, allowing calcula-tions on larger and more complex systems than can bestudied using conventional matrix diagonalization tech-niques.

l I I. THE MOLECULAR-DYNAMICS METHOD

Eigenvalue problems may be solved by successively"improving" a trial wave function. A simple illustrationof this process is given in Sec. III.A. Although the Car-Parrinello method (Car and Parrinello, 1985) should beregarded primarily as a scheme for performing ab initiodynamical simulations, the molecular-dynamics treat-ment of the electronic degrees of freedom introduced inthe Car-Parrinello method can be used to calculatedirectly the self-consistent Kohn-Sham eigenstates of asystem. In this case the method operates by carrying outa series of iterations that "improve" a set of trial wavefunctions until they eventually converge to the Kohn-Sham eigenstates. The total energy can be easily comput-ed once the self-consistent Kohn-Sham eigenstates havebeen determined. In this section we describe themolecular-dynamics treatment of the electronic degreesof freedom and show how it provides a very efticienttechnique for finding the electronic ground state for afixed ionic configuration. In Sec. III.A we begin this dis-cussion with a description of a simple scheme for theiterative solution of an eigenvalue problem based on thevariational princi. pie. The molecular-dynamics-basedmethod is not as transparent as the example presentedhere, but it has the common feature of varying trial wavefunctions until they become eigenstates.

A. Eigenvalue solution by successive"improvement" of a trial wave function

where Xo is the energy of the lowest-energy eigenstate ofthe Hamiltonian H.

If g is expanded using a set of arbitrary orthonormalbasis functions j d) I,

g=g c„P„. (3.2)

Substitution of Eq. (3.2) into (3.1) gives

The variational theorem gives an upper bound for theexpectation value of a Hamiltonian H for any arbitrarynormalized trial wave function g. The expectation valueis greater than or equal to the energy of the lowest-energy eigenstate of the Hamiltonian. Hence

(3 1)

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 15: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1059

(3.3)

and the constraint of normalization requires that

(3.4)

The values of the coefBcients c„can be varied subjectto the constraint of normalization until the minimumvalue for the expectation value of the Hamiltonian isreached. This minimum value gives an upper bound forthe ground-state energy of the Hamiltonian.

The variational theorem gives an upper bound to theground-state energy of the Hamiltonian. However, thedifference between the minimum value of the expectationvalue and the true ground-state energy of any givenHamiltonian is due to the lack of completeness in thebasis set I/I. The eigenstate and the eigen-energy ob-tained by using the variational theorem are exact in thespace of the basis set. Diagonalization of the Hamiltoni-an matrix in the Hilbert space of the same basis stateswould yield an identical solution for the lowest-energyeigenstate.

The variational principle can be applied to obtain anestimate for the energy of the next-lowest-energy eigen-state of the Hamiltonian by using a trial wave functionthat is orthogonal to the ground-state wave function.The eigenstate and eigen-energy obtained for the secondeigenstate will again be identical to those calculated bydiagonalizing the Hamiltonian matrix in the Hilbertspace of the same basis functions. A third eigenstate canbe obtained by using a trial wave function that is orthog-onal to the ground state and to the first excited state.This process can be repeated until all of the eigenstateshave been obtained. The essential point is that the varia-tional principle can be used to obtain eigenstates that areexact in the Hilbert space of the basis set used in the cal-culation. The molecular-Dynamics method is essentiallya dynamical method for applying the variational princi-ple, in which the eigenstates of all the lowest-energy elec-tronic states are determined simultaneously.

B. Molecular-dynamics procedure

Xo

FIG. 8. Schematic illustration of annealing procedure in molec-ular dynamics. The system is started at a high temperaturewith total energy El. The trajectory at this energy allows thesystem to sample a large amount of phase space. As the systemis gradually cooled to E2, E3, etc., it settles down to a minimum

energy configuration.

functional E [ {c;I ] is a function of the set of coefficientsof the plane-wave basis set Ic; I. Each coefficient c; canbe regarded as the coordinate of a classical "particle. "To minimize the Kohn-Sham energy functional, these"particles" are given a kinetic energy, and the system isgradually cooled until the set of coordinates reaches thevalues Ic; Io that minimize the functional. Thus theproblem of solving for the Kohn-Sham eigenstates is re-duced to one of solving for a set of classical equations ofmotion. It should be emphasized, however, that theKohn-Sham energy functional is physically meaningfulquantum mechanically only when the coefBcients takethe values Ic; Io.

f . Molecular-dynamics Lagrangian

Car and Parrinello formulated their method in thelanguage of molecular dynamics. Their essential step wasto treat the electronic wave functions as dynamical vari-ables. A Lagrangian is defined for the electronic systemas follows:

(3.5)

The molecular-dynamics method will be introduced bya description of its application to a system in which thepositions of the ions and the size of the unit cell remainfixed. The calculation can then be directly compared to atotal-energy calculation performed using conventionalmatrix diagonalization techniques. In the traditionalmolecular-dynamics approach a system of classical parti-cles with coordinates IX;I interact through an interac-tion potential V( I X; I ). If the configuration of minimum

energy is required, the system is started at a high temper-ature, and the temperature is gradually reduced until theparticles reach a configuration IX; Io that minimizes V.

This procedure is illustrated schematically in Fig. 8.In the Car-Parrinello scheme the Kohn-Sham energy

where p is a fictitious mass associated with the electronicwave functions, E is the Kohn-Sham energy functional,RI is the position of ion I, and a„define the size andshape of the unit cell. The kinetic-energy term in the La-grangian is due to the fictitious dynamics of the electron-ic degrees of freedom. The Kohn-Sham energy function-al takes the place of the potential energy in a convention-a1 Lagrangian formulation.

2. Constraints

The electronic wave functions are subject to the con-straints of orthonormality,

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 16: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1060 M. C. Payne et al. : Abinitio iterative minimization techniques

Ig,*(r)QJ(r)d r=5,J. . (3.6)

These constraints are incorporated in the molecular-dynamics Lagrangian by using the method of Lagrangemultipliers. The molecular-dynamics Lagrangian be-comes

+g A," Ig,*(r)g (r)d'r —5,, (3.7)

The Lagrange multipliers A - ensure that wave functionsremain normalized, while the Lagrange multipliers A;.(i' ) ensure that the wave functions remain orthogonal.As described below, the Lagrange multipliers may bethought of as providing additional forces acting on thewave functions, which ensure that the wave functionsremain orthonormal.

C. Molecular-dynamics equations of motion

The equations of motion for the electronic states arederived from the Lagrange equations of motion,

d BLdt (jg.

(3.8)

which give

(3.9)

where H is the Kohn-Sham Hamiltonian.The force —(Hg; ) is the gradient of the Kohn-Sham

energy functional at the point in the Hilbert space thatcorresponds to the wave function f; The Lagr.ange mul-tipliers add forces A, /to the "force —(Hg; ). Theseforces ensure that the electronic wave functions remainorthonormal as they propagate along their molecular-dynamics trajectories. A general discussion of the conse-quences of various orthonormalization schemes is given

by Broughton and Khan (1989).

1. Unconstrained equations of motion

where H is the Kohn-Sham Hamiltonian, and o. is an en-

ergy shift that defines the zero of energy.If g; is expanded in the basis set of the eigenstates of

Hamiltonian H,

(3.11)

The constraints of orthonormality play a crucial rolein the evolution of the electronic states in the molecular-dynamics method. To illustrate the importance of theseconstraints, we consider the evolution of the electronicstates in the absence of any constraints:

(3.10)

and if Eq. (3.11) is substituted into (3.10), the followingequation of motion for the coefficient of the eigenstate g„is obtained:

pc', „=—[E„—cr ]c, „, (3.12)

where c„ is the eigenvalue corresponding to eigenstateIntegration of these equations of motion, under the

assumption that the velocities of the coefficients are ini-tially zero, gives the coefficients at time t as

c, „(t)=c, „(0)cos[[(e„—o ) Ip]'~ t ], c.„)cr,

c; .(t)=c; „(0)cosh[(~s„cr—~/p)' t], e„&o .

(3.13a)

(3.13b)

Here c; „(0)are the initial values of the coefficients.It can be seen that the amplitudes of the coefficients of

the eigenstates with energies greater than o. oscillate withtime, while the amplitudes of the coefficients of the eigen-states with energies less than o. increase with time. If o.

is chosen to be larger than the lowest-energy eigenvalue,then all the electronic states that have c; 0(0)WO will con-verge to the lowest-energy eigenstate go, since thecoefficient c; o will increase faster than any othercoefficient. Therefore, under the unconstrained equationsof motion, the electronic wave functions remain neitherorthogonal nor normahzed. The initial wave functionswill only converge to diFerent eigenstates when the con-straints of orthogonality are imposed.

2. Constrained equations of motion

The constrained molecular-dynamics equations ofmotion for the electronic states,

(3.14)

3. Partially constrained equations of motion

Since a separate orthonormalization step is required atthe end of each time step, it is possible to remove the

ensure that the electronic wave functions remain ortho-normal at every instant in time. The molecular-dynamicsevolution of the electronic wave functions under theseequations of motion would also conserve the total energyin the electronic degrees of freedom for the system offixed ions we assume for this section. However, to ensurethese properties, the values of the Lagrange multipliersmust vary continuously with time, and so implementa-tion of this form of the molecular-dynamics equations re-quires that the Lagrange multipliers be evaluated atinfinitely small time separations. To make the calcula-tions tractable, variation of the Lagrange multipliers dur-ing a time step is neglected and the Lagrange multipliersare approximated by a constant value during the timestep. In this case the wave functions will not be exactlyorthonormal at the end of the time step, and a separateorthonormalization step is needed in the calculation.

Rev. Mod. Phys. , Vol. 64, No. 4, October l992

Page 17: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al.: Abinitio iterative minimization techniques 1061

constraints of orthogonality from the equation of motionand use a partially constrained equation of motion. Theconstraints of orthogonality are then imposed after theequations of motion have been integrated, and theLagrange multipliers for the constraints of normalizationA, , are approximated by the expectation values of the en-ergies of the states, A, , where

(3.15)

larger number of time steps in order to integrate theequations of motion. It requires a large amount ofmemory to store the wave-function coefFicients and ac-celerations for each time step in a total-energy pseudopo-tential calculation. If the extra coefficients and accelera-tions did not fit into core memory, the computation couldbecome I/O bound and the total time required for thecalculation may actually increase.

This leads to an equation of motion that has the form

pP;= —[H —A,;]g; . (3.16)

With this equation of motion, the acceleration of an elec-tronic state is always orthogonal to that state, a necessaryrequirement to maintain normalization, and the accelera-tion becomes zero when the wave function is an exacteigenstate.

D. Integration of equations of motion

1. The Verlet algorithm

Once the accelerations of the coefficients have beencalculated, the equations of motion for the coefficients ofthe plane-wave basis states have to be integrated. Carand Parrinello used the Verlet algorithm (Verlet, 1967) tointegrate the equations of motion.

2. Stability of the Verlet algorithm

p;=go+ g 5 (t)ga&0

(3.18)

where the 5 represent infinitesimal coefFicients. Substi-tution of Eq. (3.18) into (3.17b) gives to first order in 5

A general performance measure of algorithms of theVerlet type is the rate at which they converge tominimum-energy state. A given problem normally re-quires a certain amount of real time to converge, and thecomputational effort is then determined by the size of thetime step At. In what follows it is demonstrated that thelargest ht allowed for stability is related to the differencebetween the largest and smallest eigenvalues of the sys-tem.

Given the assumption that g, is near the lowest-energyeigenstate go, the state P, is expanded as in Eq. (3.11),

The Verlet algorithm is derived from the simplestsecond-order difference equation for the second deriva-tive. It gives the value of the ith electronic state at thenext time step, f; (b, t), as

(3.17a)

(3.19)

In the standard stability analysis (see Mathews and Walk-er, 1970), a constant growth factor g is introduced ateach time step, so that

where b, t is the length of the time step, g;(0) is the valueof the state at the present time step, and g;( —b, t) is thevalue of the state at the last time step. Substitution ofEq. (3.16) into (3.17a) then gives

5 (nest)=g5 ((n —1)b,t) .

Substitution of Eq. (3.20) into (3.19) then gives

g —2g+1+ (E —Eo)g =0,(hr)'

(3.20)

(3.21)

(3.17b)

The Verlet algorithm introduces an error of order Atinto the integration of the equations of motion. A moresophisticated finite-difference algorithm could be used tointegrate the equations of motion and hence reduce theerror in the integration to a higher order of ht. In princi-ple, for a given level of accuracy this would allow alonger time step to be used in the integration of the equ'a-

tions of motion and hence reduce the total computationaltime by reducing the number of time steps required toperform the calculation. The maximum stable time step,however, is not significantly increased with a higher-order difference scheme. A more sophisticated finite-difference equation would also require the values of thecoefficients or the corresponding accelerations from a

and the real part of g can become greater than 1 if1/2) ]y2

~

(s —so)(3.22a)

Therefore the largest possible At that is allowed for sta-bility must be

2 1/2Lait (s,„—eo)

(3.22b)

where c „is the largest eigenvalue of the problem.For a Hamiltonian representation in a plane-wave

basis, the largest eigenvalue is primarily determined bythe cutoff kinetic energy of the basis set. Thus the Verletalgorithm will require the time step to be reduced as thecutoff energy is increased. This problem is addressedagain in Sec. IV.A.

Rev. Mod. Phys. , Voi. 64, No. 4, October 1992

Page 18: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1062 M. C. Payne et al. : Abinitio iterative minimization techniques

In the discussion above, it has been tacitly assumedthat the Hamiltonian remains fixed during the time evo-lution of the system. New instabilities can arise, howev-er, when the Hamiltonian is not allowed to vary when itmust vary, as in the case of self-consistency. Thesedifficulties are discussed in Sec. III.J.

E. Orthogonalization of electronicwave functions

that minimize the Kohn-Sham energy functional. This isillustrated schematically in Fig. 9. The damping of thecoefficients can be applied in a number of ways: a damp-ing term of the form y—f; can be added to the equationof motion for the wave function f;, or the velocities ofthe coefficients can be reduced at the end of a time stepby replacing the value of each coefficient at the previoustime step by a value lying between the values of thecoefficient at the previous and present timesteps.

After one integrates the partially constrained equationsof motion for the coefficients of all the basis states foreach electronic state, the wave functions will no 1onger beorthogonal. Car and Parrinello use an iterative tech-nique to orthogonalize the wave functions, repeating theapplication of the following algorithm to generate a newset of wave functions i',' from a normalized set of wavefunctions f;:

(3.23)

The electronic wave functions can be made orthogonal toany desired accuracy by a repeated application of this al-gorithm. If the algorithm is applied to two wave func-tions, these wave functions mill be exactly orthogonalafter a single application of the algorithm. However, ap-plying the algorithm to orthogonalize each of the newwave functions to a third wave function wi11 make thetwo wave functions nonorthogonal. The number of itera-tions of this algorithm required to orthogonalize a set ofwave functions to a particular accuracy increases withthe number of wave functions and with the initial degreeof nonorthogonality. The algorithm does not maintainthe normalization of the wave functions, so they must benormalized after each application of the algorithm. Vari-ous methods for imposing orthogonality have been com-pared by Broughton and Khan (1989).

G. Self-consistency

The accelerations of the wave functions in themolecular-dynamics equations of motion are governed bythe Kohn-Sham Hamiltonian. As described in Sec. II,the Hartree potential and the exchange-correlation po-tential contribute to the Kohn-Sham Hamiltonian anddepend on the charge density generated by the electronicwave functions. As the wave functions evolve under themolecular-dynamics equations of motion, these potentialsvary. The potentials are recalculated at the end of eachtime step (when a new set of wave functions has beengenerated) and lead to a new Kohn-Sham Hamiltonian.Thus the evolution of the coefficients to their stationaryvalues is accompanied by an evolution of the Kohn-ShamHamiltonian to self-consistency. This is illustrated inFig. 10. Each of the solid-lines shown passing throughthe open circles corresponds to a static Kohn-ShamHamiltonian with a fixed charge density. During amolecular-dynamics time step, the coefficients Ic [ movealong a trajectory that lies between two open circles. Atthe end of each time step, the Kohn-Sham Hamiltonian isupdated with the new charge density, and the trajectoryshifts to a new solid line. This shift is indicated by thedotted-line trajectory. The final time step leads to a self-consistent solution of the Kohn-Sham Hamiltonian andthe determination of the minimum in the total energy.

F. Damping

When damping is applied to the motions of thecoefficients c; „, the coe%cients will evolve to the values

fc),

FIG. 9. Schematic representation of the damping of wave-function coefficients Icj and the evolution of the Kohn-Shamenergy functional E [Ic J ] to its ground-state value Eo

FICy. 10. Schematic representation of the evolution ofcoefficients I c I, Kohn-Sham Hamiltonian H, and the total ener-

gy, in the final two time steps of the molecular-dynamicsmethod. Solid-line trajectories between open circles correspondto H with a fixed charge density. The final time step leads to aself-consistent solution of H and a simultaneous determinationof the minimum total energy. Note that, if a trajectory along asolid line were followed all the way to the Icj axis, this wouldcorrespond to conventional matrix diagonalization of H with afixed charge density.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 19: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1063

H. Kohn-Sham eigenstates

The Kohn-Sham energy functional is minimized byany set of wave functions that are a linear combination ofthe lowest-energy Kohn-Sham eigenstates. These wavefunctions will be stationary under the molecular-dynamics equations of motion and subsequent orthogo-nalization. Therefore, in the molecular-dynamicsmethod, each electronic wave function will, in general,converge to a linear combination of the lowest-energyKohn-Sham eigenstates. This is not a problem for sys-tems with a gap, but it can be a severe problem for metal-lic systems in which the occupancy of a state depends onits eigenvalue. Some ideas for handling metals in Car-Parrinello dynamical simulations have been proposed byFernando et al. (1989), Woodward et al. (1989), Benedeket al. (1991), and Pederson and Jackson (1991). The ac-tual Kohn-Sham eigenvalues can be found by diagonali-zation of the matrix whose matrix elements are given by

(3.24)

A simpler approach, which guarantees convergence toKohn-Sham eigenstates, is presented later in Sec. IV.B.

I. Computational procedurewith molecular dynamics

The procedure for performing a total-energy pseudo-potential calculation using the molecular-dynamics tech-nique is shown in the Bow diagram of Fig. 11. The pro-

Construct V;pp given atomic numbers and

positions of ions

Pick a cutoff for the plane-wave basis set

Choose initial wave functions p,

Calculate n (r)

Calculate VH ( n ) and Vxc (n )

Construct KOHN-SHAM Hamiltonian H

Calculate p, Q; = —[H —K] Q.„

Integrate equations of motion

Orthogonalize and normalize

wave functions

cARE tI(IAVE FUNCTIONS STATIONARYP

ii YES

Compute Total Energy

FIG. 11. Flow chart describing the computational procedurefor the calculation of the total energy of a solid with moleculardynamics.

cedure requires an initial set of trial wave functions fromwhich the Hartree potential and the exchange-correlationpotential can be calculated. The Hamiltonian matricesfor each of the k points included in the calculation areconstructed, and from these the accelerations of the wavefunctions are calculated. The equations of motion for theelectronic states are integrated, and the wave functionsare orthogonalized and normalized. The charge densitygenerated by the new set of wave functions is then calcu-lated. This charge density is used to construct a new setof Hamiltonian matrices, and a further set of wave func-tions is obtained by integration of the equations ofmotion and orthonormalization of the resultant wavefunctions. These iterations are repeated until the wavefunctions are stationary. The wave functions are thenlinear combinations of the Kohn-Sham eigenstates. TheKohn-Sham energy functional is minimized, and its valuegives the total energy of the system. The solution is iden-tical to the solution that would be obtained by using ma-trix diagonalization techniques with the same basisstates. The convergence tests described in Sec. II.F mustbe performed to ensure that the calculated total energyhas converged both as a function of the number of k-points included in the calculation and as a function of thecutoF energy for the plane-wave basis set.

J. Instabilities and fluctuationsin the Kohn-Sham energy

In Sec. III.D.2 above a criterion was derived that pro-vides an upper bound to the largest stable time step in theVerlet algorithm. There are, however, additional limita-tions to this time step that are much more subtle. Theselimitations are related to instabilities in the Kohn-Shamenergy and can arise when the Hamiltonian is allowed toevolve under the equations of motion, as required byself-consistency. These instabilities are caused by chargeQuctuations and are commonly referred to as chargesloshing.

As discussed in Sec. III.G above, the Hartree andexchange-correlation potentials of the Kohn-Sham Harn-iltonian depend on the electronic density and mustchange after each time step. If these changes are toolarge, the problem becomes unstable and the time stepmust be reduced. This instability is indicated schemati-cally in Fig. 12. The trajectories in this figure should becontrasted with the trajectories shown in Fig. 10.

The major difficulty lies with the Hartree potential,VH ( Cx ), in Eq. (2.10). VIt ( Cx ) is proportional ton (Cx)/~Cs~, where n (Cx) is the Fourier transform of thecharge density. Therefore, at small reciprocal-lattice vec-tors, a small change in n (Cx) will produce large changesin the potential. Since these changes are not taken intoaccount during an elemental time step, they can lead tolarge increases in energy if the time step is too large.This is particularly true for systems with sufficientlysmall reciprocal-lattice vectors. The smallest (nonzero)reciprocal-lattice vector is inversely proportional to the

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 20: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1064 M. C. Payne et al. : Abinitio iterative minimization techniques

FKx. 12. Illustration of instability in the molecular-dynamicsmethod and Kohn-Sham Hamiltonian as a consequence of alarge time step. The convention is the same as in Fig. 10. Notethat each iteration drives the coefficients further from theirequilibrium value.

longest length of the supercell. Therefore, as supercellsbecome physically larger, the stability of the problemeventually becomes dominated by "charge sloshing" con-siderations, and the time step must be reduced to keepthe fluctuations small.

A related difticulty, which leads to a similarphenomenon, arises if Inany states with nearly the sameeigenvalue exist in the neighborhood of the Fermi energy.Under these circumstances, macroscopic oscillations inelectron density can occur with very little change in totalenergy. The problem is particularly serious for metals.Since small energy changes may yield large difFerences inelectron density, and hence in forces, close convergenceof the electrons to their ground state is necessary in me-tallic systems.

Thus the largest stable time step in the Car-Parrinelloalgorithm is dominated by either the maximum kineticenergy in the problem (as discussed in Sec. III.D.2) or theneed to limit charge sloshing. One of these two con-siderations will place a practical limit on the size andtype of problem that can be attacked with the Car-Parrinello algorithm. Despite these limitations, however,there are many problems for which the method has out-standing utility.

Finally, it should be noted that, even for aninfinitesimal time step, the Kohn-Sham energy will notalways decrease monotonica11y during the evolution ofthe electronic system to its ground state. InsufFicientdamping of the wave-function coefFicients in the equa-tions of motion, for example, will result in fIuctuations ofthe Kohn-Sham energy during this evolution. This issimply the expected dynamical behavior of an under-damped system and will not prevent the electrons fromeuentually reaching their ground state.

K. Computational cost of molecular dynamics

The molecular-dynamics method provides a generaltechnique for solving eigenvalue problems. However,

this method was developed specifically in the context oftotal-energy pseudopotential calculations, and in this sec-tion the computational cost of using the molecular-dynamics method to perform total-energy pseudopoten-tial calculations will be calculated. An important featureof any computational method is the rate at which thecomputational time increases with the size of the system,and so particular attention will be paid to the operationsthat increase fastest as the size of the system increases.

In a total-energy pseudopotential calculation the wavefunctions f; are expanded in a plane-wave basis set sothat

g, (r)=pc, „+Gexp[i(k+Cx) rj . (3.25)

1. Calculation of charge density

The charge density is most cheaply computed in realspace because it is simply the square of the magnitude ofthe wave function. This requires that the wave functionsbe Fourier transformed from reciprocal space to realspace. However, the charge density has componentswith wave vectors up to twice the cuto8' wave vector forthe electronic wave functions. Hence, to maintain afaithful representation of the charge density, one mustcompute it on a Fourier grid twice as dense in each spa-tial direction as the grid required to provide a faithfulrepresentation of the wave function. Furthermore, theplane-wave components of the wave function, which liewithin a sphere in reciprocal space, must be placed in an

A feature of total-energy pseudopotential calculationsis that the number of plane-wave basis states used in thecalculations is always much larger than the number ofoccupied bands. The ratio is typically 100:1. This ratiois independent of the size of the system: increasing thesize of the unit cell will increase the number of atoms inthe unit cell. However, the lengths of the reciprocal-lattice vectors will be reduced, so that there will be moreplane waves with energies below the cutoff energy for theplane-wave basis set. The ratio of the number of planewaves to the number of occupied bands is considerablylarger when the system contains any transition-metalatoms or first-row atoms. These atoms have strong pseu-dopotentials, and Inuch larger basis sets are needed to ex-pand the electronic wave functions. Only the parts of thetotal-energy calculation that scale faster than linearlywith the size of the system will be considered in the fol-lowing discussion. The parts of the calculation that scalelinearly with the size of the system do not have large"prefactors" associated with their computational cost, sothe computational time required for these parts of thecalculation is negligible. A calculation for a system thathas Xz occupied bands in which the electronic wavefunctions are expanded in a basis set containing 1V~+plane waves will be considered. The operations describedbelow scale linearly with the number of k points so that acalculation for a single k point will be considered.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 21: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1065

orthorhombic grid of plane waves to perform the fastFourier transformation (FFT). Fourier transformation ofa single wave function from reciprocal space to real spacecan be performed in NFFT =N~sln(N~s ) operations usinga fast Fourier transform algorithm, where NRs is thenumber of points in the real-space grid. For the reasonsgiven above, Nzz is of the order of 16 times the numberof plane waves in the wave function. Therefore the costof performing the fast Fourier transform for a singleband requires N„FT = 16N~~ln(N~~) operations; the fac-tor 16 is irrelevant in the logarithm. The total chargedensity can then be calculated in N~N„„T operations.

2. Construction of Hamiltonian matrix

The Kohn-Sham Hamiltonian is a matrix of dimensionNppr. The total number of elements in the matrix is N~~.The number of operations to construct the matrix andthe storage required by the matrix both increase as Nz~.

3. Accelerations of the coefficients

The molecular-dynamics equation of motion for thecoefficient of the plane-wave basis state with wave vectork+G is obtained by substitution of Eq. (3.25) into (3.16)and use of (2.3) for H. Integration over r then gives

PCi k+Q ~k+G~' —A, ; c; „+G—g VH(G —G')c; g+G2m Ql

—g Vzc(G —G')c; z+o. g V;,„(k+G,k+G')c; kQI Ql

(3.26)

with A, , defined as in Eq. (3.15).If Eq. (3.26) is used to calculate the accelerations of the

coefficients of the plane-wave basis states, Nz~ opera-tions are required to calculate the accelerations for a sin-gle wave function. This is the number of operations re-quired to multiply the wave function f, by the Kohn-Sham Harniltonian H. The accelerations of each of theNz occupied bands must be calculated, so that a total ofNzNz~ operations are required to compute the accelera-tions of the wave functions.

4. Integration of equations of motion

Integration of the equations of motion for the electron-ic states requires NzN&~ operations.

I

putational speed over matrix diagonalization techniques.However, the number of time steps required to integratethe equations of motion and converge the electron statesin the molecular-dynamics method is usually /arger thanthe number of iterations required to achieve self-consistency in matrix diag onalization calculations.Hence the equation of motion given in (3.26) does notprovide a significantly faster method for obtaining theself-consistent eigenstates than conventional matrix diag-onalization techniques. It should also be noted that thefull Kohn-Sham Hamiltonian has to be stored, which re-quires N~~ words of memory. Therefore, even if themolecular-dynamics method were faster than convention-al matrix diagonalization methods, the memory require-ment would make it difficult to perform calculations thatrequired extremely large plane-wave basis sets.

5. Orthogonalization

The number of operations required to orthogonalizethe wave functions using Car and Parrinello's algorithmis proportional to NzNz~.

6. Comparison to conventional matrixdiag onalizations

The total computational time for the processes de-scribed above is dominated by the N~Nz~ operations re-quired to calculate the accelerations of the wave func-tions. This should be compared to the computationalcost of conventional matrix diagonalization techniquesfor total-energy pseudopotential calculations, which isdominated by the Nz~ operations required to diagonalizethe Kohn-Sham Hamiltonian. As the number of occu-pied bands in a total-energy pseudopotential calculati:onis so much smaller than the number of plane waves in-cluded in the basis set, it appears that the molecular-dynamics method offers a considerable increase in corn-

7. Local pseudopotentials

PCi, k+ Qf2

)k+G/' —X, c, „—g VT(G —G')c, „+o

Qt

where VT(G) is the total potential given by

VT(G) = V„„(G)+VH(G)+ V~c(G) .

(3.27)

(3.28)

The problems described above can be overcome by re-placing the nonlocal ionic pseudopotential by a localpseudopotential. A local pseudopotential is a functiononly of the distance from the nucleus or, equivalently, isa function only of the difference between the wave vec-tors of the plane-wave basis states. Therefore use of thesame procedure as before to derive Eq. (3.26) results inthe equation of motion for the coefficient of the plane-wave basis state at wave vector k+6 with a local pseu-dopotential

Rev. Mod. Phys. , Vol. 64, No. 4, October 1 992

Page 22: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1066 M. C. Payne et al. : Abinitio iterative minimization techniques

The equation of motion can more usefully be written as

I c;k+G II+Gl —X, c, „+o2

2m

—I [ VT(r)g;(r)]exp(i [k+Cx] r)d r . (3.29)

This form of the equation of motion shows that the mul-tiplication of the wave function by the Kohn-Sham Ham-iltonian can be divided into a part that is diagonal in re-ciprocal space and a part that is diagonal in real space.However in order to maintain a faithful representation ofthe potentials and the product VT(r)g;(r), one must per-form the real-space multiplication on a double-densityFourier transform grid (see Sec. III.K.1). This separationprocedure leads to a multiplication that can be per-formed in just 17Nppr operations: 16N~~ for the multi-plication of the wave function by the potential in realspace and Nz~ for the multiplication of the wave func-tion by the kinetic-energy operator in reciprocal space.The computational cost of calculating the accelerationsof the wave functions is dominated by the NFFT opera-tions required to Fourier-transform the wave functionfrom reciprocal space to real space and the N„FT opera-tions required to transform the contribution to the ac-celeration that is calculated in real space back to recipro-cal space. The accelerations of all the plane-wavecoefficients for the Nz occupied electron states can becalculated in 2N~N»T ope~at~o~s. Reductio~ of thenumber of operations required to evaluate the accelera-tions of the coefFicients from N~Npgr to 2NgNFFT makesthe molecular-dynamics method very much faster thanconventional matrix diagonalization techniques for anysystem, although the saving in time is obviously muchgreater for large systems. The same technique can beused with any iterative matrix diagonalization technique,since all iterative matrix diagonalization techniques in-volve the multiplication of trial wave functions by theHamiltonian matrix.

Another advantage of using local pseudopotentials intotal-energy pseudopotential calculations is that theHamiltonian "matrix" can be stored in 17N~~ words ofmemory by storing the potential-energy operator in realspace in 16N&~ words and the kinetic-energy operator inreciprocal space in Nz~ words. This allows extremelylarge "matrices" to be stored with ease.

Unfortunately, it is not possible to describe all atomswith local pseudopotentials. It is important to use anefIicient scheme for applying nonlocal pseudopotentials ifthe computational speed of the molecular-dynamicsmethod is to be retained. EfFicient schemes have beendeveloped for applying nonlocal pseudopotentials in themolecular-dynamics method and other iterative methods.These schemes will be described in Sec. IX, but it shouldbe noted that the most efficient of these schemes requires16N&~N& operations to apply the operators for eachcomponent of the nonlocal pseudopotential. This is lessthan the cost of Fourier-transforming the wave functionsand is less than the cost of orthogonalizing the wave

functions in any but the smallest of systems, and so theseoperations dominate the computational cost irrespectiveof whether local or nonlocal pseudopotentials are used.

IY. IMPROVEMENTS IN ALGORITHMS

In this section a number of relatively simplemodifications to the molecular-dynamics-based methodare introduced which offer significant improvements overthe original approach when calculating the electronicground state for a fixed ionic configuration. These im-

provements include methods that increase the computa-tional speed of the calculations and methods that permitthe electrons to converge to the exact Kohn-Sham eigen-states. As discussed in Sec. III, the possibility of obtain-ing the latter is important, for it paves the way for stud-ies of metallic systems. However, it should be noted thatthese modifications are not generally useful when per-forming dynamical simulations of the ionic system, forreasons that will be discussed in Sec. VII. The remainingproblems in the calculation of the electronic ground statefor a static ionic configuration, problems that cannot beovercome with the improvements described below, areaddressed in Sec. IV.D.

A. Improved integration

It has already been pointed out in Sec. III.D that an at-tempt to improve on the Verlet algorithm by using amore sophisticated finite-difference technique for in-

tegrating the equations of motion may not produce anyincrease in computational speed. This is because a moresophisticated finite-difference equation typically requiresinformation from a large number of time steps in order tointegrate the equations of motion. A large amount ofmemory is generally needed to store this information.Therefore, if sufficient core memory is not available, thecomputation can involve an excessive number of inputand output operations.

An alternative technique for improving the integrationof the equations of motion, which does not require anyadditional storage, is to calculate the change in the mag-nitude of the acceleration of the coefficient during thetime step, as the values of the coe+cients change. [Recallthat simple use of the Verlet algorithm (3.17) requires thecoefficients to remain fixed during a time step. ] At first

thought, it might appear that the determination of thetime dependence of the coefBcients during a time step ac-tually necessitates a solution of the equations of motionin the first place! This, however, turns out not to be thecase. An examination and manipulation of the form ofthe equations of motion (3.26) and (3.27) reveals that it is

indeed possible to obtain a good approximation to thetime dependence of the coefficients during a time step.The argument is as follows (Payne et al. , 1986).

With the assumption of a local pseudopotential for

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 23: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et aI.: Abinitio iterative minimization techniques 1067

PCi k+Gf2

ik+Gi'+ V,(a=a) —X, c,

Vr(G G—')c, „+G. .G'4G

If one now introduces the definitions

(4.1)

~k+a~2+ VT(a=a) —A, , /iM (4.2a)

simplicity, the equation of motion (3.27) can be rewrittenas

highest kinetic-energy basis states, however, provide theleast important contributions to the wave functions, sincethe coefficients of these basis states are small if the cuto6'energy for the basis set is large enough to converge thetotal energy. It is unsatisfactory that the least physicallysignificant basis states restrict the length of the time stepused to integrate the equations of motion. One methodof overcoming this restriction is to perform an analyticintegration of the equation of motion. A direct compar-ison between the Verlet algorithm and the analytic in-tegration scheme is presented in Sec. IV.C below.

and

Bi i+G= $ VT(G —G')c(k+G' /p ~

G'WG(4.2b)

1. Analytic integration of second-order equationsof motion

equation (4.1) becomes

2i, k+G ~i,k+Gci, k+G +i,k+G ' (4.3)

This shows that the equation of motion for thecoefficient of each plane-wave basis state is essentially anoscillator equation. This permits an immediate deter-mination of the largest acceptable time step for use withthe Verlet algorithm. For plane-wave basis states at largereciprocal-lattice vectors, the oscillation frequency of thecoefficient increases roughly linearly with the magnitudeof the wave vector of the plane-wave basis state or as thesquare root of its kinetic energy. To integrate the oscilla-tor equations stably, using the Verlet algorithm, one mustrestrict the length of the time step so that (co; i,+Gb, t) (1for all of the plane-wave basis states. This means that thelength of the time step is restricted by the plane-wavebasis states that have the highest kinetic energies, whichis consistent with the discussion in Sec. III.D. The

/ 2Ci, k+G ~i,k+G~ ~i,k+G

and the complementary function is

(4.4)

c; i,+G(t) = A, exp(ice; k+Gt)+ A2exp( ice; „+G—t) .

(4.5)

The coefficients 3, and A2 are determined by the valuesof the coefficient at the present and previous time steps togive

It is extremely difficult to integrate the equation ofmotion for the coefficient c;i,+G [Eq. (4.3)] because theterm B; k+G depends on the values of all the othercoefficients c; k+G. However, if the variation of B,. k+G isignored, then the equation of motion (4.3) can be easilyintegrated analytically over a time step. The particularintegral is

c; „+G(ht) = 2cos(co; k+ab t)c( z+G(0) c; k+G( —A—t) —2[1 cos(co; i,+G—ht)]B; k+o/co; i+G, (4.6)

where c; k+G(0) is the value of the coefficient at the present time step and c; k+G( ht) is the value of—the coefficient atthe previous time step.

In Sec. III.F, the importance of damping the electronic equations of motion was discussed. One of various possiblechoices is to apply the damping directly to the equation of motion. If a term of the form yc; k+o is added to Eq. (4.3),the coefficient at the next time step is now given by

c; i,+G(b t) = 2 exp( y, k+ob t)cos(n; k+—Gb t)c; k+o(0) exp( 2y, i,+Gb, t)c—; i,+G—( b. ,t)——[1+exp( —2y, „+Ger)—2 exp( y, „+Gb,t)cos(~, „+Gh—t))B, „+G/~', „+G (4.7)

where

y i, v+ 6=DI ~;,k~o I

(4.8)

and D is the damping factor applied to the motion of thecoefficients.

If the expectation value of the energy of the stateA,;k+o is larger than ~A' ~k+G~ /2m+ V(G=Q)], thevalue of ~; k+o in Eq. (4.6) becomes imaginary. This canoccur for the basis states at small reciprocal-lattice vec-tors in the higher-energy electronic states. Nevertheless,

I

the analytic expression for the value of the coefficient ofthe plane-wave basis state at time At is still valid, provid-ed the argument of the cosine function is taken as imagi-nary.

It might appear to be unduly expensive to use Eq. (4.6)to compute the value of the coefficient of the plane-wavebasis state because several function calls are required toevaluate the expression. A square root is needed to ob-tain m; k+G, and then a cosine or hyperbolic cosine mustbe evaluated to obtain cos(co; i,+oh, t). These function

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 24: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1068 M. C. Payne et ai. : Ab/nitio iterative minimization techniques

calls are required for every coe%cient of the basis statesfor each electronic state. However, the heavy computa-tional cost of the function calls can be avoided by an in-terpolation of the value of cos(co; „+Ght) from a data ar-ray that contains cos(ro; „+oh,t) as a function of co; z+o.If the damping is applied directly in the equation ofmotion, the function exp( —A, z+&b, t) can be interpolatedin a similar manner.

2. Analytic integration of first-order equationsof motion

Some authors (Williams and Soler, 1987) have suggest-ed a change to a first-order equation of motion for the

electronic degrees of freedom. Thus Eq. (4.3) becomes

2Ci k+G mi, k+GCi, k+G Bi,k+G &

(4.9)

with ro;k+o and B;k+o defined as in Eq. (4.2). Thiswould have adverse consequences for following ionmotion, as described later in Sec. VIII.C, but is complete-

ly appropriate for fixed-ion positions. Again assumingthat B; z+& does not vary during the time step, Eq. (4.9)can be analytically integrated to give

Bi,k+ G Bi,k+ Gc;z+&(br)= — ' + c;k+o(0)+ ' exp( co; k+ob—t) .

~i,k+G ~i,k+G(4.10)

This first-order equation gives roughly the sameasymptotic convergence rate as the correspondingsecond-order equation of (4.6). But even given the sameconvergence rate, the first-order method is more efticient,since it requires only half the storage or half theinput/output operations for the large number of wave-function degrees of freedom.

B. Orthogonalization of ~ave functions

When analytic expressions for the integration of theequations of motion (4.6) and (4.10) are used to calculatethe coefficients, the length of the time step ht is no longerrestricted by the requirement ~,. k+Ght &(1 for all of theplane-wave basis states, and a longer time step can beused. Car and Parrinello's iterative scheme for orthogo-nalization of the wave functions, described in Sec. III.E,becomes increasingly expensive to apply as the time stepsincrease in length. This is because the wave functions be-come more nonorthogonal during the time step, andmore iterations of the algorithm are required to orthogo-nalize the wave functions. Since each iteration of the or-thogonalization scheme requires NzXz~ operations, it issensible to adopt an alternative orthogonalization schemethat makes full use of the increase in computationalspeed obtained by integrating the equations of motion us-

ing a longer time step.

1. The Gram-Schmidt scheme

The simplest and most computationally efticient or-thogonalization technique for long time steps is theGram-Schniidt scheme (see Lin and Segel, 1974). In thisapproach, a set of orthonormal wave functions Ig,'I iseasily obtained from a set of linearly independent wavefunctions I Q,. I by use of the following algorithm:

(4.11a)

with

(4.11b)

The Gram-Schmidt scheme also has the advantage ofbreaking spurious symmetries that may occur in thechoice of initial conditions for the electrons. These sym-

metries can propagate through the equations of motionand keep the electrons from reaching their true groundstate. This is a rather subtle and technical point, ad-

dressed later in Sec. VI.A.The most significant difference, however, between the

Gram-Schmidt scheme and the Car and Parrinello or-thogonalization method (3.23) lies in the convergence ofelectrons to Kohn-Sham eigenstates.

2. Convergence to Kohn-Sham eigenstates

As discussed earlier in Sec. III.H. , the Kohn-Sham en-

ergy functional is minimized by any set of wave functionsthat are linear combinations of the lowest-energy Kohn-Sham eigenstates. The orthogonalization method of Carand Parrinello (3.23) generates orthogonal wave functionsby a procedure that tends to intermix the wave functions,thereby preventing the Kohn-Sham eigenstates from be-

ing singled out. As a consequence, the final wave func-tions are, in general, linear combinations of the Kohn-Sham eigenstates.

In contrast, the Gram-Schmidt procedure orthogonal-izes wave functions in a definite order. All of the higher-energy wave functions are forced to be orthogonal to thelowest-energy wave function, and so on. This in turnforces each state to converge to its lowest possible energyunder the constraint that it be orthogonal to all statesbelow it. The set of lowest possible single-particle levelsunder these constraints comprises the Kohn-Sham eigen-states.

The ability to converge to Kohn-Sham eigenstates isvery important for metallic systems. After each timestep it is necessary to know the correct ordering of theenergy levels in order to fill states properly up to the Fer-

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 25: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al.: Abinitio iterative minimization techniques 1069

mi level. Without Kohn-Sham eigenstates, the Fermilevel of a metallic system cannot be defined.

C. Comparison be@veen algorithms

D. Remaining difficulties

The molecular-dynamics method and the algorithmimprovements described above all become ineffective asthe size of the system increases. For example, in silicon,only systems with supercell length scales less than about50 A benefit from these modifications to the molecular-dynamics method. For larger length scales the maximumstable time step is completely dominated by the need tosuppress Auctuations in the charge density, or chargesloshing. As discussed earlier in Sec. III.J, this is a conse-quence of instabilities in the Kohn-Sham energy Hamil-tonian which arise for very small reciprocal-lattice vec-tors and which require intractably small time steps toovercome. These instabilities present severe obstacles to

-700—0

0-o00

h -780 — o00

Oo -820

I—

~

I

20I

40Time steps

60 80

FIG. 13. Evolution of the total energy for an eight-atom cell ofgermanium. in the diamond structure. Open circles representthe original scheme. The filled circles correspond to an analyticintegration of the equations of motion as described in the text.

The simple modifications to the molecular-dynamics-based method described in Secs. III.A and III.B abovecan produce significant increases in computational speed.This is illustrated in Fig. 13, where the evolution of thetotal energy of an 8-atom cubic supercell of germaniumin the diamond structure is shown. Here the ions areheld fixed and the electrons are being iterated to conver-gence. A local pseudopotential of the Starkloff-Joannopoulos type is used with a basis set of 4096 planewaves. The open circles show the results obtained by us-ing the Verlet algorithm to integrate the equations ofmotion, while the filled circles show the results obtainedby using analytic integration of the equation of Inotion.The use of the analytic expression to integrate the equa-tion of motion allows the time step to be increased to sixtimes the value at which the Verlet algorithm becomesunstable and gives convergence of the total energy inone-tenth of the number of time steps.

future studies of large and more complex systems.Clearly, some different and novel methods are required

to surmount the problems encountered in the regime oflarge systems. One method that overcomes thesedifficulties involves a direct minimization of the Kohn-Sham total energy with a conjugate-gradients approach.This is the subject of the next section.

V. DIRECT MINIMIZATION OF THE KOHN-SHAMENERGY FUNCTIONAL

At the end of Sec. IV, it was stated that all the algo-rithms described so far in this article encounterdifficulties when performing calculations on large sys-tems. The difficulties encountered can be attributed tothe discontinuous changes in the Kohn-Sham Hamiltoni-an from iteration to iteration. There are an infinite num-ber of Kohn-Sham Hamiltonians, each of which has adifferent set of eigenstates. One of these sets of eigen-states, the set generated by the self-consistent Kohn-Sham Hamiltonian, minimizes the Kohn-Sham energyfunctional. All the methods for performing total-energypseudopotential calculations described so far involve anindirect search for the self-consistent Kohn-Sham Hamil-tonian. The search procedure used in the molecular-dynamics method was illustrated in Fig. 10. The discon-tinuous evolution of the Hamiltonian can be clearly seenin this figure. In Sec. III.J it was shown that use of toolong a time step in the molecular-dynamics method canlead to an unstable evolution of the Kohn-Sham Hamil-tonian. This results in wave functions that move furtherfrom the self-consistent Kohn-Sham eigenstates at eachtim. e step, as illustrated in Fig. 12. Unfortunately, thevalue of the critical time step at which the instabilityoccurs decreases as the size of the system increases. It isalways possible to ensure stable evolution of the electron-ic configuration using the molecular-dynamics method,but this is at the cost of using smaller time steps andhence more computational time as the size of the systemincreases. This problem is also present in all of the algo-rithms described in Sec. III, since all of these employ anindirect search for the self-consistent Kohn-Sham Hamil-tonian.

To perform a total-energy pseudopotential calculationit is necessary to find the electronic states that minimizethe Kohn-Sham energy functional. As discussed above,to perform this process inChrectly, by searching for theself-consistent Kohn-Sham Hamiltonian, can lead to in-stabilities. These instabilities would not be encounteredif the Kohn-Sham energy functional were minimizeddhrectly because the Kohn-Sham energy functional nor-mally has a single well-defined energy minimum. Asearch for this energy minimum cannot lead to instabili-ties in the evolution of the electronic configuration. Inthis section, a computational method is introduced thatallows direct minimization of the Kohn-Sham energyfunctional in a tractable and efficient manner. In Sec.V.A, an introductory discussion is provided of two gen-

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 26: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

10T0 M. C. Payne et al. : Abinitio iterative minimization techniques

eral methods that can be used for the minimization ofany function. Of these general methods, the conjugate-gradients method is shown to be particularly promising.The modifications and extensions to the conjugate-gradients method that are needed in order to performtotal-energy pseudopotential calculations tractably andstably for large supercells and large plane-wave kinetic-energy cuto6's are described in Sec. V.B.

STEEPEST DESCENTS

A. Minimization of a function CONJUGATE GRADIENT

In this section two general methods are described thatcan be used to locate the minimum of function F(x),where x is a vector in the multidimensional space [a de-tailed description of the techniques outlined in this sec-tion can be found in the book by Gill, Murray, andWright (1981, p. 144)]. It will be assumed that the func-tion F(x) has a single minimum. If the function hadseveral minima the methods described here would locatethe position of the minimum "closest" to the initial sam-pling point (strictly speaking, it would locate theminimum in whose basin of attraction the initial sam-pling point lies).

FIT+. 14. Schematic illustration of two methods of convergenceto the center of an anisotropic harmonic potential. Top:steepest-descents method requires many steps to converge. Bot-tom: Conjugate-gradients method allows convergence in twosteps.

1. The method of steepest descents

In the absence of any information about the functionF(x), the optimum direction to move from the point x'to minimize the function is just the steepest-descentdirection I' given by

It will be assumed that the direction of steepest descentat the point x' can be obtained from the negative of agradient operator G acting on the vector x' so that

g'= —Gx' .

To reduce the value of the function F (x) one shouldmove from the point x' in the steepest-descent directiong' to the point x'+ b 'g', where the function is aminimum. This can be done by sampling the functionF(x) at a number of points along the line x'+bg' in or-der to determine the value of b at which F(x'+bg') is aminimum. Alternatively, if the gradient operator G is ac-cessible, the minimum value of the function along theline x'+bg' can be found by locating the point where thegradient of the function is orthogonal to the search direc-tion, so that g G (x +b g ) =0. It should be noted thatthis process minimizes only the value of the functionalong a particular line in the multidimensional space. Tofind the absolute minimum of the function F(x), onemust perform a series of such line minimizations. Thusthe vector x'+b 'g' is used as the starting vector for thenext iteration of the process. This next point is conven-tionally labeled x . (The superscripts label the iterationsof the minimization process. ) The steps described above

can be repeated to generate a series of vectors x suchthat the value of the function F(x) decreases at eachiteration. Hence F(x') &F(x") for I )k. Each iterationreduces the value of the function F(x) and moves the tri-al vector x towards the vector that minimizes the func-tion. This process is illustrated schematically in the toppanel of Fig. 14.

Although each iteration of the steepest-descents algo-rithm moves the trial vector towards the minimum of thefunction, there is no guarantee that the minimum will bereached in a finite number of iterations. In many cases avery large number of steepest-descents iterations is need-ed to get close to the minimum of the function. Themethod of steepest descents performs particularly poorlywhen the minimum of the function F(x) lies in a longnarrow valley such as the one illustrated in Fig. 14. Thereason for the poor performance in this case is that eachsteepest-descent vector is orthogonal to the steepest-descent vector of the previous iteration. If the initialsteepest-descent vector does not lie at right angles to theaxis of the valley, successive vectors will point acrossrather than along the valley, so that a large number ofiterations will be needed to move along the valley to theminimum of the function. This problem is overcome byusing the conjugate-gradients technique.

2. The conjugate-gradients technique

It might seem surprising that there can be a bettermethod of minimizing a function than to move in thedirection in which the function decreases most rapidly.The rate of convergence of the steepest-descents methodis limited by the fact that, after a minimization is per-formed along a given gradient direction, a subsequent

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 27: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Ab initio iterative minimization techniques 1071

minimization along the new gradient reintroduces errorsproportional to the previous gradient. If the only infor-mation one has about the function F(x) is its value andgradient at a set of points, the optimal method would al-low one to combine this information, so that each minim-ization step is independent of the previous ones. To ac-complish this, one must first derive the condition thatmakes one minimization step independent of another.

For simplicity, consider a symmetric and positive-definite function of the form

d~ =gm+ y~d~

where

(5.8)

elude minimization over all previous directions in a mul-tidimensional space. The proof that directions generatedin this manner are indeed conjugate is the important re-sult of the conjugate-gradients derivation. The precisesearch directions d' generated by the conjugate-gradientsmethod are obtained from the following algorithm:

F(x)=—,'x G x, (5.3)m g g

m —i. m —ig 'g

(5.9)

where G is the gradient operator defined in Eq. (5.2).Consider now the minimization of F(x) along somedirection d' from some point x'. The minimum willoccur at x =x'+ b 'd' where b ' satisfies

(x'+b'd'). G.d'=0 . (5.4a)

This is obtained by difFerentiation of Eq. (5.3) withrespect to b' at x . A subsequent minimization alongsome direction d will then yield x =x +b d, where bsatisfies

(x +b d +b d ).G.d =0 (5.4b)

However, the best choice of b' and b, for the rninimiza-tion of F(x) along d' and d, is obtained from thedifferentiation of Eq. (5.3) with respect to both b ' and bat x . This gives

(x'+b'd'+b d ).G d'=0 (5.5a)

and

(x'+b'd'+b d ) G.d =0 .. (5.5b)

It is clear that in order for Eqs. (5.4) and (5.5) to be con-sistent, and consequently for the minimization along dand d to be independent, one must require that

d'. G-d2=d2 6 d'=0 . (5.6)

d".G.d =0 for num . (5.7)

The conjugate-gradients technique provides a simpleand effective procedure for implementation of such aminimization approach. The initial direction is taken tobe the negative of the gradient at the starting point. Asubsequent conjugate direction is then constructed froma linear combination of the new gradient and the previ-ous direction that minimized F(x). In a two-dimensionalproblem, it is clear that one would need only two conju-gate directions, and this would be sufFicient to span thespace and arrive at the minimum in just two steps, asshown at the bottom of Fig. 14. It is less clear, however,that the current gradient and the previous direction vec-tor would maintain all of the information necessary to in-

This is the condition that the directions d' and d be con-jugate to each other (see Gill et al. , 1981) and can be im-mediately generalized to

and y'=0.Since minimizations along the conjugate directions are

independent, the dimensionality of the vector space ex-plored in the conjugate-gradients technique is reduced by1 at each iteration. When the dimensionality of the func-tion space has been reduced to zero, there are no direc-tions left in which to minimize the function, so the trialvector must be at the position of the minimum. There-fore the exact location of the minimum of a quadraticfunction will be found, conservatively speaking, in anumber of iterations that is equal to the dimensionality ofthe vector space. In practice, however, it is usually possi-ble to perform the calculations so that far fewer itera-tions are required to locate the minimum.

Another way of thinking about the difference betweenthe conjugate-gradients technique and the method ofsteepest descents is that, in the method of steepest de-scents, each direction is chosen only from informationabout the function at the present sampling point. In con-trast, in the conjugate-gradients technique the searchdirection is generated using information about the func-tion obtained from all the sampling points along theconjugate-gradients path.

The conjugate-gradients technique provides an efFicientmethod for locating the minimum of a general function.This suggests that it should be a good technique for lo-cating the minirnurn of the Kohn-Sham energy function-al. It is important, however, to implement theconjugate-gradients technique in such a way as to max-imize computational speed, so that each iteration of themethod is not signifieantly more expensive than alterna-tive techniques, and to minimize the memory require-ment so that calculations are not limited by the availablememory. A conjugate-gradients method that fulfillsthese criteria has been developed by Teter et al. (1989).This method is described in See. V.B below. Stieh et aI.(1989) have used a conjugate-gradients method with theindirect approach in order to search for the eigenstates ofthe Kohn-Sham FSamiltonian. While this is an efFicientmethod for obtaining the eigenstates of the Hamiltonian,it is expected, for reasons discussed earlier, that thismethod will encounter difBculties when calculations areperformed on very large systems. The authors point outthat in these cases it will be necessary to use a directminimization method. Gillan (1989) has developed aconjugate-gradients technique for directly minimizing the

Rev. Mod. Phys. , Yol. 64, No. 4, October 3992

Page 28: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1072 M. C. Payne et al. : Abinitio iterative minimization techniques

Kohn-Sham energy functional. His method has manysimilarities to that described in the following section, butit does have a signi6cantly larger memory requirement.

B. Application of the conjugate-gradients method

In this section, a computational technique is intro-duced that overcomes the instabilities described earlier,which are associated with large supercell sizes and largeplane-wave kinetic-energy cutoffs. This technique usesthe conjugate-gradients approach, with the properpreconditioning, to minimize directly the Kohn-Sham en-ergy functional.

The abstract description of the conjugate-gradientstechnique presented above considered a function I" of thevector x where the gradient of the function could be cal-culated using a gradient operator G. In the case of total-energy calculations, the Kohn-Sham energy functional Etakes the place of the function F, the wave functions I g; I

take the place of the vector x, and the Kohn-Sham Ham-iltonian H is the relevant gradient operator G.

1. The update of a single band

2. Constraints

A tota1-energy calculation differs from the conjugate-gradients minimization described previously in that theelectronic wave functions are constrained to be orthogo-nal. The orthogonality constraints can be maintained byensuring that the steepest-descent vector is orthogonal toall the other bands. The steepest-descent direction calcu-lated as

(5.12)

is the direction of steepest descent allowed by the ortho-gonality constraints. There is no iteration index on thewave functions P~ in Eq. (5.12) because these wave func-tions do not vary during iterations for band i.

If the search direction for band i were not orthogonalto the wave functions of all the other bands, all of thewave functions would have to change during each itera-tion in order to maintain the constraints of orthogonali-ty. Since all the wave functions would change, thecharge density from all of the bands would have to becomputed at each iteration. By ensuring that the searchdirection is orthogonal to all the other bands, this tech-nique requires computation only of the change in thecharge density from the single band at each iteration.

A conjugate-gradients iteration can be used to updateall of the electronic wave functions simultaneously. Theonly drawback to this approach is that a large amount ofdata has to be stored between iterations to ensure theconjugacy of the search directions. To determine theconjugate direction requires, among other things, theprevious conjugate direction and the present steepest-descent vector. If all of the wave functions are updatedsimultaneously, both of these constitute an array of thesame size as the wave-function array, so that, in total,three arrays of the size of the wave-function array areneeded to perform the conjugate-gradients calculation.The memory requirement in molecular-dynamics calcula-tions is dominated by the storage of the wave-function ar-rays. Increasing the number of arrays that must bestored could limit the size of system for which calcula-tions can be performed. Ideally all of the advantages ofthe conjugate-gradients technique should be retainedwithout increasing the memory requirement. This can beachieved by updating a single band at a time.

The steepest-descent direction for a single band isgiven by

3. Preconditioning

Successive steps along the conjugate-gradient direc-tions will reduce the magnitude of the error in the wavefunction. From Eq. (5.10) it is clear that a measure of theerror in the wave function f, is contained in thesteepest-descent vector g;. Ideally, if the steepest-descentvector were a simple multiple of the error in the wavefunction, then moving the correct distance along thesteepest-descent direction would entirely eliminate the er-ror in the wave function. The actual relation between theerror in the wave function, 5g;, and the steepest-descentdirection g; is most easily demonstrated by expansion of5$, in terms of the eigenstates of the Kohn-Sham Hamil-tonian,

(5.13)

The steepest-descent vector is obtained by substitution ofEq. (5.13) into (5.10), which gives

(5.10) g;= —[H —A, , ]pc; (5.14)

where = —g (e —A,;)c; (5.15)

(5.1 1)where c is the eigenvalue associated with the eigenstate

It should be noted that the superscript m labels the itera-tion number and the subscript i labels the band.

It can be seen that the steepest-descent vector g, is

only a multiple of the error vector 5$, if all the unoccu-

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 29: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques l073

pied eigenstates of the Kohn-Sham Hamiltonian are de-generate; however, the Kohn-Sham Hamiltonian has abroad spectrum of eigenvalues, which extends up to thecutoff energy for the plane-wave basis set and leads topoor convergence in a conjugate-gradient calculation.Each step tends to remove components of the error vec-tor that correspond to eigenstates in a particular energyrange. The rate of convergence will be improved if somemethod is used to conjugate for the weighting factorsc. —A, ; that distinguish the error vector and the steepest-descent vector. The technique of preconditioning can beused to achieve this approximately (see Chill et al. , 1981).

The technique of preconditioning involves multiplyingthe steepest-descent vector by a preconditioning matrixK to produce a preconditioned steepest-descent vector gthat more accurately represents the error vector, as illus-trated in Fig. 15. In principle, a preconditioning matrixexists that perfectly preconditions the steepest-descentvector so that the preconditioned vector is parallel to theerror vector. However, this preconditioning matrix willbe a full Xz~xXz~ matrix, which would then require

N&~ operations to precondition the steepest-d. escent vec-tor for each band, thus making the computationalscheme prohibitively expensive. In practice, it is ex-tremely expensive to construct an exact preconditioningmatrix, and the cost of this operation would make thecost of the calculation even more unfavorable. It is al-ways more efBcient to use an approximation precondi-tioning matrix and a succession of conjugate-gradientminimizations rather than to attempt to compute and ap-ply exact preconditioning.

The broad eigenvalue spectrum of the Kohn-ShamHamiltonian in pseudopotential calculations that useplane-wave basis sets is associated with the wide range ofenergies of the basis states. The higher-energy eigen-states of the Hamiltonian are dominated by plane-wavebasis states whose high kinetic energies lie close to the ei-genvalue of the state. To make those states whose eigen-values are dominated by their kinetic energy nearly de-generate, one must remove the effect of the kinetic-energy operator in the Hamiltonian. This is easily

IU

CCU)

achieved by multiplication of a diagonal preconditioningmatrix, which is essentially the inverse of the kinetic-energy operator. This argument breaks down for thelower-energy eigenstates because the potential and kinet-ic energies are similar, and so the potential is strongenough to mix a range of different energy plane-wavebasis states into the eigenstates. Therefore the elementsof the preconditioning matrix should become a constantfor the plane-wave basis states of low energy rather thanvarying as the inverse of the kinetic energy.

It has been found that preconditioned steepest-descentvectors that accurately represent the errors in the wavefunctions can be obtained by multiplication of thesteepest-descent vectors by a preconditioning matrix EC

whose matrix elements are given by the following expres-sion:

27+ 18x + 12x ~+ Sx 3

27+18x+12x +8x +16x

where

(X'~k+a[')/2mTm

E

and T; =(P; ~(—fi /2m)V ~P, ) is the kinetic energy of

the state g; . The matrix elements KG & have the follow-ing attractive properties. As x approaches zero, theEG G. approach unity, with zero first, second, and thirdderivatives. This guarantees that the smaH wave-vectorcomponents of the steepest-descent vector remain un-changed. Above x =1, the EC«asymptotically ap-proach 1/[2(x —1)] with an asymptotic expansioncorrect to fourth order in 1/x.

This factor thus causes all of the large wave-vectorcomponents to converge at nearly the same rate. Thispreconditioning procedure should be compared to the an-alytic integration technique outlined in Sec. IV.A. 1. Inthe molecular-dynamics method the diagonal dominanceof the Hamiltonian due to the kinetic energy causes arapid oscillation of the wave-function coe%cients, andthe analytic integration technique provides a method forallowing a time step that is not restricted by these oscilla-tions. In the conjugate-gradients method, the same diag-onal dominance of the Hamiltonian produces a steepest-descent vector that is biased towards plane-wave com-ponents with large wave vectors, and the preconditioningdirectly reduces these components. Although both prob-lems have the same basic cause, their solutions are ratherdifferent in the two methods.

The preconditioned steepest-descent vector g; is

(5.17)

FIG. 15. Spectral representation of error in wave function 5g,the gradient of the wave function g, and the preconditioned gra-dient g. Note that the error in the wave function can be elim-inated completely, in a single step, only if the preconditionedgradient is added to the wave function g.

The preconditioned steepest-descent vector is not or-thogonal to all the bands. For the computational reasonsgiven above, any change to a band vector must leave allthe other bands unaffected. The preconditionedsteepest-allowed-descent vector that is orthogonal to allthe bands is calculated as

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 30: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1074 M. C. Payne et al. : Abinitio iterative minimization techniques

(5.18)

4. Conjugate directions

The conjugate-gradient direction is constructed out ofsteepest-descent vectors as indicated in Eq. (5.8). Withthe inclusion of preconditioning as described above, thepreconditioned conjugate directions cp; are given by

(5.19)

where

One piece of information, such as a total energy or agradient of the total energy, is required to evaluate eachterm in this expression. As the summation in Eq. (5.24)is infinite, any attempt to locate the minimum of theKohn-Sham energy functional using this expressionwould be even more costly than searching for theminimum by calculation of the Kohn-Sham energy forvarious values of 0. However, it is found that the varia-tion of the Kohn-Sham energy with 0 is very accuratelyreproduced by just the n = 1 term in the summation inEq. (5.24). The accuracy of the fit can be seen in Fig. 16.Therefore the following expression for the Kohn-Shamenergy is sufficient to locate the minimum of the Kohn-Sham energy functional:

~m —1 lg~m—1) (5.20) E(8)=E,„+A, cos(28)+B,sin(28) . (5.25)

«m

»mt imam )1/2(5.22)

5. Search for the energy minimum

The steps outlined above yield a preconditioned conju-gate direction described by the normalized vector y,

'

which is orthogonal to all the bands. The following com-bination of the present wave function g; and the precon-ditioned conjugate vector cp;

g; cosO+y, ' sinO (8 real), (5.23)

is a normalized vector that is orthogonal to all the otherbands P~ (j Wi) Theref. ore any vector described by Eq.(5.23) obeys the constraints of orthonormality requiredfor the electronic wave functions.

The conjugate-gradients technique requires that thevalue of 0 that minimizes the Kohn-Sham energy func-tional be found. The search for the position of minimumenergy could be performed by the calculation of theKohn-Sham energy for various values of 0 until theminimum is located. If the approximate location of theminimum is not known, this would be a relatively expen-sive search technique. As an alternative method for lo-cating the minimum, the Kohn-Sham energy could bewritten as a general function of 0,

E(8)=E,„+g [A„cos(2nO)+B„sin(2nO)] .n =1

(5.24)

and y,'=O.The conjugate direction generated by Eq. (5.19) will be

orthogonal to all the other bands because it is construct-ed from preconditioned steepest-descent vectors that areorthogonal to all the other bands. However, the conju-gate direction will not be orthogonal to the wave functionof the present band. A further orthogonalization to thepresent band should be performed and a normalized con-jugate direction y,' calculated as

Three pieces of information are required to evaluatethe three unknowns in this expression. The value of thetotal energy at 8=0, E(0), is already known. The gra-dient of the energy with respect to 0 at 0=0 is given by

(5.26)

Since Hl P, ) has been computed to determine the

steepest-descent vector g;. , the value of (BE/BO)l8=0can be computed cheaply. Therefore just one furtherpiece of information is required to determine all the pa-rameters in Eq. (5.25). This could be the second deriva-tive of the Kohn-Sham energy functional at 0=0, or theKohn-Sham energy, or its derivative at any other valueof 0. Calculation of the second derivative of the Kohn-Sham energy functional would require additional pro-gramming effort in most pseudopotential codes, whereasno additional programming effort is required to calculatethe Kohn-Sham energy or its derivative at a second valueof 0. Therefore we shall first describe the method thatuses the Kohn-Sham energy at a second value of 0.

The second value of 0 should be chosen so that the

-180

~ -190

a -200CD

CDC-210—

~O

-220 '

08(rad)

FIG. 16. The total energy of an 8-atom silicon supercell plottedas a function of the parameter 0, which determines the propor-tions of wave function and its gradient as described in the text.The dots represent the exact calculation and the line is thelowest harmonic contribution.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 31: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1075

sampling point is far enough from 0=0 to avoid round-ing errors but not so far from the origin that the estimateof the curvature of the Kohn-Sham energy functional at8=0 becomes inaccurate. In the later stages of the cal-culation, shorter and shorter steps will be taken along theconjugate directions, and so it is important that the cur-vature of the Kohn-Sham energy functional at 8=0 beaccurately determined in order to locate the position ofthe minimum to high precision. It has been found thatcomputing the Kohn-Sham energy at the point 0=m. /300gives reliable results. If the value of the Kohn-Sham en-

ergy at the point 8=m /300, E (vr/300), is computed, thethree unknowns in Eq. (5.25) are calculated as

Alternatively, the Hartree term can be written" la(G)l'20 o~p eo G

where

h(G)= —f exp(iG r)—1

0 unit cell

X2Re[qp,' (r)*g; (r)]d r .

The exchange-correlation term is

1 BVxcf (2Re[y,' (r)"g; (r)]) d r .Q unit cell Bn(r)

(5.34)

(5.35)

(5.36)

Eavg = 3001 BE E(0)—cos

2m

0=0

2'1 —cos

300

(5.27)

The cost of computing the analytic second derivative isthe same as the cost of calculating the Kohn-Sham ener-

gy at a trial value of 0.The required value of (9, 0;„,is determined using

E(0)—E +-0=0

1-- ""'300

(5.28)

BEBO g=p

1 BE2 BO g=o

(5.37)

1 BEBl =— (5.29)

The wave function used to start the next iteration ofthe conjugate-gradients procedure, f, +', is

1 lB

e =—tan2

(5.30)

Once the parameters E„, A&, and B& have beendetermined, the value of 0 that minimizes the Kohn-Sham energy function can be calculated. The stationarypoints of the function (5.25) occur at the points

~=/; cos(8;„)+y,'. sin(0, „) . (5.38)

The new wave function generates a diA'erent charge den-sity from that generated by the previous wave function,and so the electronic potentials in the Kohn-Sham Ham-iltonian must be updated before commencing the nextiteration.

The value of 0& that lies in the range 0& 6I &m/2 is therequired value, 19;„.

The calculation of an analytic second derivative of theKohn-Sham energy at 0=0 provides an elegant way ofdetermining the optimum step length, even though itdoes require some additional programming. The re-quired expression for the second derivative is

1 f v(r).2Re[y,' (r)*g,. (r)]d r,Q unit cell

where

(5.32)

2Re[y,' (r')'g; (r')]v(r)= f, d r' .

4~o lr —r'l (5.33)

+f[Hartree term

+exchange-correlation term], (5.31)

where f is the product of the occupation number and thek-point weight, the Hartree term is

6. Calculational procedure

The Aow diagram in Fig. 17 illustrates the steps in-volved in an update of the wave function of a single bandusing the conjugate-gradients method. Eventually thewave functions of all of the bands must be updated.There is no point in converging a single band exactly iflarge errors remain in the bands that have not yet beenupdated. Thus no more than three or four conjugate-gradients iterations should be performed on one band be-fore moving to the next band. Once all of the bands havebeen updated, conjugate-gradients iterations are startedagain on the lowest band. Rather than perform a fixednumber of conjugate-gradients iterations on each band,one can perform conjugate-gradients iterations on oneband until the total energy changes by less than a partic-ular value or by less than a given fraction of the changeof energy in the first conjugate-gradients iteration. Theniterations are started on the next band. The convergencecriterion should be changed as the system moves towardsthe minimum of the Kohn-Sham energy functional.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 32: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1076 M. C. Payne et al. : A&initio iterative minimization techniques

~ ~ ~ ~ ~ ~ ~

REPEAT

N TIMESor

uNTlLCONVERGED

TRIAL WAVE FUNCTIQN FQR

Calculate steepest-descent vector

Orthogonalize to all bands

Precondition vector

Orthogonalize to all bands

Determine conjugate direction

Orthogonalize to present band

and normalize

Calculate Kohn-Sham energy attrial value of 6I

Calculate value of 8 thatminimizes Kohn-Sham energy

functional

CQN5l RUCT NEW TRIAL WAV

FIG. 17. Flow diagram for the update of a single band in the(direct-minimization) conjugate-gradients method.

7. Computational cost

After each sweep through the bands, the change in thetotal energy at which conjugate-gradients iterations on aband are stopped should be reduced. The reasons for ap-plying the convergence criteria outlined in this sectionand a discussion of the optimum choice of the parametersis given by Arias et al. (1991).

of the accelerations of the wave functions in themolecular-dynamics method. Hence the computationalcost is dominated by the cost of performing two Fouriertransformations which require 2%„„T operations. Ifpreconditioning is applied, one orthogonalization of thesteepest-descent vector to all the bands and one orthogo-nalization of the preconditioned steepest-descent vectorto all the bands must be performed. These two orthogo-nalizations require 2X&N~w operations. Only a singleorthogonalization would be needed if preconditioningwere not applied. To calculate the Kohn-Sham energy atthe trial value of 0, one must transform the trial wavefunction to real space, so that the charge density can becomputed, and then transform the charge density to re-ciprocal space so that the Hartree energy can be calculat-ed. These two Fourier transforms require 2X„„Topera-tions. After the wave function is updated, the newKohn-Sham Hamiltonian must be calculated. The newcharge density can be computed directly from the previ-ous wave function and the trial wave function, both ofwhich have already been transformed to real space, sothat no extra Fourier transforms are required for thisoperation. However, the new charge density must betransformed to reciprocal space so that the new Hartreepotential can be computed, and the new Hartree poten-tial must be transformed back to real space. These twoFourier transforms require 2%FFT operations. Thereforethe total number of operations required to perform aconjugate-gradients update on a single band is 6%FFToperations for the Fourier transforms and 2X&Xzwoperations for the orthogonalizations. Hence eachconjugate-gradients iteration requires twice the numberof operations required by a molecular-dynamics time stepfor a single band.

The computational cost of performing a conjugate-gradients iteration on a single band is higher than thecost of performing a molecular-dynamics time step on asingle band. In a molecular-dynamics calculation thecomputational cost per iteration is dominated by thethree Fourier transforms for each band and the orthogo-nalization. The number of operations required for eachband at each time step is 3XFFT for the Fourier trans-forms and XzXzw for the orthogonalization, where X~wis the number of plane-wave basis states,X„„T=16XIwlnXtw, and X~ is the number of occupiedbands. The computational cost of a conjugate-gradientsiteration is also dominated by the cost of performingFourier transforms and the cost of orthogonalizing thewave functions. Qnly the steps in the conjugate-gradients iteration that involve Fourier transforms or or-thogonalizations will be considered in this section. All ofthe other steps in the conjugate-gradients update of a sin-gle band require only %zw operations and constitute anegligible part of the computational effort.

The calculation of the steepest-descent vector in theconjugate-gradients method is identical to the calculation

In this section the speed of convergence of methodsthat directIy minimize the Kohn-Sham Hamiltonian iscompared with the speed of convergence of themolecular-dynamics method. The systems used for thesecalculations were specifically chosen to highlight theproblems associated with the use of conventionalmethods to perform total-energy pseudopotential calcula-tions. However, it is important to appreciate that thesesystems are representative of systems for whichmolecular-dynamics calculations are currently being per-formed.

1. Large energy cutoff

Figure I 8 shows the error in the total energy againstiteration number for a calculation on an 8-atom unit cellof silicon with a cutoff energy for the plane-wave basis setof 32 rydberg. Although this cutoff energy is muchhigher than the value actually needed for calculations onsilicon, it is typical of the cutoff energies required for the

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 33: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et a/. : Abinitio iterative minimization techniques 1077

f04

)~~ f0oCD

CD

D

~ fO~

CD)O

Eo fo-8CD

CD

c3 f0-12

0I

foIterat ions

20

CD

CD

C3

8 iCIoCDOO

EO

fQCD

CD

f040 f0 f5

Iterat Ions

f04I I I

4IQQU)go«ynomics~eggy ~

CD

~ 10~-

20

FIG. 18. Error in the total energy of an 8-atom silicon supercellwith a 32-Ry kinetic-energy cutoff vs iteration number forindirect-minimization (dashed line) and direct-minimization(solid line) methods. Note that the curve labeled "moleculardynamics" involves a first-order equation of motion, and thenumber of iterations associated with this curve has been diuided

by Pve to allow comparison at the same level of computationaleffort as discussed in the text.

FIG. 19. Error in the total energy of a row of 12 silicon unitcells with an 8-Ry kinetic-energy cutoff vs iteration number forindirect-minimization (dashed line) and direct-minimization(solid line) methods. Same scaling as in Fig. 18.

2. Long supercells

first-row elements and for transition metals. The dashedcurve represents results obtained with the molecular-dynamics method and a first-order equation of motion toevolve the electronic wave functions. This also includesthe algorithmic improvements discussed in Sec. IV. Allof the other curves show results obtained with methodsthat directly minimize the Kohn-Sham energy functional,methods for which the iteration number in Fig. 18 labelssweeps through all the bands. Each sweep through thebands involves a number of conjugate-gradients steps foreach band. In contrast, each iteration has been taken torepresent five time steps in the molecular-dynamicsmethod. This number has been chosen so that the com-putational time required for each "iteration" is similarfor all the schemes. The same scaling is used in the otherexamples presented in this section.

Figure 18 shows that all the schemes that d&rectly lo-cate the minimum of the Kohn-Sham energy functionalconverge in a smaller amount of computational time thanthe molecular-dynamics method. The improved perfor-mance of the conjugate-gradients method over themethod of steepest descents is clearly demonstrated. Itcan be seen that the preconditioning of the conjugate gra-dients significantly increases the speed of convergence forthis system. This is expected because the cutoff energyfor the plane-wave basis set is very high, so the spectrumof eigenvalues of the Kohn-Sham Hamiltonian is particu-larly broad, and preconditioning is particularly beneficial.

The diSculties associated with instability due to thediscontinuous evolution of the Kohn-Sham Hamiltonianare most apparent for large systems, when one or more ofthe unit cell vectors becomes very long. The perfor-mance of different computational methods for such a sys-tem has been tested by performing calculations on a longunit cell containing 24 silicon atoms. The results areshown in Fig. 19. The time step used in the molecular-dynarnics method had to be drastically reduced to main-tain stability in the calcu1ation. The consequences of us-ing a very small time step are clearly revealed by the slowrate of convergence shown in the figure. In contrast, allof the methods that directly minimize the Kohn-Shamenergy functional perform extremely we11. Thesemethods do not suffer from the instabilities associatedwith an indirect minimization of the Kohn-Sham energyfunctional. Comparing the relative speeds of conver-gence of the directly minimization methods shows thatthe speed of convergence improves on changing fromsteepest-descent to conjugate-gradients methods and thento the preconditioned conjugate-gradients method, as ex-pected.

3. A real system

To demonstrate that the results shown in Figs. 18 and19 are representative of calculations on real systems, Fig.20 shows the results of calculations on a twelve-atom unitcell of silicon dioxide in the a-critobalite structure. Fortest purposes a cutoff energy of 32 rydberg for the plane-

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 34: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1078 M. C. Payne et al.: Ab initio iterative minimization techniques

~04

~o

10oc(D

C3

~ 10-4CD

EO- ~0-8CD

C:CD

Cl10-&2

0

"~«y«mics

I I

10 15I)erat ions

20

FIG. 20. Error in the total energy in eV of a 12-atom cell of theo.-cristobalite form of Si02 with a 32-Ry kinetic-energy cutoff vsiteration number for indirect-minimization (dashed line) anddirect-minimization (solid line) methods. Same scaling as in Fig.18.

Vl. CHOICE OF INITIAL WAVE FUNCTIONSFOR THE ELECTRONIC STATES

A. Convergence to the ground state

wave basis set was used for this calculation. The figurecompares results obtained by use of the molecular-dynamics method and the preconditioned conjugate-gradients method, with the letter giving an improvedspeed of convergence, as it did in the previous examples.

the ground state of the electronic configuration. For ex-ample, when the lowest-energy plane waves are used asinitial states for a calculation of the electronic structureof an 8-atom cubic cell of any of the tetrahedral semicon-ductors, the electronic configuration does not converge tothe ground state. This is shown schematically in Fig. 21.Germanium, silicon, and carbon each have four valenceelectrons. An 8-atom unit cell of any of these materialscontains 32 electrons, so 16 doubly occupied electronicstates are required to accommodate the electrons.

Consider a calculation for the It point (0,0,0). The 16lowest-energy plane-wave basis states at the (0,0,0) kpoint are the plane wave with wave vector (0,0,0), the sixplane waves with wave vectors in the I1,0,0j star ofwave vectors, and any nine of the twelve plane waveswith wave vectors in the [1,1,0j star. At least six of thenine plane waves chosen from the I1, 1,0j wave-vectorstar can be paired, so that the wave vectors of the planewaves in each pair are separated by reciprocal-lattice vec-tors in the I2, 2, 0j set. For instance, the plane waveswith wave vectors (1,1,0) and (1,1,0) are connected bythe (2,2,0) and (2, 2,0) reciprocal-lattice vectors. Theband gap in the tetrahedral semiconductors is formed bythe potentials at the I2, 2, 0j reciprocal-lattice vectors.The electronic states on either side of the band gap arebonding and antibonding combinations of the planewaves connected by the {2,2,0j reciprocal-lattice vec-tors. If the plane waves with wave vectors (1,1,0) and(1, 1,0) are among the initial states used in the calcula-tion for the 8-atom unit cell, one of these initial filledstates will relax to a valence-band state based on the com-bination [(1,1,0)+(1,1,0) j, and the other will relax to aconduction-band state based on the combination[(1,1,0)—(1,1,0)], which is clearly not occupied in thephysical ground state. Hence the choice of the lowest-

The most important consideration when choosing theinitial electronic states for any of the iterative schemesdescribed in the previous sections is to choose a set ofwave functions that allows the electronic configuration toconverge to its ground state. Two factors that canprevent a set of initial states from converging to theground state are described in this section.

&110& 12

INITIALWAVE FUNCTIONS DEGENERACIES

RELAXED STATES

1. Spanning the ground state &100&

The most obvious reason why an electronicconfiguration might not converge to the ground state isthat the initial states do not span the ground state. Inthis case the electronic wave functions relax to a self-consistent set of Kohn-Sham eigenstates but not to theset that forms the ground state.

The simplest choice of initial wave functions for theelectronic states is a single plane wave for each state, andthe most obvious choice of initial states is the set of planewaves with the lowest kinetic energies. However, there isconsiderable danger that these initial states may not span

(000)

FIG. 21. Schematic energy-level diagram for the lowest statesof an 8-atom cubic supercell of the diamond structure at the(0,0,0) k point. The figure shows that three plane waves fromthe ( 111) set must be included in the initial conditions in orderfor the electronic configuration to relax to the 16 valence-bandstates that constitute the ground state. Note that, for the sys-tem to relax to the ground state using the lowest-energy planewaves as initial conditions, 22 bands must be included, of which6 will end up in the conduction band.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 35: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et a/. : Abinitio iterative minimization techniques 1079

energy plane waves as the initial states for a calculationfor the tetrahedral semiconductors will not yield the elec-tronic ground state. In order to span the ground state,the initial electronic configuration must not contain anypairs of plane waves from the I 1, 1,0I star of wave vec-tors connected by I2, 2, 0I reciprocal-lattice vectors.Only when three plane waves from the I 1, 1, 1) star areincluded among the initial states will the electronic statesrelax to the required I6 valence-band states. If thechoice of lowest-energy plane waves as the initial stateswere retained, 22 electronic states would have to be in-cluded in the ca1culation before the electronicconfiguration could relax to the ground state, and ofthese, 6 states would end up being unoccupied.

The computational cost of all of the iterative schemesdescribed in the previous sections increases at leastlinearly with the number of electronic states included inthe calculation, and the cost of orthogonalizing the elec-tronic wave functions increases as the square of the num-ber of bands. Including unoccupied states in the elec-tronic relaxation reduces the advantage in computationalspeed of the molecular-dynamics and conjugate-gradientsmethods over conventional matrix diagonalization tech-niques. It is sensible, therefore, to attempt to use thesmallest possible number of electronic states in the calcu-lation. However, it is essential that the initial electronicstates be able to converge to the ground state.

The problem with the calculation for an 8-atom cubiccell of the diamond-structure materials is that there aremany reciprocal-lattice vectors at which the structurefactor is zero. Hence there is no lattice potential associ-ated with these reciprocal-lattice vectors. If the ionic po-tential is nonzero at every reciprocal-lattice vector, anychoice of plane waves for the initial electronic states willspan the ground state. The most common reason for thestructure factor to be zero at some reciprocal-lattice vec-tors is symmetry. If the ionic configuration has no sym-metry, any choice of initial electronic states should con-verge to the ground state. Only if the system has somesymmetry must precautions be taken to ensure that theinitial electronic states span the ground state.

2. Symmetry conservation

There is another reason why a particular set of initialstates may not relax to the ground state when molecular-dynamics equations of motion are used to evolve the elec-tronic configuration. This is a constraint on the evolu-tion of the electronic states that arises from symmetryconservation. The effect is described in detail by Payneet al. (1988), and only a brief outline of the problem willbe presented here.

The molecular-dynamics equations of motion and therelated first-order equations of motion for the electronicwave functions will conserve any symmetry that is sharedby the Hamiltonian and the initial electronicconfiguration. This symmetry can be broken when the

electronic wave functions are orthogonalized, but this de-pends on the orthogonalization scheme used. In theCxram-Schmidt orthogonalization scheme, the symmetryis broken, whereas in many others, including the iterativescheme by Car and Parrinello, it is not. Since the elec-tronic ground-state configuration must have the samesymmetry as the Hamiltonian, one might not expect con-servation of symmetry in the electronic configuration tocause any problems. However, the initial electronicconfiguration may not be able to propagate to theground-state configuration without breaking this symme-try. If this is the case, the electronic configuration willnot reach the ground state unless a symmetry-breakingorthogonalization scheme is used. Symmetry breaking isnot necessary if random initial conditions are applied tothe electronic wave functions, as described in Secs.VI.C.3 and VI.C.4.

B. Rate of convergence

Once it has been ensured that the initial electronicwave functions can relax to the ground state, the nextmost important consideration in choosing the wave func-tions is to maximize the rate of convergence. In all itera-tive matrix diagonalization techniques, new wave func-tions are generated by successive improvements to theprevious wave functions. The closer the initial wavefunctions are to the self-consistent Kohn-Sham eigen-states, the fewer the number of iterations required toreach the ground state of the electronic configuration.However, it is extremely dificult to estimate the form ofall the Kohn-Sham eigenstates of a complex system.Hence it will be necessary to use fairly poor approxima-tions to the eigenstates as the initial states for most calcu-lations.

C. Specific initial configurations

Selection of a set of initial wave functions is straight-forward when the system to be studied is similar to a sys-tern that has been studied previously. In this case theconverged wave functions for that system should be usedas the initial states. This is particularly useful when test-ing for k-point convergence or for convergence as thecutoff energy for the plane-wave basis set is increased.The wave functions that were calculated with the previ-ous set of k points or with the previous energy cutoff canbe used as the initial electronic states.

Reduced basis sets

It is possible to obtain approximations to the Kohn-Sham eigenstates by using conventional matrix diagonali-zation techniques to find the eigenstates of the Kohn-Sham Hamiltonian with a small number of basis states.For most of the calculations that have been performedusing the molecular-dynamics and conjugate-gradients

Rev. Mod. phys. , VoI. 64, No. 4, October ) ggg

Page 36: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1080 M. C. Payne et ai. : Abinitio iterative minimization techniques

methods, this technique would yield a relatively largematrix to diagonalize, even if a basis set consisting ofonly a few plane waves per atom were used. The Hamil-tonian matrix would have to be diagonalized severaltimes to achieve approximate self-consistency. Using thismethod to obtain a set of initial states for a molecular-dynamics or conjugate-gradients calculation might notsave any computational effort over a method that usedmuch poorer approximations to the self-consistentKohn-Sham eigenstates but applied iterative matrix diag-onalization methods throughout. To avoid the cost of di-agonalizing the Hamiltonian matrix using conventionalmatrix diagonalization techniques, one could use iterativetechniques to find the initial states with the smaller basisset. However, the computational cost of the iterativetechniques increases only linearly with the number ofbasis states, so that there may not be much to be gainedby using a reduced number of basis states initially. Ifmost of the computation has to be performed using thecomplete basis set, then there will be an insignificant sav-

ing in computational time if the calculation is startedwith a reduced basis set.

2. Syrnmetrized combinations of plane waves

The choice of single plane waves for the initia1 elec-tronic states in a molecular-dynamics or conjugate-gradients calculation does not exploit the symmetry ofthe system. The computational cost of such a calculationcould be reduced by using symmetrized combinations ofplane waves as the initial states for the electronic relaxa-tion. Factoring the Hamiltonian matrix into submatricesof dN'erent symmetry, which can be solved independent-

ly, drastically reduces the computational time from thatrequired using conventional matrix diagonalization tech-niques. However, there is an insignificant time saving tobe gained by exploiting the symmetry of the system in themolecular-dynamics or conjugate-gradients methods. Asignificant saving in computational time could only beachieved if the fast Fourier-transform routines wererewritten for each symmetrized combination of basisstates, and this requires a considerable investment of pro-gramming efI'ort. Computational time can be saved inthe orthogonalization routine by usirig symmetrized com-binations of plane waves as the initial states. States ofdi6'erent symmetry are automatically orthogonal, andonly states with the same symmetry have to be orthogo-nalized. If there are N, states with one symmetry, N2 ofanother, and so on, ortho gonalization requires(N&+X&+ . )Xz~ operations rather than the X&Xp~operations required if symmetrized combinations ofplane waves are not used. However, rounding errors willtend to destroy the symmetry of the wave functions, andso additional computational efI'ort will be required toresymmetrize them periodica11y.

The relatively small reduction in computational speedfor systems with low symmetry is one of the strengthsof the molecular-dynamics and conjugate-gradients

methods. The use of symmetry is contrary to the spirit ofthese methods. In all of the calculations performed usingthese techniques, the positions of the ions have been al-lowed to vary. If the positions of the ions vary duringthe calculation, the ionic configuration will spend most ofthe time in regions of the phase space that have very lowsymmetry. The use of symmetry was essential in the daysof primitive computers, when conventional matrix diago-nalization techniques were used to solve for the Kohn-Sham eigenstates. Imposing symmetry onto a systemadds fictitious constraints to the motions of the ions andrestricts the relaxation of the ionic configuration. Thereis no point in imposing symmetry in molecular-dynamicsor conjugate-gradients calculations because this does notsignificantly increase the computational speed of thesetechniques.

3. Random initial configurations

The initial electronic states for a molecular-dynamicsor conjugate-gradients calculation can be generated bychoosing random values for the coefficients of the plane-wave basis states. This method ensures that the ground-state is spanned by the initial states and that there is noconserved symmetry in the initial electronicconfiguration that might prevent the wave functions fromrelaxing to the Kohn-Sham eigenstates. It is sensible togive nonzero values only to the coefficients of plane-wavebasis states that have small energies, so that the initialstates do not have very high energies. With this precau-tion the electronic states are unlikely to be significantlyfurther from eigenstates than simple plane waves, andvery few extra iterations will be required to converge tothe ground state.

4. Random initial velocities

In the molecular-dynamics method there are two de-grees of freedom available for choosing the initial elec-tronic configuration: the initial wave functions and theirvelocities can be chosen arbitrarily. Adding random ve-locities to the coefficients of the initial electronic statesavoids the problem of the initial configuration s not span-ning the ground-state and not relaxing to the groundstate due to a conserved symmetry. It is sensible to limitthe kinetic energy of the initial wave functions so thatthere is not too much excess energy in the electronic sys-tem.

Vll. RELAXATION OF THE IONIC SYSTEM

Up to this point, the relaxation of the electronicconfiguration to its ground state has been considered,while the ionic positions and the size and shape of theunit cell have been held fixed. Once these additional de-grees of freedom are allowed to relax to equilibrium, newfeatures emerge. This procedure is much simpler than a

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 37: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1081

full dynamical simulation of the ionic system becauseonly the final state of the system (ions and electrons intheir minimum energy configurations) is of interest, andthe path towards this state is irrelevant. Hence errorscan be tolerated along the relaxation path. This is notthe case with a full dynamical simulation, where errorsmust be carefully controlled at all points along the ionictrajectories. The problems associated with full dynami-cal simulations of the ionic system will be discussed inSec. VIII. Here we describe how ionic relaxation is easilyincorporated into a molecular-dynamics-based method.

A. The Car-Parrinello Lagrangian

The positions of the ions and the coordinates thatdefine the size and shape of the unit cell can be includedas dynamical variables in the molecular-dynamics La-grangian. The resulting Lagrangian is usually referred toas the "Car-Parrinello Lagrangian*' and takes the form

(7. l)

where Ml is the mass of ion I and P is a fictitious massassociated with the dynamics of the coordinates thatdefine the unit cell, I avl.

C. The Hellmann-Feynman theorem

The force on ion I, fr, is Ininus the derivative of the to-tal energy of the system with respect to the position ofthe ion,

dEI (7.4)

As an ion moves from one position to another, the wavefunctions must change to the self-consistent Kohn-Shameigenstates corresponding to the new position of the ionif the value of the Kohn-Sham energy functional is toremain physically meaningful. The changes in the elec-tronic w'ave functions contribute to the force on the ion,as can be clearly seen by expanding the total derivative in(7.4),

motion of the unit cell. In this case the system willevolve until the total energy of the system is minimizedwith respect to all of these degrees of freedom, and theionic configuration will have reached a local energyminimum. However, integration of the equations ofmotion for the ions and for the unit cell is not as straight-forward as it first appears. This is because physicalground-state forces on the ions and integrated stresses onthe unit cell cannot be calculated for arbitrary electronicconfigurations, as shown in the following section.

B. Equations of motionBE BE d'4 BEBRI,. Bg; dRI, . Bq,*. dRI

(7.5)

Two further sets of equations of motion can be ob-tained from the Lagrangian (7.1), the first for the posi-tions of the ions,

(7.2)

which simply relates the acceleration of the ions to theforces acting on them. The second set of equations is forthe coordinates of the unit cell,

BEBa

Equation (7.5) should be compared with the Lagrangeequation of motion for the ion (7.2). It can be seen thatthe "force" in Eq. (7.2) is only the partial derivative ofthe Kohn-Sham energy functional with respect to the po-sition of the ion. In the Lagrange equations of motionfor the ion, the force on the ion is not a physical force. Itis the force that the ion would experience from a particu-lar electronic configuration. However, it is easy to showthat when each electronic wave function is an eigenstateof the Hamiltonian the final two terms in Eq. (7.5) sum tozero. Since BE/Bf,'. is just Mg;, these two terms can berewritten

These equations relate the rate of acceleration of thelengths of the lattice vectors to the diagonal componentsof the stress tensor integrated over the unit cell and relatethe accelerations of the angles between the lattice vectorsto the off-diagonal components of the stress tensor in-tegrated over the unit cell.

The equations of motion for the degrees of freedom as-sociated with the dynamics of the ions and of the unit cellcan be integrated at the same time as the equations ofmotion for the electronic states and, as will be shownbelow, provide a method for performing ab initio dynam-ical simulations of the ionic system. However, a relaxa-tion of the ionic system can be performed using theseequations of motion simply by removing kinetic energyfrom the electronic system, the ionic system, and the

g ( Hg;)+g (P;H ). (7.6)

Hg; =A, ;Q;,so Eq. (7.6) is equal to

(7.7)

(7.8)

since ( P; lg, ) is a constant by normalization.

However, if each f; is an eigenstate of the Hamiltonian,

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 38: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1082 M. C. Payne et af.: Abinitio iterative minimization techniques

This shows that when each g; is an eigenstate of theHamiltonian the partial derivative of the Kohn-Sham en-

ergy with respect to the position of an ion gives the realphysical force on the ion. This result is usually referredto as the Hellrnann Fey-nman theorem (Hellmann, 1937;Feynman, 1939). The Hellmann-Feynman theorem holdsfor any derivative of the total energy. Hence, when each

g,- is an eigenstate of the Hamiltonian, only the explicitdependence of the energy on the size and the shape of theunit cell has to be calculated to determine the integratedstresses.

1. Errors in Hellmann-Feynman forces

Forces calculated using the Hellm ann-Feyn mantheorem are very sensitive to errors in the wave functions

g;. The error in the force is first order with respect to er-rors in the wave functions. Therefore accurate forces canonly be calculated when the wave functions are very closeto exact eigenstates. The error in the Kohn-Sham energyis second order with respect to errors in the wave func-tion, so that it is significantly easier to calculate an accu-rate total energy than to calculate an accurate force.

2. Consequences of the Hellmann-Feynmantheorem

The Hellmann-Feynman theorem simplifies the calcu-lation of the physical forces on the ions and the integrat-ed stresses on the unit cell. However, the electronic wavefunctions must be eigenstates of the Kohn-Sham Hamil-tonian for the Hellmann-Feynman theorem to be applic-able. Therefore the forces on the ions and the integratedstresses on the unit cell should not be calculated until theelectronic configuration is near its ground state. Oncethe forces and stresses have been calculated, the positionsof the ions and the size and shape of the unit cell may bechanged. Each time that the positions of the ions or thesize and shape of the unit cell are changed, the electronsmust be brought close to the ground state of the new ion-ic configuration in order to calculate forces and stressesfor the new ionic configuration.

When the ionic configuration is relaxed to a local ener-

gy minimum, the relaxation of the ionic configurationcan be partially overlapped with the initial relaxation ofthe electronic configuration. Provided that the magni-tudes of the Hellmann-Feynman forces are larger thanthe errors in the forces, moving each ion in the directionof the calculated force will lower the total energy of thesystem and move the ionic configuration towards the lo-cal energy minimum. However, if the Hellmann-Feynman forces are smaller than the errors in the forces,displacement of the ions in the directions of the forcesmay not decrease the total energy and could take the ion-ic configuration away from the global energy minimum.In this case, overlapping the ionic relaxation with theelectronic relaxation will increase the total number of

iterations needed to relax the system to the global energyminimum.

It might be argued that, as long as kinetic energy iscontinuously removed from all the degrees of freedom inthe system, the total energy in the system must continu-ously decrease, so that the ionic configuration must relaxto a local energy minimum. However, this is only true ifthe time steps are made sufticiently short. Moving theions a finite distance can add energy to the electronic sys-tem. If the energy added to the electronic system eachtime step becomes too large, the electronic system willnever relax to its ground state, and the ionic system willnever reach a local energy minimum. Therefore somecaution has to be exercised when one overlaps ionic re-laxation with the electronic relaxation, to ensure that theionic system reaches the local energy minimum in theshortest possible time.

D. Pulay forces

In principle, there should be an additional term in Eq.(7.5) to represent the derivative of the basis set withrespect to the position of the ion. This contribution tothe force on the ion is called the Pulay force (Pulay,1969). If the value of the Pulay force is not calculated,there is a further error in the value of the Hellmann-Feynman force. It can be shown that the Pulay forcevanishes if the derivatives of all the basis states 5P/M,are spanned by the basis set [PI (SchetBer et al. , 1985).For a plane-wave basis set, the derivatives of each basisstate with respect to the position of an ion are zero andthe Pulay force is zero. The calculated Hellmann-Feynman force then will be exactly equal to the deriva-tive of the total energy with respect to the position of theion, provided that the electronic wave functions areKohn-Sham eigenstates. This is one of the great advan-tages of using a plane-wave basis. If the Pulay force doesnot vanish and if it is not calculated, the computedHellmann-Feynman force will not be equal to the deriva-tive of the total energy with respect to the position of theion. This error is independent of how close the electronicconfiguration is to its ground state. In this case, movingan ion in the direction of the calculated force may in-crease the total energy. When the Pulay force is nonzero,a local energy minimum of the ionic system cannot be lo-cated by calculating the Hellmann-Feynman forces onthe ions. The only ways of finding a local energyminimum are by trial and error or by calculating the ac-tual force on each ion by calculating the change in the to-tal energy on displacing each ion in turn. This calcula-tion on XI ions requires 3' total-energy calculations.Therefore the number of total-energy calculations thatare required to take the system to the local energyminimum using this method is 3%I times as many aswould be required if the forces on all the ions could becalculated from a single total-energy calculation. Thisrepresents a hidden increase in computational time with

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 39: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1083

the size of the system, which could completely negate theapparent efBciency of a computational method.

E. Pulay stresses

If a plane-wave basis set is used in a total-energy calcu-lation, the Pulay forces on the ions will be zero. Howev-er, the Pulay stresses on the unit cell may be nonzerowith a plane-wave basis set. If the number of plane-wavebasis states remains constant, changing the size of theunit cell changes the cutoff energy for the basis set. In-creasing the number of plane-wave basis states by in-creasing the cutoff energy for the basis set will usuallyreduce the total energy of the system. Only if the cutoffenergy is large enough to achieve absolute convergencewill the change in the total energy be zero. Most total-energy pseudopotential calculations are performed with acutoff energy at which energy differences have convergedbut at which the total energies have not converged. Inthis case the diagonal components of the Pulay stresseson the unit cell will be nonzero.

It may be surprising that increasing the number ofbasis states can change the total energy of a system butnot change the differences in energy between a number ofsystems. However, the energy differences between sys-tems arise mainly from the differences in bonding in eachsystem, so the energy differences are dominated by the re-gions outside the ion cores. Provided that the additionalbasis states introduced by increasing the cutoff energy forthe basis set do not change the charge density in thebonding regions, energy differences between systems willnot change. The additional basis states introduced by in-creasing the cutoff energy for the basis set merely providea more accurate description of the wave functions insidethe core regions. The additional basis states will changethe total energy per atom of each system by a constantamount and so leave energy differences unaltered.

The magnitude of the Pulay stress in a pseudopotentialcalculation can be determined by calculating the changein the total energy per atom as the cutoff energy for thebasis set varies (Froyen and Cohen, 1986; Gomes Dacos-ta et al. , 1986). The crucial significance of Pulay stresscorrection for surface stress calculations has been em-phasized by Vanderbilt (1987). The change in the totalenergy per atom will be independent of the details of theionic configuration provided that the cutoff energy forthe basis set is large enough for energy differences tohave converged. Hence the Pulay stress due to theplane-wave basis set can be calculated once and for allfrom the change in the total energy of a small unit cell asthe cutoff energy for the plane-wave basis set varies.

Hellmann-Feynman forces until the residual forces on allthe atoms are smaller than a given value. In such a cal-culation the errors in the Hellmann-Feynman forces dueto the deviation of the electronic configuration from theground state can be regarded as a source of thermalnoise. These forces will cause the ions to Quctuatearound their equilibrium positions, and the magnitudesof the residual forces on the ions will never reach zero.The magnitudes of the errors in the Hellmann-Feynmanforces must be reduced as the system approaches the lo-cal energy minimum if the system is to continue ap-proaching that minimum. Therefore the electronicconfiguration must be relaxed closer and closer to the in-stantaneous ground state as the ionic configuration ap-proaches the loca1 energy minimum.

1. Local energy minimization withthe molecular-dynamics method

In molecular-dynamics methods it is sensible to treatthe electronic and ionic systems independently when re-laxing the ions to their equilibrium positions and to usedifferent time steps for the two systems. The time stepfor the ionic system should be progressively reduced asthe ionic configuration approaches the local energyminimum. This allows the electronic configuration to re-lax closer to its instantaneous ground-state configurationas the ions approach their equilibrium positions, to en-sure that the errors in the Hellmann-Feynman forces arealways smaller than the actual forces on the ions.

2. Local energy minimization withthe conjugate-gradients method

The conjugate-gradients method converges the elec-tronic configuration to its ground state in far fewer itera-tions than molecular-dynamics methods. In this case,moving the ions small distances along the directions ofthe Hellmann-Feynman forces at each iteration will be aninefticient method for performing a local energy minimi-zation. Many more iterations will be required to reachthe energy minimum than would be required to convergethe electronic configuration to its ground state. In thiscase it is sensible to use a more sophisticated scheme forrelaxing the ionic configuration, one which can locate theequilibrium positions of the ions in the minimum numberof iterations. Ideally the number of iterations required tolocate a local energy minimum of the ionic system shouldbe of the same order as the number of iterations requiredto relax the electronic configuration to its ground state.

F. Local energy minimization

The simplest use of Hellmann-Feynman forces is to lo-cate the position of a local energy minimum of the ionicsystem. The ions are moved along the directions of the

G. Global energy minimization

Accurate forces on the ions can be calculated relativelyquickly when conjugate-gradients or molecular-dynamicsmethods are used to perform a total-energy pseudopoten-tial calculation. It has been shown how these forces can

Rev. Mod. Phys. , Vot. 64, No. 4, October 1992

Page 40: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1084 M. C. Payne et aj.: Ab initio iterative minimization techniques

be used to relax a system of ions to a local energyminimum. The technique of moving the ions in thedirections of the Hellmann-Feynman forces until theforces on the ions become zero is basically a zero-temperature quench, because the ions do not acquire anykinetic energy during the relaxation. At the end of thisprocess, the system will be in a local energy minimum.By performing zero-temperature quenches from a varietyof initial configurations of the ionic system, one can ob-tain information about the local energy minima of thesystem. However, there is no guarantee that this methodwill locate the global energy minimum of the system. Intheory, a very-low-energy minimum can only be found ifa simulated annealing process is carried out (Kirkpatricket al. , 1983). But even with a simulated annealing pro-cedure, there is no guarantee that the global energyminimum will be located.

1. Simulated annealing

The success of any simulated annealing technique isvery sensitive to the structure of the phase space beingexplored. The purpose of performing a simulated annealis to determine the lowest-energy configuration of theionic system. For a system that contains many ions therewill be a large number of ionic configurations that are lo-cal energy minima. The simulated annealing procedurehas to explore the phase space of the system to locate thelowest-energy local minimum. The phase space for a par-ticularly simple system is shown schematically in Fig. 22.The diagram shows two local energy minima, separatedby energy AE. If the position of the lowest-energyminimum is to be located using the technique of simulat-ed annealing, the thermal energy kT must be smallerthan AE. If the thermal energy is larger than this, theenergies of the two minima cannot be distinguishedwithin the thermal smearing. The diagram shows an en-

ergy barrier of height Ez separating the local energyminima. The ionic configuration can only move betweenthe two local energy minima by gaining at least E~ in en-

ergy through a thermal fIuctuation. The time spent wait-ing for a thermal fIuctuation of this magnitude is(1/v)exp(E~/kT), where v is the attempt frequency. In

E

Configuration Coordinate

FIG. 22. Representation of two energy minima differing by 6E,separated by a barrier E~.

a typical simulation with the molecular-dynamicsmethod, the total time of the simulation is of the order of10—100 times 1/v. Therefore the probability that theionic configuration will traverse an energy barrier duringthe simulation is dominated by the exponential factor.When the temperature is low enough to distinguish be-tween the energies of the local energy minima (kT-hE),the time taken for the system to move between the energyminima is proportional to exp(E~/EE). The systemmust move between the minima at least once to locatethe lower-energy minimum. If E~/b, E is large, it is ex-tremely unlikely that the simulation wi11 locate the globalenergy minimum, but if E~/b, E is small, the global ener-

gy minimum can be located easily.Simulated annealing is ideal for escaping from local en-

ergy minima separated from the global energy minimum

by small energy barriers. However, the method is veryine%cient when the energy barriers separating the energyminima are large. Unless the structure of the phasespace of the system is very favorable, any simulated an-nealing process will leave the system in a local energyminimum rather than at the global energy minimum.

The success of simulated annealing techniques also de-

pends on the number of local energy minima that haveenergies close to the energy of the global energyminimum. In principle, all of these local energy minimahave to be visited during the simulated annealing processin order to determine which is the true global energyminimum. As the number of local energy minima in-creases exponentially with the number of atoms in thesystem, simulated annealing is only likely to be successfulfor small systems.

2. Monte Carlo methods

The difticulty of locating the position of the global en-

ergy minimum using the technique of simulated anneal-ing is due to the difBculty of traversing the energy bar-riers that separate the local energy minima. If the energybarriers are large, the system only occasionally traversesan energy barrier. Between traversals of the energy bar-riers, the system explores the region of phase spacearound a single local energy minimum. However, onlythe value of the energy at the local minimum has anyrelevance to the process of locating the global energyminimum. The time spent exploring the phase spacearound each local energy minimum serves no useful pur-pose in the simulated annealing process, although it doesprovide information about the free energy of the system.It is easy to locate the local energy minimum in each re-gion of phase space, but it is dificult to move between re-gions of phase space that have low-energy local minima.

The process of locating the global energy minimum issometimes attempted by using the method of steepestdescents or simple molecular dynamics to sample the re-gion of phase space around each local energy minimum,followed by a discontinuous jurnp through phase spaceinto the region around a diC'erent local energy minimum.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 41: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et a/. : Abinitio iterative minimization techniques 1085

However, there is no point in exploring regions of phasespace that have very high energies, and so a sampling cri-terion should to be applied to determine whether to ex-plore the region of phase space reached by the discon-tinuous jump. The sampling criterion generally adoptedcompares the energies of the new and the old ionicconfigurations. The new configuration is accepted or re-jected according to a Monte Carlo algorithm (Metropoliset al. , 1953): if the new configuration is of lower energythan the old, it is accepted; if the new configuration is AEhigher in energy than the old configuration, it is acceptedwith a probability exp( bElkT—). This method allowsthe system to cross energy barriers without waiting for athermal fluctuation large enough to traverse the barrier.

The Monte Carlo technique described above is compu-tationally expensive to implement with molecular-dynamics schemes for relaxing the electronicconfiguration to its ground state. When the ionicconfiguration makes a discontinuous jump through phasespace, the electrons will not be close to the ground stateof the new ionic configuration. Each change in the ionicconfiguration must be followed by a complete relaxationof the electronic configuration to the new ground statebefore the energies of the initial and final configurationscan be compared. In contrast, the Monte Carlo tech-nique could be efticiently implemented with theconjugate-gradients method because the energy of thenew ionic configuration can be calculated rapidly.

3. Location of low-energy configurations

Location of global energy minima is a complex prob-lem. No scheme can be guaranteed to find the global en-

ergy minimum in a single calculation. The only way ofbeing reasonably confident that the global energyminimum has been located is to find the same lowest-energy configuration in a number of different calcula-tions. In practice a number of low-energy configurationswill be located by successive calculations. When subse-quent calculations do not locate any new low-energyconfigurations and the ionic configuration always reachesone of the low-energy configurations found previously ora configuration of significantly higher energy, then thereis a very high probability that all the low-energyconfigurations of the system have been located and,hence, that the global energy minimum has been located.

Empirical potentials have the drawback that it is impos-sible to know their region of validity. The potentials areoften parametrized using data for the perfect crystal ordata describing only small perturbations from the perfectcrystal. Even if these potentials do work perfectly for thecrystal, there is no reason why they should be capable ofdescribing diffusion in the solid, which can involveconfigurations very different from those found in thecrystal, let alone a liquid, whose structure may bear norelation whatsoever to the parent crystal. The problemof determining an accurate interatomic potential is. par-ticularly acute in the case of silicon, for which manyyears of effort have yet to produce a general-purpose po-tential. In contrast, the total-energy pseudopotentialmethod has been shown to be applicable in a much largerregion of phase space than any empirical potential.Hence a dynamical simulation performed using theseforces should accurately describe a real system, irrespec-tive of the region of phase space that is explored underthe dynamical evolution of the system.

A. Limitations to dynamical simulations

If simulations are performed using a finite supercell, asthey must be when plane-wave basis sets are used, thesystems cannot undergo true phase transitions, and therange of correlations in the system will be limited by thesize of the supercell. It should also be appreciated thatthe electron temperature will, in general, be zero in sucha simulation. Thermal excitation of the electronic systemcan be described in density-functional theory (Mermin,1965); however, there are fundamental problems withdensity-functional theory which make it dificult to de-scribe a system at finite temperature without performingan extremely time-consuming calculation for the excitedstates of the system (Hybertson and Louie, 1985; Godbyet al. , 1986). Provided that the thermal energy is muchsmaller than the smallest excitation energy of the elec-tronic system, the effects of a finite electron temperatureshould be small. If this is the case, the error introducedby using zero-temperature density-functional theory in adynamical simulation should not be significant. Theeffect of setting the electronic temperature to zero recent-ly has been shown to be negligible in a study of thestructural phase transition of GeTe (Rabe and Joanno-poulos, 1987).

Vill. DYNAMICAL SIMULATIONS B. Accuracy of dynamical trajectories

There is an enormous literature associated with studiesof the dynamical behavior of systems. The book by Allenand Tildesley (1987) provides an excellent introduction tothe subject. Obvious areas of interest include diffusion,melting, and the calculation of free energies. These stud-ies are generally carried out using empirical potentials(i.e., some model of the interaction between the atoms inthe system parametrized according to experimental data).

In Sec. VII it was pointed out that the calculatedforces on the ions are only the true physical forces when

the electronic system is in its exact ground-stateconfiguration. Therefore, to generate correct dynamicaltrajectories for the ions, the electrons must be relaxed tothe ground state at each ionic time step. Although any ofthe methods described in Secs. III, IV, and V can be usedto relax the electronic configuration to its ground state,

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 42: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1086 M. C. Payne et al. : Abinitio iterative minimization techniques

C. Error cancellation in dynamical simulations

The origin of the cancellation of the errors in theHellmann-Feynman forces under the equations of motiongenerated by the Car-Parrinello Lagrangian can be illus-

trated by considering a system that contains a singleatom, which has a single occupied electronic orbita1, asshown in Fig. 23. The molecular-dynamics equation ofmotion for the evolution of the electronic wave function1s

p,f= —tH —A, ]g . (8.1)

If the atom is at rest and the electronic wave functionis the ground-state wave function, then tH —A, ]/=0, andthe wave function will be stationary. If the orbital is dis-placed away from the ion, the magnitude of the accelera-tion of the wave function will increase roughly linearlywith the magnitude of the displacement. If the ion ismoving at constant velocity and the orbital begins to lag

most of these prove to be extremely expensive computa-tionally for performing dynamical simulations of the ion-ic system. The most efIIicient of these, the conjugate-gradients method, is fast enough to allow dynamicalsimulations, but even in the case of this technique it isimportant to generate good sets of initial wave functionsaccording to the technique outlined in Sec. VIII.E below.However, there is an alternative approach to performingdynamical simulations, which forms the basis of the Car-Parrinello method. Rather than insisting that the elec-tronic configuration be in the exact ground-stateconfiguration at each ionic time step, one may be able toperform dynamical simulations even if the electronicconfiguration is only close to the exact ground state. Al-though this implies that there are errors in theHellmann-Feynman forces at each time step, dynamicalsimulations will be successful provided that the errors inthe forces remain small and that the effect of these errorsremains bounded in time. The Car-Parrinello methodcan fulfill both of these criteria (Retnler and Madden,1990; Pastore et al. , 1991). It is this latter point aboutthe boundedness of the errors which provides the distinc-tion between the Car-Parrinello method and the "im-proved" methods outlined in Secs. IV and V. Whilethese improved methods will for a fixed computationaleffort move the electronic system closer to the ground-state configuration than the simple molecular-d. ynamicsmethod, the errors introduced by these improvedmethods, although smaller than the error in the simplemolecular-dynamics method, does not remain bounded in

time. The boundedness of the error in the Car-Parrinellomethod results from an "error cancellation" that occurswhen the Car-Parrinello Lagrangian is used to generatethe dynamics of the electronic and ionic system. Thiseffect is most easily demonstrated by the simple examplein the following section, which clarifies this point bycomparing second-order and first-order equations ofmotion.

ORBITAL IS STATIONARY

~-W ION BEGINS TO MOVE

ORBITAL ACCELERATED

o-c/

ORBITAL IN

INSTANTANEOUS

~-~~

GROUND STATE

ORBITAL DECELERATE

FIG. 23. Schematic illustration of how an orbital will oscillatearound a moving ion during a simulation withpP= —[0—

A, ]g, as discussed in the text. Velocities and ac-celerations are designed as open and filled arrows, respectively.

g= —[H —A, ]g . (8.2)

With this equation of motion the velocity of the orbitalincreases roughly linearly with the displacement of theorbital from the ion. Once the ion has begun to move,

behind the ion, the acceleration of the orbital will in-crease. The velocity of the orbital will increase until theorbital overtakes the ion. As the orbital overtakes theion, the acceleration of the wave function will changesign and the orbital will begin to slow down. The orbitalcontinues to slow down until the ion overtakes it, atwhich point the whole process starts again. Hence, if theion were to move at constant velocity, the electronic or-bital would oscillate around the instantaneous position ofthe ion. The value of the Hellmann-Feynman force ex-erted on the ion by the orbital will oscillate around thecorrect value, so that the error in the Hellmann-Feynmanforce will tend to cancel when averaged over a number ofwave-function oscillations. The oscillation of the error inthe Hellmann-Feynrnan force will prevent a continuoustransfer of energy between the ionic and the electronicdegrees of freedom, as long as the fictitious oscillationsoccur over time scales much shorter than the physicalionic time scales. This is a reAection of the fact that,given a sufticiently large mass ratio, there is an adiabaticisolation of the (tnuch) "lighter" electronic coordinatesfrom the "heavier" ionic degrees of freedom.

A first-order equation of motion, on contrast, gives thefollowing expression for the evolution of the electronicorbital:

Rev, . Mod. Phys. , Vol. 64, No. 4, October 1992

Page 43: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne ef aj.: Abinitio iterative minimization techniques 1087

ORBITAL IS STATIONARY

ION BEGINS TO MOVE

ORBITAL VELOCITY LESSTHAN ION VELOCITY

ORBITALVELOCITY EQUALTO ION VELOCITY

FIG. 24. Schematic illustration of how an orbital will eventual-

ly lag behind a moving ion during a simulation withIMQ= —[H —X]p, as discussed in the text. Convention the sameas in Fig. 23.

the orbital falls further behind the ion until its velocity isequal to the velocity of the ion, and the orbital thenremains a fixed distance behind the instantaneous posi-tion of the ion. This process is illustrated in Fig. 24. Theorbital then exerts a constant damping force on the iondue to the systematic error in the value of the Hellmann-Feynman force, which is proportional to the velocity ofthe ion. Hence using the first-order equation of motionto evolve the electronic wave function for a dynamicalsimulation results in a viscous drag on the ions. Thissimple model suggests that a second-order equation ofmotion for the electronic degrees of freedom should givea more accurate account of the dynamics of the ionic sys-tem than a first-order equation of motion, a conclusionsupported by detailed analysis of the evolution of theelectronic wave function (Payne, 1989; Car et al. , 1991)and by simulations using first- and second-order equa-tions of motion (Remler and Madden, 1990). The successof the Car-Parrinello method comes from this error can-cellation, which turns the first-order error in theHellmann-Feynman forces into a second-order errorwhen integrated along the ionic trajectories.

D. Car-Parrinello dynamics; constraints

The successful implementation of Car-Parrinello dy-namics relies on a number of features. The error cancel-lation occurs only if the time scales in the electronic sys-tem are all shorter than the shortest time period in theionic system. From considerations similar to those inSec. III.D.2, it can be seen that the longest time period inthe electronic system is related to the difference in energybetween the highest occupied state and the lowest unoc-cupied state. The actual magnitude of this time periodcan be adjusted by changing the value of the fictitiousmass, so that even in systems where this energy gap is ar-

bitrarily small the fictitious mass can be chosensufficiently small to ensure the nonoverlap of timeperiods in the electronic and ionic systems. However, asthe value of the fictitious mass is decreased, the highestfrequency in the electronic system increases, requiring ashorter time step to integrate the equations of motionstably for the electrons, thus increasing the computation-al effort required for a given simulation. Therefore theenergy gap can become so small that it becomes impracti-cal to carry out a simulation.

The Car-Parrinello Lagrangian is invariant in time,and hence the total energy in the electronic and ionic sys-tems will be constant provided that no damping is ap-plied to any of the kinetic degrees of freedom and thatthe "forces" required to impose the constraints of ortho-normality of the wave functions are conservative. Theenergy constant of motion in a classical molecular-dynamics simulation is made up only of the kinetic ener-

gy of the ions and the potential energy, which is theKohn-Sham energy in the ab initio simulation. Howeverin a Car-Parrinello simulation the energy constant ofmotion includes the fictitious kinetic energy of the elec-trons. Even in a "perfect" Car-Parrinello calculation, inwhich the electrons moved at exactly the correct velocityto remain in the instantaneous ground state of the ionicconfiguration, the electronic wave functions would speedup and slow down as the ions moved along their trajec-tories, and the kinetic energy of the electrons would varyin time in direct proportion to the ionic kinetic energy,but smaller by a factor of the ratio of the fictitious elec-tronic and physical ionic masses, typically less than 0.01.Because of this fictitious electronic kinetic energy, thesum of the kinetic energy of the ions and the potential en-

ergy is not a constant, even in a "perfect" Car-Parrinellosimulation. In this situation the fictitious temperature ofthe electronic degrees of freedom is at least two orders ofmagnitude smaller than the temperature of the ionic sys-

tem, so that this situation is thermodynamically unstable.In addition to the kinetic energy required for the elec-

trons to follow the ions exactly, there are fluctuations ofenergy between the ionic and electronic systems. The de-viation of the electronic configuration from its groundstate is related to the magnitude of these energy Auctua-tions. If the longest time period in the electronic systemis not significantly shorter than the shortest time periodin the ionic system, energy will be continuouslytransferred between the electronic and ionic systems untilthe fictitious temperature of the electronic degrees offreedom and the temperature of the ionic degrees of free-dom are equal. In such a situation there are largeamounts of energy in the electronic degrees of freedom,and the electronic configuration is far from its groundstate. When a significant transfer of energy from the ionsto the electrons occurs in a Car-Parrinello calculation,the simulation must be stopped periodically and the elec-trons returned to their ground-state configuration beforerestarting the simulation (Zhang et al. , 1990). In theprocess of returning the electrons to their ground state,

Rev. Mod. Phys. , Voa. 64, No. 4, October 1992

Page 44: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1088 M. C. Payne et al. : Ahinitio iterative minimization techniques

the fictitious kinetic and potential energies in the elec-tronic degrees of freedom are removed from the systemso there is no longer a conserved energy associated withthe Car-Parrinello dynamics. If no further action weretaken, the temperature of the ionic system would de-crease continuously during the simulation. To compen-sate for this irreversible heat Aow from the ionic system,it is usual to attach a Nose thermostat to the ionic system(Nose, 1984). This Nose thermostat has the primary roleof supplying energy to the ionic degrees of freedom, tocompensate for the loss of energy from the ions to theelectronic degrees of freedom. A systematic study of therange and validity of the Nose thermostat is given byCho and Joannopoulos (1992). It has been demonstrated,however, that in simulations where the total energy tendsto drift, the Nose thermostat breaks down and fails toproduce a correct canonical thermal distribution (Tox-vaerd, 1991). As yet no one has attempted to analyze theerrors in the ionic trajectories that arise when the timeperiods in the electronic and ionic degrees of freedom be-gin to overlap. An alternative method has recently beenproposed to control the buildup of energy in the electron-ic degrees of freedom by attaching a separate Nose ther-mostat to the electronic degrees of freedom, set at muchlower temperature than the ionic thermostat (Bloechl andParrinello, 1991). Once again no attempt has yet beenmade to quantify the errors introduced by this method.

The correct application of the constraints of ortho-gonality and normalization is essential for performing asuccessful Car-Parrinello dynamical simulation. This isrelatively easily understood from the following considera-tions. Consider two wave functions that are not orthogo-nal. There are an infinite number of pairs of orthogonalwave functions that can be formed from these two wavefunctions, and each of these possible choices will have adifferent "velocity" associated with each of the wavefunctions. However, only one of these choices is con-sistent with a "conservative" constraint force acting onthe wave functions that does not change the kinetic ener-

gy of the electronic system. The simplest method for un-derstanding which form of application of constraints iscorrect is to appreciate that the constraint forces mustnot change if the labeling of the wave functions ischanged —the constraint forces should be invariant un-der rotations within the subspace of the occupied elec-tronic states. It is clear, then, that the Gram-Schmidt or-thogonalization technique cannot be applied in a dynami-cal simulation, because the forces change according tothe labeling of the states —for instance, whichever wavefunction is labeled 1 is not changed by the orthogonaliza-tion procedure. Car and Parinello apply the constraintsin two steps (Car and Parrinello, 1989). The first is anapplication of constraints directly in the equations ofmotion, using the La grange multiplier s. These con-straints ensure that if the accelerations of the wave func-tions were all zero the wave functions would remainorthonormal. This constraint is required because, despitetheir orthonormality at the last and present time steps,

the wave functions would become nonorthonormal ifthey continued to move with constant velocity. TheI.agrange multipliers that ensure this orthonormality (toorder dt ) are

(8.3)

Although application of these Lagrange multipliersalone would be sufficient to ensure orthonormality of thewave functions, to the same accuracy as the error in Ver-let algorithm in the absence of any accelerations, this isno longer true if accelerations are present. To ensure theorthonormality of the wave functions at the end of thetime step, one can either modify the above Lagrangianmultipliers to take account of the accelerations of thewave functions, or one can retain the Lagrange multi-pliers given by Eq. (8.3) and Car and Parrinello s iterativemethod (3.23), or one can employ a similar rotationallyinvariant method, such as determining the similaritytransform required to diagonalize the overlap matrix.

E. Conjugate-gradients dynamics

It has been pointed out that the Car-Parrinello algo-rithm permits accurate dynamical simulations of ionicsystems to be performed, providing the time scales in theionic and electronic systems are decoupled. Althoughthere are many systems in which this is the case, thisdecoupling of the time scales is generally dificult to ob-tain in the case of metallic systems, where the gap van-ishes (unless the "simulation" is so artificial that the sys-tem used in the simulation is actually an insulator as a re-sult of limited lt-point sampling). In such cases, wherethe long-term stability of the Car-Parrinello dynamics isin doubt, there is considerable motivation for seeking analternative technique for performing dynamical simula-tions. It has already been pointed out that using one ofthe alternative techniques to relax the electrons to theground state requires much more computational effort toachieve the same accuracy in the evolution of the ionicsystem, and so, at first sight, it is simply too expensivecomputationally to perform dynamical simulations onsystems for which the Car-Parrinello algorithm fails.However, it is obvious that, if the initial electronicconfiguration can be moved closer to the correct instan-taneous ground-state configuration, less computationaleffort is required to converge it to its exact ground state,and hence a faster simulation is possible.

A simple method has been developed that allows an ac-curate prediction to be made for the initial electronicconfiguration at each ionic time step by extrapolating for-ward from electronic configurations at previous timesteps. Typically, this method of extrapolation is found tobring the initial wave functions two orders of magnitudecloser to the minimizing energy functional than simplyusing the wave functions from the previous time step.This typically reduces by a factor of two the computa-tional effort required to bring the electronic system to

Rev. Mod. Phys. , vol. 64, No. 4, October 1992

Page 45: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al.: Abinitio iterative minimization techniques 1089

within a few micro-eV per ion of its ground state. Whenthis extrapolation technique is combined with theconjugate-gradients method, the resulting computationalscheme is sufficient to make the entire dynamical simula-tion comparable in speed to a Car-Parrinello simulation.However, the technique has the advantage that it can beapplied to a broader class of systems. The details of thescheme can be found in Arias et al. (1991);it will only bedescribed brieAy below.

The basis for the trial wave-function scheme is thefirst-order extrapolation

(8.4)

where [r(t; ) j are the ionic coordinates at time t; with ithe ionic iteration number, o. is a fitted parameter, andthe prime indicates a trial wave function, as opposed tothe fully converged %„1,( I r(t; ) j ). This scheme producestrial wave functions correct to first order in dr (and, bythe 2X + 1 theorem, energies correct to third order in thetime step) when the ionic coordinates are

I r'(t;+, ) j=

I r (t; )+a[r (t, ) r(t;,—)]j . (8.5)

To ensure that the resulting wave functions are in asclose correspondence as possible with the actual ionic lo-cations, [ r (t;+, ) j,a is taken to minimize the discrepancy

= ~r(t;+&) —(1+a)r(t; )+ar(t;, )~ . (8.6)

(8.7)

One may imagine generalizing this scheme to higherorders, employing more of the preceding wave functionsand producing ever smaller errors in the extrapolatedwave functions. However, higher-order schemes sufferfrom an instability that pushes the wave-function errorsinto regions of phase space where'convergence is sodifficult that the net effect is to slow the simulation.

As in the Car-Parrinello scheme, orthonormality of thewave functions must be maintained; however, Eq. (8.4)yields wave functions that are not properly orthonormal.In the present case, one can simply Gram-Schmidt ortho-normalize the resulting wave functions, because there isno longer any concern for maintaining a proper electrondynamic and because this procedure will not disturb thecorrectness to first order of the wave functions, a conse-quence of the fact that

F. Comparison of Car-Parrinelloand conjugate-gradient dynamics

The Car-Parrinello and conjugate-gradients schemesfor performing dynamical simulations are very different,and it is important to understand these differences in or-der to apply either technique successfully. The most im-portant point is the difference between the time stepsused in the two methods. In this respect conjugate-gradients dynamics is closer to conventional dynamicalsimulations, in which the time step is chosen to ensure anaccurate integration of the ionic equations of motion. Insimulations employing empirical potentials and those us-ing the conjugate-gradients scheme, the forces on theions are, to high precision, true derivatives of the totalpotential energy of the ions. In the case of empirical po-tentials, the only differences between the computed forcesand the derivatives of the total ionic energy are roundingerrors due to finite machine accuracy, but in the case ofthe conjugate-gradients simulation, the differences alsoinclude contributions due to the failure of the Hellmann-Feynman theorem because the electronic system is notexactly converged to its ground state. In the Car-Parrinello simulation, at each time step there aresignificantly larger errors in the Hellmann-Feynmanforces, because the electronic configuration is riot main-tained so close to its exact ground-state configuration.The time step used in a Car-Parrinello simulation has tobe much shorter than the one used for an equivalentconjugate-gradients simulation to integrate the electronicequations of motion stably. Additionally, the longesttime period in the electronic system must be less than theshortest ionic time period, to ensure that the errors in theHellmann-Feynman forces average to zero along the ion-ic trajectories.

At first sight the Car-Parrinello method and the wave-function extrapolation combined with conjugate-gradientrelaxation are rather similar, in that each essentially per-forms an integration of the wave functions forward intime. However, the spirit of each technique and the be-havior of the wave-function coefficients in the two casesare very different. In the case of the conjugate-gradientsdynamics, the wave functions are propagated as close aspossible to the instantaneous ground state, in order toreduce the effort required to fully relax them to theground state. In the Car-Parrinello method, the motionof the electronic degrees of freedom preserves a delicateadiabatic separation between the electronic and ionic de-grees of freedom. The electronic coefficients oscillateartificially about their ground-state values, which leads toa cancellation of the errors in the ionic forces.

Once the wave functions extrapolated according to Eq.(8.4) have been Grahm-Schmidt orthonormalized, theyare then relaxed to within a set tolerance of the Born-Oppenheimer surface by the conjugate-gradient pro-cedure; this completes one cycle of iteration of the ionicmotion.

G. Phonon frequencies

The phonon frequencies of a system can be obtained byperforming a dynamical simulation and then Fourier-transforming either the velocity or the position auto-correlation functions. However, for this procedure to

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 46: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1090 M. C. Payne et al. : Abinitio iterative minimization techniques

3GQ

0.2 0 4. Pg 08 I 0o

FICx. 25. Superposition of maximum-entropy-method spectralfits for each class of allowed phonon k state. For clarity, onlythe longitudinal spectra are displayed. The frequencies havebeen scaled so that the optic phonon frequency coo, as calculatedfrom a frozen phonon calculation at the same cuto6'in the samesupercell, is normalized to unity.

give phonon frequencies to high accuracy, the originalionic trajectories must be extremely accurate, since anynoise in the trajectories will broaden the phonon frequen-cies. The conjugate-gradients dynamics scheme gen-erates extremely accurate ionic trajectories, in which thenoise can be reduced to an arbitrarily low level, and thusprovides an excellent set of input data with which todetermine phonon frequencies. Figure 25 shows thetransform of the longitudinally polarized autocorrelation,as determined by the maximum-entropy method (Presset aI , 19.89), of 40 silicon atoms in a periodic system over50 A long in the [100] direction. Each peak represents anatural frequency in the system. Neither the heights ofthe peaks nor this integrated intensities are meaningful,in that the system has not yet reached thermal equilibri-um. Note that the primary caveat when working withthe maximum-entropy method is that it produces spuri-ous pea.ks when working with noisy data. No such peaksare obtained, indicating very clean data. The frequenciesof the peak values of these spectra, as well as their trans-verse counterparts, are then compared with the experi-mentally measured phonon frequencies (Dolling, 1963;Nilsson and Nelin, 1972) in Fig. 26. As can be seen,there is good agreement between the results of the calcu-lation and experiment, particularly in resolving the deli-cate splitting of the optic bands along 6, which beatagainst each other with periods on the order of one pi-cosecond. This technique for obtaining phonon frequen-cies requires no information about the displacements as-

sociated with each phonon mode and is particularly at-tractive for complex systems in which the phonon dis-placements are not known and for which it would be ex-tremely expensive to compute the full matrix of second

EXPERIMENT THEORY

0 o41I

8 !gl

()OOPO 0oo

00~r

~~yII~ eOy

x z r r z x

FIG. 26. Phonon spectrum as determined from maximum peakvalues of maximum-entropy-method fits. These values are com-pletely ab initio, with no free parameters. Empty circlesrepresent experimental data (Dolling, 1963; Nilsson and Nelin,1972), and filled circles represent results of an extrapolatedconjugate-gradient dynamics simulation.

derivatives of the ionic potential energy —a calculationnormally required to calculate phonon frequencies and

eigen vectors.

IX. NONLOCAL PSEUDOPOTENTIALS

The computational speeds of the molecular dynamicsand conjugate gradients techniques are significantlyenhanced by using local pseudopotentials rather thannonlocal pseudopotentials. This allows the number ofoperations required to multiply each of the wave func-tions by the Hamiltonian to be reduced from Np~ to16Xpgrln(Nppr ) where Np~ is the number of plane wavebasis states. However, it is not possible to produce accu-rate local pseudopotentials for all atoms. To applymolecular-dynamics and conjugate-gradients methods tosystems containing atoms that can only be represented bynonlocal pseudopotentials, it is necessary to use anefficient scheme for dealing with the nonlocality of thepseudopotential. Nonlocal pseudopotentials generally re-quire fewer plane-wave basis states than do local pseudo-potentials to expand the electronic wave functions.Therefore, although it will require additional computa-tional effort to use nonlocal pseudopotentials inmolecular-dynamics and conjugate-gradients calcula-tions, some of the loss in computational speed will berecouped because fewer plane-wave basis states are re-quired. However, it is essential to find an efficientmethod for using the nonlocal pseudopotentials. Themethods that have been used employ only a partial pro-jection of the nonlocal components of the wave functions.Examples of such methods are described in the followingtwo sections. It has long been appreciated that all ofthese partial-projection methods could be applied in ei-ther real space or reciprocal space. The computationalcost scales as the cube of the system size using a

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 47: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques

reciprocal-space method but only as the square of thesystem size with the real-space method. As will be seenin Sec. IX.C.1, there are difhculties associated with thereal-space projection method that have delayed its im-plementation. These problems have now been overcome,and Sec. IX.D describes a successful real-space projectionmethod for nonlocal pseudopotentials.

Im

(9.1)

where YI are spherical harmonics and VI is the pseudo-potential acting on the component of the wave functionthat has angular momentum l. Outside the core radiusthe potentials VI are identical for all the angular momen-tum components of the wave function. To implementthis form for the nonlocal pseudopotential, one needs acomplete projection of the angular momentum corn-ponents of the wave functions. In contrast, theKleinman-Bylander pseudopotential (Kleinman and By-lander, 1982; Allan and Teter, 1987) is a norm-conservingpseudopotential that uses a single basis state for each an-gular momentum component of the wave function. TheKleinman-Bylander pseudopotential has the form

(9.2)

where Vtoc is a local potential, $1 are the wave func-tions of the pseudoatom, and 5 VI is

& VI =VI, NL

—VI.Qc (9.3)

Here VI NL is the l angular momentum component of anynonlocal pseudopotential. Kleinman and Bylander sug-gested using the arbitrariness of VLQc to produce an ac-curate and transferable pseudopotential.

The Kleinman-Bylander pseudopotential projects eachspherical harmonic component of the wave function ontoa single basis state. When applied to the pseudoatom, thepotential gives identical results to the nonlocal pseudopo-tential it was derived from, independent of the choice forthe local potential VJQC However, the potential doesnot produce identical results when applied in another en-vironment, because the wave function is not projectedonto a radially complete set of spherical harmonics.Some of the difBculties that can be encountered with thisapproach have been discussed recently by Gonze et al.(1990). All of the known problems can be overcome bythe proper choice of local potential, a simple reduction inthe core radius, or the application of the ideas of extend-ed norm conservation (Shirley et al. , 1989). It may benecessary, however, to include pseudo core states toachieve a high degree of transferability for certain

A. Kleinman-Bylander potentials

The most general form for a nonlocal pseudopotential1s

transition-metal atoms. These improvements all typicallyrequire a larger number of plane waves in the basis set.

1. Enhanced projections

The Kleinman-Bylander form of the pseudopotentialcan be systematically improved by adding more basisfunctions for the projection of the spherical harmonics(Bloechl, 1990). This allows the accuracy of the nonlocalpotential to be checked by plotting the total energy as afunction of the number of basis states used for the projec-tion of the spherical harmonics of the wave functions.

2. Computational cost

The contribution to the product of the Hamiltonianand the wave function f; at wave vector k+Cx for theKleinman-Bylander pseudopotential is given by

X xlm, k+G g xlm, k+ci'ci, k+o'lm 6'

where

Jr dr j,(Ik+cxIr)BV, (r)p, (r)

(9.4)

(9.5)

B. Vanderbilt potentials

A rather more radical approach to modifying pseudo-potentials for use in plane-wave calculations has beensuggested by Vanderbilt (1990). The basic aim with thesepotentials, in common with the other schemes describedin Sec. II.D.1.d, is to allow calculations to be performed

and jI is the spherical Bessel function of order l. Thespherical Bessel function j&( Ik+G Ir) gives the amplitudeof the l angular momentum component of the plane waveexp[i (k+ Cx) r] at distance r from the origin.

The contributions to the product of the Hamiltonianwith each and every electronic wave function from thenonlocal pseudopotential can be calculated in3'NpgrNI NL operations per k point, using theKleinman-Bylander pseudopotential, where XL is thenumber of nonlocal spherical harmonic components inthe pseudopotential. 3X~Xp~NIXL operations are re-quired to calculate the contributions to the forces on allthe ions from the nonlocal pseudopotential using theKleinman-Bylander scheme in reciprocal space, and6'NpgrXIXL operations are required to compute thediagonal and off-diagonal stresses on the unit cell. Hencethe computational cost of all these operations scale as thethird power of the number of ions in the unit cell, sinceXz and Xp~ are proportional to Xl. This computationalcost is usually significantly larger than the cost of ortho-gonalizing the wave functions (N+N~a ). Therefore theapplication of the nonlocal potential in reciprocal spacewill dominate the computational cost for large systems.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 48: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1092 M. C. Payne et al.: Abinitio iterative minimization techniques

with as low a cutoft energy for the plane-wave basis set aspossible. The rationale behind the Vanderbilt potential isthat in most cases a high cutofF' energy is required for theplane-wave basis set only when there are tightly boundorbitals that have a substantial fraction of their weightinside the core region of the atom. In this case the cutoA'

energy for the plane-wave basis set cannot be substantial-ly reduced, because there must be plane-wave com-ponents up to a large enough wave vector to allow themajority of the weight of the wave function to be keptwithin the core. However, if the norm conservation ruleis relaxed, then the resulting wave function can be ex-panded using a much smaller plane-wave basis set, asshown in Fig. 27. All that is required is that the wavefunction be projected back to the correct pseudovalencewave function before normalization. Unfortunately, theprocedure is rather more complex, because the relaxationof the norm conservation condition from the pseudopo-tential also causes the correct first-order change of thephase shift with energy to be lost. Therefore this schemealso requires an energy-dependent potential to ensurethat the correct phase shift is generated over the range ofenergies of the electrons in the system. Fortunately, thismodification can be included at a relatively modest com-putational cost in any iterative method, although itwould be disastrous in a conventional matrix diagonaliza-tion method, since each matrix diagonalization wouldyield only a single band. Details of the implementationof Vanderbilt potentials can be found in Vanderbilt(1990) and Laasonen et al (1991).

Although Vanderbilt potentials require lower cuto6'energies for the plane-wave basis set than even optimizedpseudopotentials, they require more operations to com-pute the nonlocal components of the pseudopotential ateach iteration. It is also possible that iterating to energyself-consistency in the potential may increase the totaltime required for an energy minimization calculation or

FIG. 27. Illustration of a pseudo wave function that is stronglypeaked inside the core and the modified wave function inVanderbilt's scheme.

require an even shorter time step in a Car-Parrinellodynamical calculation. If the operations for the nonlocalpseudopotential are carried out in reciprocal space in alarge system, where these operations dominate the com-putational cost, it is not clear that a calculation usingVanderbilt potentials will be any cheaper than a calcula-tion using Kleinman-Bylander pseudopotentials. If thereal-space projection technique described in Secs. IX.Cand IX.D below is used, so that operations required toimplement the nonlocal pseudopotential no longer dom-inate the computational cost, then a Vanderbilt potentialwill be more eKcient.

C. Real-space projection; nonuniquenessof spherical harmonic projection

The nonlocality of the pseudopotential extends onlyover the region occupied by the core of the atom. As thecore region is relatively small, it should be possible todeal efhciently with the nonlocality of the pseudopoten-tial by working in real space, since only a small numberof operations should be required to project the angularmomentum components of each wave function in thecore of each atom. Furthermore, the number of opera-tions needed to project the angular momentum com-ponents of a single wave function around a single atom inreal space will be independent of the size of the system,thus leading to a more efticient scaling than thereciprocal-space projection. If Xp is the number ofpoints in the core of each atom used to project each an-gular momentum component of a single wave function,then the number of operations required to incorporate anonlocal pseudopotential using a real-space method is

X~NIXpXI per k point. The electronic wave functionsare routinely transformed to real space in the molecular-dynamics and conjugate-gradients methods. No furtheroperations besides those described above are required toimplement a real-space projection of the angular momen-tum components provided that the product of the wavefunction and the nonlocal potential is computed at thepoint in the calculation where the product of the wavefunction and the local potential is computed. The num-ber of operations required to calculate the forces on theions is 3X~XIXpXI per k point using the real-space pro-jection method 6XgNyXpXL operations are required tocompute the diagonal and off-diagonal stresses on theunit cell. However, it may also be necessary to performan additional FFT to generate the wave functions in realspace before performing these operations.

The cost of computing the product of the Harniltonianand the wave functions, the forces on the atoms, and thestresses on the unit cell using the real-space projectionmethod scales only as the second power of the number ofions in the system. This is in contrast to cubic scaling forthe reciprocal-space projection methods. The reason isthat, in a reciprocal-space formulation, computation ofthe force on an ion requires a sum over all reciprocal-lattice vectors. In contrast, calculation of the force on an

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 49: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et a/. : Ab initio iterative minimization techniques 1093

ion using the real-space method requires only operationsinvolving the wave function in the immediate vicinity ofthe atom.

As we shall see shortly, the real-space projectionmethod for nonlocal pseudopotentials is not simple to im-plement using a representation of the wave function onthe existing Fourier transform grid because of its coarse-ness. Calculation of the wave functions at a specializeddense set of points near each atom (ruling out the FFT)results in the same scaling as the reciprocal-spacemethods. Alternatively, a fast Fourier transform of thewave functions onto a denser real-space grid preservesthe favorable scaling of the real-space methods. Howev-er, a scheme that allowed computation on the originalgrid would be most efficient. Such a scheme exists andwill be described below.

If fast Fourier transform techniques are used to trans-form the electronic wave functions between real and re-ciprocal space, the values of the electronic wave func-tions in real space are known only on a grid of points.The incomplete knowledge of the wave function in realspace introduces an ambiguity between the angular andradial dependences of the wave function. This can be il-lustrated with a simple example as shown in Fig. 28.Consider a real-space Fourier transform grid that hasequal distances between grid points in the x and y direc-tions but a distance between grid points in the z directionthat is 50% greater. Suppose an atom is positioned at

one of the grid points and the value of the wave functionis 0 at the position of the atom, 1.0 at the adjacent gridpoints in the +x and +y directions, and 2.25 at the adja-cent grid points in the +z directions. The wave functionat these points could be an "s" wave function equal to1.0(r /a ), where r is the radial coordinate and a is thedistance between grid points in the x direction. Howev-er, the values of the wave function at the points can alsobe fIt by the sum of an "s"wave function equal to 6

7(r/a)and a "d„"wave function equal to —'(r/a)[3cos 0—I],where 0 is the angle between the radius vector and the zaxis. The distinction between these two wave functioncan only be made by considering the value of the wavefunction at points lying between the Fourier transformgrid points. However, it is computationally expensive tocalculate the value of the wave function at arbitrarypoints of a Fourier transform grid. If the real-space pro-jections cannot be performed from the Fourier transformgrid, then the real-space method will be too costly to implement directly.

The difficulty highlighted above rises because thediscrete Fourier transform grid is poorly suited to pro-jecting the angular momentum components of the wavefunctions. The problem of ambiguity between the radialand angular dependences of the wave functions becomeseven more difFicult to resolve if the atom is not positionedat a grid point or halfway between grid points, if the dis-tances between the grid points is different along all threeof the cell axes, or if the unit cell axes are not perpendic-ular to each other.

D. Potentials for real-space projection

Z

2.25

FIG. 28. Illustration of nonuniqueness of spherical harmonicprojection, as discussed in the text. Top, an s state equal tol.o(r /a ). Bottom, a mixed state equaI to6(r/a)+ 6(r/a)(3cos 0—1). The grey discs represent the ion

positions.

Some of the problems outlined above become some-what less severe if, instead of insisting on a complete pro-jection of the spherical harmonic components of thewave functions around each ion, one performs only a par-tial projection as in the Kleinman-Bylander scheme.However, the scheme outlined below can be applied toany number of projections and hence allows the sphericalharmonic components of the wave functions to be pro-jected to any degree of accuracy (as in the enhanced pro-jection schemes mentioned above). Let us considerKleinman-Bylander projectors for the present.

An alternative way to understand the difficulties asso-ciated with the real-space projection technique and thusidentify a possible solution is by considering the proper-ties of the Fourier transform grid. If a double-densityFourier transform grid (see Sec. III.K.1) is used in thecalculation, the reciprocal-space Fourier transform gridis of length 46,„along each reciprocal-lattice vector,where G,„ is the cutoff wave vector for the electronicwave functions. On the corresponding real-space Fouriertransform grid, the phase of a plane wave of wave vector46 „changes by a factor of 2m between each grid point,and so the values of this plane wave on the grid pointsare indistinguishable from those generated by a plane

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 50: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1094 M. C. Payne et al. : Abinitio iterative minimization techniques

wave with wave vector 6 =0 or any other integer multi-ple of 46,„. This is a general result; any functiondefined only on the real-space grid points is unchanged ifany of its Fourier components are changed by any multi-ple of the wave vector 46 „.This is the origin of theso-called "wraparound error" of discrete representationsof Fourier transforms. The use of a double-densityFourier transform grid eliminates the wraparound errorin all the parts of a pseudopotential calculation con-sidered so far. Unfortunately, this is not true in the caseof a real-space projection of the nonlocal Kleinman-Bylander projectors because these projectors must haveFourier components at all wave vectors if they are to bestrictly localized in the core of the atom. If the Fouriertransform grid moves with respect to a fixed ionic andelectronic configuration, there will be interference be-tween the components of the Kleinman-Bylander projec-tors at wave vectors G +n 4G,„(n integer), and thevalue of the Kleinman-Bylander matrix elements willvary. For a fixed ionic and electronic configuration,these matrix elements must be independent of the originof the Fourier transform grid for their values to be physi-cally meaningful, and without a solution of this problemthe real-space projection technique cannot be applied.

It should be remembered that the local part of thepseudopotential will also have components at large wavevectors. However, the real-space representation of thispotential includes only wave vectors up to 26 „,sincethis representation is generated by Fourier-transformingthe potential from reciprocal space, where only com-ponents of the potential up to wave vector 26 „arerepresented on the double-density Fourier transformgrid. This analogy provides an immediate solution to theproblem associated with the real-space Kleinman-Bylander projectors. Rather than using the actual real-space Kleinman-Bylander projectors, we should use pro-jectors that have been Fourier filtered so that they do notcontain any components at wave vectors larger than awave vector Gz, which must be less than 46,„. If this isthe case, these modified projectors will not be subject toany wraparound error.

The cost of forcing the high Fourier components of themodified Kleinman-Bylander projectors to be strictlyzero at wave vectors above Gz is that in real space theprojectors are no longer localized in the core, but arenonzero over the whole of the real-space grid. Obvious-ly, if the projectors extend over the whole of the real-space grid, there is nothing to be gained from the real-space projection technique. The trick is to use the arbi-trariness of the modified Kleinman-Bylander projectorsover the range of wave vectors 6,„&6 & Gz to ensurethat the magnitude of the modified projectors is negligi-ble at distances greater than roughly 2r, from the ioncore, where r, is the core radius of the pseudopotential.Of course, the Fourier components of the projectors atwave vectors less than 6 „must not be changed, or thereal- and reciprocal-space Kleinman-Bylander projec-tions will not be identical. The Kleinman-Bylander pro-

jections are carried out only over the grid points wherethe projectors are non-negligible, thus restoring thefavorable scaling of the real-space projection technique.The procedure for generating the modified projectors isdetailed in King-Smith et al. (1991). The test of the gen-eration technique is simple. The modified real-space pro-jectors must yield identical results to the reciprocal-spaceKleinman-Bylander projectors, irrespective of the rela-tive positions of the ion and the Fourier transform grid, acondition that the old, unmodified real-space Kleinman-Bylander projectors failed to meet.

X. PARALLEL COMPUTERS

The series of algorithmic improvements described inthis article yield a method for performing total-energypseudopotential calculations whose computational timerequirements scale essentially as a constant times thetheoretical minimum number of operations required toupdate all the wave functions in a single iteration. (Thiscan never be reduced below the cost of orthogonalizingthe wave functions without introducing an error. ) Thevalue of this constant depends on the system but alwayslies between several tens and several hundreds. Althoughthere may be some possibility of reducing this constant, itis clear that, since this constant can never be less thanone and is more likely to be of the order of ten, there areultimate limits to the gains to be achieved by improve-ments in the numerical methods. There is certainly nolonger the possibility of increases in computational speedby many orders of magnitude, of the sort that have beengained in the last few years, without fundamentallychanging the essential features of the total-energy pseu-dopotential method. This does not, of course, excludethe possibility of a completely different method provingto be more efFicient.

The developments in the total-energy pseudopotentialmethod allow calculations to be performed on anyreasonably powerful computer (anything from a modernworkstation to a conventional supercomputer) for quitelarge systems containing up to 150 atoms in the unit cell,provided that the pseudopotentials are moderately weak.However, to allow studies of significantly larger systems,much more powerful computers are required, whichcombine both faster processing speeds with extremelylarge amounts of memory. Without a fundamentalchange in computer technology (the speed of each pro-cessor of a conventional supercomputer has changed bysignificantly less than one order of magnitude in the lastdecade), these requirements can only be fulfilled by com-bining a number of processors into a "parallel" comput-er. In principle, by combining enough compute nodes,parallel computers can be constructed that achieve arbi-trarily large numbers of operations per second. Noparallel computer will achieve 100%%uo efficiency on a realcomputation, since there are overheads associated withcommunication between the processors, and achievement

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992

Page 51: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al.: Abinitio iterative minimization techniques 1095

of a significant fraction of full efficiency is quite difFiculton any but the most trivial of tasks. A relatively lowefficiency is not too great a cause for concern, as long asit is maintained as the number of compute nodes is in-creased. In this case, any required computational speedis achievable with a large enough number ofprocessors —even if this computational speed issignificantly less than the theoretical speed of that num-ber of processors. The real problem, however, is that inall applications the efficiency falls steadily above a criti-cal number of processors. This effect is associated withthe parts of the code that cannot be run in parallel. Atthe critical number of processors this part of the calcula-tion starts to dominate the computational time, and nofurther reduction in computational time is possible by in-creasing the number of processors. In this regime theefficiency varies as the inverse of the number of computenodes! It is clear that, for any calculation to be runefficiently and "scaleably" (i.e., so that computationaltime decreases with number of processors), on a parallelmachine containing n processors not more than 1/n ofthe entire calculation may run sequentially.

There are many possible architectures for parallelmachines. Each tends to have its own strengths andweaknesses and, more importantly, its own suitability forany particular computation. Interestingly total-energypseudopotential calculations have been successfully im-plemented on two very different classes of parallelrnachine. One, the Connection Machine, consists of anextremely large number of relatively modest-performance

compute nodes. The other class of machine consists of asmaller number of extremely powerful compute nodes;examples of this latter class of machine are thosemanufactured by Intel, Meiko, and N-Cube. Althoughthe strategies for implementing the codes is the same onboth classes of rnachine, the detailed methods required toimplement the codes are rather different. The Connec-tion Machine is programmed using a standard high-levelcomputer language, Fortran 90. When the total-energypseudopotential calculation is expressed using vector-oriented Fortran 90 statements, parallel execution is irn-plemented by the compiler. The programmer is not re-quired explicitly to implement communications amongthe thousands of processors constituting the fine-grained,massively parallel architecture. To further accelerateprocessing, the vendor supplies a library of hand-micro-coded FFT subroutines, which are directly callable fromFortran 90 programs. The combination of programmingin Fortran 90 and the use of the distributed rnachine FFTmakes it possible for most of the total-energy calculationto be performed in parallel.

In the case of the other class of machine, there is nor-mally no fully distributed three-dimensional FFT, and amajor task in implementing total-energy pseudopotentialcodes on these machines is the implementation of thecommunications required to perform a distributed FFT.A full description of the implementation of a set of pseu-dopotential codes on this class of machine can be found

in Clarke et al. (1992).The potential for pseudopotential calculations on

parallel computers has been demonstrated by the success-ful calculations of the surface energy and relaxed struc-ture of the 7 X 7 Takayanagi reconstruction (Takayanagiet al , 1.985) of the (111) surface of silicon on a 64-nodeMeiko machine (Stich et al. , 1992) and on a ConnectionMachine (Brommer et al. , 1992). The supercells used forthese calculations contained 400 atoms and had a volumeof 784 times the atomic volume. Basis sets of up to35000 plane waves were used to expand the electronicwave functions, and the electronic minimization involvedup to 2. 8 X 10 degrees of freedom.

XI. CONCLUDING REMARKS

Car and Parrinello's molecular-dynamics methodstands as a landmark in the field of total-energy pseudo-potential calculations. Iterative matrix diagonalizationtechniques were in use before the molecular-dynamicsmethod was developed, but these schemes only partiallyexploited the benefits to be gained by the use of iterativetechniques. Car and Parrinello's technique exploited theadvantages of overlapping the processes of calculatingeigenstates and iterating to self-consistency and the use offast Fourier transform techniques. Efficient schemes forusing nonlocal pseudopotentials were only implementedas a result of the molecular-dynamics method. The com-bination of all of these features rather than just the re-placernent of a conventional matrix diagonalizationscheme by an iterative scheme is responsible for thesignificant increase in the power of the total-energy pseu-dopotential technique brought about by the molecular-dynamics method. Any iterative matrix diagonalizationtechnique must exploit all of these features to be able tochallenge the molecular-dynamics method. Car andParrinello's method did not change the basic total-energypseudopotential technique but offered an enormous in-crease in computational efficiency, so that much largerand more complex systems became accessible to the tech-nique. It also allowed the first ab initio dynamical simu-lations to be performed. Only with the reduction in thescaling of computational time with system size thatcomes from Car. and Parrinello's molecular-dynamicsmethod and from conjugate-gradients techniques is itworthwhile performing calculations for large systems onparallel computers.

As mentioned previously, there is now only limitedscope for improvements in the algorithms used to per-form total-energy pseudopotential calculations. Howev-er, there are a number of areas where progress can still bemade. At present a definitive scheme for generatingpseudopotentials that are fully transferable and computa-tionally efficient is still lacking. There is considerablescope for improved densi. ty functionals. Perhaps themost ambitious objective is to couple quantum-mechanical modeling of small, critical regions of a system

Rev. Mod. Phys. , Vol. 64, No. 4, October 1 992

Page 52: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

1096 M. C. Payne et al. : Abinifio iterative minimization techniques

(such as a dislocation core) with a less rigorous modelingof the noncritical regions (which could be modeled usingclassical elasticity theory).

Even if all else is forgotten, the authors would like thereader to retain just one idea. This is that ab initioquantum-mechanical modeling using the total-energypseudopotential technique is now capable of addressingan extremely large range of problems in a wide range ofscientific disciplines.

ACKNOWLEDGMENTS

The authors wish to thank Y. Bar-Yarn, E. Kaxiras,M. Needels, K. M. Rabe, E. Tarnow, and D. H. Vander-bilt for help and advice in developing, applying, and un-derstanding the molecular-dynamics and conjugate-gradients methods. MCP wishes to thank Volker Heinefor many insightful discussions on the pseudopotentialapproach and R. J. Needs, R. W. Cxodby, and C. M. M.Nex for helpful discussions on iterative matrix diagonali-zation techniques.

REFERENCES

Alan, D. C., and M. P. Teter, 1987, Phys. Rev. Lett. 59, 1136.Allen, M. P., and D. J. Tildesley, 1987, Computer Simulation of

Liquids (Clarendon, Oxford).Arias, T., J. D. Joannopoulos, and M. C. Payne, 1991, Phys.

Rev. B, in press.Ashcroft, N. W. , and N. D. Mermin, 1976, Solid State Physics

(Holt Saunders, Philadelphia), p. 113.Bachelet, G. B., D. R. Hamann, and M. Schluter, 1982, Phys.Rev. B 26, 4199.

Baraff; G. A. , and M. A. Schluter, 1979, Phys. Rev. B 19, 4965.Bar-Yam, Y., S. T. Pantelides, and J. D. Joannopoulos, 1989,

Phys. Rev. B 39, 3396.Benedek, R., L. H. Yang, C. Woodward, and B. I. Min, 1991,

unpublished.Bloechl, P. E., 1990, Phys. Rev. B 41, 5414.Bloechl, P. E., and M. Parrinello, 1991,unpublished.Brommer, K., M. Needels, B. Larson, and J. D. Joannopoulos,

1992, Phys. Rev. Lett. 68, 1355.Broughton, J., and F. Khan, 1989, Phys. Rev. 8 40, 12098.Car, R., and M. Parrinello, 1985, Phys. Rev. Lett. 55, 2471.Car, R., and M. Parrinello, 1989, in Simple Molecular Systemsat Very High Density, edited by A. Polian, P. Lebouyre, and N.Boccara (Plenum, New York), p. 455.

Car, R., M. Parrinello, and M. C. Payne, 1991, J. Phys. : Con-dens. Matter 3, 9539.

Chadi, D. J., and M. L. Cohen, 1973, Phys. Rev. 8 8, 5747.Cho, K., and J. D. Joannopoulos, 1992, Phys. Rev. A 45, 7089.Clarke, L. J., I. Stich, and M. C. Payne, 1992, unpublished.Cohen, M. L., 1984, Phys. Rep. 110,293.Cohen, M. L., and V. Heine, 1970, Solid State Physics Vol. 24,p. 37.

Denteneer, P., and W. van Haeringen, 1985, J. Phys. C 18, 4127.Dolling, G., 1963, in Inelastic Scattering of Neutrons in Solids

and Liquids Vol. II (International Atomic Energy Agency,Vienna), p. 37.

Drabold, D. A. , P. A. Fedders, Otto F. Sankey, and John D.

Dow, 1990, Phys. Rev. B 42, 5135.Dreizler, R. M. , and J. da Providencia, 1985, Density Functional

Methods in Physics (Plenum, New York).Evarestov, R. A. , and V. P. Smirnov, 1983, Phys. Status Solidi

119, 9.Ewald, P. P., 1917a, Ann. Phys. (Leipzig) 54, 519.Ewald, P. P., 1917b, Ann. Phys. (Leipzig) 54, 557.Ewald, P. P., 1921, Ann. Phys. (Leipzig) 64, 253.Fahy, S., X. W. Wang, and S. G. Louie, 1988, Phys. Rev. Lett.61, 1631.

Fernando, G. W. , Guo-Xin gian, M. Weinert, and J. W. Daven-port, 1989, Phys. Rev. B 40, 7985.

Fetter, A. L., and J. D. Walecka, 1971, Quantum Theory ofMany-Particle Systems (McGraw-Hill, New York), p. 29.

Feynman, R. P., 1939, Phys. Rev. 56, 340.Francis, G. P., and M. C. Payne, 1990, J. Phys. : Condens.

Matter 17, 1643.Froyen, S., and M. L. Cohen, 1986, J. Phys. C 19, 2623.Gill, P. E., W. Murray, and M. H. Wright, 1981, Practical Op-

timization (Academic, London).Gillan, M. J., 1989,J. Phys. : Condens. Matter 1, 689.Godby, R. W., M. Schluter, and L. J. Sham, 1986, Phys. Rev.

Lett. 56, 2415.Gomes Dacosta, P., O. H. NIelsen, and K. Kunc, 1986, J. Phys.C 19, 3163.

Gonze, X., P. Kackell, and M. ScheNer, 1990, Phys. Rev. B 42,12 264.

Gunnarsson, O. , and B. I. Lundqvist, 1976, Phys. Rev. B 13,4174.

Hamann, D. R., 1989, Phys. Rev. B 40, 2980.Hamann, D. R., M. Schluter, and C. Chiang, 1979, Phys. Rev.Lett. 43, 1494.

Harris, J., and R. O. Jones, 1974, J. Phys. F 4, 1170.Hedin, L., and B. Lundqvist, 1971,J. Phys. C 4, 2064.Hellmann, H. , 1937, Einfuhrung in die Quantumchemie (Deu-

ticke, Leipzig).Hohenberg, P., and W. Kohn, 1964, Phys. Rev. 136, 864B.Hybertson, M. S., and S. G. Louie, 1985, Phys. Rev. Lett. 55,

1418.Ihm, J., A. Zunger, and M. L. Cohen, 1979, J. Phys. C 12, 4409.Janak, J. F., 1978, Phys. Rev. B 18, 7165.Joannopoulos, J. D., 1985, in Physics of Disordered Materials,

edited by D. Adler, H. Fritzsche, and S. R. Ovshinsky (Ple-num, New York), p. 19.

Joannopoulos, J. D., P. Bash, and A. Rappe, 1991, ChemicalDesign Automation News 6, No. 8.

Joannopoulos, J. D., and M. L. Cohen, 1973,J. Phys. C 6, 1572.Joannopoulos, J. D., T. Starkloff; and M. A. Kastner, 1977,

Phys. Rev. Lett. 38, 660.Jones, 1991,Phys. Rev. Lett. 67, 224.Jones, R. O., and O. Gunnarsson, 1989, Rev. Mod. Phys. 61,

689.Kerker, G., 1980, J. Phys. C 13, L189.King-Smith, R. D., M. C. Payne, and J-S. Lin, 1991,Phys. Rev.B 44, 13063.

Kirkpatrick, S., C. Gelatt, Jr., and M. Vecchi, 1983, Science220, 671.

Kleinman, L., and D. M. Bylander, 1982, Phys. Rev. Lett. 4,1425.

Kohn, W. , and L. J. Sham, 1965, Phys. Rev. 140, 1133A.Kryachko, E. S., and E. V. Ludena, 1990, Energy Density Func-

tional Theory of Many Electron Systems (Kluwer Academic,Boston).

Laasonen, K., R. Car, C. Lee, and D. Vanderbilt, 1991, Phys.

Rev. Mod. Phys. , Vo). 64, No. 4, October 1992

Page 53: Iterative minimization techniques ab calculations: conjugate · 2019-03-02 · 1046 M. C. Payne et a/.: Abinitio iterative minimization techniques Choice of Initial Wave Functions

M. C. Payne et al. : Abinitio iterative minimization techniques 1097

Rev. B 43, 6796.Langreth, D. C., and M. J. Mehl, 1981,Phys. Rev. Lett. 47, 446.Langreth, D. C., and M. J. Mehl, 1983, Phys. Rev. B 28, 1809.Langreth, D. C., and J. P. Perdew, 1977, Phys. Rev. B 15, 2884.Li, X. P., D. M. Ceperley, and Richard M. Martin, 1991, Phys.Rev. B 44, 10929.

Lin, C. C., and L. A. Segel, 1974, Mathematics Applied to Deter-ministic Problems in the Natural Sciences (Macmillan, NewYork), p. 161.

Mathews, J., and R. L. Walker, 1970, Mathematical Methods ofPhysics (Benjamin, New York), p. 355.

Mermin, N. D., 1965, Phys. Rev. 137, 1441A.Metropolis, N. , A. W. Rosenbluth, M. N. Rosenbluth, A. H.Teller, and E. Teller, 1953,J. Chem. Phys. 21, 1087.

Monkhorst, H. J., and J. D. Pack, 1976, Phys. Rev. B 13, 5188.Nilsson, G., and G. Nelin, 1972, Phys. Rev. B 6, 3777.Nose, S., 1984, J. Chem. Phys. 81, 511.Pastore, G., E. Smargiassi, and F. Buda, 1991,Phys. Rev. A 44,6334.

Payne, M. C., 1989, J. Phys. : Condens. Matter 1, 2199.Payne, M. C., J. D. Joannopoulos, D. C. Allan, M. P. Teter, andD. H. Vanderbilt, 1986, Phys. Rev. Lett. 56, 2656.

Payne, M. C., M. Needels, and J. D. Joannopoulos, 1988, Phys.Rev. B 37, 8138.

Pederson, M. R., and K. A. Jackson, 1991, Phys. Rev. B 43,7312.

Perdew, J. P., and A. Zunger, 1981, Phys. Rev. B 23, 5048.Perdew, J. P., Robert G. Parr, Mel Levy, and Jose L. Balduz,Jr., 1982, Phys. Rev. Lett. 49, 1691.

Phillips, J. C., 1958, Phys. Rev. 112, 685.Pickett, W., 1989, Comput. Phys. Rep. 9, 115.Press, W. H. , B. P. Flannery, S. A. Teukolsky, and W. T.

Vetterlin, 1989, in Numerical Recipes: The Art of ScienttftcComputing (Cambridge University, Cambridge, England), p.431.

Pulay, P., 1969, Mol. Phys. 17, 197.Qian, Guo-Xin, M. Weinert, G. W. Fernando, and J. W. Daven-port, 1990, Phys. Rev. Lett. 64, 1146.

Rabe, K., and J. D. Joannopoulos, 1987, Phys. Rev. B 36, 6631.Rappe, A. , and J. D. Joannopoulos, 1991, in Computer Simula-

tion in Materials Science, edited by M. Meyer and V. Pontikis,NATO ASI Vol. 205, p. 409.

Rappe, A., K. Rabe, E. Kaxiras, and J. D. Joannopoulos, 1990,Phys. Rev. B 41, 1227.

Redondo, A. , W. A. Cxoddard III, and T. C. McGill, 1977,Phys. Rev. B 15, 5038.

Remler, D. K., and P. A. Madden, 1990, Mol. Phys. 70, 921.Robertson, I. J., and M. C. Payne, 1990, J. Phys. : Condens.

Matter 2, 9837.Robertson, I. J., and M. C. Payne, 1991, J. Phys. : Condens.

Matter 3, 8841.ScheNer, M. , J. P. Vigneron, and G. B. Bachelet, 1985, Phys.Rev. B 31, 6541.

Shirley, E. L., D. C. Allan, R. M. Martin, and J. D. Joanno-poulos, 1989, Phys. Rev. B 40, 3652.

Starkloff, T., and J. D. Joannopoulos, 1977, Phys. Rev. B 16,5212.

Stich, I., R. Car, M. Parrinello, and S. Baroni, 1989, Phys. Rev.B 39, 4997.

Stich, I., M. C. Payne, R. D. King-Smith, J-S. Lin, and L. J.Clarke, 1992, Phys. Rev. Lett. 68, 1351.

Takayanagi, K., Y. Tanashiro, S. Takahashi, and M. Takahashi,1985, Surf. Sci. 164, 367.

Teter, M. P., M. C. Payne, and D. C. Allan, 1989, Phys. Rev. B40, 12255.

Toxvaerd, Soren, 1991,Mol. Phys. 72, 159.Trouillier, N. , and J. L. Martins, 1991,Phys. Rev. B 43, 1993.Vanderbilt, D., 1987, Phys. Rev. Lett. 59, 1456.Vanderbilt, D., 1990, Phys. Rev. B 41, 7892.Verlet, L., 1967, Phys. Rev. 159, 98.von Barth, U. , 1984, in Many Body Phenomena at Surfaces, edit-

ed by D. Langreth and H. Suhl (Academic, New York), p. 3.Vosko, S. H. , L. Wilk, and M. Nusair, 1980, Can. J. Phys. 58,

1200.Wigner, E. P., 1938, Trans. Faraday Soc. 34, 678.Williams, A., and J. Soler, 1987, Bull. Am. Phys. Soc. 32, 562.Woodward, C., B. I. Min, R. Benedek, and J. Garner, 1989,

Phys. Rev. B 39, 4853.Yin, M. T., and M. L. Cohen, 1982a, Phys. Rev. B 25, 7403.Yin, M. T., and M. L. Cohen, 1982b, Phys. Rev. B 26, 5668.Zhang, Q.-M. , G. Chiarotti, A. Selloni, R. Car, and M. Par-

rinello, 1990, Phys. Rev. B 42, 5071.Zunger, A. , and M. L. Cohen, 1979, Phys. Rev. B 20, 4082.

Rev. Mod. Phys. , Vol. 64, No. 4, October 1992