a tutorial on: numerical methods for electronic structure ... · note: in electronic structure, it...

A tutorial on:Numerical methods for electronic structure

calculations

Yousef SaadUniversity of Minnesota

Computer Science and Engineering

IMA - Minneapolis July 30th, 2007

1

Outline

Part 1

• Introduction, the electronic structure problem

• The Kohn-Sham equations: a nonlinear eigenvalue prob-

lem

• Planewaves and real-space discretization

• Algorithms for the eigenvalue problem: tools and general

projection techniques

2 IMA 07 – 07/30/2007

2

Part 2

• Algorithms for the eigenvalue problem: Subspace itera-

tion, and Krylov subspaces

• Preconditioning, Davidson and Jacobi-Davidson algo-

rithms

• The nonlinear viewpoint: Nonlinear subspace iteration

• Avoiding eigenvalue calculations and Order n methods

3 IMA 07 – 07/30/2007

3

General preliminary comments

II Ingredients of an effective numerical simulation:

+ ApproximationsPhysical Model

Efficient Algorithms High performanceComputers

Effective scientificSimulation

+

+=

4 IMA 07 – 07/30/2007

4

Most of the gains in speed combine advances from all

3 areas: simplifications from physics, effective numerical

algorithms, and powerful hardware+software tools.

II More than ever a successful physical simulation must be

cross-disciplinary.

II In particular, computational codes have become too

complex to be handled by ’one-dimensional’ teams.

II This tutorial: Algorithms – mostly diagonalization.

II Will illustrate the above with experience of cross - dis-

ciplinary collaboration

5 IMA 07 – 07/30/2007

5

The electronic structure problem

Schrodinger’s equation

HΨ = EΨ

II The Hamiltonian in its original form is very complex:

H = −~2

2m

∑i

∇2~ri

+∑i,j

e2

|~ri − ~rj|−

∑i

∑k

Zke2

|~ri − ~Rk|

−~2

2M

∑k

∇2~Rk

+∑k,l

e2

|~Rk − ~Rl|

II Wavefunction Ψ is a function of coordinates of all par-

ticles (+ electron spin). Problem is way too hard to solve.

6 IMA 07 – 07/30/2007

6

Approximations/theories used

II Born-Oppenheimer approximation: Neglects motion of

nuclei [much heavier than electrons]

II Density Functional Theory: observable quantities are

uniquely determined by ground state charge density.

Kohn-Sham equation:

[−

∇2

2+ Vion +

∫ρ(r′)

|r − r′|dr′ +

δExc

δρ

]Ψ = EΨ

Effective Hamiltonian is of the form

−∇2

2+ Vion + VH + Vxc

7 IMA 07 – 07/30/2007

7

The three potential terms

II Hartree Potential VH = solution of Poisson equation:

∇2VH = −4πρ(r)

where

ρ(r) =∑occupi=1 |ψi(r)|2

II Solve by CG or FFT once a distribution ρ is known.

II Vxc (exchange & correlation) approximated by a po-

tential induced by a local density. [LDA]. Valid for slowly

varying ρ(r).

8 IMA 07 – 07/30/2007

8

The ionic potential: pseudopotentials

Potential Vion is handled by pseudopotentials: replace

effect of core (inner shell) electrons of the system by a

smooth effective potential (Pseudopotential).

II Must be smooth and replicate the same physical effects

as those of all-electron potential.

II Vion is the sum of a local term and of low-rank projectors

for each atom.

Vion = Vloc +∑a

Pa

Roughly speaking in matrix form: Pa =∑UlmU

>lm

II A small-rank matrix localized around each atom.

9

II In the end:[−∇2

2 + Vion + VH + Vxc

]Ψ(r) = EΨ(r)

With

• Hartree potential (local)

∇2VH = −4πρ(r)

• Vxc depends on functional. For LDA:

Vxc = f(ρ(r))

• Vion = nonlocal – does not explicitly depend on ρ

Vion = Vloc +∑aPa

• VH and Vxc depend nonlinearly on eigenvectors:

ρ(r) =∑occupi=1 |ψi(r)|2

10 IMA 07 – 07/30/2007

10

Self Consistence

II The potentials and/or charge densities must be self-

consistent: Can be viewed as a nonlinear eigenvalue prob-

lem. Can be solved using different viewpoints

• Nonlinear eigenvalue problem: Linearize + iterate to

self-consistence

• Nonlinear optimization: minimize energy [again linearize

+ achieve self-consistency]

The two viewpoints are more or less equivalent

II Preferred approach: Broyden-type quasi-Newton tech-

nique

II Typically, a small number of iterations are required

11 IMA 07 – 07/30/2007

11

Self-Consistent Iteration

Initial Guess for V , V = Vat

Solve (−12∇2 + V )ψi = εiψi

Calculate new ρ(r) =∑occi |ψi|2

Find new VH: −∇2VH = 4πρ(r)

Find new Vxc = f [ρ(r)]

Vnew = Vion + VH + Vxc + ‘Mixing’

If |Vnew − V | < tol stop

V = Vnew

?

?

?

?

?

?

6

�

-

12

II Most time-consuming part = computing eigenvalues /

eigenvectors.

Characteristic : Large number of eigenvalues /-vectors

to compute [occupied states]. For example Hamiltonian

matrix size can be N = 1, 000, 000 and the number of

eigenvectors about 1,000.

II Self-consistent loop takes a few iterations (say 10 or 20

in easy cases).

13 IMA 07 – 07/30/2007

13

Two broad types of methods used

Planewave techniques Basis used is of the form:

w ~G(~r) = exp[i(~k + ~G).~r

]where ~G is the vector “index” of the planewave, r is the

coordinate in space. Basis for a Galerkin-type method

II This is similar to spectral methods used to solve certain

PDEs

II In H = −12∇2 + V , ∇2 is diagonal in Fourier space

II Calculations done in Fourier space - go back-and-forth

between real and Fourier space.

Real-space methods. Calculations done in real space –

using finite differences or finite element methods.

14 IMA 07 – 07/30/2007

14

Planewaves:

II Hamiltonian matrix is dense but it is not computed.

Action on vectors is performed with the help of FFTs

II Work is done in reciprocal space

II This is a Galerkin-type method. Hamiltonian is pro-

jected onto a good small dimensional subspace

Advantages:

II Method is ‘variational’: Thus the smallest eigenvalue

will decrease if the size of the basis increases.

II Easy to ‘precondition’ since a smaller basis can be used

to provide a nice approximation for the larger problem.

II Planewaves provide a good initial approximation - so we

need not use large bases

15

Disadvantages:

II FFTs do not parallelize too well [however there are other

ways to extract parallelism - not an issue for very large

problems.]

II FFTs somewhat expensive

II Method is geared toward periodic systems (crystals).

Must use super-cells for confined systems.

16 IMA 07 – 07/30/2007

16

Real-space Finite Difference Methods

II Use High-Order Finite Difference Methods [Fornberg &

Sloan ’94]

II Typical Geometry = Cube – regular structure.

II Laplacean matrix need not even be stored.

Order 4 Finite Differ-

ence Approximation:x

yz

17 IMA 07 – 07/30/2007

17

Fornberg-Sloan formulas: (for ∂2/∂x2)

Order Coefficients

2 1 -2 1

4 −112

43

−52

43

−112

6 190 − 3

2032 −49

1832 − 3

20190

8 − 1560

8315 −1

585 −205

7285 −1

58

315 − 1560

II PARSEC code can use up to order 12 – but often order

is 6 or 8.

II Nonlocal term → sum of localized low-rank matrices.

18 IMA 07 – 07/30/2007

18

Pattern of resulting matrix

19 IMA 07 – 07/30/2007

19

Advantages:

• Easy to implement (parallelism easy to exploit)

• Less costly per step. No FFT’s required.

• Leads to sparse matrices.

Disadvantages:

•Not as easy to ‘precondition’ as planewave techniques

•May require more points to reach same accuracy as planewaves

(partial remedy: use high order finite differences)

•Not variational (important?)

II Bottom line: FD codes can (1) do very large systems

... (2) fast ... (3) ... accurately .. (4) on massively parallel

computers. Main appeal = ease of implementation.

20 IMA 07 – 07/30/2007

20

PARSEC

21

PARSEC

II PARSEC = Pseudopotential Algorithm for Real Space

Electronic Calculations

II Represents ≈ 15 years of gradual improvements

II Runs on sequential and parallel platforms – using MPI.

II Efficient diagonalization -

II Takes advandate of symmetry

22 IMA 07 – 07/30/2007

22

PARSEC - A few milestones

• Sequential real-space code on Cray YMP [up to ’93]

•Cluster of SGI workstations [up to ’96]

•CM5 [’94-’96] Massive parallelism begins

• IBM SP2 [Using PVM]

•Cray T3D [PVM + MPI] ∼ ’96; Cray T3E [MPI] – ’97

• IBM SP with +256 nodes – ’98+

• SGI Origin 3900 [128 processors] – ’99+

• IBM SP + F90 - PARSEC name given, ’02

•Current: Runs on many machines (SGI Altix, IBM SP4,

Cray XT, ...)

• ’05: PARSEC released

23 IMA 07 – 07/30/2007

23

Main contributors to code:

Early days:

N. Troullier

K. Wu (*)

X. Jing

H. Kim

A. Stathopoulos (*)

I. Vassiliev

More recent:

M. Jain

L. Kronik

R. Burdick (*)

M. Alemany

M. Tiago

Y. Zhou (*)

(*) = Computer-Scientist

24 IMA 07 – 07/30/2007

24

The physical domain

25 IMA 07 – 07/30/2007

25

Pattern of resulting matrix for Ge99H100:

26 IMA 07 – 07/30/2007

26

Domain Mapping:

Domain i

mapped to

processor i

A domain decomposition approach is used

27

Problem Setup

Non-linear step

MasterV

ρ

v1 v2 v3 v4 v5 ...

worker p

...

worker 5

worker 4

worker 3

worker 2

worker 1

wavefunctions and potential

28

Sample calculations done: Quantum dots

II Small silicon clusters (≈ 20 − 100Angst. Involve up to

a few hundreds atoms)

• Si525H276 leads to a matrix size of N ≈ 290, 000

and nstates = 1, 194 eigenpairs.

• In 1997 this took ∼ 20 hours of CPU time on the

Cray T3E, using 48 processors.

• TODAY: 2 hours on one SGI Madison proc. (1.3GHz)

• Could be done on a good workstation!

II Gains: hardware and algorithms

II Algorithms: Better diagonalization + New code exploits

symmetry

29 IMA 07 – 07/30/2007

29

Si525H276

30

Matlab version: RSDFT (work in progress)

II Goal is to provide (1) prototyping tools (2) simple codes

for teaching Real-space DFT with pseudopotentials

II Can do small systems on this laptop – [Demo later..]

II Idea: provide similar input file as PARSEC –

II Many summer interns helped with the project. This year

and last year’s interns:

Virginie Audin, Long Bui, Nate Born, Amy Coddington,

Nick Voshell, Adam Jundt, ...

+ ... others who worked with a related visualization tool.

31 IMA 07 – 07/30/2007

31

SOLVING LARGE EIGENVALUE PROBLEMS

32

Outline

•Projection methods

•The subspace iteration

•Krylov subspace methods

•Restarting, implicit restarts.

•Preconditioning and Davidson’s method

•The Jacobi-Davidson idea.

•Generalized eigenvalue problems

•AMLS

33 IMA 07 – 07/30/2007

33

General Tools for Solving Large Eigen-Problems

II Projection techniques – Arnoldi, Lanczos, Subspace It-

eration;

II Preconditioninings: shift-and-invert, Polynomials, ...

II Deflation and restarting techniques

Effective algorithms combine these three ingredients

34 IMA 07 – 07/30/2007

34

A few popular solution Methods

• Subspace Iteration [Now less popular – sometimes used

for validation]

•Arnoldi’s method (or Lanczos) with polynomial acceler-

ation [Stiefel ’58, Rutishauser ’62, YS ’84,’85, Sorensen

’89,...]

• Shift-and-invert and other preconditioners. [Use Arnoldi

or Lanczos for (A− σI)−1.]

•Davidson’s method and variants, Generalized Davidosn’s

method [Morgan and Scott, 89], Jacobi-Davidsion

• Emerging method (structures): Automatic Multilevel Sub-

structuring (AMLS).

35 IMA 07 – 07/30/2007

35

Projection Methods for Eigenvalue Problems

Principle: One of the most important ideas in compu-

tational methods is that of projection, which consists of

extracting an approximation to the given problem (linear

system, eigenvalue problem) from a subspace.

Formulation for the general case:

II Given: Two subspaces K and L of same dimension m

II Find: λ, u such that

λ ∈ C, u ∈ K; (λI −A)u ⊥ L

Note: m degrees of freedom (K), m constraints (L).

36 IMA 07 – 07/30/2007

36

Two types of methods:

II Orthogonal projection methods: situation when L = K

(generally used in the Hermitian case)

II Oblique projection methods: When L 6= K often used

in the non-Hermitian case.

Note: in electronic structure, it is important to think in

terms of subspaces rather than individual eigenvectors.

II Implementation of projection methods: Rayleigh-Ritz

approach with orthonormal bases.

37 IMA 07 – 07/30/2007

37

Rayleigh-Ritz projection

Given: a subspace X known to contain good approxi-

mations to eigenvectors of A.

Question: How to extract good approximations to eigen-

values/ eigenvectors from this subspace?

Answer: Rayleigh Ritz process.

Let Q = [q1, . . . , qm] an orthonormal basis of X. Then

write an approximation in the form u = Qy and obtain y

by writing

QH(A− λI)u = 0

II QHAQy = λy

38 IMA 07 – 07/30/2007

38

Procedure:

1. Obtain an orthonormal basis of X

2. Compute C = QHAQ (an m×m matrix)

3. Diagonalize C, C = Y ΛY H

4. Compute U = QY

Terminology: A subspace X is invariant if AX ⊆ X, i.e.,

if

Au ∈ X for all u ∈ X

Property: if X is (exactly) invariant, then procedure will

yield exact eigenvalues and eigenvectors.

II Can use above procedure in conjunction with the sub-

space obtained from subspace iteration algorithm

39 IMA 07 – 07/30/2007

39

Subspace Iteration

Original idea: projection technique onto a subspace if the

form Y = AkX = Span{Akx1, Akx2, · · · , Akxm}

II Computes the dominant eigenvalues (Largest in |.|)

Advantages:

• Easy to implement;

• Easy to analyze;

Disadvantage:

Slow.

II In practice: used with polynomial acceleration, i.e.,

AkX replaced by Ck(A)X. Typically Ck = Chebyshev

polynomial.

40 IMA 07 – 07/30/2007

40

Algorithm: Subspace Iteration with Projection

1. Start: Choose an initial basis X = [x1, . . . , xm]

2. Iterate: Until convergence do:

(a) Select a polynomial Ck(b) Compute Z = Ck(A)Xold.

(c) Compute Xnew = RayleighRitz (A, Z)

II RayleighRitz (A, Z) steps:

1) Orthonormalize Z into Z; 2) Compute B = ZHAZ ;

3) Diagonalize: B = Y ΛY T ; 4; Set Xnew := ZY .

II Cost per iteration: O(m×nnz(A))+O(m2n)+O(m3)

II Typically: m is of the order of # (wanted eigenvalues)

41 IMA 07 – 07/30/2007

41

A quick overview of Chebyshev polynomials

II Chebyshev polynomials of

1st kind. For |t| ≤ 1:

Ck(t) = cos(k cos−1(t))

II |Ck(t)| ≤ 1 for t ∈[−1, 1]. Grows very fast for

|t| > 1.

II In the plot Ck is scaled:

Ck(t)/Ck(−1.1) is shown. −1 −0.5 0 0.5 1

0

0.2

0.4

0.6

0.8

1

Deg. 8 Cheb. polynom., on interv.: [−11]

II Obeys 3-term recurrence:

Ck+1(t) = 2tCk(t) − Ck−1(t)

starting with C0(t) = 1;C1(t) = t.

42 IMA 07 – 07/30/2007

42

II Can compute sequence vk = Ck(A)v0 from:

vk+1 = 2 ∗A ∗ vk − vk−1

II If undesired eigenvalues are in interval [a, b] – then we

need to use the polynomial

Pk(t) =Ck(η(t))

Ck(η(c))

where η(t) is the linear transformation:

η(t) = 2 t−b−ab−a

and c is an estimate of the eigenvalue that is farthest away

from (a+ b)/2.

II More on Chebyshev ‘filtering’ later

43 IMA 07 – 07/30/2007

43

KRYLOV SUBSPACE METHODS

44

KRYLOV SUBSPACE METHODS

Principle: Projection methods on Krylov subspaces:

Km(A, v1) = span{v1, Av1, · · · , Am−1v1}

II Probably the most important class of projection meth-

ods [for linear systems and for eigenvalue problems]

II Many variants exist

A few important observations:

•Km = {p(A)v | p = polynomial of degree ≤ m− 1}• If we change A to αA+σI, then Km does not change.

45 IMA 07 – 07/30/2007

45

ARNOLDI’S ALGORITHM

II Goal: to compute an orthogonal basis of Km.

II Input: Initial vector v1, with ‖v1‖2 = 1 and m.

II Setting: A is general non-symmetric (non-Hermitian)

ALGORITHM : 1 Arnoldi’s procedure

For j = 1, ...,m do

Compute w := Avj

For i = 1, . . . , j, do

{hi,j := (w, vi)

w := w − hi,jvi

hj+1,j = ‖w‖2 if hj+1,j == 0 STOP

vj+1 = w/hj+1,j

End

46 IMA 07 – 07/30/2007

46

II Notice:

Avj =

j+1∑i=1

hijvi

II What do we obtain if we look at A[v1, ..., vm]?

II Notation Vm = [v1, v2, ..., vm]

47 IMA 07 – 07/30/2007

47

Result of Arnoldi’s algorithm

Let

Hm =

x x x x x

x x x x x

x x x x

x x x

x x

1. Vm = [v1, v2, ..., vm] = orthonormal basis of Km.

2. AVm = VmHm + hm+1,mvm+1eTm

3. V TmAVm = Hm

48 IMA 07 – 07/30/2007

48

Appliaction to eigenvalue problems

II Write approximate eigenvector as u = Vmy + Galerkin

condition

(A− λI)Vmy ⊥ Km → V Hm (A− λI)Vmy = 0

II Approximate eigenvalues are eigenvalues of Hm

Hmyj = λjyj

Associated approximate eigenvectors are

uj = Vmyj

Typically a few of the outermost eigenvalues will converge

first.

49 IMA 07 – 07/30/2007

49

Hermitian case: The Lanczos Algorithm

A = AH and V Hm AVm = Hm → Hm = HHm

II The Hessenberg matrix becomes tridiagonal, real, sym-

metric:

II We can write

Hm =

α1 β2

β2 α2 β3

β3 α3 β4

. . .

. . .

βm αm

II Consequence: three term recurrence

βj+1vj+1 = Avj − αjvj − βjvj−1

50 IMA 07 – 07/30/2007

50

ALGORITHM : 2 Lanczos

1. Choose an initial vector v1 of norm unity. Set β1 ≡ 0, v0 ≡ 0

2. For j = 1, 2, . . . ,m Do:

3. wj := Avj − βjvj−1

4. αj := (wj, vj)

5. wj := wj − αjvj6. βj+1 := ‖wj‖2. If βj+1 = 0 then Stop

7. vj+1 := wj/βj+1

8. EndDo

Hermitian matrix + Arnoldi → Hermitian Lanczos

II Convergence ?

51 IMA 07 – 07/30/2007

51

CONVERGENCE THEORY

52

The Lanczos Algorithm in the Hermitian Case

Assume eigenvalues sorted increasingly

λ1 ≤ λ2 ≤ · · · ≤ λn

II Orthogonal projection method onto Km;

II To derive error bounds, use the Courant characterization

λ1 = minu ∈ K, u 6=0

(Au, u)

(u, u)=

(Au1, u1)

(u1, u1)

λj = min{u ∈ K, u 6=0u ⊥u1,...,uj−1

(Au, u)

(u, u)=

(Auj, uj)

(uj, uj)

53 IMA 07 – 07/30/2007

53

II Bounds for λ1 easy to find

II Ritz values approximate eigenvalues of A inside out:

λ1 λ2

λ1 λ2

λn−1 λn

λn−1 λn

54 IMA 07 – 07/30/2007

54

A-priori error bounds

Theorem [Kaniel, 1966]:

0 ≤ λ(m)1 − λ1 ≤ (λN − λ1)

[tan ∠(v1, u1)

Tm−1(1 + 2γ1)

]2

where γ1 = λ2−λ1λN−λ2

; and ∠(v1, u1) = acute angle

between v1 and u1.

+ Results for other eigenvalues. [Kaniel, Paige, YS]

Theorem [YS,1980]

0 ≤ λ(m)i − λi ≤ (λN − λ1)

[κ

(m)i

tan ∠(vi, ui)

Tm−i(1 + 2γi)

]2

where γi =λi+1−λiλN−λi+1

, κ(m)i =

∏j<i

λ(m)j −λNλ

(m)j −λi

55 IMA 07 – 07/30/2007

55

Convergence

0 1 2 3 4 5 6 7 850

100

150

200

250

300

Lanc

zos

step

s

converged eigenpairs

Convergence of eigenpairs for Lapl(15,20)

56 IMA 07 – 07/30/2007

56

Loss of orthogonality

II In theory vi’s defined by 3-term recurrence are orthog-

onal.

II However: in practice severe loss of orthogonality;

Observation [Paige, 1981]: Loss of orthogonality starts

suddenly, when the first eigenpair has converged. It is

a sign of loss of linear indedependence of the underlying

eigenvectors. When orthogonality is lost, then several

copies of the same eigenvalue start appearing.

II Remedies??

57 IMA 07 – 07/30/2007

57

Reorthogonalization

• Full reorthogonalization – reorthogonalize vj+1 against

all previous vi’s every time.

• Partial reorthogonalization – reorthogonalize vj+1 against

all previous vi’s only when needed [Parlett & Simon]

• Selective reorthogonalization – reorthogonalize vj+1 against

computed eigenvectors [Parlett & Scott]

• No reorthogonalization – Do not reorthogonalize - but

take measures to deal with ’spurious’ eigenvalues. [Cul-

lum & Willoughby]

58 IMA 07 – 07/30/2007

58

Example: Partial Reorth. Lanczos - PLAN

Goal: To illustrate how to adapt a well-known algorithm

to the specific situation of interest.

II Compute eigenspace instead of individual eigenvectors

II No full reorthogonalization: reorthogonalize when needed

II No restarts – so, much larger basis needed in general

Ingredients: (1) test of loss of orthogonality (recurrence

relation) and (2) stopping criterion based on charge

density instead of eigenvectors.

59 IMA 07 – 07/30/2007

59

Partial Reorth. Lanczos - (Background)

II Recall:

βj+1vj+1 = Avj − αjvj − βjvj−1

II Scalars βj+1, αj selected so that vj+1 ⊥ vj, vj+1 ⊥vj−1, and ‖vj+1‖2 = 1.

II In theory this is enough to guarantee that {vj} is or-

thonormal. + we have:

V TmAVm = Tm =

α1 β2

β2 α2 β3. . . . . . . . .

βm−1 αm−1 βmβm αm

II In practice: orthogonality lost quickly

60 IMA 07 – 07/30/2007

60

Remedy: Partial reorthogonalization: reorthogonalize only

when deemed necessary, and only to maintain orthogonality

to within√

machine epsilon

II Uses an inexpensive recurrence relation

II Work done in the 80’s [Parlett, Simon, and co-workers]

+ more recent work [Larsen, ’98]

II Package: PROPACK [Larsen] V 1: 2001, most recent:

V 2.1 (Apr. 05)

II In tests with real-space Hamiltonians from PARSEC,

need for reorthogonalization not too strong.

61 IMA 07 – 07/30/2007

61

Reorthgonalization: a small example

0 20 40 60 80 100 120 140 160 180 20010−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Lanczos steps

leve

l of o

rtho

gona

lity

0 20 40 60 80 100 120 140 160 180 20010−14

10−13

10−12

Lanczos steps

leve

l of o

rtho

gona

lity

Levels of orthogonality of the Lanczos basis for the Hamil-

tonian corresponding to Si10H16 (n = 17077). Left: no

reorthogonalization. Right: partial reorth. (34 in all)

62 IMA 07 – 07/30/2007

62

Second ingredient: avoid computing and updating eigen-

values / eigenvectors. Instead:

Test how good is the underlying eigenspace without

knowledge of individual eigenvectors. When converged

– then compute the basis (eigenvectors). Test: sum of

occupied energies has converged = a sum of eigenvalues

of a small tridiagonal matrix. Inexpensive.

II See:

“Computing Charge Densities with Partially Reorthogonal-

ized Lanczos”, C. Bekas, Y. Saad, M. L. Tiago, and J. R.

Chelikowsky; to appear, CPC.

63 IMA 07 – 07/30/2007

63

Partial Reorth. Lanczos vs. ARPACK for Ge99H100.Partial Lanczos ARPACK

no A ∗ x orth mem. secs A ∗ x rest. mem. secs

248 3150 109 2268 2746 3342 20 357 16454

350 4570 184 3289 5982 5283 24 504 37371

496 6550 302 4715 13714 6836 22 714 67020

II Matrix-size = 94,341; # nonzero entries = 6,332,795

II Number of occupied states : neig=248.

II Requires more memory but ...

II ... Still manageable for most systems studied [can also

use secondary storage]

II ... and gain a factor of 4 to 6 in CPU time

64 IMA 07 – 07/30/2007

64

DEFLATION

65

Deflation

II Deflation is any technique which aims at ‘removing’

already computed eigenvectors to enable the computation

of the others

II Very useful in practice

II Different forms: locking (subspace iteration), selective

orthogonalization (Lanczos), Schur deflation, ...

66 IMA 07 – 07/30/2007

66

Standard explicit deflation for Hermitian case

Wiedlandt Deflation: Assume we have computed a right

eigenpair λ1, u1. Wielandt deflation considers eigenvalues

of

A1 = A− σu1uH1

Note:

Λ(A1) = {λ1 − σ, λ2, . . . , λn}Eigenvectors are preserved.

II It is possible to apply this procedure successively:

67 IMA 07 – 07/30/2007

67

ALGORITHM : 3 Explicit Deflation

1. A0 = A

2. For j = 0 . . . µ− 1 Do:

3. Compute a dominant eigenvector of Aj4. Define Aj+1 = Aj − σjuju

Hj

5. End

II Computed u1, u2., .. form a set of eigenvectors for A.

II Alternative: implicit deflation (within a procedure such

as Lanczos).

68

Deflated Lanczos: When first eigenvector converges,

we freeze it as the first vector of Vm =

[v1, v2, . . . , vm]. Lanczos procedure starts working at

column v2. Re-Orthogonalization is still done against

v1, ..., vj at step j. Each new converged eigenvector

will be added to the ‘locked’ set of eigenvectors.

69 IMA 07 – 07/30/2007

69

Illustration

Thus, for k = 2: Vm =

v1, v2︸︷︷︸Locked

,active︷︸︸︷

v3, . . . , vm

Tm =

∗∗

∗ ∗∗ ∗ ∗

∗ ∗ ∗∗ ∗

II Top 2 × 2 part of Tm is frozen

70 IMA 07 – 07/30/2007

70

PRECONDITIONING - DAVIDSON’S METHOD

71

Preconditioning eigenvalue problems

Goal: To extract good approximations to add to a

subspace in a projection process. Result: faster conver-

gence.

II Best known technique: Shift-and-invert; Work with

B = (A− σI)−1

II Some success with polynomial preconditioning [Cheby-

shev iteration / least-squares polynomials]. Work with

B = p(A)

II Above preconditioners preserve eigenvectors. Other

methods (Davidson) use a more general preconditionerM .

72 IMA 07 – 07/30/2007

72

Shift-and-invert preconditioning

Main idea: to use Arnoldi, or Lanczos, or subspace

iteration for the matrix B = (A−σI)−1. The matrix B

need not be computed explicitly. Each time we need to

apply B to a vector we solve a system with B.

II Factor B = A−σI = LU . Then each solution Bx = y

requires solving Lz = y and Ux = z.

Advantages:

• Fast convergence

•Cannot miss eigen-

values

•No problems with in-

terior eigenvalues

Disadvantages:

Factorization cost can

be very high. Not

practical for 3-D prob-

lems - in real-space ap-

proaches

73 IMA 07 – 07/30/2007

73

Preconditioning by polynomials

Main idea:

Iterate with p(A) instead of A in Arnoldi or Lanczos,..

II Used very early on in subspace iteration [Rutishauser,

1959.]

II Usually not as reliable as Shift-and-invert techniques but

less demanding in terms of storage.

74 IMA 07 – 07/30/2007

74

Question: How to find a good polynomial (dynamically)?

Approaches:

1 Use of Chebyshev polynomials

2 Use polynomials based on ‘Leja’ points

3 Least-squares polynomials

4 Polynomials from a previous Lanczos run

75 IMA 07 – 07/30/2007

75

Polynomial filters and implicit restart

II Goal: to apply polynomial filter of the form

p(t) = (t− θ1)(t− θ2) . . . (t− θq)

by exploiting the Arnoldi or Lanczos procedure.

Assume AVm = VmHm + βmvm+1eTm

and consider first factor: (t− θ1)

(A− θ1I)Vm = Vm(Hm − θ1I) + βmvm+1eTm

Let Hm − θ1I = Q1R1. Then,

(A− θ1I)Vm = VmQ1R1 + βmvm+1eTm →

(A− θ1I)(VmQ1) = (VmQ1)R1Q1 + βmvm+1eTmQ1 →

A(VmQ1) = (VmQ1)(R1Q1 + θ1I) +

βmvm+1eTmQ1

76 IMA 07 – 07/30/2007

76

Notation: R1Q1 + θ1I ≡ H(1)m ; (b

(1)m+1)

T ≡eTmQ1; VmQ1 = V

(1)m

II AV(1)m = V

(1)m H

(1)m + vm+1(b

(1)m+1)

T

II Note that H(1)m is upper Hessenberg (resp. Tridiagonal)

II Similar to an Arnoldi decomposition (resp. Lanczos)

Observe:

II R1Q1 + θ1I ≡ matrix resulting from one step of the

QR algorithm with shift θ1 applied to Hm.

II First column of V(1)m is a multiple of (A− θ1I)v1.

II The columns of V(1)m are orthonormal.

77 IMA 07 – 07/30/2007

77

Can now apply second shift in same way:

(A−θ2I)V(1)m = V

(1)m (H

(1)m −θ2I)+vm+1(b

(1)m+1)

T →

Similar process: (H(1)m − θ2I) = Q2R2 then ×Q2 to the

right:

(A−θ2I)V(1)m Q2 = (V

(1)m Q2)(R2Q2)+vm+1(b

(1)m+1)

TQ2

AV(2)m = V

(2)m H

(2)m + vm+1(b

(2)m+1)

T

Now:

First column of V(2)m = scalar ×(A− θ2I)v

(1)1

= scalar ×(A− θ2I)(A− θ1I)v1

78 IMA 07 – 07/30/2007

78

II Note that

(b(2)m+1)

T = eTmQ1Q2 = [0, 0, · · · , 0, q1, q2, q3]

II Let: Vm−2 = [v1, . . . , vm−2] consist of first m − 2

columns of V(2)m and Hm−2 = leading principal submatrix

of Hm. Then

AVm−2 = Vm−2Hm−2 + βm−1vm−1eTm with

βm−1vm−1 ≡ q1vm+1 + h(2)m−1,m−2v

(2)m−1 ‖vm−1‖2 = 1

II Result: An Arnoldi/Lanczos process of m−2 steps with

the initial vector p(A)v1.

II In other words: We know how to apply polynomial fil-

tering via a form of the Arnoldi/Lanczos process, combined

with the QR algorithm.

79 IMA 07 – 07/30/2007

79

The Davidson approach

Goal: to use a more general preconditioner to introduce

good new components to the subspace.

• Ideal new vector would be eigenvector itself!

•Next best thing: an approximation to (A−µI)−1r where

r = (A− µI)z, current residual.

•Approximation written in the form M−1r. Note that M

can vary at every step if needed.

80 IMA 07 – 07/30/2007

80

ALGORITHM : 4 Davidson’s method

1. Choose an initial unit vector v1. Set V1 = [v1].2. Until convergence Do:3. For j = 1, . . . ,m Do:4. w := Avj.

5. Update Hj ≡ V Tj AVj6. Compute the smallest eigenpair µ, y of Hj.7. z := Vjy r := Az − µz8. Test for convergence. If satisfied Return

9. If j < m compute t := M−1j r

10. compute Vj+1 := ORTHN([Vj, t])11. EndIf12. Set v1 := z and go to 313. EndDo14. EndDo

81 IMA 07 – 07/30/2007

81

II Note: Traditional Davidson uses diagonal precondition-

ing: Mj = D − σjI.

II Will work only for some matrices

Other options:

II Shift-and-invert using ILU [negatives: expensive + hard

to parallelize.]

II Filtering (by averaging)

II Filtering by using smoothers (multigrid style)

II Iterative solves [See Jacobi-Davidson]

82 IMA 07 – 07/30/2007

82

JACOBI – DAVIDSON

83

Introduction via Newton’s metod

Assumptions: M = A+ E and Az ≈ µz

Goal: to find an improved eigenpair (µ+ η, z + v).

II Write A(z + v) = (µ + η)(z + v) and neglect second

order terms + rearrange II(M − µI)v − ηz = −r with r ≡ (A− µI)z

II Unknowns: η and v.

II Underdetermined system. Need one constraint.

II Add the condition: wHv = 0 for some vector w.

84

In matrix form:(M − µI −zwH 0

) (v

η

)=

(−r0

)II Eliminate v from second equation:(

M − µI −z0 wH(M − µI)−1z

) (v

η

)=

(−rs

)with s = wH(M − µI)−1r

II Solution: [Olsen’s method]

η =wH(M − µI)−1r

wH(M − µI)−1zv = −(M − µI)−1(r − ηz)

85 IMA 07 – 07/30/2007

85

II When M = A, corresponds to Newton’s method for

solving {(A− λI)u = 0

wTu = Constant

Note: Another way to characterize the solution is:

v = −(M − µI)−1r + η(M − µI)−1z,

v ⊥ w

II Involves inverse of (M−λI). Jacobi-Davidson rewrites

solution using projectors.

86 IMA 07 – 07/30/2007

86

II Let Pz be a projector in the direction of z which leaves

r invariant. It is of the form Pz = I −zsH

sHzwhere s ⊥ r.

II Similarly let Pw = any projector which leaves v in-

changed. Then can rewrtite Olsen’s solution :

[Pz(M − µI)Pw]v = −r wHv = 0

II The two solutions are mathematically equivalent!

87 IMA 07 – 07/30/2007

87

The Jacobi-Davidson approach

II In orthogonal projection methods (e.g. Arnoldi) we have

r ⊥ z

II Also it is natural to take w ≡ z. Assume ‖z‖2 = 1

With the above assumptions, Olsen’s correction equation

is mathematically equivalent to finding v such that :

(I − zzH)(M − µI)(I − zzH)v = −r v ⊥ z

II Main attraction: can use iterative method for the solu-

tion of the correction equation. (M -solves not explicitly

required).

88 IMA 07 – 07/30/2007

88

Diagonalization methods in PARSEC

1. At the beginning there was simplicity: DIAGLA

II In-house parallel code developed circa 1995 (1st version)

II Uses a form of Davidson’s method with various enhance-

ments

II Feature: can use part of a subspace from previous scf

iteration

II One issue: Preconditioner required in Davidson – but

not easy in real-space methods [in contrast with planewaves]

89 IMA 07 – 07/30/2007

89

2. Later ARPACK was added as an option

II State of the art public domain diagonalization package

II Could be a few times faster than DIAGLA - Also: quite

robust. But...

II .. cannot reuse previous eigenvectors for next SCF

iteration.

II .. Not easy to modify to adjust to our needs

II .. Really a method not designed for large eigenspaces

90 IMA 07 – 07/30/2007

90

3. Recent work: focus on eigenspaces – using a *nonlin-

ear* subspace iteration approach. More on this shortly

4. Also explored - AMLS

II Domain-decomposition type approach

II Very complex (to implement) ...

II and ... accuracy not sufficient for DFT approaches

[Although this may be fixed]

91 IMA 07 – 07/30/2007

91

Current work on diagonalization

Focus:

II Compute eigen-space - not individual eigenvectors.

II Take into account outer (SCF) loop

II Future: eigenvector-free or basis-free methods

Motivation:

Standard packages (ARPACK) do not easily

take advantage of specificity of problem:

self-consistent loop, large number of eigen-

values, ...

92 IMA 07 – 07/30/2007

92

CHEBYSHEV FILTERING

93

Chebyshev Subspace iteration

II Main ingredient: Chebyshev filtering

Given a basis [v1, . . . , vm], ’filter’ each vector as

vi = Pk(A)vi

II pk = Polynomial of low degree: enhances desired eigen-

components

The filtering step is not used

to compute eigenvectors ac-

curately IISCF & diagonalization loops

merged

Important: convergence still

good and robust 0 0.5 1 1.5 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Deg. 6 Cheb. polynom., damped interv=[0.2, 2]

94 IMA 07 – 07/30/2007

94

Main step:

Previous basis V = [v1, v2, · · · , vm]

↓Filter V = [p(A)v1, p(A)v2, · · · , p(A)vm]

↓Orthogonalize [V,R] = qr(V , 0)

II The basis V is used to do a Ritz step (basis rotation)

C = V TAV → [U,D] = eig(C) → V := V ∗ U

II Update charge density using this basis.

II Update Hamiltonian — repeat

95 IMA 07 – 07/30/2007

95

II In effect: Nonlinear subspace iteration

II Main advantages: (1) very inexpensive, (2) uses minimal

storage (m is a little ≥ # states).

II Filter polynomials: if [a, b] is interval to dampen, then

pk(t) =Ck(l(t))

Ck(l(c)); with l(t) =

2t− b− a

b− a

• c ≈ eigenvalue farthest from (a+b)/2 – used for scaling

II 3-term recurrence of Chebyshev polynommial exploited

to compute pk(A)v. IfB = l(A), thenCk+1(t) = 2tCk(t)−Ck−1(t) →

wk+1 = 2Bwk − wk−1

96

Select initial V = Vat

Get initial basis {ψi} (diag)

Calculate new ρ(r) =∑occi |ψi|2

Find new VH: −∇2VH = 4πρ(r)

Find new Vxc = f [ρ(r)]

Vnew = Vion + VH + Vxc + ‘Mixing’

If |Vnew − V | < tol stop

Filter basis {ψi} (with Hnew)+orth.

V = Vnew

?

?

?

?

?

?

?

6

�

-

97 IMA 07 – 07/30/2007

97

Reference:

Yunkai Zhou, Y.S., Murilo L. Tiago, and James R. Che-

likowsky, Parallel Self-Consistent-Field Calculations with Cheby-

shev Filtered Subspace Iteration, Phy. Rev. E, vol. 74, p.

066704 (2006).

[See http://www.cs.umn.edu/∼saad]

98 IMA 07 – 07/30/2007

98

Chebyshev Subspace iteration - experiments

model size of H nstate nsymm nH−reducedSi525H276 292,584 1194 4 73,146

Si65Ge65H98 185,368 313 2 92,684

Ga41As41H72 268,096 210 1 268,096

Fe27 697,504 520 × 2 8 87,188

Fe51 874,976 520 × 2 8 109,372

Test problems

II Tests performed on an SGI Altix 3700 cluster (Min-

nesota supercomputing Institute). [CPU = a 1.3 GHz Intel

Madison processor. Compiler: Intel FORTRAN ifort, with

optimization flag -O3 ]

99 IMA 07 – 07/30/2007

99

method # A ∗ x SCF its. CPU(secs)

ChebSI 124761 11 5946.69

ARPACK 142047 10 62026.37

TRLan 145909 10 26852.84

Si525H276, Polynomial degree used is 8. Total energies

agreed to within 8 digits.

100 IMA 07 – 07/30/2007

100

method # A ∗ x SCF its. CPU (secs)

ChebSI 42919 13 2344.06

ARPACK 51752 9 12770.81

TRLan 53892 9 6056.11

Si65Ge65H98, Polynomial degree used is 8. Total energies

same to within 9 digits.

101 IMA 07 – 07/30/2007

101


ChebSI 138672 37 12923.27

ARPACK 58506 10 44305.97

TRLan 58794 10 16733.68

Ga41As41H72. Polynomial degree used is 16. The spec-

trum of each reduced Hamiltonian spans a large interval,

making the Chebyshev filtering not as effective as other

examples. Total energies same to within 9 digits.

102 IMA 07 – 07/30/2007

102


ChebSI 363728 30 15408.16

ARPACK 750883 21 118693.64

TRLan 807652 21 83726.20

Fe27, Polynomial degree used is 9. Total energies same to

within ∼ 5 digits.

103 IMA 07 – 07/30/2007

103


ChebSI 474773 37 37701.54

ARPACK 1272441 34 235662.96

TRLan 1241744 32 184580.33

Fe51, Polynomial degree used is 9. Total energies same to

within ∼ 5 digits.

104 IMA 07 – 07/30/2007

104

Larger tests

II Large tests with Silicon and Iron clusters –

Legend:

•nstate : number of states

•nH : size of Hamiltonian matrix

•# A ∗ x : number of total matrix-vector products

•# SCF : number of iteration steps to reach self-consistency

• total eVatom : total energy per atom in electron-volts

• 1st CPU : CPU time for the first step diagonalization

• total CPU : total CPU spent on diagonalizations to reach

self-consistency

105 IMA 07 – 07/30/2007

105

Silicon clusters [symmetry of 4 in all cases]

Si2713H828

nstate # A ∗ x # SCF total eVatom 1st CPU total CPU

5843 1400187 14 -86.16790 7.83 h. 19.56 h.

# PEs = 16. H Size = 1,074,080. m = 17 for Chebyshev-

Davidson; m = 10 for CheFSI.

Note: First diagonalization by TRLan costs 8.65 hours.

Fourteen steps would cost ≈ 121h.

106 IMA 07 – 07/30/2007

106

Si4001H1012


8511 1652243 12 -89.12338 18.63 h. 38.17 h.

# PEs = 16. nH =1,472,440. m = 17 for Chebyshev-


Note: 1st step diagonalization by TRLan costs 34.99 hours

107 IMA 07 – 07/30/2007

107

Si6047H1308


12751 2682749 14 -91.34809 45.11 h. 101.02 h.



Note: comparisons with TRlan no longer available

108 IMA 07 – 07/30/2007

108

Si9041H1860


19015 4804488 18 -92.00412 102.12 h. 294.36 h.

# PEs = 48; nH =2,992,832. m = 17 for Chebyshev-


109 IMA 07 – 07/30/2007

109

Iron clusters [symmetry of 12 in all cases]

Fe150


900 × 2 2670945 66 -794.95991 2.34 h. 19.67 h.



Fe286


1716 × 2 7405332 100 -795.165 10.19 71.66 h.

# PEs = 16. nH =2,726,968 m = 20 for Chebyshev-


110 IMA 07 – 07/30/2007

110

Fe388


2328 × 2 18232215 187 -795.247 16.22 247.05 h.Fe388

#PE= 24. nH = 3332856. m = 20 for

Chebyshev-Davidson; m = 18 for CheFSI.

Reference:

M. L. Tiago, Y. Zhou, M. M. G. Alemany, YS, and J.R.

Chelikowsky, The evolution of magnetism in iron from the

atom to the bulk, Physical Review Letters, vol. 97, pp.

147201-4, (2006).

111 IMA 07 – 07/30/2007

111

The generalized eigenvalue problem

II In many applications the problem to solve is of the form

Ax = λBx

II Often B is referred to as the “overlap matrix” in physics

II Arises for example in PAW and ultrasoft pseudo-potential

techniques

II It is the “Mass matrix” in finite element techniques

II B is often positive definite:

〈Bx, x〉 > 0 for all x 6= 0

II Eigenvalues are roots of p(λ) = det(A− λB).

II Many situations can arise

112 IMA 07 – 07/30/2007

112

Illustrative examples

Example: 1 A =

(−1 0

0 1

), B =

(0 1

1 0

)A and B are both symmetric, yet Λ = {+i,−i}

Example: 2 A =

(−1 0

0 1

), B =

(0 0

1 0

).

II det(A) 6= 0, but pair (A,B) has no eigenvalues: Λ = ∅.

Example: 3 A =

(0 −1

1 0

), B =

(0 1

1 0

)II det(B) 6= 0 −→ Λ = {±1}.

Example: 4 A =

(−1 0

1 0

), B =

(0 0

1 0

).

II Singular pair – any value is an eigenvalue. Λ = C.

113 IMA 07 – 07/30/2007

113

The bottom line:

II If B is nonsingular, then spectrum of the pair (A,B) is

equal to Λ(B−1A).

II .. If not, then spectrum is can be a discrete set, or the

empty set, or all of C.

II When B is non-singular can solve problem as

B−1Ax = λx .

II Not a good solution in practice (loss of symmetry, ... )

114 IMA 07 – 07/30/2007

114

Solving the generalized eigenvalue problem

II Consider now the case where A,B are both symmetric

real, and B is positive definite. [similar situation when A,

B are Hermitian complex]

II Can transform the problem into a standard one. Let

B = LLT the Cholesky factorization of B. Then:

Ax = λBx

if and only if

(L−1AL−T )y = λy with x = L−Ty.

II Can use standard algorithms on the matrixC = L−1AL−T .

II Need to 1) factor B once, and 2) solve a system with

L and one with LT at each step

115 IMA 07 – 07/30/2007

115

Lanczos with B inner product

Since B is SPD we can define the scalar product:

(x, y)B = (Bx, y)

II Along with its B-norm:

‖x‖B = (Bx, x)1/2

II Let C ≡ B−1A. Observe:

Even though C is nonsymmetric, it is self-adjoint w.r.t.

B-scalar product:

∀ x, y, (Cx, y)B = (x,Cy)B.

II Can use Lanczos with the matrix C (nonsymmetric)

using the B-dot product

116 IMA 07 – 07/30/2007

116

ALGORITHM : 5 Lanczos for Ax = λBx

1. Init: Select v1 s.t. ‖v1‖B = 1.

2. Set β1 = 0, z0 = v0 = 0, z1 = Bv1.

3.For j = 1, 2, . . . ,m, Do:

4. v := Avj − βjzj−1 , // Matvec

5. αj = (v, vj) ,

6. v := v − αjzj ,

7. w := B−1v , // B-solve

8. βj+1 =√

(w, v) ,

9. vj+1 = w/βj+1 and zj+1 = v/βj+1.

10.EndDo

II Line 8 computes ‖w‖B = (Bw,w)1/2.

117 IMA 07 – 07/30/2007

117

EIGENVALUE-FREE METHODS

118

Eigenvalue-free methods

II Diagonalization-based methods seem like an over-kill.

There are techniques which avoid diagonalization.. [“Order

n” methods,..]

II Main ingredient used: obtain charge densities by other

means than using eigenvectors, e.g. exploit density matrix

ρ(r, r′)

II Filtering ideas – approximation theory ideas, ..

119 IMA 07 – 07/30/2007

119

Avoiding the eigenvalue problem

Recall:

ρ(r) =occ∑j=1

|Ψj(r)|2

The main observation: The eigenvectors are not really

needed. When ψj(r) is discretized w.r.t. r then ρ(ri) ≡ρii is the i-th diagonal entry of the ‘density matrix’

P = V V T with V = [ψ1, . . . , ψocc]

II Any orthogonal basis V of the same space can be used.

II ‘Order n methods’ based on finding an approximation

to P . Bandedness /Sparseness of P (in certain bases) is

exploited.

120 IMA 07 – 07/30/2007

120

Avoiding eigenvalues / eigenvectors: Order n methods

II Methods based on approximating the projector P =

V V H directly. First observe:

tr[P ] =∑i pii = particle number

tr[PH] =∑i,j pijHji = system energy.

II Property exploited: P decays away from the main diag-

onal [true only for certain bases and certain systems]

Idea: Find P whose trace is fixed and which minimizes

tr(PH).

121

II The trace constraint can be avoided by shifting H

tr[P (H − µI)] = tr[PH] − µNe

II Need another constraint : P must be a projector, so:

0 ≤ λi(P ) ≤ 1

A common strategy: Seek P in the form

P = 3S2 − 2S3

where S is a banded approximation which minimizes:

tr[(3S2 − 2S3)(H − µI)]

II Descent type-algorithms used [The gradient of the above

function is computable. ]

122 IMA 07 – 07/30/2007

122

A few references: (review papers)

• S. Goedecker, “Linear Scaling electronic structure meth-

ods”, Rev. Modern Phy. vol 71, pp. 1085-1123 (1999).

• X.-P. Li, R. W. Nunes, and D. Vanderbilt, Density-matrix

electronic-structure method with linear system-size scaling,

Phys. Rev. B 47, 10891 (1993).

123 IMA 07 – 07/30/2007

123

Other Approaches

P = f(H)

where f is a step function. Approximate f by, e.g., a

polynomial

II Result: can obtain columns of P inexpensively via:

Pej ≈ pk(H)ej

II Exploit sparsity of P . Ideas of “probing” allow to com-

pute several columns of P at once.

II Statistical approach: work of Hutchinson for estimating

trace of a matrix [used in image processing] adapted to

estimating diagonals.

II Many variants currently being investigated

124 IMA 07 – 07/30/2007

124

Example 1 : Statistical approach

II Diagonal of a matrix B can be approximated by

Ds =

s∑k=1

vk �Bvk

�

s∑k=1

vk � vk

where v � w = componentwise product v, w, v � w =

componentwise division of v by w, and v1, . . . , vs = a

sequence of random vectors with normal distribution.

II Determinstic approach: For a banded matrix (bandwidth

p), there exists p vectors such the above formula yields the

exact diagonal.

II Methods require computing pk(H)v for several v’s.

Generally → expensive unless bandwith is small

125 IMA 07 – 07/30/2007

125

Example 2: density matrix in PW basis

II Recall P = f(H) = {ρ(r, r′)}

II Consider the expression in the G-basis

II Most entries are small –

II Idea: use technique of “probing” or “CPR” or “Sparse

Jacobian” estimators ....

126 IMA 07 – 07/30/2007

126

1 3 161

1

(1)

(3)

(12)

(15)

1

1

5 20

1

1

1

(5)

(13)

(20)

12 13

Probing in action: blue columns can be computed at once

by one matrix-vector product. Then red columns can be

coumputed the same way

127 IMA 07 – 07/30/2007

127

Density matrix for Si64

0 2000 4000 6000 8000 10000

0

2000

4000

6000

8000

10000

nz = 799

DROP TOL: 10−1

0 2000 4000 6000 8000 10000

0

2000

4000

6000

8000

10000

nz = 8661

DROP TOL: 10−2

128 IMA 07 – 07/30/2007

128

0 2000 4000 6000 8000 10000

0

2000

4000

6000

8000

10000

nz = 48213

DROP TOL: 10−3

0 2000 4000 6000 8000 10000

0

2000

4000

6000

8000

10000

nz = 196221

DROP TOL: 10−4

129 IMA 07 – 07/30/2007

129

Using Chebychev polynomials

Main idea: Call H the original Hamiltonian matrix. Then

P = f(H)

where f(t) is the Heaviside function:

6

-λocc λmax

1

II Replace f by a polynomial using Chebyshev expansions.

II This is now done in the planewave space not real space.

130 IMA 07 – 07/30/2007

130

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

-1.0 -0.5 0.0 0.5 1.0

H (

x)γ

M = 64 M = 128 M = 256

Chebyshev-Jackson Chebyshev

Chebyshev & Jackson polynomials of deg. 64, 128, & 258

[Jackson expansions are modifications of the least-squares

Chebyshev expansions that avoid Gibbs oscillations.]

131 IMA 07 – 07/30/2007

131

II P is approximated by pm(H)X where X is a matrix of

size n× p [e.g., 1st p columns of identity]

II Exploit recurrence relations of Chebyshev polynomials

and structure of P in planewave basis

II Test : with (Crys-

talline silicon) [See L. Jay

et al. 1998]

II Times scale like

n2 logn

II Cost is still high but

: (1) can avoid eigenvec-

tors completely, and (2)

can still iterate to self-

consistency

1e+00

1e+01

1e+02

1e+03

1e+04

1e+05

1 10 100

CP

U T

ime

(Sec

)

Number of Atoms, N

M=32

M=64

132 IMA 07 – 07/30/2007

132

Order n methods: general comments

II What is the cost of diagonalization-based methods?

II Let N = Basis size (or dimension of mesh), and m =

number of states. Then

Cost per SCF step :

O(N ∗m2)

↑+ O(m3)

↑Because you need an

orthonormal basis of

at least m vectors.

Because you must solve

an eigenvalue problem of

size at least m

II No matter which method is used – ultimately there is an

O(m3) calculation – where m is the number of occupied

states. Most methods focus on first term becauseN � m.

133 IMA 07 – 07/30/2007

133

II Many linear scaling techniques available – some com-

pute density of states only; others do not iterate to self-

consistency, ...

Needed: truly linear-scaling methods which do every-

thing done by Pseudopotential DFT [including SCF, forces,

geometry relaxation,....] and competitive with diagonalization-

based methods.

II Some efforts in this direction. See, e.g., ONETEP work

[Haynes et al. Phys. Stat. Sol. vol 243, pp. 2489 (2006)]

134 IMA 07 – 07/30/2007

134

Resources

Software:

II PARSEC: http://www.ices.utexas.edu/parsec/

II ARPACK: http://www.caam.rice.edu/software/ARPACK

II PRIMME:http://www.cs.wm.edu/∼andreas/software/

II TRLAN: http://crd.lbl.gov/∼kewu/ps/trlan .html

⊕ Watch for matlab code RSDFT.

135 IMA 07 – 07/30/2007

135

Books on diagonalization:

II “Templates for the Solution of Algebraic Eigenvalue

Problems: A Practical Guide”, Zhaojun Bai, et al., SIAM,

2000.

II Matrix Algorithms, Vol 2, G. W. Stewart, SIAM, 2001

II Numerical Methods for Large Eigenvalue Problems, YS.

Halstead Press (out-of print). Post-script available from

http://www.cs.umn.edu/∼saad/books.html

136 IMA 07 – 07/30/2007

136

a tutorial on: numerical methods for electronic structure ... · note: in electronic structure, it...

Documents