a tutorial on: numerical methods for electronic structure ... · note: in electronic structure, it...
TRANSCRIPT
A tutorial on:Numerical methods for electronic structure
calculations
Yousef SaadUniversity of Minnesota
Computer Science and Engineering
IMA - Minneapolis July 30th, 2007
1
Outline
Part 1
• Introduction, the electronic structure problem
• The Kohn-Sham equations: a nonlinear eigenvalue prob-
lem
• Planewaves and real-space discretization
• Algorithms for the eigenvalue problem: tools and general
projection techniques
2 IMA 07 – 07/30/2007
2
Part 2
• Algorithms for the eigenvalue problem: Subspace itera-
tion, and Krylov subspaces
• Preconditioning, Davidson and Jacobi-Davidson algo-
rithms
• The nonlinear viewpoint: Nonlinear subspace iteration
• Avoiding eigenvalue calculations and Order n methods
3 IMA 07 – 07/30/2007
3
General preliminary comments
II Ingredients of an effective numerical simulation:
+ ApproximationsPhysical Model
Efficient Algorithms High performanceComputers
Effective scientificSimulation
+
+=
4 IMA 07 – 07/30/2007
4
Most of the gains in speed combine advances from all
3 areas: simplifications from physics, effective numerical
algorithms, and powerful hardware+software tools.
II More than ever a successful physical simulation must be
cross-disciplinary.
II In particular, computational codes have become too
complex to be handled by ’one-dimensional’ teams.
II This tutorial: Algorithms – mostly diagonalization.
II Will illustrate the above with experience of cross - dis-
ciplinary collaboration
5 IMA 07 – 07/30/2007
5
The electronic structure problem
Schrodinger’s equation
HΨ = EΨ
II The Hamiltonian in its original form is very complex:
H = −~2
2m
∑i
∇2~ri
+∑i,j
e2
|~ri − ~rj|−
∑i
∑k
Zke2
|~ri − ~Rk|
−~2
2M
∑k
∇2~Rk
+∑k,l
e2
|~Rk − ~Rl|
II Wavefunction Ψ is a function of coordinates of all par-
ticles (+ electron spin). Problem is way too hard to solve.
6 IMA 07 – 07/30/2007
6
Approximations/theories used
II Born-Oppenheimer approximation: Neglects motion of
nuclei [much heavier than electrons]
II Density Functional Theory: observable quantities are
uniquely determined by ground state charge density.
Kohn-Sham equation:
[−
∇2
2+ Vion +
∫ρ(r′)
|r − r′|dr′ +
δExc
δρ
]Ψ = EΨ
Effective Hamiltonian is of the form
−∇2
2+ Vion + VH + Vxc
7 IMA 07 – 07/30/2007
7
The three potential terms
II Hartree Potential VH = solution of Poisson equation:
∇2VH = −4πρ(r)
where
ρ(r) =∑occupi=1 |ψi(r)|2
II Solve by CG or FFT once a distribution ρ is known.
II Vxc (exchange & correlation) approximated by a po-
tential induced by a local density. [LDA]. Valid for slowly
varying ρ(r).
8 IMA 07 – 07/30/2007
8
The ionic potential: pseudopotentials
Potential Vion is handled by pseudopotentials: replace
effect of core (inner shell) electrons of the system by a
smooth effective potential (Pseudopotential).
II Must be smooth and replicate the same physical effects
as those of all-electron potential.
II Vion is the sum of a local term and of low-rank projectors
for each atom.
Vion = Vloc +∑a
Pa
Roughly speaking in matrix form: Pa =∑UlmU
>lm
II A small-rank matrix localized around each atom.
9
II In the end:[−∇2
2 + Vion + VH + Vxc
]Ψ(r) = EΨ(r)
With
• Hartree potential (local)
∇2VH = −4πρ(r)
• Vxc depends on functional. For LDA:
Vxc = f(ρ(r))
• Vion = nonlocal – does not explicitly depend on ρ
Vion = Vloc +∑aPa
• VH and Vxc depend nonlinearly on eigenvectors:
ρ(r) =∑occupi=1 |ψi(r)|2
10 IMA 07 – 07/30/2007
10
Self Consistence
II The potentials and/or charge densities must be self-
consistent: Can be viewed as a nonlinear eigenvalue prob-
lem. Can be solved using different viewpoints
• Nonlinear eigenvalue problem: Linearize + iterate to
self-consistence
• Nonlinear optimization: minimize energy [again linearize
+ achieve self-consistency]
The two viewpoints are more or less equivalent
II Preferred approach: Broyden-type quasi-Newton tech-
nique
II Typically, a small number of iterations are required
11 IMA 07 – 07/30/2007
11
Self-Consistent Iteration
Initial Guess for V , V = Vat
Solve (−12∇2 + V )ψi = εiψi
Calculate new ρ(r) =∑occi |ψi|2
Find new VH: −∇2VH = 4πρ(r)
Find new Vxc = f [ρ(r)]
Vnew = Vion + VH + Vxc + ‘Mixing’
If |Vnew − V | < tol stop
V = Vnew
?
?
?
?
?
?
6
�
-
12
II Most time-consuming part = computing eigenvalues /
eigenvectors.
Characteristic : Large number of eigenvalues /-vectors
to compute [occupied states]. For example Hamiltonian
matrix size can be N = 1, 000, 000 and the number of
eigenvectors about 1,000.
II Self-consistent loop takes a few iterations (say 10 or 20
in easy cases).
13 IMA 07 – 07/30/2007
13
Two broad types of methods used
Planewave techniques Basis used is of the form:
w ~G(~r) = exp[i(~k + ~G).~r
]where ~G is the vector “index” of the planewave, r is the
coordinate in space. Basis for a Galerkin-type method
II This is similar to spectral methods used to solve certain
PDEs
II In H = −12∇2 + V , ∇2 is diagonal in Fourier space
II Calculations done in Fourier space - go back-and-forth
between real and Fourier space.
Real-space methods. Calculations done in real space –
using finite differences or finite element methods.
14 IMA 07 – 07/30/2007
14
Planewaves:
II Hamiltonian matrix is dense but it is not computed.
Action on vectors is performed with the help of FFTs
II Work is done in reciprocal space
II This is a Galerkin-type method. Hamiltonian is pro-
jected onto a good small dimensional subspace
Advantages:
II Method is ‘variational’: Thus the smallest eigenvalue
will decrease if the size of the basis increases.
II Easy to ‘precondition’ since a smaller basis can be used
to provide a nice approximation for the larger problem.
II Planewaves provide a good initial approximation - so we
need not use large bases
15
Disadvantages:
II FFTs do not parallelize too well [however there are other
ways to extract parallelism - not an issue for very large
problems.]
II FFTs somewhat expensive
II Method is geared toward periodic systems (crystals).
Must use super-cells for confined systems.
16 IMA 07 – 07/30/2007
16
Real-space Finite Difference Methods
II Use High-Order Finite Difference Methods [Fornberg &
Sloan ’94]
II Typical Geometry = Cube – regular structure.
II Laplacean matrix need not even be stored.
Order 4 Finite Differ-
ence Approximation:x
yz
17 IMA 07 – 07/30/2007
17
Fornberg-Sloan formulas: (for ∂2/∂x2)
Order Coefficients
2 1 -2 1
4 −112
43
−52
43
−112
6 190 − 3
2032 −49
1832 − 3
20190
8 − 1560
8315 −1
585 −205
7285 −1
58
315 − 1560
II PARSEC code can use up to order 12 – but often order
is 6 or 8.
II Nonlocal term → sum of localized low-rank matrices.
18 IMA 07 – 07/30/2007
18
Pattern of resulting matrix
19 IMA 07 – 07/30/2007
19
Advantages:
• Easy to implement (parallelism easy to exploit)
• Less costly per step. No FFT’s required.
• Leads to sparse matrices.
Disadvantages:
•Not as easy to ‘precondition’ as planewave techniques
•May require more points to reach same accuracy as planewaves
(partial remedy: use high order finite differences)
•Not variational (important?)
II Bottom line: FD codes can (1) do very large systems
... (2) fast ... (3) ... accurately .. (4) on massively parallel
computers. Main appeal = ease of implementation.
20 IMA 07 – 07/30/2007
20
PARSEC
21
PARSEC
II PARSEC = Pseudopotential Algorithm for Real Space
Electronic Calculations
II Represents ≈ 15 years of gradual improvements
II Runs on sequential and parallel platforms – using MPI.
II Efficient diagonalization -
II Takes advandate of symmetry
22 IMA 07 – 07/30/2007
22
PARSEC - A few milestones
• Sequential real-space code on Cray YMP [up to ’93]
•Cluster of SGI workstations [up to ’96]
•CM5 [’94-’96] Massive parallelism begins
• IBM SP2 [Using PVM]
•Cray T3D [PVM + MPI] ∼ ’96; Cray T3E [MPI] – ’97
• IBM SP with +256 nodes – ’98+
• SGI Origin 3900 [128 processors] – ’99+
• IBM SP + F90 - PARSEC name given, ’02
•Current: Runs on many machines (SGI Altix, IBM SP4,
Cray XT, ...)
• ’05: PARSEC released
23 IMA 07 – 07/30/2007
23
Main contributors to code:
Early days:
N. Troullier
K. Wu (*)
X. Jing
H. Kim
A. Stathopoulos (*)
I. Vassiliev
More recent:
M. Jain
L. Kronik
R. Burdick (*)
M. Alemany
M. Tiago
Y. Zhou (*)
(*) = Computer-Scientist
24 IMA 07 – 07/30/2007
24
The physical domain
25 IMA 07 – 07/30/2007
25
Pattern of resulting matrix for Ge99H100:
26 IMA 07 – 07/30/2007
26
Domain Mapping:
Domain i
mapped to
processor i
A domain decomposition approach is used
27
Problem Setup
Non-linear step
MasterV
ρ
v1 v2 v3 v4 v5 ...
worker p
...
worker 5
worker 4
worker 3
worker 2
worker 1
wavefunctions and potential
28
Sample calculations done: Quantum dots
II Small silicon clusters (≈ 20 − 100Angst. Involve up to
a few hundreds atoms)
• Si525H276 leads to a matrix size of N ≈ 290, 000
and nstates = 1, 194 eigenpairs.
• In 1997 this took ∼ 20 hours of CPU time on the
Cray T3E, using 48 processors.
• TODAY: 2 hours on one SGI Madison proc. (1.3GHz)
• Could be done on a good workstation!
II Gains: hardware and algorithms
II Algorithms: Better diagonalization + New code exploits
symmetry
29 IMA 07 – 07/30/2007
29
Si525H276
30
Matlab version: RSDFT (work in progress)
II Goal is to provide (1) prototyping tools (2) simple codes
for teaching Real-space DFT with pseudopotentials
II Can do small systems on this laptop – [Demo later..]
II Idea: provide similar input file as PARSEC –
II Many summer interns helped with the project. This year
and last year’s interns:
Virginie Audin, Long Bui, Nate Born, Amy Coddington,
Nick Voshell, Adam Jundt, ...
+ ... others who worked with a related visualization tool.
31 IMA 07 – 07/30/2007
31
SOLVING LARGE EIGENVALUE PROBLEMS
32
Outline
•Projection methods
•The subspace iteration
•Krylov subspace methods
•Restarting, implicit restarts.
•Preconditioning and Davidson’s method
•The Jacobi-Davidson idea.
•Generalized eigenvalue problems
•AMLS
33 IMA 07 – 07/30/2007
33
General Tools for Solving Large Eigen-Problems
II Projection techniques – Arnoldi, Lanczos, Subspace It-
eration;
II Preconditioninings: shift-and-invert, Polynomials, ...
II Deflation and restarting techniques
Effective algorithms combine these three ingredients
34 IMA 07 – 07/30/2007
34
A few popular solution Methods
• Subspace Iteration [Now less popular – sometimes used
for validation]
•Arnoldi’s method (or Lanczos) with polynomial acceler-
ation [Stiefel ’58, Rutishauser ’62, YS ’84,’85, Sorensen
’89,...]
• Shift-and-invert and other preconditioners. [Use Arnoldi
or Lanczos for (A− σI)−1.]
•Davidson’s method and variants, Generalized Davidosn’s
method [Morgan and Scott, 89], Jacobi-Davidsion
• Emerging method (structures): Automatic Multilevel Sub-
structuring (AMLS).
35 IMA 07 – 07/30/2007
35
Projection Methods for Eigenvalue Problems
Principle: One of the most important ideas in compu-
tational methods is that of projection, which consists of
extracting an approximation to the given problem (linear
system, eigenvalue problem) from a subspace.
Formulation for the general case:
II Given: Two subspaces K and L of same dimension m
II Find: λ, u such that
λ ∈ C, u ∈ K; (λI −A)u ⊥ L
Note: m degrees of freedom (K), m constraints (L).
36 IMA 07 – 07/30/2007
36
Two types of methods:
II Orthogonal projection methods: situation when L = K
(generally used in the Hermitian case)
II Oblique projection methods: When L 6= K often used
in the non-Hermitian case.
Note: in electronic structure, it is important to think in
terms of subspaces rather than individual eigenvectors.
II Implementation of projection methods: Rayleigh-Ritz
approach with orthonormal bases.
37 IMA 07 – 07/30/2007
37
Rayleigh-Ritz projection
Given: a subspace X known to contain good approxi-
mations to eigenvectors of A.
Question: How to extract good approximations to eigen-
values/ eigenvectors from this subspace?
Answer: Rayleigh Ritz process.
Let Q = [q1, . . . , qm] an orthonormal basis of X. Then
write an approximation in the form u = Qy and obtain y
by writing
QH(A− λI)u = 0
II QHAQy = λy
38 IMA 07 – 07/30/2007
38
Procedure:
1. Obtain an orthonormal basis of X
2. Compute C = QHAQ (an m×m matrix)
3. Diagonalize C, C = Y ΛY H
4. Compute U = QY
Terminology: A subspace X is invariant if AX ⊆ X, i.e.,
if
Au ∈ X for all u ∈ X
Property: if X is (exactly) invariant, then procedure will
yield exact eigenvalues and eigenvectors.
II Can use above procedure in conjunction with the sub-
space obtained from subspace iteration algorithm
39 IMA 07 – 07/30/2007
39
Subspace Iteration
Original idea: projection technique onto a subspace if the
form Y = AkX = Span{Akx1, Akx2, · · · , Akxm}
II Computes the dominant eigenvalues (Largest in |.|)
Advantages:
• Easy to implement;
• Easy to analyze;
Disadvantage:
Slow.
II In practice: used with polynomial acceleration, i.e.,
AkX replaced by Ck(A)X. Typically Ck = Chebyshev
polynomial.
40 IMA 07 – 07/30/2007
40
Algorithm: Subspace Iteration with Projection
1. Start: Choose an initial basis X = [x1, . . . , xm]
2. Iterate: Until convergence do:
(a) Select a polynomial Ck(b) Compute Z = Ck(A)Xold.
(c) Compute Xnew = RayleighRitz (A, Z)
II RayleighRitz (A, Z) steps:
1) Orthonormalize Z into Z; 2) Compute B = ZHAZ ;
3) Diagonalize: B = Y ΛY T ; 4; Set Xnew := ZY .
II Cost per iteration: O(m×nnz(A))+O(m2n)+O(m3)
II Typically: m is of the order of # (wanted eigenvalues)
41 IMA 07 – 07/30/2007
41
A quick overview of Chebyshev polynomials
II Chebyshev polynomials of
1st kind. For |t| ≤ 1:
Ck(t) = cos(k cos−1(t))
II |Ck(t)| ≤ 1 for t ∈[−1, 1]. Grows very fast for
|t| > 1.
II In the plot Ck is scaled:
Ck(t)/Ck(−1.1) is shown. −1 −0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
Deg. 8 Cheb. polynom., on interv.: [−11]
II Obeys 3-term recurrence:
Ck+1(t) = 2tCk(t) − Ck−1(t)
starting with C0(t) = 1;C1(t) = t.
42 IMA 07 – 07/30/2007
42
II Can compute sequence vk = Ck(A)v0 from:
vk+1 = 2 ∗A ∗ vk − vk−1
II If undesired eigenvalues are in interval [a, b] – then we
need to use the polynomial
Pk(t) =Ck(η(t))
Ck(η(c))
where η(t) is the linear transformation:
η(t) = 2 t−b−ab−a
and c is an estimate of the eigenvalue that is farthest away
from (a+ b)/2.
II More on Chebyshev ‘filtering’ later
43 IMA 07 – 07/30/2007
43
KRYLOV SUBSPACE METHODS
44
KRYLOV SUBSPACE METHODS
Principle: Projection methods on Krylov subspaces:
Km(A, v1) = span{v1, Av1, · · · , Am−1v1}
II Probably the most important class of projection meth-
ods [for linear systems and for eigenvalue problems]
II Many variants exist
A few important observations:
•Km = {p(A)v | p = polynomial of degree ≤ m− 1}• If we change A to αA+σI, then Km does not change.
45 IMA 07 – 07/30/2007
45
ARNOLDI’S ALGORITHM
II Goal: to compute an orthogonal basis of Km.
II Input: Initial vector v1, with ‖v1‖2 = 1 and m.
II Setting: A is general non-symmetric (non-Hermitian)
ALGORITHM : 1 Arnoldi’s procedure
For j = 1, ...,m do
Compute w := Avj
For i = 1, . . . , j, do
{hi,j := (w, vi)
w := w − hi,jvi
hj+1,j = ‖w‖2 if hj+1,j == 0 STOP
vj+1 = w/hj+1,j
End
46 IMA 07 – 07/30/2007
46
II Notice:
Avj =
j+1∑i=1
hijvi
II What do we obtain if we look at A[v1, ..., vm]?
II Notation Vm = [v1, v2, ..., vm]
47 IMA 07 – 07/30/2007
47
Result of Arnoldi’s algorithm
Let
Hm =
x x x x x
x x x x x
x x x x
x x x
x x
1. Vm = [v1, v2, ..., vm] = orthonormal basis of Km.
2. AVm = VmHm + hm+1,mvm+1eTm
3. V TmAVm = Hm
48 IMA 07 – 07/30/2007
48
Appliaction to eigenvalue problems
II Write approximate eigenvector as u = Vmy + Galerkin
condition
(A− λI)Vmy ⊥ Km → V Hm (A− λI)Vmy = 0
II Approximate eigenvalues are eigenvalues of Hm
Hmyj = λjyj
Associated approximate eigenvectors are
uj = Vmyj
Typically a few of the outermost eigenvalues will converge
first.
49 IMA 07 – 07/30/2007
49
Hermitian case: The Lanczos Algorithm
A = AH and V Hm AVm = Hm → Hm = HHm
II The Hessenberg matrix becomes tridiagonal, real, sym-
metric:
II We can write
Hm =
α1 β2
β2 α2 β3
β3 α3 β4
. . .
. . .
βm αm
II Consequence: three term recurrence
βj+1vj+1 = Avj − αjvj − βjvj−1
50 IMA 07 – 07/30/2007
50
ALGORITHM : 2 Lanczos
1. Choose an initial vector v1 of norm unity. Set β1 ≡ 0, v0 ≡ 0
2. For j = 1, 2, . . . ,m Do:
3. wj := Avj − βjvj−1
4. αj := (wj, vj)
5. wj := wj − αjvj6. βj+1 := ‖wj‖2. If βj+1 = 0 then Stop
7. vj+1 := wj/βj+1
8. EndDo
Hermitian matrix + Arnoldi → Hermitian Lanczos
II Convergence ?
51 IMA 07 – 07/30/2007
51
CONVERGENCE THEORY
52
The Lanczos Algorithm in the Hermitian Case
Assume eigenvalues sorted increasingly
λ1 ≤ λ2 ≤ · · · ≤ λn
II Orthogonal projection method onto Km;
II To derive error bounds, use the Courant characterization
λ1 = minu ∈ K, u 6=0
(Au, u)
(u, u)=
(Au1, u1)
(u1, u1)
λj = min{u ∈ K, u 6=0u ⊥u1,...,uj−1
(Au, u)
(u, u)=
(Auj, uj)
(uj, uj)
53 IMA 07 – 07/30/2007
53
II Bounds for λ1 easy to find
II Ritz values approximate eigenvalues of A inside out:
λ1 λ2
λ1 λ2
λn−1 λn
λn−1 λn
54 IMA 07 – 07/30/2007
54
A-priori error bounds
Theorem [Kaniel, 1966]:
0 ≤ λ(m)1 − λ1 ≤ (λN − λ1)
[tan ∠(v1, u1)
Tm−1(1 + 2γ1)
]2
where γ1 = λ2−λ1λN−λ2
; and ∠(v1, u1) = acute angle
between v1 and u1.
+ Results for other eigenvalues. [Kaniel, Paige, YS]
Theorem [YS,1980]
0 ≤ λ(m)i − λi ≤ (λN − λ1)
[κ
(m)i
tan ∠(vi, ui)
Tm−i(1 + 2γi)
]2
where γi =λi+1−λiλN−λi+1
, κ(m)i =
∏j<i
λ(m)j −λNλ
(m)j −λi
55 IMA 07 – 07/30/2007
55
Convergence
0 1 2 3 4 5 6 7 850
100
150
200
250
300
Lanc
zos
step
s
converged eigenpairs
Convergence of eigenpairs for Lapl(15,20)
56 IMA 07 – 07/30/2007
56
Loss of orthogonality
II In theory vi’s defined by 3-term recurrence are orthog-
onal.
II However: in practice severe loss of orthogonality;
Observation [Paige, 1981]: Loss of orthogonality starts
suddenly, when the first eigenpair has converged. It is
a sign of loss of linear indedependence of the underlying
eigenvectors. When orthogonality is lost, then several
copies of the same eigenvalue start appearing.
II Remedies??
57 IMA 07 – 07/30/2007
57
Reorthogonalization
• Full reorthogonalization – reorthogonalize vj+1 against
all previous vi’s every time.
• Partial reorthogonalization – reorthogonalize vj+1 against
all previous vi’s only when needed [Parlett & Simon]
• Selective reorthogonalization – reorthogonalize vj+1 against
computed eigenvectors [Parlett & Scott]
• No reorthogonalization – Do not reorthogonalize - but
take measures to deal with ’spurious’ eigenvalues. [Cul-
lum & Willoughby]
58 IMA 07 – 07/30/2007
58
Example: Partial Reorth. Lanczos - PLAN
Goal: To illustrate how to adapt a well-known algorithm
to the specific situation of interest.
II Compute eigenspace instead of individual eigenvectors
II No full reorthogonalization: reorthogonalize when needed
II No restarts – so, much larger basis needed in general
Ingredients: (1) test of loss of orthogonality (recurrence
relation) and (2) stopping criterion based on charge
density instead of eigenvectors.
59 IMA 07 – 07/30/2007
59
Partial Reorth. Lanczos - (Background)
II Recall:
βj+1vj+1 = Avj − αjvj − βjvj−1
II Scalars βj+1, αj selected so that vj+1 ⊥ vj, vj+1 ⊥vj−1, and ‖vj+1‖2 = 1.
II In theory this is enough to guarantee that {vj} is or-
thonormal. + we have:
V TmAVm = Tm =
α1 β2
β2 α2 β3. . . . . . . . .
βm−1 αm−1 βmβm αm
II In practice: orthogonality lost quickly
60 IMA 07 – 07/30/2007
60
Remedy: Partial reorthogonalization: reorthogonalize only
when deemed necessary, and only to maintain orthogonality
to within√
machine epsilon
II Uses an inexpensive recurrence relation
II Work done in the 80’s [Parlett, Simon, and co-workers]
+ more recent work [Larsen, ’98]
II Package: PROPACK [Larsen] V 1: 2001, most recent:
V 2.1 (Apr. 05)
II In tests with real-space Hamiltonians from PARSEC,
need for reorthogonalization not too strong.
61 IMA 07 – 07/30/2007
61
Reorthgonalization: a small example
0 20 40 60 80 100 120 140 160 180 20010−14
10−12
10−10
10−8
10−6
10−4
10−2
100
Lanczos steps
leve
l of o
rtho
gona
lity
0 20 40 60 80 100 120 140 160 180 20010−14
10−13
10−12
Lanczos steps
leve
l of o
rtho
gona
lity
Levels of orthogonality of the Lanczos basis for the Hamil-
tonian corresponding to Si10H16 (n = 17077). Left: no
reorthogonalization. Right: partial reorth. (34 in all)
62 IMA 07 – 07/30/2007
62
Second ingredient: avoid computing and updating eigen-
values / eigenvectors. Instead:
Test how good is the underlying eigenspace without
knowledge of individual eigenvectors. When converged
– then compute the basis (eigenvectors). Test: sum of
occupied energies has converged = a sum of eigenvalues
of a small tridiagonal matrix. Inexpensive.
II See:
“Computing Charge Densities with Partially Reorthogonal-
ized Lanczos”, C. Bekas, Y. Saad, M. L. Tiago, and J. R.
Chelikowsky; to appear, CPC.
63 IMA 07 – 07/30/2007
63
Partial Reorth. Lanczos vs. ARPACK for Ge99H100.Partial Lanczos ARPACK
no A ∗ x orth mem. secs A ∗ x rest. mem. secs
248 3150 109 2268 2746 3342 20 357 16454
350 4570 184 3289 5982 5283 24 504 37371
496 6550 302 4715 13714 6836 22 714 67020
II Matrix-size = 94,341; # nonzero entries = 6,332,795
II Number of occupied states : neig=248.
II Requires more memory but ...
II ... Still manageable for most systems studied [can also
use secondary storage]
II ... and gain a factor of 4 to 6 in CPU time
64 IMA 07 – 07/30/2007
64
DEFLATION
65
Deflation
II Deflation is any technique which aims at ‘removing’
already computed eigenvectors to enable the computation
of the others
II Very useful in practice
II Different forms: locking (subspace iteration), selective
orthogonalization (Lanczos), Schur deflation, ...
66 IMA 07 – 07/30/2007
66
Standard explicit deflation for Hermitian case
Wiedlandt Deflation: Assume we have computed a right
eigenpair λ1, u1. Wielandt deflation considers eigenvalues
of
A1 = A− σu1uH1
Note:
Λ(A1) = {λ1 − σ, λ2, . . . , λn}Eigenvectors are preserved.
II It is possible to apply this procedure successively:
67 IMA 07 – 07/30/2007
67
ALGORITHM : 3 Explicit Deflation
1. A0 = A
2. For j = 0 . . . µ− 1 Do:
3. Compute a dominant eigenvector of Aj4. Define Aj+1 = Aj − σjuju
Hj
5. End
II Computed u1, u2., .. form a set of eigenvectors for A.
II Alternative: implicit deflation (within a procedure such
as Lanczos).
68
Deflated Lanczos: When first eigenvector converges,
we freeze it as the first vector of Vm =
[v1, v2, . . . , vm]. Lanczos procedure starts working at
column v2. Re-Orthogonalization is still done against
v1, ..., vj at step j. Each new converged eigenvector
will be added to the ‘locked’ set of eigenvectors.
69 IMA 07 – 07/30/2007
69
Illustration
Thus, for k = 2: Vm =
v1, v2︸ ︷︷ ︸Locked
,active︷ ︸︸ ︷
v3, . . . , vm
Tm =
∗∗
∗ ∗∗ ∗ ∗
∗ ∗ ∗∗ ∗
II Top 2 × 2 part of Tm is frozen
70 IMA 07 – 07/30/2007
70
PRECONDITIONING - DAVIDSON’S METHOD
71
Preconditioning eigenvalue problems
Goal: To extract good approximations to add to a
subspace in a projection process. Result: faster conver-
gence.
II Best known technique: Shift-and-invert; Work with
B = (A− σI)−1
II Some success with polynomial preconditioning [Cheby-
shev iteration / least-squares polynomials]. Work with
B = p(A)
II Above preconditioners preserve eigenvectors. Other
methods (Davidson) use a more general preconditionerM .
72 IMA 07 – 07/30/2007
72
Shift-and-invert preconditioning
Main idea: to use Arnoldi, or Lanczos, or subspace
iteration for the matrix B = (A−σI)−1. The matrix B
need not be computed explicitly. Each time we need to
apply B to a vector we solve a system with B.
II Factor B = A−σI = LU . Then each solution Bx = y
requires solving Lz = y and Ux = z.
Advantages:
• Fast convergence
•Cannot miss eigen-
values
•No problems with in-
terior eigenvalues
Disadvantages:
Factorization cost can
be very high. Not
practical for 3-D prob-
lems - in real-space ap-
proaches
73 IMA 07 – 07/30/2007
73
Preconditioning by polynomials
Main idea:
Iterate with p(A) instead of A in Arnoldi or Lanczos,..
II Used very early on in subspace iteration [Rutishauser,
1959.]
II Usually not as reliable as Shift-and-invert techniques but
less demanding in terms of storage.
74 IMA 07 – 07/30/2007
74
Question: How to find a good polynomial (dynamically)?
Approaches:
1 Use of Chebyshev polynomials
2 Use polynomials based on ‘Leja’ points
3 Least-squares polynomials
4 Polynomials from a previous Lanczos run
75 IMA 07 – 07/30/2007
75
Polynomial filters and implicit restart
II Goal: to apply polynomial filter of the form
p(t) = (t− θ1)(t− θ2) . . . (t− θq)
by exploiting the Arnoldi or Lanczos procedure.
Assume AVm = VmHm + βmvm+1eTm
and consider first factor: (t− θ1)
(A− θ1I)Vm = Vm(Hm − θ1I) + βmvm+1eTm
Let Hm − θ1I = Q1R1. Then,
(A− θ1I)Vm = VmQ1R1 + βmvm+1eTm →
(A− θ1I)(VmQ1) = (VmQ1)R1Q1 + βmvm+1eTmQ1 →
A(VmQ1) = (VmQ1)(R1Q1 + θ1I) +
βmvm+1eTmQ1
76 IMA 07 – 07/30/2007
76
Notation: R1Q1 + θ1I ≡ H(1)m ; (b
(1)m+1)
T ≡eTmQ1; VmQ1 = V
(1)m
II AV(1)m = V
(1)m H
(1)m + vm+1(b
(1)m+1)
T
II Note that H(1)m is upper Hessenberg (resp. Tridiagonal)
II Similar to an Arnoldi decomposition (resp. Lanczos)
Observe:
II R1Q1 + θ1I ≡ matrix resulting from one step of the
QR algorithm with shift θ1 applied to Hm.
II First column of V(1)m is a multiple of (A− θ1I)v1.
II The columns of V(1)m are orthonormal.
77 IMA 07 – 07/30/2007
77
Can now apply second shift in same way:
(A−θ2I)V(1)m = V
(1)m (H
(1)m −θ2I)+vm+1(b
(1)m+1)
T →
Similar process: (H(1)m − θ2I) = Q2R2 then ×Q2 to the
right:
(A−θ2I)V(1)m Q2 = (V
(1)m Q2)(R2Q2)+vm+1(b
(1)m+1)
TQ2
AV(2)m = V
(2)m H
(2)m + vm+1(b
(2)m+1)
T
Now:
First column of V(2)m = scalar ×(A− θ2I)v
(1)1
= scalar ×(A− θ2I)(A− θ1I)v1
78 IMA 07 – 07/30/2007
78
II Note that
(b(2)m+1)
T = eTmQ1Q2 = [0, 0, · · · , 0, q1, q2, q3]
II Let: Vm−2 = [v1, . . . , vm−2] consist of first m − 2
columns of V(2)m and Hm−2 = leading principal submatrix
of Hm. Then
AVm−2 = Vm−2Hm−2 + βm−1vm−1eTm with
βm−1vm−1 ≡ q1vm+1 + h(2)m−1,m−2v
(2)m−1 ‖vm−1‖2 = 1
II Result: An Arnoldi/Lanczos process of m−2 steps with
the initial vector p(A)v1.
II In other words: We know how to apply polynomial fil-
tering via a form of the Arnoldi/Lanczos process, combined
with the QR algorithm.
79 IMA 07 – 07/30/2007
79
The Davidson approach
Goal: to use a more general preconditioner to introduce
good new components to the subspace.
• Ideal new vector would be eigenvector itself!
•Next best thing: an approximation to (A−µI)−1r where
r = (A− µI)z, current residual.
•Approximation written in the form M−1r. Note that M
can vary at every step if needed.
80 IMA 07 – 07/30/2007
80
ALGORITHM : 4 Davidson’s method
1. Choose an initial unit vector v1. Set V1 = [v1].2. Until convergence Do:3. For j = 1, . . . ,m Do:4. w := Avj.
5. Update Hj ≡ V Tj AVj6. Compute the smallest eigenpair µ, y of Hj.7. z := Vjy r := Az − µz8. Test for convergence. If satisfied Return
9. If j < m compute t := M−1j r
10. compute Vj+1 := ORTHN([Vj, t])11. EndIf12. Set v1 := z and go to 313. EndDo14. EndDo
81 IMA 07 – 07/30/2007
81
II Note: Traditional Davidson uses diagonal precondition-
ing: Mj = D − σjI.
II Will work only for some matrices
Other options:
II Shift-and-invert using ILU [negatives: expensive + hard
to parallelize.]
II Filtering (by averaging)
II Filtering by using smoothers (multigrid style)
II Iterative solves [See Jacobi-Davidson]
82 IMA 07 – 07/30/2007
82
JACOBI – DAVIDSON
83
Introduction via Newton’s metod
Assumptions: M = A+ E and Az ≈ µz
Goal: to find an improved eigenpair (µ+ η, z + v).
II Write A(z + v) = (µ + η)(z + v) and neglect second
order terms + rearrange II(M − µI)v − ηz = −r with r ≡ (A− µI)z
II Unknowns: η and v.
II Underdetermined system. Need one constraint.
II Add the condition: wHv = 0 for some vector w.
84
In matrix form:(M − µI −zwH 0
) (v
η
)=
(−r0
)II Eliminate v from second equation:(
M − µI −z0 wH(M − µI)−1z
) (v
η
)=
(−rs
)with s = wH(M − µI)−1r
II Solution: [Olsen’s method]
η =wH(M − µI)−1r
wH(M − µI)−1zv = −(M − µI)−1(r − ηz)
85 IMA 07 – 07/30/2007
85
II When M = A, corresponds to Newton’s method for
solving {(A− λI)u = 0
wTu = Constant
Note: Another way to characterize the solution is:
v = −(M − µI)−1r + η(M − µI)−1z,
v ⊥ w
II Involves inverse of (M−λI). Jacobi-Davidson rewrites
solution using projectors.
86 IMA 07 – 07/30/2007
86
II Let Pz be a projector in the direction of z which leaves
r invariant. It is of the form Pz = I −zsH
sHzwhere s ⊥ r.
II Similarly let Pw = any projector which leaves v in-
changed. Then can rewrtite Olsen’s solution :
[Pz(M − µI)Pw]v = −r wHv = 0
II The two solutions are mathematically equivalent!
87 IMA 07 – 07/30/2007
87
The Jacobi-Davidson approach
II In orthogonal projection methods (e.g. Arnoldi) we have
r ⊥ z
II Also it is natural to take w ≡ z. Assume ‖z‖2 = 1
With the above assumptions, Olsen’s correction equation
is mathematically equivalent to finding v such that :
(I − zzH)(M − µI)(I − zzH)v = −r v ⊥ z
II Main attraction: can use iterative method for the solu-
tion of the correction equation. (M -solves not explicitly
required).
88 IMA 07 – 07/30/2007
88
Diagonalization methods in PARSEC
1. At the beginning there was simplicity: DIAGLA
II In-house parallel code developed circa 1995 (1st version)
II Uses a form of Davidson’s method with various enhance-
ments
II Feature: can use part of a subspace from previous scf
iteration
II One issue: Preconditioner required in Davidson – but
not easy in real-space methods [in contrast with planewaves]
89 IMA 07 – 07/30/2007
89
2. Later ARPACK was added as an option
II State of the art public domain diagonalization package
II Could be a few times faster than DIAGLA - Also: quite
robust. But...
II .. cannot reuse previous eigenvectors for next SCF
iteration.
II .. Not easy to modify to adjust to our needs
II .. Really a method not designed for large eigenspaces
90 IMA 07 – 07/30/2007
90
3. Recent work: focus on eigenspaces – using a *nonlin-
ear* subspace iteration approach. More on this shortly
4. Also explored - AMLS
II Domain-decomposition type approach
II Very complex (to implement) ...
II and ... accuracy not sufficient for DFT approaches
[Although this may be fixed]
91 IMA 07 – 07/30/2007
91
Current work on diagonalization
Focus:
II Compute eigen-space - not individual eigenvectors.
II Take into account outer (SCF) loop
II Future: eigenvector-free or basis-free methods
Motivation:
Standard packages (ARPACK) do not easily
take advantage of specificity of problem:
self-consistent loop, large number of eigen-
values, ...
92 IMA 07 – 07/30/2007
92
CHEBYSHEV FILTERING
93
Chebyshev Subspace iteration
II Main ingredient: Chebyshev filtering
Given a basis [v1, . . . , vm], ’filter’ each vector as
vi = Pk(A)vi
II pk = Polynomial of low degree: enhances desired eigen-
components
The filtering step is not used
to compute eigenvectors ac-
curately IISCF & diagonalization loops
merged
Important: convergence still
good and robust 0 0.5 1 1.5 2−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Deg. 6 Cheb. polynom., damped interv=[0.2, 2]
94 IMA 07 – 07/30/2007
94
Main step:
Previous basis V = [v1, v2, · · · , vm]
↓Filter V = [p(A)v1, p(A)v2, · · · , p(A)vm]
↓Orthogonalize [V,R] = qr(V , 0)
II The basis V is used to do a Ritz step (basis rotation)
C = V TAV → [U,D] = eig(C) → V := V ∗ U
II Update charge density using this basis.
II Update Hamiltonian — repeat
95 IMA 07 – 07/30/2007
95
II In effect: Nonlinear subspace iteration
II Main advantages: (1) very inexpensive, (2) uses minimal
storage (m is a little ≥ # states).
II Filter polynomials: if [a, b] is interval to dampen, then
pk(t) =Ck(l(t))
Ck(l(c)); with l(t) =
2t− b− a
b− a
• c ≈ eigenvalue farthest from (a+b)/2 – used for scaling
II 3-term recurrence of Chebyshev polynommial exploited
to compute pk(A)v. IfB = l(A), thenCk+1(t) = 2tCk(t)−Ck−1(t) →
wk+1 = 2Bwk − wk−1
96
Select initial V = Vat
Get initial basis {ψi} (diag)
Calculate new ρ(r) =∑occi |ψi|2
Find new VH: −∇2VH = 4πρ(r)
Find new Vxc = f [ρ(r)]
Vnew = Vion + VH + Vxc + ‘Mixing’
If |Vnew − V | < tol stop
Filter basis {ψi} (with Hnew)+orth.
V = Vnew
?
?
?
?
?
?
?
6
�
-
97 IMA 07 – 07/30/2007
97
Reference:
Yunkai Zhou, Y.S., Murilo L. Tiago, and James R. Che-
likowsky, Parallel Self-Consistent-Field Calculations with Cheby-
shev Filtered Subspace Iteration, Phy. Rev. E, vol. 74, p.
066704 (2006).
[See http://www.cs.umn.edu/∼saad]
98 IMA 07 – 07/30/2007
98
Chebyshev Subspace iteration - experiments
model size of H nstate nsymm nH−reducedSi525H276 292,584 1194 4 73,146
Si65Ge65H98 185,368 313 2 92,684
Ga41As41H72 268,096 210 1 268,096
Fe27 697,504 520 × 2 8 87,188
Fe51 874,976 520 × 2 8 109,372
Test problems
II Tests performed on an SGI Altix 3700 cluster (Min-
nesota supercomputing Institute). [CPU = a 1.3 GHz Intel
Madison processor. Compiler: Intel FORTRAN ifort, with
optimization flag -O3 ]
99 IMA 07 – 07/30/2007
99
method # A ∗ x SCF its. CPU(secs)
ChebSI 124761 11 5946.69
ARPACK 142047 10 62026.37
TRLan 145909 10 26852.84
Si525H276, Polynomial degree used is 8. Total energies
agreed to within 8 digits.
100 IMA 07 – 07/30/2007
100
method # A ∗ x SCF its. CPU (secs)
ChebSI 42919 13 2344.06
ARPACK 51752 9 12770.81
TRLan 53892 9 6056.11
Si65Ge65H98, Polynomial degree used is 8. Total energies
same to within 9 digits.
101 IMA 07 – 07/30/2007
101
method # A ∗ x SCF its. CPU (secs)
ChebSI 138672 37 12923.27
ARPACK 58506 10 44305.97
TRLan 58794 10 16733.68
Ga41As41H72. Polynomial degree used is 16. The spec-
trum of each reduced Hamiltonian spans a large interval,
making the Chebyshev filtering not as effective as other
examples. Total energies same to within 9 digits.
102 IMA 07 – 07/30/2007
102
method # A ∗ x SCF its. CPU (secs)
ChebSI 363728 30 15408.16
ARPACK 750883 21 118693.64
TRLan 807652 21 83726.20
Fe27, Polynomial degree used is 9. Total energies same to
within ∼ 5 digits.
103 IMA 07 – 07/30/2007
103
method # A ∗ x SCF its. CPU (secs)
ChebSI 474773 37 37701.54
ARPACK 1272441 34 235662.96
TRLan 1241744 32 184580.33
Fe51, Polynomial degree used is 9. Total energies same to
within ∼ 5 digits.
104 IMA 07 – 07/30/2007
104
Larger tests
II Large tests with Silicon and Iron clusters –
Legend:
•nstate : number of states
•nH : size of Hamiltonian matrix
•# A ∗ x : number of total matrix-vector products
•# SCF : number of iteration steps to reach self-consistency
• total eVatom : total energy per atom in electron-volts
• 1st CPU : CPU time for the first step diagonalization
• total CPU : total CPU spent on diagonalizations to reach
self-consistency
105 IMA 07 – 07/30/2007
105
Silicon clusters [symmetry of 4 in all cases]
Si2713H828
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
5843 1400187 14 -86.16790 7.83 h. 19.56 h.
# PEs = 16. H Size = 1,074,080. m = 17 for Chebyshev-
Davidson; m = 10 for CheFSI.
Note: First diagonalization by TRLan costs 8.65 hours.
Fourteen steps would cost ≈ 121h.
106 IMA 07 – 07/30/2007
106
Si4001H1012
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
8511 1652243 12 -89.12338 18.63 h. 38.17 h.
# PEs = 16. nH =1,472,440. m = 17 for Chebyshev-
Davidson; m = 8 for CheFSI.
Note: 1st step diagonalization by TRLan costs 34.99 hours
107 IMA 07 – 07/30/2007
107
Si6047H1308
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
12751 2682749 14 -91.34809 45.11 h. 101.02 h.
# PEs = 32. nH =2,144,432. m = 17 for Chebyshev-
Davidson; m = 8 for CheFSI.
Note: comparisons with TRlan no longer available
108 IMA 07 – 07/30/2007
108
Si9041H1860
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
19015 4804488 18 -92.00412 102.12 h. 294.36 h.
# PEs = 48; nH =2,992,832. m = 17 for Chebyshev-
Davidson; m = 8 for CheFSI.
109 IMA 07 – 07/30/2007
109
Iron clusters [symmetry of 12 in all cases]
Fe150
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
900 × 2 2670945 66 -794.95991 2.34 h. 19.67 h.
# PEs = 16. nH =1,790,960. m = 20 for Chebyshev-
Davidson; m = 18 for CheFSI.
Fe286
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
1716 × 2 7405332 100 -795.165 10.19 71.66 h.
# PEs = 16. nH =2,726,968 m = 20 for Chebyshev-
Davidson; m = 17 for CheFSI.
110 IMA 07 – 07/30/2007
110
Fe388
nstate # A ∗ x # SCF total eVatom 1st CPU total CPU
2328 × 2 18232215 187 -795.247 16.22 247.05 h.Fe388
#PE= 24. nH = 3332856. m = 20 for
Chebyshev-Davidson; m = 18 for CheFSI.
Reference:
M. L. Tiago, Y. Zhou, M. M. G. Alemany, YS, and J.R.
Chelikowsky, The evolution of magnetism in iron from the
atom to the bulk, Physical Review Letters, vol. 97, pp.
147201-4, (2006).
111 IMA 07 – 07/30/2007
111
The generalized eigenvalue problem
II In many applications the problem to solve is of the form
Ax = λBx
II Often B is referred to as the “overlap matrix” in physics
II Arises for example in PAW and ultrasoft pseudo-potential
techniques
II It is the “Mass matrix” in finite element techniques
II B is often positive definite:
〈Bx, x〉 > 0 for all x 6= 0
II Eigenvalues are roots of p(λ) = det(A− λB).
II Many situations can arise
112 IMA 07 – 07/30/2007
112
Illustrative examples
Example: 1 A =
(−1 0
0 1
), B =
(0 1
1 0
)A and B are both symmetric, yet Λ = {+i,−i}
Example: 2 A =
(−1 0
0 1
), B =
(0 0
1 0
).
II det(A) 6= 0, but pair (A,B) has no eigenvalues: Λ = ∅.
Example: 3 A =
(0 −1
1 0
), B =
(0 1
1 0
)II det(B) 6= 0 −→ Λ = {±1}.
Example: 4 A =
(−1 0
1 0
), B =
(0 0
1 0
).
II Singular pair – any value is an eigenvalue. Λ = C.
113 IMA 07 – 07/30/2007
113
The bottom line:
II If B is nonsingular, then spectrum of the pair (A,B) is
equal to Λ(B−1A).
II .. If not, then spectrum is can be a discrete set, or the
empty set, or all of C.
II When B is non-singular can solve problem as
B−1Ax = λx .
II Not a good solution in practice (loss of symmetry, ... )
114 IMA 07 – 07/30/2007
114
Solving the generalized eigenvalue problem
II Consider now the case where A,B are both symmetric
real, and B is positive definite. [similar situation when A,
B are Hermitian complex]
II Can transform the problem into a standard one. Let
B = LLT the Cholesky factorization of B. Then:
Ax = λBx
if and only if
(L−1AL−T )y = λy with x = L−Ty.
II Can use standard algorithms on the matrixC = L−1AL−T .
II Need to 1) factor B once, and 2) solve a system with
L and one with LT at each step
115 IMA 07 – 07/30/2007
115
Lanczos with B inner product
Since B is SPD we can define the scalar product:
(x, y)B = (Bx, y)
II Along with its B-norm:
‖x‖B = (Bx, x)1/2
II Let C ≡ B−1A. Observe:
Even though C is nonsymmetric, it is self-adjoint w.r.t.
B-scalar product:
∀ x, y, (Cx, y)B = (x,Cy)B.
II Can use Lanczos with the matrix C (nonsymmetric)
using the B-dot product
116 IMA 07 – 07/30/2007
116
ALGORITHM : 5 Lanczos for Ax = λBx
1. Init: Select v1 s.t. ‖v1‖B = 1.
2. Set β1 = 0, z0 = v0 = 0, z1 = Bv1.
3.For j = 1, 2, . . . ,m, Do:
4. v := Avj − βjzj−1 , // Matvec
5. αj = (v, vj) ,
6. v := v − αjzj ,
7. w := B−1v , // B-solve
8. βj+1 =√
(w, v) ,
9. vj+1 = w/βj+1 and zj+1 = v/βj+1.
10.EndDo
II Line 8 computes ‖w‖B = (Bw,w)1/2.
117 IMA 07 – 07/30/2007
117
EIGENVALUE-FREE METHODS
118
Eigenvalue-free methods
II Diagonalization-based methods seem like an over-kill.
There are techniques which avoid diagonalization.. [“Order
n” methods,..]
II Main ingredient used: obtain charge densities by other
means than using eigenvectors, e.g. exploit density matrix
ρ(r, r′)
II Filtering ideas – approximation theory ideas, ..
119 IMA 07 – 07/30/2007
119
Avoiding the eigenvalue problem
Recall:
ρ(r) =occ∑j=1
|Ψj(r)|2
The main observation: The eigenvectors are not really
needed. When ψj(r) is discretized w.r.t. r then ρ(ri) ≡ρii is the i-th diagonal entry of the ‘density matrix’
P = V V T with V = [ψ1, . . . , ψocc]
II Any orthogonal basis V of the same space can be used.
II ‘Order n methods’ based on finding an approximation
to P . Bandedness /Sparseness of P (in certain bases) is
exploited.
120 IMA 07 – 07/30/2007
120
Avoiding eigenvalues / eigenvectors: Order n methods
II Methods based on approximating the projector P =
V V H directly. First observe:
tr[P ] =∑i pii = particle number
tr[PH] =∑i,j pijHji = system energy.
II Property exploited: P decays away from the main diag-
onal [true only for certain bases and certain systems]
Idea: Find P whose trace is fixed and which minimizes
tr(PH).
121
II The trace constraint can be avoided by shifting H
tr[P (H − µI)] = tr[PH] − µNe
II Need another constraint : P must be a projector, so:
0 ≤ λi(P ) ≤ 1
A common strategy: Seek P in the form
P = 3S2 − 2S3
where S is a banded approximation which minimizes:
tr[(3S2 − 2S3)(H − µI)]
II Descent type-algorithms used [The gradient of the above
function is computable. ]
122 IMA 07 – 07/30/2007
122
A few references: (review papers)
• S. Goedecker, “Linear Scaling electronic structure meth-
ods”, Rev. Modern Phy. vol 71, pp. 1085-1123 (1999).
• X.-P. Li, R. W. Nunes, and D. Vanderbilt, Density-matrix
electronic-structure method with linear system-size scaling,
Phys. Rev. B 47, 10891 (1993).
123 IMA 07 – 07/30/2007
123
Other Approaches
P = f(H)
where f is a step function. Approximate f by, e.g., a
polynomial
II Result: can obtain columns of P inexpensively via:
Pej ≈ pk(H)ej
II Exploit sparsity of P . Ideas of “probing” allow to com-
pute several columns of P at once.
II Statistical approach: work of Hutchinson for estimating
trace of a matrix [used in image processing] adapted to
estimating diagonals.
II Many variants currently being investigated
124 IMA 07 – 07/30/2007
124
Example 1 : Statistical approach
II Diagonal of a matrix B can be approximated by
Ds =
s∑k=1
vk �Bvk
�
s∑k=1
vk � vk
where v � w = componentwise product v, w, v � w =
componentwise division of v by w, and v1, . . . , vs = a
sequence of random vectors with normal distribution.
II Determinstic approach: For a banded matrix (bandwidth
p), there exists p vectors such the above formula yields the
exact diagonal.
II Methods require computing pk(H)v for several v’s.
Generally → expensive unless bandwith is small
125 IMA 07 – 07/30/2007
125
Example 2: density matrix in PW basis
II Recall P = f(H) = {ρ(r, r′)}
II Consider the expression in the G-basis
II Most entries are small –
II Idea: use technique of “probing” or “CPR” or “Sparse
Jacobian” estimators ....
126 IMA 07 – 07/30/2007
126
1 3 161
1
(1)
(3)
(12)
(15)
1
1
5 20
1
1
1
(5)
(13)
(20)
12 13
Probing in action: blue columns can be computed at once
by one matrix-vector product. Then red columns can be
coumputed the same way
127 IMA 07 – 07/30/2007
127
Density matrix for Si64
0 2000 4000 6000 8000 10000
0
2000
4000
6000
8000
10000
nz = 799
DROP TOL: 10−1
0 2000 4000 6000 8000 10000
0
2000
4000
6000
8000
10000
nz = 8661
DROP TOL: 10−2
128 IMA 07 – 07/30/2007
128
0 2000 4000 6000 8000 10000
0
2000
4000
6000
8000
10000
nz = 48213
DROP TOL: 10−3
0 2000 4000 6000 8000 10000
0
2000
4000
6000
8000
10000
nz = 196221
DROP TOL: 10−4
129 IMA 07 – 07/30/2007
129
Using Chebychev polynomials
Main idea: Call H the original Hamiltonian matrix. Then
P = f(H)
where f(t) is the Heaviside function:
6
-λocc λmax
1
II Replace f by a polynomial using Chebyshev expansions.
II This is now done in the planewave space not real space.
130 IMA 07 – 07/30/2007
130
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
-1.0 -0.5 0.0 0.5 1.0
H (
x)γ
M = 64 M = 128 M = 256
Chebyshev-Jackson Chebyshev
Chebyshev & Jackson polynomials of deg. 64, 128, & 258
[Jackson expansions are modifications of the least-squares
Chebyshev expansions that avoid Gibbs oscillations.]
131 IMA 07 – 07/30/2007
131
II P is approximated by pm(H)X where X is a matrix of
size n× p [e.g., 1st p columns of identity]
II Exploit recurrence relations of Chebyshev polynomials
and structure of P in planewave basis
II Test : with (Crys-
talline silicon) [See L. Jay
et al. 1998]
II Times scale like
n2 logn
II Cost is still high but
: (1) can avoid eigenvec-
tors completely, and (2)
can still iterate to self-
consistency
1e+00
1e+01
1e+02
1e+03
1e+04
1e+05
1 10 100
CP
U T
ime
(Sec
)
Number of Atoms, N
M=32
M=64
132 IMA 07 – 07/30/2007
132
Order n methods: general comments
II What is the cost of diagonalization-based methods?
II Let N = Basis size (or dimension of mesh), and m =
number of states. Then
Cost per SCF step :
O(N ∗m2)
↑+ O(m3)
↑Because you need an
orthonormal basis of
at least m vectors.
Because you must solve
an eigenvalue problem of
size at least m
II No matter which method is used – ultimately there is an
O(m3) calculation – where m is the number of occupied
states. Most methods focus on first term becauseN � m.
133 IMA 07 – 07/30/2007
133
II Many linear scaling techniques available – some com-
pute density of states only; others do not iterate to self-
consistency, ...
Needed: truly linear-scaling methods which do every-
thing done by Pseudopotential DFT [including SCF, forces,
geometry relaxation,....] and competitive with diagonalization-
based methods.
II Some efforts in this direction. See, e.g., ONETEP work
[Haynes et al. Phys. Stat. Sol. vol 243, pp. 2489 (2006)]
134 IMA 07 – 07/30/2007
134
Resources
Software:
II PARSEC: http://www.ices.utexas.edu/parsec/
II ARPACK: http://www.caam.rice.edu/software/ARPACK
II PRIMME:http://www.cs.wm.edu/∼andreas/software/
II TRLAN: http://crd.lbl.gov/∼kewu/ps/trlan .html
⊕ Watch for matlab code RSDFT.
135 IMA 07 – 07/30/2007
135
Books on diagonalization:
II “Templates for the Solution of Algebraic Eigenvalue
Problems: A Practical Guide”, Zhaojun Bai, et al., SIAM,
2000.
II Matrix Algorithms, Vol 2, G. W. Stewart, SIAM, 2001
II Numerical Methods for Large Eigenvalue Problems, YS.
Halstead Press (out-of print). Post-script available from
http://www.cs.umn.edu/∼saad/books.html
136 IMA 07 – 07/30/2007
136