improve scaling quantum monte carlo simulations for insulators · improve scaling quantum monte...
TRANSCRIPT
Improve Scaling Quantum Monte Carlo Simulations for Insulators
Eric de [email protected], http://www.math.vt.edu/people/sturler
QMC Summer School 2012 University of Illinois at Urbana-Champaign, July 23 – 27, 2012
Collaborators and Support
Kapil Ahuja, Bryan Clark David Ceperley, Jeongnim Kim Li Ming, Arielle Grim McNally Burkhard Militzer, Ron Cohen
Funded by NSF 1025327
Overview
Scaling of QMC Importance Sampling Intro to Krylov Subspace Methods Approximate Determinant Ratio – Sparse Case A Sequence of Preconditioners Results Conclusions
(Some New Ideas if Time)
Background: QMC for Deep Earth Materials
Important for geophysicists to understand properties of materials very deep in the ground
Under high pressure and temperature DFT and other methods cannot predict certain properties accurately (and DFT has no error bars )
QMC needs few assumptions, accurate, but currently too expensive – better algorithms for the linear algebra
Collaboration VT, UIUC, CIW, UC Berkeley
Ahuja, Ceperley, Clark, de Sturler, and Kim: Make QMC much faster for many particles
NSF Collaborations Math & Geosciences – 1025327
Materials Computation Center/NSF (ITR) DMR
Quantum Monte Carlo MethodQuantum Monte Carlo for electronic structure calculations: variational optimization of functions of the density of electrons. The multiparticle wavefunction: ( ) ( )1 1
, , ; , ,N s
R r raY Y a a=
where ir has position and spin (we mostly ignore spin in this talk)
Wavefunction gives the probability (density) of finding a particle (and spin) at a region in space (not always normalized). We want to optimize some function over parameter space (vector a), for example
( ) ( )( ) ( )
*
*
R R dRE
R R dR
a a
a
a a
Y YY
Y Yé ù =ê úë û
òò
(this talk: real and normalized)
High dimensional (> few particles) need Monte Carlo method to evaluate
Quantum Monte Carlo MethodSample local energy
LE to evaluate expected energy
( ) ( ) 2 2L
E R R dR dR E dRY
Y Y Y Y YY
æ ö÷çé ù = = =÷çê ú ÷ë û ç ÷çè øò ò ò
More generally, for observable ( );R Y : ( ) ( )2 ;R R dRY Yò
Approximate integral by Monte Carlo sampling, compute average
( );R Y
However, Y very small except at very localized regions So need importance sampling – sample preferentially from regions with high probability Need to sample with right probability
Quantum Monte Carlo Method
Approximate ( ) ( ) ( )( )
( )2 2
L
RE R E R dR R dR
R
YY Y Y
Y
æ ö÷ç ÷é ù ç= = ÷çê ú ÷ë û ç ÷÷çè øò ò
Sample ( )L
E R from density ( )2 RY .
Generate multiple configurations R and for each do (repeatedly)
1. generate random step of size d
2. accept move with probability ( )( )
2
2min ,1
R d
R
Y
Y
æ ö+ ÷ç ÷ç ÷ç ÷ç ÷÷çè ø
3. if accepted evaluate observable at new configuration Needs initialization period to obtain equilibrium (sequence of configurations has desired density). In practice we move a single particle in each step and evaluate observable after sufficiently many steps for decorrelation (sweep is N steps)
What is ?( )RaYVariational form.: the space determines quality of approximation. Having basis functions that approximate solutions reduces number of basis functions needed: reduce (linear algebra) cost. Simple and effective: choose wave functions that are products (independence) of single particle wave functions (simple problem) ( ) ( ) ( ) ( )1 1 2 2 N N
R r r rY y y y=
For electrons we need wave functions that are anti-symmetric
( ) ( ) ( ) ( ) ( ) ( )( )( )( )
1 1 2 2det
N N j iP R
R P r r r rY y y y y= =å
Slater determinants
We use ( ) ( ) ( )U RR e RY Y-= but extra factor is cheap
Slater Determinants / Matrices
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
1 1 2 1 1
1 2 2 2 2
1 2
N
N
N N N N
r r r
r r rA
r r r
y y y
y y y
y y y
æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷= ç ÷ç ÷÷ç ÷ç ÷ç ÷ç ÷çè ø
Slater matrix
( ) ( )detR AY =
( )1 1 1, , , , , ,
k k k NR r r r r r- +=
( ) ( ) ( ) ( ) ( ) ( )( )1 1 2 2Tk k k k k N k N ku r r r r r ry y y y y y= - - -
( )( )
11 Tk k
Ru A e
R
Y
Y-= +
Slater Determinants / Matrices
The standard algorithm maintains an explicit copy of the inverse matrix
For an accepted move the inverse is updated using the Sherman-Morrison or Woodbury formula (if multiple particles are moved at once) – cost O(N3)
The explicit inverse is useful as the local energy (Hamiltonian) requires
once per sweep (for all i) and so each column of the inverse will be used
( )( )
( )( )and
2i iR R
R Ra a
a a
Y YY Y
( ) ( )1 11 1 1 11T T T
k k kA e u A u A e A e u A
- -- - - -+ = - +
( )3O N
Approximate Determinant Ratio
Major cost computing ( )( )
det
det
A
A
where ( )ij j i
a rf= and moving particle k,
( )kj j ka rf= - so one row of matrix is changed.
Many ways to compute ratio, but needs to be done many, many times. Order-N method requires moving all particles once at ( )O N cost.
Current methods ( )3O N
Ratio: 11 T
ku A e-+
Approximate by solving linear system, generalized eigenvalue problem, estimate bilinear form, … (many more possible ways) Difficulty is solving a simple problem very fast, many times Cost depends on sparsity of matrix – decay/locality of orbitals Depends on type of problem: insulator, semiconductor, metallic, …
Optimally Sparse Slater Matrices
For sufficiently localized basis functions the matrix is (quite) sparse
This can be achieved by optimizing the basis of (standard) single particle wave functions to obtain maximally localized (Wannier) basis functions
A change of basis does not change the solution to the variational problem
Reduces computation of Slater matrices to , but determinant ratios still per sweep (N steps)
Our methods work for any system where the matrix can be made sparse or fast matvec possible
For metallic systems optimizing locality may not work, but other approaches might be possible
Generic Sparsity Pattern (reordered)
0 200 400 600 800 1000
0
100
200
300
400
500
600
700
800
900
1000
Generic Sparsity Pattern of a 1024 System
nz = 40699
Krylov Methods Crash CourseConsider Ax b= and no (or explicit) preconditioning. Given 0x and 0 0r b Ax= - , compute optimal update z from
( ) 10 0 0 0, span{ , , , }m mK A r r Ar A r-= :
( )
( )( )0 0
0 0 22, ,min minm mz K A r z K A r
b A x z r AzÎ Î
- + -
Let 2 10 0 0 0
mmK r Ar A r A r-é ù= ê úë û , then mz K z= ,
and we must solve the following least squares problem
20 0 0 0 0
mmAK r Ar A r A r rz zé ù» »ê úë û
Do this accurately and efficiently every iteration for increasing m . Arnoldi recurrence: 1 mm mAV V H+= , where 1 0 0 2
/v r r= ,
1 1 1
H
m m mV V I+ + += , and ( ) ( )1 1range rangem mV K+ +=
0 1 1 0 1 1 02 2 22 2m mm m m m m mr AV y V e r V H y e r H y+ +- = - = -
GMRES, Saad&Schultz '86
Krylov Methods Crash CourseConsider Ax b= (or preconditioned system PAx Pb= ) Given 0x and 0 0r b Ax= - , compute optimal update
mz from
( ) 10 0 0 0, span{ , , , }m mK A r r Ar A r-= :
0 0
0 2 0 2( , ) ( , )min ( ) minm mz K A r z K A r
b A x z r AzÎ Î
- + -‖ ‖ ‖ ‖
Let 2 10 0 0 0
mmK r Ar A r A r-é ù= ê úë û , then mz K z= ,
and we must solve the following least squares problem
20 0 0 0 0
mmAK r Ar A r A r rz zé ù» »ê úë û
GMRES – Saad and Schulz '86, GCR – Eisenstat, Elman, and Schulz '83
( )10 0 1 0 1 0 1 0
mm m mK r Ar A r p A rz z z z -
- -= + + + = & 1m
p - arbitrary
( ) ( )( ) ( ) ( )0 1 0 1 0 00 1
m m m m mr r Ap A r I Ap A r q A r q- -= - = - = =
Minimum Residual Solutions: GMRESSolve Ax b= : Choose
0x ; set
0 0r b Ax= - ;
1 0 0 2v r r= , 0k = .
while 2k
r e³‖ ‖ do
1k k= +
1k kv Av+ = ;
for 1j k= , *
, 1j k j kh v v += ;
1 1 ,k k j k jv v h v+ += - ;
end
1, 1 2k k kh v+ += ;
1 1 1,k k k kv v h+ + += ;
Solve LS 1 0 2 2
min kr Hz h z- ( )2k
r= by construction
(in practice we update the solution each step) end
0k kx x V z= + ;
( )0 1 1 1 0k kk k kr r V H V r Hz h z+ += - = - or simply
k kr b Ax= -
GMRES, Saad & Schultz '86
Krylov SpacesKrylov space is a space of polynomials in a matrix times a vector. Krylov space inherits the approximation properties of polynomials on the real line or in the complex plane. Let A be diagonalizable, 1A V VL -= (simplify explanation) Then 2 1 1 2 1A V V V V V VL L L- - -= = and generally 1k kA V VL -= . So, the polynomial ( ) 0 1
mm mp t t ta a a= + + + applied to A gives
( ) ( )2 1
0 1 2m
m mp A V I Va a L a L a L -= + + + + and hence
( ) ( ) ( ) ( )( )1 11
diag , ,m m m m np A Vp V V p p VL l l- -= =
The polynomial is applied to the eigenvalues individually. Approximate solutions to linear systems, eigenvalue problems, and more general problems using polynomial approximation can be analyzed/understood this way.
Approximation by Matrix PolynomialsLet 1A V VL -= , let ( )AL WÌ Ì .
If ( )1
1mp t
t- » for all t WÎ , then ( ) 11m
p A A-- »
Let 0r V r= . Then ( ) ( ) 1
1 0 1 0i
m i m i i ii ii
p A r v p v A rr
l rl
-- -= » =å å
( ) ( )( ) ( )( )0 1 0 1 01 0
m m m i i m iir q A r I Ap A r v pl l r- -= = - = - Ȍ
If we can construct such polynomials for modest m , we have an efficient linear solver. This is possible if the region W is nice – small region away from origin: clustered eigenvalues If this is not the case, we improve by preconditioning: PAx Pb= s.t. PA has clustered eigenvalues and product with P is cheap.
Approximation by Matrix PolynomialsLet 1B V VL -= , let ( )BL WÌ Ì .
If ( ) 1mp t
t» for all t WÎ , then ( ) 1
mp B B-» .
Let y V z= . Then ( ) ( ) 1im i m i i ii i
i
p B y v p v B yz
l zl
-= » =å å
Furthermore, let 0e » and
i jl l d- > (for some eigenvalue
il )
If ( ) and, ,
1, ,i
mi
t tp t
t
e W l dl
ìï Î - >ï= íï =ïî
then ( )m i ip B y v z» .
If we can construct such polynomials for modest m we have an efficient linear solver or eigensolver.
Convergence Bounds
Residual at iteration m: ( ) 0m mr p A r= optimal (2-norm)
Eigenvalue bound ( )
( ) ( )10 ,
0 1
min maxmm p A
p
r V V r pP l Ll-
Î Î=
£
FOV bound ( )
( ) ( )00 1
2 min maxmm p W A
p
r r pP ggÎ Î
=
£
Alternative FOV bound
( ) ( ) ( ) ( ) ( )* *1 2
0 1 20 1
2 min max maxmm p Q YW Q AQ W Y AY
p
r r P p P pP g gg gÎ Î Î
=
é ùê ú£ +ê úë û
Pseudospectrum bound ( )
( )( )0 ,
0 1
min max2 mm p
p
r r pe
eP g g
pe Î Î=
£
Preconditioned Iterative Solver Compute determinant ratios by preconditioned iterative solve
efficient for sparse matrices if fast convergence more generally need preconditioned matvec to be cheap
Iterative solver with ILU (ILUTP) converges well, but matrix degrades due to continual updating – particle moves loss of diag. dom. leads to unstable ILU decomposition need periodic reordering of matrix and recomputing prec. reordering aims to pair orbitals with nearby particles recomputing ILU expensive, so compute cheap updates to
preconditioner (until more expensive than new ILU) Three improvements:
cheap updates to the preconditioner cheap way to monitor instability effective reordering of matrix for diagonal dominance
Cheap Updates to PreconditionerAssume we have a good preconditioner: AP is nice (e.g., spectrum)
Update ( )1T Tk k k
A A e u A I A e u-= + = +
Update preconditioner: ( ) ( )11 T Tk k k
P I A e u P I wu P-
-= + = - with
( ) 11 11 Tk k k
w u A e A e-
- -= + (already available for determinant ratio)
Note that breakdown occurs with zero probability By construction AP AP= , so has same favorable properties
Increasing cost in applying sequence of products of type ( )TkI wu-
At some point cheaper to recompute ILU Links with update exact inverse in Broyden type methods (book Kelley) and updating preconditioners in Bergamaschi et al.
Effective Stability of PreconditionerCompute incomplete decomposition A LU R= + (ignore all or some fill-in) Accuracy of preconditioner:
FA LU-
Stability of preconditioner: ( ) 1F
I A LU-
-
(papers by Benzi, Saad, Chow) In general stability the most important, but metric expensive to compute.
Replace by effective or local stability: ( ) 12
maxi i iv A LU v
--
The
iv are (ortho)normal vectors in the Arnoldi recurrence. If local stability
small, then the preconditioner is stable over the Krylov space. Even if the preconditioner is not stable (over whole space).
Reordering Algorithm
Matrix Reordering
Greedy algorithm based on geometry Variant of Edmunds '98 (control)
At each step: Pick new particle (from remaining)
Find nearest orbital (swap columns to give same index) If same as current find nearest particle to that orbital
(swap rows to give same index)
Greedy algorithms can be far from optimal, but usually quite effective
Considering other algorithms for improving near diagonal dominance (Duff and Koster, HSL lib/RAL)
Really want local incremental update of ordering
Effect of Reordering and ILUTP
Results for Model Problem
BCC lattice with Gaussian orbitals = Parameter = 1, size of cube ≈ 2 ( ⁄ particles per unit volume) = 2 particles
Check accuracy, statistics, and scaling (in time)
Results for Model Problem
Probability wrong decision in Metropolis algorithm:
( ) ( )min ,1 min ,1a
f q q= -
Average f over random walk (MCMC sequence) gives expected number of errors in acceptance/rejection test
extremely good: f < 0.0001very good: f < 0.001good: f < 0.01
Cost Analysis of Algorithm
Timing Comparison
Scaling (fit to power law):std algorithm: O(n2.67)new algorithm: O(n2.19)
For large n standard algorithm will be cubic (based on algorithm)
Several ways to improve scaling new algorithm
Timing/Scaling Comparison
Further Improvements for Current Approach
Better data structures – reduce overhead ~20% Change # steps of cheap preconditioner update –
better scaling (better time) Improve reordering global algorithm
Faster and/or better diagonal dominance Better preconditioners (reduce iterations)
Change reordering to incremental local reordering Faster and better scaling
Compute bilinear form solving two systems by BiCG Products of residual norms governs convergence (exact) In practice, property is lost relatively quickly Clever algorithm by Strakos and Tichy preserves
property in floating point arithmetic
Conclusions and Future Work
Significant improvement in scaling for QMC Several further improvements obvious (but
implementation not necessarily easy) Improve reordering – local and incremental Better, multi-level preconditioners
easier to update use underlying structure – physics, interpolation
Test on several realistic materials For metallic systems need something for (nearly)
dense matrices Fast matrix vector products possible? Representation?
Good Reading
Ahuja, Clark, de Sturler, Ceperley, and Kim, Improved Scaling for Quantum Monte Carlo on Insulators, SIAM Journal on Scientific Computing 33(4), 1837-1859, 2011
www.math.vt.edu/people/sturler/class_homepages/5485_00000.html
Iterative Krylov Methods for Large Linear Systems, van der Vorst, Cambridge Univ. Press
Iterative Methods for Sparse Linear Systems, Yousef Saad, SIAM