improve scaling quantum monte carlo simulations for insulators · improve scaling quantum monte...

Improve Scaling Quantum Monte Carlo Simulations for Insulators

Eric de [email protected], http://www.math.vt.edu/people/sturler

QMC Summer School 2012 University of Illinois at Urbana-Champaign, July 23 – 27, 2012

Collaborators and Support

Kapil Ahuja, Bryan Clark David Ceperley, Jeongnim Kim Li Ming, Arielle Grim McNally Burkhard Militzer, Ron Cohen

Funded by NSF 1025327

Overview

Scaling of QMC Importance Sampling Intro to Krylov Subspace Methods Approximate Determinant Ratio – Sparse Case A Sequence of Preconditioners Results Conclusions

(Some New Ideas if Time)

Background: QMC for Deep Earth Materials

Important for geophysicists to understand properties of materials very deep in the ground

Under high pressure and temperature DFT and other methods cannot predict certain properties accurately (and DFT has no error bars )

QMC needs few assumptions, accurate, but currently too expensive – better algorithms for the linear algebra

Collaboration VT, UIUC, CIW, UC Berkeley

Ahuja, Ceperley, Clark, de Sturler, and Kim: Make QMC much faster for many particles

NSF Collaborations Math & Geosciences – 1025327

Materials Computation Center/NSF (ITR) DMR

Quantum Monte Carlo MethodQuantum Monte Carlo for electronic structure calculations: variational optimization of functions of the density of electrons. The multiparticle wavefunction: ( ) ( )1 1

, , ; , ,N s

R r raY Y a a=

where ir has position and spin (we mostly ignore spin in this talk)

Wavefunction gives the probability (density) of finding a particle (and spin) at a region in space (not always normalized). We want to optimize some function over parameter space (vector a), for example

( ) ( )( ) ( )

*

*

R R dRE

R R dR

a a

a

a a

Y YY

Y Yé ù =ê úë û

òò

(this talk: real and normalized)

High dimensional (> few particles) need Monte Carlo method to evaluate

Quantum Monte Carlo MethodSample local energy

LE to evaluate expected energy

( ) ( ) 2 2L

E R R dR dR E dRY

Y Y Y Y YY

æ ö÷çé ù = = =÷çê ú ÷ë û ç ÷çè øò ò ò

More generally, for observable ( );R Y : ( ) ( )2 ;R R dRY Yò

Approximate integral by Monte Carlo sampling, compute average

( );R Y

However, Y very small except at very localized regions So need importance sampling – sample preferentially from regions with high probability Need to sample with right probability

Quantum Monte Carlo Method

Approximate ( ) ( ) ( )( )

( )2 2

L

RE R E R dR R dR

R

YY Y Y

Y

æ ö÷ç ÷é ù ç= = ÷çê ú ÷ë û ç ÷÷çè øò ò

Sample ( )L

E R from density ( )2 RY .

Generate multiple configurations R and for each do (repeatedly)

1. generate random step of size d

2. accept move with probability ( )( )

2

2min ,1

R d

R

Y

Y

æ ö+ ÷ç ÷ç ÷ç ÷ç ÷÷çè ø

3. if accepted evaluate observable at new configuration Needs initialization period to obtain equilibrium (sequence of configurations has desired density). In practice we move a single particle in each step and evaluate observable after sufficiently many steps for decorrelation (sweep is N steps)

What is ?( )RaYVariational form.: the space determines quality of approximation. Having basis functions that approximate solutions reduces number of basis functions needed: reduce (linear algebra) cost. Simple and effective: choose wave functions that are products (independence) of single particle wave functions (simple problem) ( ) ( ) ( ) ( )1 1 2 2 N N

R r r rY y y y=

For electrons we need wave functions that are anti-symmetric

( ) ( ) ( ) ( ) ( ) ( )( )( )( )

1 1 2 2det

N N j iP R

R P r r r rY y y y y= =å

Slater determinants

We use ( ) ( ) ( )U RR e RY Y-= but extra factor is cheap

Slater Determinants / Matrices

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

1 1 2 1 1

1 2 2 2 2

1 2

N

N

N N N N

r r r

r r rA

r r r

y y y

y y y

y y y

æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷= ç ÷ç ÷÷ç ÷ç ÷ç ÷ç ÷çè ø

Slater matrix

( ) ( )detR AY =

( )1 1 1, , , , , ,

k k k NR r r r r r- +=

( ) ( ) ( ) ( ) ( ) ( )( )1 1 2 2Tk k k k k N k N ku r r r r r ry y y y y y= - - -

( )( )

11 Tk k

Ru A e

R

Y

Y-= +

Slater Determinants / Matrices

The standard algorithm maintains an explicit copy of the inverse matrix

For an accepted move the inverse is updated using the Sherman-Morrison or Woodbury formula (if multiple particles are moved at once) – cost O(N3)

The explicit inverse is useful as the local energy (Hamiltonian) requires

once per sweep (for all i) and so each column of the inverse will be used

( )( )

( )( )and

2i iR R

R Ra a

a a

Y YY Y

( ) ( )1 11 1 1 11T T T

k k kA e u A u A e A e u A

- -- - - -+ = - +

( )3O N

Approximate Determinant Ratio

Major cost computing ( )( )

det

det

A

A

where ( )ij j i

a rf= and moving particle k,

( )kj j ka rf= - so one row of matrix is changed.

Many ways to compute ratio, but needs to be done many, many times. Order-N method requires moving all particles once at ( )O N cost.

Current methods ( )3O N

Ratio: 11 T

ku A e-+

Approximate by solving linear system, generalized eigenvalue problem, estimate bilinear form, … (many more possible ways) Difficulty is solving a simple problem very fast, many times Cost depends on sparsity of matrix – decay/locality of orbitals Depends on type of problem: insulator, semiconductor, metallic, …

Optimally Sparse Slater Matrices

For sufficiently localized basis functions the matrix is (quite) sparse

This can be achieved by optimizing the basis of (standard) single particle wave functions to obtain maximally localized (Wannier) basis functions

A change of basis does not change the solution to the variational problem

Reduces computation of Slater matrices to , but determinant ratios still per sweep (N steps)

Our methods work for any system where the matrix can be made sparse or fast matvec possible

For metallic systems optimizing locality may not work, but other approaches might be possible

Generic Sparsity Pattern (reordered)

0 200 400 600 800 1000

0

100

200

300

400

500

600

700

800

900

1000

Generic Sparsity Pattern of a 1024 System

nz = 40699

Krylov Methods Crash CourseConsider Ax b= and no (or explicit) preconditioning. Given 0x and 0 0r b Ax= - , compute optimal update z from

( ) 10 0 0 0, span{ , , , }m mK A r r Ar A r-= :

( )

( )( )0 0

0 0 22, ,min minm mz K A r z K A r

b A x z r AzÎ Î

- + -

Let 2 10 0 0 0

mmK r Ar A r A r-é ù= ê úë û , then mz K z= ,

and we must solve the following least squares problem

20 0 0 0 0

mmAK r Ar A r A r rz zé ù» »ê úë û

Do this accurately and efficiently every iteration for increasing m . Arnoldi recurrence: 1 mm mAV V H+= , where 1 0 0 2

/v r r= ,

1 1 1

H

m m mV V I+ + += , and ( ) ( )1 1range rangem mV K+ +=

0 1 1 0 1 1 02 2 22 2m mm m m m m mr AV y V e r V H y e r H y+ +- = - = -

GMRES, Saad&Schultz '86

Krylov Methods Crash CourseConsider Ax b= (or preconditioned system PAx Pb= ) Given 0x and 0 0r b Ax= - , compute optimal update

mz from

( ) 10 0 0 0, span{ , , , }m mK A r r Ar A r-= :

0 0

0 2 0 2( , ) ( , )min ( ) minm mz K A r z K A r

b A x z r AzÎ Î

- + -‖ ‖ ‖ ‖

Let 2 10 0 0 0

mmK r Ar A r A r-é ù= ê úë û , then mz K z= ,

and we must solve the following least squares problem

20 0 0 0 0

mmAK r Ar A r A r rz zé ù» »ê úë û

GMRES – Saad and Schulz '86, GCR – Eisenstat, Elman, and Schulz '83

( )10 0 1 0 1 0 1 0

mm m mK r Ar A r p A rz z z z -

- -= + + + = & 1m

p - arbitrary

( ) ( )( ) ( ) ( )0 1 0 1 0 00 1

m m m m mr r Ap A r I Ap A r q A r q- -= - = - = =

Minimum Residual Solutions: GMRESSolve Ax b= : Choose

0x ; set

0 0r b Ax= - ;

1 0 0 2v r r= , 0k = .

while 2k

r e³‖ ‖ do

1k k= +

1k kv Av+ = ;

for 1j k= , *

, 1j k j kh v v += ;

1 1 ,k k j k jv v h v+ += - ;

end

1, 1 2k k kh v+ += ;

1 1 1,k k k kv v h+ + += ;

Solve LS 1 0 2 2

min kr Hz h z- ( )2k

r= by construction

(in practice we update the solution each step) end

0k kx x V z= + ;

( )0 1 1 1 0k kk k kr r V H V r Hz h z+ += - = - or simply

k kr b Ax= -

GMRES, Saad & Schultz '86

Krylov SpacesKrylov space is a space of polynomials in a matrix times a vector. Krylov space inherits the approximation properties of polynomials on the real line or in the complex plane. Let A be diagonalizable, 1A V VL -= (simplify explanation) Then 2 1 1 2 1A V V V V V VL L L- - -= = and generally 1k kA V VL -= . So, the polynomial ( ) 0 1

mm mp t t ta a a= + + + applied to A gives

( ) ( )2 1

0 1 2m

m mp A V I Va a L a L a L -= + + + + and hence

( ) ( ) ( ) ( )( )1 11

diag , ,m m m m np A Vp V V p p VL l l- -= =

The polynomial is applied to the eigenvalues individually. Approximate solutions to linear systems, eigenvalue problems, and more general problems using polynomial approximation can be analyzed/understood this way.

Approximation by Matrix PolynomialsLet 1A V VL -= , let ( )AL WÌ Ì .

If ( )1

1mp t

t- » for all t WÎ , then ( ) 11m

p A A-- »

Let 0r V r= . Then ( ) ( ) 1

1 0 1 0i

m i m i i ii ii

p A r v p v A rr

l rl

-- -= » =å å

( ) ( )( ) ( )( )0 1 0 1 01 0

m m m i i m iir q A r I Ap A r v pl l r- -= = - = - »å

If we can construct such polynomials for modest m , we have an efficient linear solver. This is possible if the region W is nice – small region away from origin: clustered eigenvalues If this is not the case, we improve by preconditioning: PAx Pb= s.t. PA has clustered eigenvalues and product with P is cheap.

Approximation by Matrix PolynomialsLet 1B V VL -= , let ( )BL WÌ Ì .

If ( ) 1mp t

t» for all t WÎ , then ( ) 1

mp B B-» .

Let y V z= . Then ( ) ( ) 1im i m i i ii i

i

p B y v p v B yz

l zl

-= » =å å

Furthermore, let 0e » and

i jl l d- > (for some eigenvalue

il )

If ( ) and, ,

1, ,i

mi

t tp t

t

e W l dl

ìï Î - >ï= íï =ïî

then ( )m i ip B y v z» .

If we can construct such polynomials for modest m we have an efficient linear solver or eigensolver.

Convergence Bounds

Residual at iteration m: ( ) 0m mr p A r= optimal (2-norm)

Eigenvalue bound ( )

( ) ( )10 ,

0 1

min maxmm p A

p

r V V r pP l Ll-

Î Î=

£

FOV bound ( )

( ) ( )00 1

2 min maxmm p W A

p

r r pP ggÎ Î

=

£

Alternative FOV bound

( ) ( ) ( ) ( ) ( )* *1 2

0 1 20 1

2 min max maxmm p Q YW Q AQ W Y AY

p

r r P p P pP g gg gÎ Î Î

=

é ùê ú£ +ê úë û

Pseudospectrum bound ( )

( )( )0 ,

0 1

min max2 mm p

p

r r pe

eP g g

pe Î Î=

£

Preconditioned Iterative Solver Compute determinant ratios by preconditioned iterative solve

efficient for sparse matrices if fast convergence more generally need preconditioned matvec to be cheap

Iterative solver with ILU (ILUTP) converges well, but matrix degrades due to continual updating – particle moves loss of diag. dom. leads to unstable ILU decomposition need periodic reordering of matrix and recomputing prec. reordering aims to pair orbitals with nearby particles recomputing ILU expensive, so compute cheap updates to

preconditioner (until more expensive than new ILU) Three improvements:

cheap updates to the preconditioner cheap way to monitor instability effective reordering of matrix for diagonal dominance

Cheap Updates to PreconditionerAssume we have a good preconditioner: AP is nice (e.g., spectrum)

Update ( )1T Tk k k

A A e u A I A e u-= + = +

Update preconditioner: ( ) ( )11 T Tk k k

P I A e u P I wu P-

-= + = - with

( ) 11 11 Tk k k

w u A e A e-

- -= + (already available for determinant ratio)

Note that breakdown occurs with zero probability By construction AP AP= , so has same favorable properties

Increasing cost in applying sequence of products of type ( )TkI wu-

At some point cheaper to recompute ILU Links with update exact inverse in Broyden type methods (book Kelley) and updating preconditioners in Bergamaschi et al.

Effective Stability of PreconditionerCompute incomplete decomposition A LU R= + (ignore all or some fill-in) Accuracy of preconditioner:

FA LU-

Stability of preconditioner: ( ) 1F

I A LU-

-

(papers by Benzi, Saad, Chow) In general stability the most important, but metric expensive to compute.

Replace by effective or local stability: ( ) 12

maxi i iv A LU v

--

The

iv are (ortho)normal vectors in the Arnoldi recurrence. If local stability

small, then the preconditioner is stable over the Krylov space. Even if the preconditioner is not stable (over whole space).

Reordering Algorithm

Matrix Reordering

Greedy algorithm based on geometry Variant of Edmunds '98 (control)

At each step: Pick new particle (from remaining)

Find nearest orbital (swap columns to give same index) If same as current find nearest particle to that orbital

(swap rows to give same index)

Greedy algorithms can be far from optimal, but usually quite effective

Considering other algorithms for improving near diagonal dominance (Duff and Koster, HSL lib/RAL)

Really want local incremental update of ordering

Effect of Reordering and ILUTP

Results for Model Problem

BCC lattice with Gaussian orbitals = Parameter = 1, size of cube ≈ 2 ( ⁄ particles per unit volume) = 2 particles

Check accuracy, statistics, and scaling (in time)

Results for Model Problem

Probability wrong decision in Metropolis algorithm:

( ) ( )min ,1 min ,1a

f q q= -

Average f over random walk (MCMC sequence) gives expected number of errors in acceptance/rejection test

extremely good: f < 0.0001very good: f < 0.001good: f < 0.01

Cost Analysis of Algorithm

Timing Comparison

Scaling (fit to power law):std algorithm: O(n2.67)new algorithm: O(n2.19)

For large n standard algorithm will be cubic (based on algorithm)

Several ways to improve scaling new algorithm

Timing/Scaling Comparison

Further Improvements for Current Approach

Better data structures – reduce overhead ~20% Change # steps of cheap preconditioner update –

better scaling (better time) Improve reordering global algorithm

Faster and/or better diagonal dominance Better preconditioners (reduce iterations)

Change reordering to incremental local reordering Faster and better scaling

Compute bilinear form solving two systems by BiCG Products of residual norms governs convergence (exact) In practice, property is lost relatively quickly Clever algorithm by Strakos and Tichy preserves

property in floating point arithmetic

Conclusions and Future Work

Significant improvement in scaling for QMC Several further improvements obvious (but

implementation not necessarily easy) Improve reordering – local and incremental Better, multi-level preconditioners

easier to update use underlying structure – physics, interpolation

Test on several realistic materials For metallic systems need something for (nearly)

dense matrices Fast matrix vector products possible? Representation?

Good Reading

Ahuja, Clark, de Sturler, Ceperley, and Kim, Improved Scaling for Quantum Monte Carlo on Insulators, SIAM Journal on Scientific Computing 33(4), 1837-1859, 2011

www.math.vt.edu/people/sturler/class_homepages/5485_00000.html

Iterative Krylov Methods for Large Linear Systems, van der Vorst, Cambridge Univ. Press

Iterative Methods for Sparse Linear Systems, Yousef Saad, SIAM

improve scaling quantum monte carlo simulations for insulators · improve scaling quantum monte...

Documents