[springer series in solid-state sciences] strongly correlated systems volume 176 || variational...

Chapter 8Variational Monte Carlo and Markov Chainsfor Computational Physics

Sandro Sorella

Abstract In this chapter the basic concepts of Markov-processes and Monte Carlomethods, as well as a detailed description of the variational Monte Carlo techniquewill be presented. Particular emphasis will be devoted to the apparent mistery ofMonte Carlo techniques that allows us to sample a correlated many electron wavefunction defined in an Hilbert space that is exponentially large with the number N ofelectrons, in an affordable computational time, namely scaling with a modest powerof N . This is a direct consequence of two key properties that are not common to allMonte Carlo techniques: (i) the possibility to define a Markov process and appropriatestochastic variables with a finite correlation time and variance, respectively; (ii) boththese quantities should increase at most polynomially with N . In principle, the aboveproperties should be proven a priori, because their numerical validations could be verydifficult in practice. It will be shown that this is the case for the simplest variationalMonte Carlo technique for quite generic wave functions and model Hamiltonians.

8.1 Introduction

In recent years we have been spectators of a very rapid growth of computer power,especially in the realization of massively parallel computers allowing to distrib-ute several independent tasks of an algorithm over a basically unlimited number ofprocessors. This opportunity will preserve for several decades the so called Moore’slaw of a constant exponential increase of computer power, despite the performancesof the chip has almost reached a saturation within the silicon based technology. Thisnew scenario, will allow us in the near future to perform computations that wereessentially impossible before, but only by means of algorithms that involve scal-able communications, as opposed to intrinsically limited sequential algorithms, thatwill probably survive this change only when the hope of quantum computation will

S. Sorella (B)SISSA, Via Bonomea n.245, Trieste, Italye-mail: [email protected]

A. Avella and F. Mancini (eds.), Strongly Correlated Systems, 207Springer Series in Solid-State Sciences 176, DOI: 10.1007/978-3-642-35106-8_8,© Springer-Verlag Berlin Heidelberg 2013

208 S. Sorella

become a bit more realistic than an academic exercise, done for fun by R. Feynmannin 1982, and still continuing to enjoy several people, without any meaningful appli-cation in computational physics so far. In electronic calculations the Monte Carlomethod is the method that fulfills perfectly this requirement of scalability, becauseit requires a minimum amount of memory per processor and more importantly thecommunications remain minimal especially for high accuracy calculations. Withoutentering into the details of the various Monte Carlo techniques, nowadays the basicreason that, in my opinion, limits the spread out of quantum Monte Carlo in thevarious scientific disciplines, such as chemistry biology and physics, is the difficultyto obtain very accurate results, to the point when the statistical errors of the vari-ous target quantities become negligible. In quantum Monte Carlo statistical errorscan be reduced simply by replicating the same calculation in several processors, theamount of reduction being inversely proportional to the square root of the numberof processors. In this way we are not far from the date when we will obtain MonteCarlo calculations as accurate as non stochastic algorithms, and no need of statisticalelaboration of the data will be necessary.

Let me emphasize further this concept, by taking as an example the present reluc-tance of chemists to accept Monte Carlo among the most promising and accuratetechniques to deal with electronic correlation. It is clear, by reading their papers, thatthey really like to have methods able to fill a table with several digits, even when thelast digits do not mean much, being overshooted by their approximations, such asbasis sets and type of method used (e.g. Hartree-Fock or coupled clusters to mentionsome of them). On the other hand, I understand their point of view: with their tech-niques, anybody from different labs and using different computers can reproduce thesame amount of irrelevant digits, though irrelevant, they are important for the sake ofreliability and reproducibilities of the results. In the Monte Carlo methods, instead,nowadays often we do not have enough statistical accuracy to make clear statementabout the “relevant digits” of some calculation, as, for instance, it is at present pro-hibitive to obtain the chemical accuracy in the total energy of a quite big molecule.

I believe that the present limitation represents only a transient period, becausethe progress in the parallel computation namely goes just in the right direction forthe Monte Carlo methods. While the other sequential methods suffer, a reduction ofthe statistical error by a factor inversely proportional to the square root of processors isjust guaranteed within the Monte Carlo methods. Thus the future in computer sciencerepresents a fantastic opportunity for statistical methods like quantum Monte Carlo.

8.2 Quantum Monte Carlo: The Variational Approach

8.2.1 Introduction: Importance of Correlated Wave Functions

The simplest and most famous Hamiltonian where the electron-electron correlationsplay an important role is the Hubbard model:

8 Variational Monte Carlo 209

Fig. 8.1 Electron band forthe U = 0 one dimensionalHubbard model at half-filling.All states below the Fermienergy are occupied. A metal-lic behavior is implied bystandard band theory but theground state of the model is aMott insulator for U > 0 [1]

H = −t∑

〈i, j〉

∑

σ

(c†iσ c jσ + H.c.)+ U

∑

i

ni↑ni↓, (8.1)

where c†iσ (ciσ ) creates (destroys) an electron with spin σ on the site i , ni,σ = c†

iσ ciσ

is the electron number operator (for spin σ ) at site i , and the symbol 〈i, j〉 indicatesnearest-neighbor sites. Finally, the system is assumed to be finite, with L sites, andwith periodic boundary conditions.

A particularly important case is when the number N of electrons is equal to thenumber of sites L , a condition that is usually called half filling (see Fig. 8.1). In thiscase, the non-interacting system has a metallic ground state: for U = 0, the electronicband is half filled and, therefore, it is possible to have low-energy charge excitationsnear the Fermi surface. In the opposite limit, for t = 0, the ground state consists inhaving one electron (with spin up or down) on each site, the total energy being zero.Of course, since the total energy does not depend on the spin direction of each spin,the ground state is highly degenerate (its degeneracy is exactly equal to 2N ). Thecharge excitations are gapped—the lowest one corresponding to creating an emptyand a doubly occupied site, with an energy cost of U—and, therefore, the groundstate is insulating. This insulator state, obtained in the limit of large values of U/t ,is called Mott insulator. In this case, the insulating behavior is due to the strongelectron correlation, since, according to band theory, one should obtain a metal dueto an odd number of electrons per unit cell. Because of the different behavior of theground state in the two limiting cases, U = 0 and t = 0, a metal-insulator transitionis expected to appear for intermediate values of U/t . Actually, in one dimension, theHubbard model is exactly solvable by using the so-called Bethe Ansatz [1], and theground state is found to be an insulator for all U/t > 0, but, in the 2D honeycomblattice [2, 3] or frustrated lattices [4, 5] one expects that the insulating state appearsonly for (U/t) above some positive critical value (U/t)c.

Hereafter, we define an electron configuration |x〉 as a state where all the electronpositions and spins along the z-axis are defined. For instance in the one-dimensionalHubbard model, an electron configuration is determined by the position ri of Nelectrons and their spins σi = ± 1

2

|x〉 = | ↑,↑↓, 0, 0,↓, · · · 〉 = c†r1,σ1

c†r2,σ2

c†r3,σ3

c†r4,σ4

· · · c†rN ,σN

|0〉, (8.2)

210 S. Sorella

where on each site we can have no particle (0), a singly occupied site (↑ or ↓) or adoubly occupied site (↑↓). Notice also that, due to the canonical anticommutationrules for fermions, also the order is important in the above definition of a configurationx , because an arbitrary permutation of the operators can change the overall sign of|x〉. We use the convention that the first fermion appears to the leftmost place andall the others in increasing order from left to right. The state |x〉 we have written isnothing but a Slater determinant in position-spin space, where the number of doublyoccupied sites D = ∑

i ni,↑ni,↓ is a well defined number.The U = 0 exact ground state solution of the Hubbard model, |Ψ0〉, can be

expanded in terms of the complete set of configurations |x〉:

|Ψ0〉 =∏

k≤kF ,σ

c†k,σ |0〉 =

∑

x

|x〉〈x |Ψ0〉. (8.3)

In order to employ the above expansion, and to extend the wave function to genericmean field states it is more convenient to write the orbitals of the Slater determinantin a formal way:

|Ψ0〉 =N∏

j=1

(∑

r,σ

ψ j (r, σ )c†r,σ )|0〉 (8.4)

as in the simplest case of Eq. (8.3) the real space orbitals with momentum k j andspin σ j are given by ψ j (r, σ ) = 1√

Leik j rδσ j ,σ (or using sine and cosine functions

for simpler real orbitals).In the case of U/t � 1, this very simple wave function is not a good variational

state, and the reason is that the configurations with doubly occupied states have toomuch weight. Indeed it is simple to verify that the average density of doubly occupiedsites is 1/4 for the state |Ψ0〉. Therefore, by increasing U/t , all the configurationswith one or more doubly occupied sites will be “projected out” from the U = 0ansatz |Ψ0〉, simply because they have a very large (of order of U/t) average energy.A simple correlated wave function that is able to describe, at least qualitatively, thephysics of the Mott insulator is the so-called Gutzwiller wave function. In this wavefunction the uncorrelated weights 〈x |Ψ0〉 are modified according to the number ofdoubly occupied sites in the configuration |x〉:

|Ψg〉 = e−gD|Ψ0〉 =∑

x

|x〉e−g〈x |D|x〉〈x |Ψ0〉 (8.5)

All these weights 〈x |Ψ0〉 can be computed numerically by evaluating a Slater deter-minant |A| of a square matrix A, therefore with at most L3 floating point operations,where the N × N matrix is given by:

Ak, j = ψk(r j , σ j ). (8.6)


For g = ∞, only those configurations |x〉 without doubly occupied sites remain inthe wave function, and the state is correctly an insulator with zero energy expectationvalue.

The importance of electronic correlation in the Gutzwiller wave function is clear:in order to satisfy the strong local constraint of having no doubly occupied sites, onehas to expand the wave function in terms of a huge number of Slater determinants (inposition-spin space), each satisfying the constraint. This is the opposite of what hap-pens in a weakly correlated system, where at most a few Slater determinants (appro-priately chosen) can describe qualitatively and also quantitatively the ground state.

8.2.2 Expectation Value of the Energy

Once a correlated wave function is defined, as in Eq. (8.5), the problem of computingthe expectation value of the Hamiltonian (the variational energy) is very involved,because each configuration |x〉 in the expansion of the wave function will contributein a different way, due to the Gutzwiller weight exp(−g〈x |D|x〉). In order to solvethis problem numerically, we can use a Monte Carlo sampling of the huge Hilbertspace containing 4L different configurations. To this purpose, using the completenessof the basis, 1 = ∑

x |x〉〈x |, we can write the expectation value of the energy, forthe most general complex wave function Ψg , in the following way:

Eg = 〈Ψg|H |Ψg〉〈Ψg|Ψg〉 = 〈Ψg|1H |Ψg〉

〈Ψg|1|Ψg〉 =∑

x 〈Ψg|x〉〈x |H |Ψg〉∑x 〈Ψg|x〉〈x |Ψg〉 =

∑x eL(x)|ψg(x)|2∑

x |ψg(x)|2 ,

(8.7)

where ψg(x) = 〈x |Ψg〉 and eL(x) is the so-called local energy :

eL(x) = 〈x |H |Ψg〉〈x |Ψg〉 . (8.8)

Though the Hilbert space of the Hubbard model is huge, the computation of the localenergy is always polynomial in L , because the application of the Hamiltonian to agiven configuration, generates only a finite number of Slater determinants H |x〉 =−∑

j t j |x j 〉 + U D|x〉, where the index j runs over all the nearest neighbor bondswhere we can hop either a spin up or a spin down, (t j = t if the hopping is allowedby the Pauli principle t j = 0 otherwise), thus amounting at most to 2Nb differentSlater determinants x j , where Nb = d L is the number of bonds of the Hamiltonian,and d indicates the dimensionality of the lattice. Since the computation of all theresulting overlaps 〈x j |Ψg〉 in Eq. (8.8) is just given by the computation of at most 2Nb

Slater determinants of the type 〈x j |Ψ0〉 times the corresponding Gutzwiller factorexp(−g〈x j |D|x j 〉), everything requires at most 2d L4 operations, if we neglect thetime for computing the Gutzwiller factor, amounting to a much smaller number ofoperations L2.

212 S. Sorella

More efficient algorithms are based on the Sherman-Morrison algebra, allowingus to reduce the amount of computation involved to L3. Essentially within thesefaster algorithms one computes each of the 2d L ratios:

r j = 〈x j |Ψg〉〈x |Ψg〉 (8.9)

in N 2 operations by using the inverse of the matrix A corresponding to the originalconfiguration x , and computed once for all in N 3 operations. In our experience wehave also seen that a further improvement in efficiency can be obtained by updatingthe N × 2L matrix:

Wi,r,σ =∑

k

A−1i,kψk(r, σ ) (8.10)

In this way the ratio r j corresponding to hop an electron with spin σ from r to r ′is simply given by r j = W (i, r ′, σ ), where i corresponds to the spin σ electron ofthe configuration x such that ri = r . More details are given in Appendix, where allthe algebra of fermion determinants, together with the Sherman-Morrison algebra isderived also for non expert in the field.

After all these technicalities, it is clear that we can generally recast the calculationof Eg as the average of a computable (in polynomial time) random variable eL(x)over a probability distribution px

1 given by:

px = |ψg(x)|2∑x |ψg(x)|2 . (8.11)

As we will show in the following, it is possible to define a stochastic algorithm(Markov chain), which generates a sequence of configurations {|xn〉} distributedaccording to the desired probability px . Then, since the local energy can be easilycomputed for any given configuration, with at most L3 operations, we can evaluatethe expectation value of the energy as the mean of the random variable eL(x) overthe visited configurations:

Eg = 1

M

M∑

n=1

eL(xn). (8.12)

This approach is very general and can be extended (with essentially the same defini-tions) to continuous systems (replace summations with multidimensional integrals),and to general Hermitian operators O (for instance the doubly occupied site numberD), the corresponding local estimator, replacing eL(x) in Eq. (8.12), being analo-gously defined:

1 It is worth noticing here that px is generally easy to calculate up to a normalization constant, whichis instead very complicated, virtually impossible, to calculate. In the present case, for instance,ψ2

g (x)

is assumed to be simple, but∑

x ψ2g (x) involves a sum over the huge Hilbert space of the system,

and is therefore numerically inaccessible in most cases.


OL(x) = 〈x |O|Ψg〉〈x |Ψg〉 . (8.13)

However, not all the operators can be computed in polynomial time.2 With MonteCarlo we can estimate efficiently and with arbitrary small statistical accuracy ε onlythe ones for which M is bounded by a (big) constant C ∝ ε−2 times a small powerp of L , namely

M ≤ C(ε)L p. (8.14)

8.2.3 Finite Variance Property

As anticipated in the introduction one of the most important property of an efficientstatistical method is that expectation values are obtained by averaging random vari-ables with finite and small variance. We remind the reader that the variance of astochastic variable defined over the probability px is just the square of its standarddeviation, namely the square of its statistical error.

At this point we just notice that the average square of the local energy 〈|eL(x)|2〉3

corresponds to the exact quantum average of the Hamiltonian squared. Indeed:

〈Ψg|H2|Ψg〉〈Ψg|Ψg〉 =

∑x 〈Ψg|H |x〉〈x |HΨg〉∑

x 〈Ψg|x〉〈x |Ψg〉 =∑

x |eL(x)|2|ψg(x)|2∑x |ψg(x)|2 =<|eL(x)|2>.

(8.15)where here and henceforth we indicate with <O(x)> the average of the randomvariable O(x) over the probability px , namely <O(x)> = ∑

x px O(x), so that wecan also write Eg = <eL(x)>. Thus, the variance of the random variable eL(x),namely Var[eL(x)] = <|eL(x)|2> − <eL(x)2>, is exactly equal to the quantumvariance of the Hamiltonian on the variational state |Ψg〉:

Var[eL(x)] =∑

x

px |eL(x)−<eL(x)|2> = 〈Ψg|(H − Eg)2|Ψg〉

〈Ψg|Ψg〉 (8.16)

From the above equation, since each bond of a lattice Hamiltonian is bounded by aconstant ΛH , e.g. for the Hubbard model ΛH = 2|t | + 2|U |, we can safely boundthe variance of the local energy by:

2 For instance eK is an example of an operator, whose expectation value cannot be computed inpolynomial time (here K is for instance the kinetic energy operator). Indeed the statistical fluctua-tions of this operator are exponentially large for generic variational wave function Ψg , requiring anexponentially large M , namely M ≥ C L p for any finite p and C , and L large enough.3 Whenever the wave function Ψg is complex, the local energy is also complex, and all the forth-coming analysis applies, by considering that the local energy is a complex random variable, whosevariance is the sum of the variance of its real and imaginary part. Notice also that the mean valueof eL (x) is real as it corresponds to the expectation value of the energy.

214 S. Sorella

Var[eL(x)] ≤ 〈Ψg|H2|Ψg〉〈Ψg|Ψg〉 ≤ Λ2

H N 2b ∼ L2 (8.17)

A closer inspection, using the cluster property, namely that the bond-bond hamil-tonian correlations (e.g. local kinetic energy at a given bond) should decay with thedistance between bonds, implies also a more strict (but less rigorous) inequality:

Var[eL(x)] ∼ L (8.18)

The fact that the variance of the local energy is not only finite but increases slowlywith the system size L , immediately implies that the standard deviation σ of Eg

obtained after averaging the local energy eL over a Markov chain of length M isgiven by:

σ K√τ L/M (8.19)

where K is a constant independent on L and τ is the number of steps required bythe Markov chain to generate a statistically independent sample. Here we arrive atthe simple and important conclusion that, if τ is also bounded, the statistical methodrequires a polynomial length M of the Markov chain to obtain a given statisticalaccuracy. It is important to remark for instance that, in order to get a given accuracyin the energy per site σ = εL , the length of the Markov chain is given by:

M ≤ C(ε)/L

C(ε) = K 2τ

ε2 (8.20)

which implies p = −1 in Eq. (8.14), namely that the length of the Markov chain canbe taken shorter and shorter when the size increases provided the correlation time τdoes not increase, which is typically the case for fully gapped systems. In this case theMonte Carlo method is especially convenient even for fermionic systems because,though the cost of the local energy scales as L3, the overall amount of computationM L3 to obtain a given statistical accuracy in the energy per site scales only with L2.

In general not all quantities and not all Markov chains have nice polynomialcomplexity. For instance one can use Monte Carlo to compute the normalization ofthe probability distribution Z = ∑

x Ψg(x)2, by generating randomly configurationx with equal probability. However it is easy to realize that in this case the randomvariable Ψg(x)2 will have exponentially large variance as:

(<Ψg(x)4>−<Ψg(x)2>2)

<Ψg(x)2>2 ∼ exp(∼ L)


This follows from the fact that, for a many body wave function only log |Ψg(x)| hasfluctuations bounded by L , and therefore its exponential has exponential fluctua-tions.4 In this case therefore, for computing Z , the Monte Carlo method is just asinefficient as the obvious method of summing the wave function square over the fullHilbert space.

Once again we remark here that it is not a general rule in Monte Carlo that we canevaluate quantities defined in an exponentially large Hilbert space with a polynomialtime. In variational Monte Carlo this is given by the particularly smart choice toinclude in the weight px all the exponentially large fluctuations of the many bodywave function, a scheme that was first proposed by McMillan [6]. This is why, Ibelieve, variational Monte Carlo is a very simple, but nevertheless very robust andefficient technique.

We conclude this section, by emphasizing another important feature of the varia-tional approach, the so-called zero variance property.

Suppose that the variational state |Ψg〉 coincides with an exact eigenstate of H(not necessarily the ground state), namely H |Ψg〉 = Eg|Ψg〉. Then, it follows thatthe local energy eL(x) is constant:

eL(x) = 〈x |H |Ψg〉〈x |Ψg〉 = Eg

〈x |Ψg〉〈x |Ψg〉 = Eg. (8.21)

Therefore, the random variable eL(x) is independent on |x〉, which immediatelyimplies that its variance is zero, and its mean value Eg coincides with the exacteigenvalue. Clearly the closer is the variational state |Ψg〉 to an exact eigenstate, thesmaller the variance of eL(x) will be, namely the constant C will be suppressed inEq. (8.19), and this is very important to reduce the statistical fluctuations and improvethe numerical efficiency.

From the variational principle, the lower the energy is the better the variationalstate will be, but, without an exact solution or other inputs, it is hard to judge howaccurate the variational approximation is. On the contrary, the variance is very useful,because the smallest possible variance, equal to zero, is known a priori, and in thiscase the variational state represents an exact eigenstate of the hamiltonian.

As shown in Ref. [7] it is possible to define, by means of a stochastic implemen-tation of the Lanczos algorithm, a sequence of variational states that systematicallyconverge to the exact ground state as can be clearly displayed in a plot of the energyas a function of its variance. As a matter of fact, the extrapolation to zero variancewith few Lanczos steps allows us, in most cases, to reach very accurate estimates ofthe energy, as shown in Fig. (8.2).

4 In order to convince ourselves of this property with a more careful analysis, we can just assumethat log|Ψg(x)| is a Gaussian random variable with given mean value and variance L), Than thestatement immediately follows after simple integration of Gaussian functions.

216 S. Sorella

Fig. 8.2 Energy as a func-tion of variance in the t-Jmodel, showing that, for goodvariational wave functions,only few Lanczos steps arerequired to obtain a very goodestimate of the exact groundstate energy

8.3 Markov Chains: Stochastic Walks in Configuration Space

In the first section we have emphasized one of the two main points of this chapter.We know want to tackle the more important problem that the correlation time τin the expression (8.19) for the standard deviation should also be finite. This is amuch more difficult task, but is general enough, to be presented in an introductionto the variational Monte Carlo method. As we will see in the following, in all theforthcoming section, it will be rigorously proven that the correlation time τ is finite,but no rigorous statement bounding the length of the correlation time will be possible.

A Markov chain is a non deterministic dynamics, for which a random variable,denoted by xn , evolves as a function of a discrete iteration time n, according to astochastic dynamics given by:

xn+1 = F(xn, ξn). (8.22)

Here F is a known given function, independent of n, and the stochastic nature of thedynamics is due to the ξn , which are random variables (quite generally, ξn can be avector whose coordinates are independent random variables), distributed accordingto a given probability density χ(ξn), independent on n. The random variables ξn atdifferent iteration times are independent (they are independent realizations of thesame experiment), so that, e.g., χ(ξn, ξn+1) = χ(ξn)χ(ξn+1). In the following, forsimplicity of notations, we will indicate xn and ξn as simple random variables, eventhough they can generally represent multidimensional random vectors. Furthermore,we will also consider that the random variables xn assume a discrete set of values,5 asopposed to continuous ones, as in the latter case multidimensional integrals shouldreplace the much simpler summations. After that substitutions the generalization tocontinuous systems is rather obvious.

It is simple to simulate a Markov chain on a computer, by using the so-called pseudo-random number generator for obtaining the random variables ξn , andthis is the reason why Markov chains are particularly important for Monte Carlo

5 For instance {xn} defines the discrete Hilbert space of the variational wave function in a finitelattice Hamiltonian.


calculations. Indeed, we will see that, by using Markov chains, we can easily definerandom variables xn that, after the so called “equilibration time”, namely for n largeenough, will be distributed according to any given probability density ρ(xn) (inparticular, for instance, the one which is required for the variational calculation inEq. (8.11)).

The most important property of a Markov chain is that the random variable xn+1depends only on the previous one, xn , and on ξn , but not on quantities at time n − 1or before. Though ξn+1 and ξn are independent random variables, the random vari-ables xn and xn+1 are not independent; therefore we have to consider the genericjoint probability distribution fn(xn+1, xn), and decompose it into the product of themarginal probability

ρn(xn) =∑

xn+1

fn(xn+1, xn) (8.23)

of the random variable xn , namely the probability to have xn regardless the valueof xn+1, times the conditional probability K (xn+1|xn) defined basically from thefollowing equation:

fn(xn+1, xn) = K (xn+1|xn) ρn(xn). (8.24)

implying immediately that K is normalized, i.e.∑

xn+1

K (xn+1|xn) = 1., easily verified

by summing both sides of the above equation, and using the definition of marginalprobability. Therefore K represents the probability to find the configuration xn+1once at the iteration n of the Markov chain the configuration is constrained to be atxn .

For readers non familiar with probability notations, one has only to remind thatK is nothing but a function of two variables xn and xn+1. At variance of commonfunctions of two arguments the left argument is different from the right one sincethe function K is normalized only with respect to the left argument, and that’s whyone uses the pipe instead of the comma in order to distinguish the two argumentsof a conditional probability. Apart from this technical notations, the most importantproperty of the conditional probability K is that it does not depend on n, as a con-sequence of the Markovian nature of Eq. (8.22), namely that the function F and theprobability density χ of the random variable ξn do not depend on n.

We are now in the position of deriving the so-called Master equation associatedto a Markov chain. Indeed the marginal probability of the variable xn+1 is given byρn+1(xn+1) = ∑

xnf (xn+1, xn), so that, using Eq. (8.24), we get:

ρn+1(xn+1) =∑

xn

K (xn+1|xn) ρn(xn). (8.25)

Thus the Master equation allows us to calculate the evolution of the marginal prob-ability ρn as a function of n, since the conditional probability K (x ′|x) is uniquelydetermined by the stochastic dynamics in Eq. (8.22). More precisely, though theactual value of the random variable xn at the iteration n is not known deterministi-

218 S. Sorella

cally, the probability distribution of the random variable xn is instead known in alldetails, in principle, at each iteration n, once an initial condition is given, for instanceat iteration n = 0, through a ρ0(x0). The solution for ρn(xn) is then obtained itera-tively by solving the Master equation, starting from the given initial condition up tothe desired value of n.

8.3.1 Detailed Balance and Effective Hamiltonian

At this point a quite natural question arises, concerning the existence of a limitingdistribution reached by ρn(x), upon iterating the Master equation for sufficientlylarge n: Does ρn(x) converge to some limiting distribution ρ(x) as n gets largeenough? The question is actually twofold: (i) Does it exist a stationary distributionρ(x), i.e., a distribution which satisfies the Master equation (8.25) when plugged inboth the right-hand and the left-hand side? (ii) Starting from a given arbitrary initialcondition ρ0(x), under what conditions it is guaranteed that ρn(x) will converge toρ(x) as n increases? The first question (i) requires:

ρ(xn+1) =∑

xn

K (xn+1|xn) ρ(xn). (8.26)

In order to satisfy this stationarity requirement, it is sufficient (but not necessary) tosatisfy the so-called detailed balance condition:

K (x ′|x) ρ(x) = K (x |x ′) ρ(x ′). (8.27)

This relationship indicates that the number of processes undergoing a transitionx → x ′ has to be exactly compensated, to maintain a stable stationary condition,by the same amount of reverse processes x ′ → x ; the similarity with the Einstein’srelation for the problem of radiation absorption/emission in atoms is worth to beremembered.

It is very simple to show that the detailed balance condition allows a stationarysolution of the Master equation. Indeed, if for some n we have that ρn(xn) = ρ(xn),then:

ρn+1(xn+1) =∑

xn

K (xn+1|xn) ρ(xn) = ρ(xn+1)∑

xn

K (xn|xn+1) = ρ(xn+1),

(8.28)where we used the detailed balance condition (8.27) for the variables x ′ = xn+1and x = xn , and the normalization condition for the conditional probability∑

xnK (xn|xn+1) = 1.

The answer to question (ii) is quite more complicated in general. In this contextit is an important simplification to consider that the conditional probability functionK (x ′|x), satisfying the detailed balance condition (8.27), can be written in terms ofa symmetric function Hx ′,x = Hx .x ′ apart for a similarity transformation:


K (x ′|x) = −Hx ′,xψ0(x′)/ψ0(x) (8.29)

where Hx ′,x < 0 and ψ0(x) = √ρ(x) is a positive function which is non zero for all

configurations x , and is normalized∑

x ψ20 (x) = 1. Though the restriction to satisfy

the detailed balance condition is not general, it basically holds in many applicationsof the Monte Carlo technique, as we will see in the following.

The function Hx ′,x , being symmetric, can be thought as the matrix elementsof an effective Hamiltonian with non positive off diagonal matrix elements. Theground state of this fictitious hamiltonian will be bosonic (i.e. non negative for eachelement x) for well known properties of quantum mechanics that we will brieflyremind here. This “bosonic” property of the ground state will be very useful to provethe convergence properties of a Markov chain described by (8.29). Indeed due tothe normalization condition

∑x ′ K (x ′, x) = 1, the positive functionψ0(x) is just the

bosonic ground state of H with eigenvalue λ0 = −1. We indicate with λM also themaximum eigenvalue and prove also that λM < 1.

It is simple to show that no eigenvalue λi of H can be larger than 1 in modulus,namely |λi | ≤ 1. Indeed suppose it exists an eigenvector ψi (x) of H with maximummodulus eigenvalue |λi | > 1, then:

|λi | = |∑

x,x ′ψi (x)(−Hx,x ′)ψi (x

′)| ≤∑

x,x ′|ψi (x)|(−Hx,x ′)|ψi (x

′)| (8.30)

Thus |ψi (x)| may be considered a trial state with expectation value of the energylarger or equal than |λi | in modulus. Since the matrix H is symmetric, by the wellknown properties of the minimum/maximum expectation value, this is possible onlyif the state ψMax (x) = |ψi (x)| with all non negative elements is also an eigenstatewith maximum eigenvalue |λi |. By assumption we know thatψ0(x) is also an eigen-state with all positive elements and therefore the assumption of |λi | > 1 cannot befulfilled as the overlap between eigenvectors corresponding to different eigenvalueshas to be zero and instead

∑x ψ0(x)ψMax (x) > 0. Thus we conclude that |λi | ≤ 1

for all eigenvalues and, if the equality holds for some i , namely |λi | = 1, this ispossible only if λi = −1, otherwise ψMax (x) would correspond to the maximumeigenvalue λM = 1, that is not possible for the same orthogonality condition betweeneigenvectors corresponding to different eigenvaluesλ0 = −1 andλM = 1. Thereforeψ0(x) is a bosonic ground state of H , that can be at most degenerate, and λM < 1,as we have anticipated.

The possibility to have a degenerate ground state of H would not lead to a uniquestationary distribution, because, as it is obvious, any arbitrary linear combinationof the degenerate ground state manifold defines a stationary equilibrium distribu-tion ρ0 = ψ2

0 . Therefore a further assumption is required to show that a uniqueequilibrium density distribution ρ(x) can be reached for large n.

The second important condition that should be satisfied in order to generate thedesired equilibrium probabilityρ0 is that the Markov chain should be ergodic, i.e., anyconfiguration x ′ can be reached, in a sufficiently large number of Markov iterations,

220 S. Sorella

starting from any initial configuration x . This implies thatψ0(x) is the unique groundstate of H , a theorem known as the Perron-Frobenius. To prove this theorem, wenotice first that if a ground state ψ0(x) ≥ 0 is non negative for any x , then it hasto be strictly positive ψ0(x) > 0. In fact suppose ψ0(x) = 0 for some x = x0. Byusing that ψ0 is an eigenstate of H , we have:

∑

x( �=x0)

Hx0,xψ0(x) = λ0ψ(x0) = 0

so that, in order to fulfill the previous condition, ψ0(x) = 0 for all configurationsconnected to x0 by H , since ψ0(x) is non negative and −Hx0,x is strictly positive.By applying iteratively the previous condition to the new configurations connectedwith x0, we can continue, by using ergodicity, to have that

ψ0(x) = 0

for all configurations, and this is not possible for a normalized eigenstate.Now suppose that there exists another ground state ψ ′

0(x) of H different fromψ0(x). Then, by linearity and for any constant λ, also ψ0(x) + λψ ′

0(x), if non-vanishing, is a ground state of H , so that by the previous discussion also the stateψ(x) = |ψ0(x)+λψ ′

0(x)| is a non negative ground state of H . However the constantλ can be chosen to have ψ(x) = 0 for a particular configuration x = x0, as we haveshown that ψ0(x) �= 0 for all x . This is not possible unless ψ(x) = 0 for all x ,because otherwise ψ(x) ≥ 0 would be a non negative eigenstate that vanishes forx = x0, which contradicts what we have proved at the beginning for these particularground state wave functions. Therefore, ψ(x) = 0 for all x implies that ψ0(x) andψ ′

0(x) differ at most by an overall constant −λ and we finally conclude that ψ0(x)is the unique ground state of H .

We have finally derived that if ergodicity and detailed balance hold, the groundstate of the fictitious hamiltonian H (8.29) is unique and equal to ψ0(x) with eigen-value λ0 = −1, and λM < 1. This implies, as readily shown later on, that anyinitial ρ0(x) will converge in the end towards the limiting stationary distributionρ(x) = ψ0(x)2. In fact:

ρn(x′) =

∑

x

ψ0(x′)

[−H]n

x ′,x /ψ0(x)ρ0(x) (8.31)

where the nth power of the matrix H can be expanded in terms of its eigenvectors:

[−H]n

x ′,x =∑

i

(−λi )nψi (x

′)ψi (x) (8.32)

Since ψ0(x) is the unique eigenvector with eigenvalue λ0 = −1, by replacing theexpansion (8.32) in (8.31) we obtain:


ρn(x) = ψ0(x)∑

i

ψi (x)(−λi )n

[∑

x ′ψi (x

′)ρ0(x′)/ψ0(x

′)]

(8.33)

Thus for large n only the i = 0 term remains in the above summation and all theother ones decay exponentially as |λi | < 1 for i �= 0. It is simple then to realize thatfor large n

ρn(x) = ψ20 (x) (8.34)

as the initial distribution is normalized and:[∑

x ′ψ0(x

′)ρ0(x′)/ψ0(x

′)]

=∑

x ′ρ0(x

′) = 1

Summarizing, if a Markov chain satisfies detailed balance and is ergodic, then theequilibrium distribution ρ(x) will be always reached, for large enough n, indepen-dently of the initial condition at n = 0. The convergence is always exponential andindeed the dynamic has a well defined finite correlation time τ = Maxi �=0 − ln λi

corresponding to the first excitation of the hamiltonian matrix Hx ′,x .It is not possible to quantify rigorously the length of the correlation time but, since

the effective hamiltonian H associated to the Markov chain, is usually very similarto the physical one,6 namely containing only nearest neighbor hoppings, we can usephysical insights to guess how the gap Δ 1

τscales as a function of the number of

electrons and/or lattice sites L:

• Standard band insulators or non conventional ground state with a finite gap to allexcitations. In this case the correlation time is essentially independent of L , andthe Monte Carlo method is mostly effective.

• Gapless phases due for instance to broken symmetry phase of matter (e.g. super-conducting, magnetic etc.) or simple metals, or gapless spin liquids. In such acase we can estimate how the gap scales with the system size by assuming thatthe gapless excitations have a well defined dynamical critical exponent z. In amodel with translation symmetry and physical spatial dimension d this is equiv-alent to know how the excitation spectrum εk behaves at small momenta, namelyεk |k|z L−z/d (as the minimum non zero value of |k| is 2π/L1/d ), so thatfor instance z = 1 (z = 2) for the quantum antiferromagnet (ferromagnet) andthe superconductor in a short range model (as the plasmon is gapless). The low-est gap Δ to the physical excitations has to contain an elementary one with nonzero momentum and therefore scales to zero with a law depending only on thedynamical critical exponent z, implying that:

6 Since we have shown that the effective Hamiltonian has eigenvalues bounded by 1, whereas thephysical Hamiltonian has extensive eigenvalues L , in the following discussion we assume thatthe gap of the effective Hamiltonian is also scaled by 1/L that may be simply compensated fromthe choice to employ of the order of L Markov steps before computing a new value of the localenergy, requiring always L3 operations.

222 S. Sorella

τ Lz/d (8.35)

We arrive therefore quite generally to the conclusion that, within the variationalMonte Carlo technique, the computer time (CPU) required for evaluating theenergy per site, or any intensive thermodynamic quantity, with a given accuracyε, scales with L as:

C PU L2+z/d (8.36)

where we have inserted Eq. (8.35) in Eq. (8.20), and used that the cost for com-puting the local energy is L3 for fermionic systems (see Appendix). In practicethe method has always polynomial complexity provided the dynamical criticalexponent is finite.

• From the above argument an exponentially hard case is possible only in exceptionalcases, when the Hamiltonian H has an exponentially small gapΔ exp(− ∼ L),namely a divergent dynamical critical exponent z. To my knowledge, among trans-lationally invariant models without disorder, only the spectrum of the Heisenbergmodel in the Kagome’ lattice [8] seems to display an exponentially large numberof singlet excitations before the triplet gap, implying a singlet gap decreasing atleast exponentially with the number of spins L . However this scenario has recentlybeen ruled out by DMRG calculation on much larger cluster sizes [9].On the other hand, in disordered systems, a very simple hamiltonian exists thathave a full gap in all phases but at the critical point:

H =∑

i

Ji Szi Sz

i+1 + hi Sxi (8.37)

This model is defined with appropriate random couplings Ji and hi [10–12], andclose to the critical point the typical gap scales as:

Δ ξ−z, (8.38)

where ξ is the correlation length and both z and ξ diverge at the critical point.Exactly at the critical point the quantum Monte Carlo becomes therefore an expo-nentially hard method. However it is clear from the above discussion and thepresent knowledge of quantum many-body physics that an infinite dynamical crit-ical exponent z cannot define a stable phase apart for particular critical points,and this represents in most cases an irrelevant limitation. I believe in fact that themain purpose of numerical simulation of quantum many-body systems is to char-acterize non trivial phases of matter, that are genuinely driven by strong electroncorrelation and are stable against small perturbations (i.e. far from a critical point).The problem of phase transition and critical behavior, seems too much difficult fornumerical methods, as it is much better understood by conventional field theoryand renormalization group approaches.


As a result of the above considerations and the rigorous statements proved so far,we can fairly introduce the Metropolis algorithm [13], as a general and powerfulMonte Carlo tool, used in particular for sampling in a polynomial time an exponen-tially large Hilbert space, within the variational quantum Monte Carlo method.

8.4 The Metropolis Algorithm

Suppose we want to generate a Markov chain such that, for large n, the configura-tions xn are distributed according to a given probability distribution ρ(x). We wantto construct, accordingly, a conditional probability K (x ′|x) satisfying the detailedbalance condition Eq. (8.27) with the desired ρ(x). How do we do that, in practice? Inorder to do that, Metropolis and collaborators introduced a very simple scheme [14].They started considering a transition probability T (x ′|x), defining the probability ofgoing to x ′ given x , which can be chosen with great freedom, as long as ergodicityis ensured, without any requirement of detailed balance. In order to define a Markovchain satisfying the detailed balance condition, the new configuration x ′ generatedby the chosen transition probability T (x ′|x) is then accepted only with a probability:

A(x ′|x) = Min

{1,ρ(x ′)T (x |x ′)ρ(x)T (x ′|x)

}, (8.39)

so that the resulting conditional probability K (x ′, x) is given by:

K (x ′|x) = A(x ′|x) T (x ′|x) for x ′ �= x . (8.40)

The value of K (x ′|x) for x ′ = x is determined by the normalization condition∑x ′ K (x ′|x) = 1. The proof that detailed balance is satisfied by the K (x ′|x) so

constructed is quite elementary, and is left to the reader. It is also simple to show thatthe conditional probability K (x ′|x) defined above can be casted in the form (8.29),for which, in the previous section, we have proved that the equilibrium distributioncan be always reached after many iterations. In particular:

ψ0(x) = √ρ(x) (8.41)

H(x ′, x) = −A(x ′|x)T (x ′|x)ψ0(x)/ψ0(x′) (8.42)

In fact from the definition of the acceptance probability (8.39), it is simple to verifythat H in (8.41) is symmetric and that the results of the previous section obviouslyhold also in this case.

Summarizing, if xn is the configuration at time n, the Markov chain iteration isdefined in two steps:

1. A move is proposed by generating a configuration x ′ according to the transitionprobability T (x ′|xn);

224 S. Sorella

2. The move is accepted, and the new configuration xn+1 is taken to be equal to x ′,if a random number ξn (uniformly distributed in the interval (0, 1]) is such thatξn ≤ A(x ′|xn), otherwise the move is rejected and one keeps xn+1 = xn .

The important simplifications introduced by the Metropolis algorithm are:

1. It is enough to know the desired probability distribution ρ(x) up to a normaliza-tion constant, because only the ratio ρ(x ′)

ρ(x) is needed in calculating the acceptanceprobability A(x ′|x) in Eq. (8.39). This allows us to avoid a useless, and oftencomputationally prohibitive, normalization (e.g., in the variational approach, thenormalization factor

∑x ψ

2g (x) appearing in Eq. (8.11) needs not be calculated).

2. The transition probability T (x ′|x) can be chosen to be very simple. For instance,in a one-dimensional example on the continuum, a new coordinate x ′ can be takenwith the rule x ′ = x + aξ , where ξ is a random number uniformly distributedin (−1, 1), yielding T (x ′|x) = 1/2a for x − a ≤ x ′ ≤ x + a. In this case, weobserve that T (x ′|x) = T (x |x ′), a condition which is often realized in practice.Whenever the transition probability is symmetric, i.e., T (x ′|x) = T (x |x ′), thefactors in the definition of the acceptance probability A(x ′|x), Eq. (8.39), furthersimplify, so that

A(x ′|x) = Min

{1,ρ(x ′)ρ(x)

}.

In the particular example of the Hubbard model, and typically any lattice model,in order to define T (x ′|x) = T (x |x ′) one can generate randomly a bond andthe spin σ of the electron to be moved. Since ratio of determinants can becomputed very efficiently, as discussed in Appendix, it is not very important thatthe acceptance rate is quite small in this case, as most of the computation is spentonly when the move is accepted.

3. As in the example shown in the previous point, the transition probability T (x ′|x)allows us to impose that the new configuration x ′ is very close to x , at leastfor a small enough. In this limit, all the moves are always accepted, sinceρ(x ′)/ρ(x) ≈ 1, and the rejection mechanism is ineffective. A good rule ofthumb to speed up the correlation time τ , i.e., the number of iterations neededto reach the equilibrium distribution, is to tune the transition probability T , forinstance by increasing a in the above example, in order to have an averageacceptance rate 〈A〉 = 0.5, which corresponds to accepting, on average, onlyhalf of the total proposed moves. This criterium is usually the optimal one forcomputational purposes, but it is not a general rule.


8.5 Stochastic Minimization of the Energy

In this section we introduce the stochastic reconfiguration method for the minimiza-tion of the total energy, a method that can be easily applied to lattice models [7] andrecently extended also to continuous ones [15].

Let Ψg(α0) be the wave function depending on an initial set of p variational

parameters {α0k }k=1,...,p. Consider now a small variation of the parameters αk =

α0k + δαk . The corresponding wave function Ψg(α) is equal, within the validity of

the linear expansion, to the following one:

Ψ ′g(α) =

(Ψg(α

0)+p∑

k=1

δαk∂

∂αkΨg(α

0))

(8.43)

Therefore, by introducing local operators defined on each configuration x as thelogarithmic derivatives with respect to the variational parameters7:

Ok(x) = ∂

∂αklnΨg(x) (8.44)

and for convenience the identity operator O0 = 1, we can writeΨ ′g in a more compact

form:

|Ψ ′g(α)〉 =

p∑

k=0

δαk Ok |Ψg〉, (8.45)

where |Ψg〉 = |Ψg(α0)〉 and δα0 = 1. However, as a result of the iterative minimiza-tion scheme we are going to present, δα0 �= 1, and in that case the variation of theparameters will be obviously scaled

δαk → δαk

δα0(8.46)

and Ψ ′g will be proportional to Ψg(α) for small δαk

δα0.

Our purpose is to set up an iterative scheme to reach the minimum possibleenergy for the parameters α, exploiting the linear approximation for Ψg(α), whichwill become more and more accurate close to the convergence, when the variationof the parameters is smaller and smaller. At a given iteration, we project the wavefunction change produced by the operatorΛI − H in the linear space defined above,by means of a formal projection operator P , such that P2 = P , that satisfies

〈Ψg|Ok P = 〈Ψg|Ok (8.47)

7 For the wave function defined in Eq. 8.5, to the Gutzwiller variational parameter g is associ-ated an operator Og(x) = −〈x |D|x〉, namely minus the number of doubly occupied sites in theconfiguration |x〉.

226 S. Sorella

for any k = 0, · · · p. In this way, the state

|Ψ ′g〉 = P(Λ− H)|Ψg〉 (8.48)

where Λ is a suitable large shift has an energy lower than Ψg [7], provided theconstant shift Λ is large enough, as we assume. In a continuous system, even if itsenergy is unbounded from above,Λ can be finite due to the projection P because theHamiltonian diagonalized in the basis (8.47) is bounded from above as in a latticesystem. In order to determine the coefficients {δαk}k=1,...,p corresponding to Ψ ′

Tdefined in Eq. 8.48, we can overlap both sides of Eq. (8.48) with the set of states〈Ψg|Ok for k = 0, 1, · · · p, so that, reminding Eq. (8.47), we are led to solve thefollowing conditions:

〈Ψg|Ok(Λ− H)|Ψg〉 = 〈Ψg|Ok |Ψ ′g〉 for k = 0, . . . , p (8.49)

Hence, by substituting in the above equation the expansion of Ψ ′ in the form givenby Eq. (8.45), we arrive at the a simple linear system:

∑

l

δαl sl,k = f k, (8.50)

where sl,k = 〈Ψg |Ol Ok |Ψg〉〈Ψg |Ψg〉 is the covariance matrix and f k = 〈Ψg |Ok (Λ−H)|Ψg〉

〈Ψg |Ψg〉 is the

known term; both sl,k and f k are computed stochastically by the same Monte Carlosampling used for the calculation of the energy, namely by generating configurationsaccording to the probability px defined in Eq. (8.11), and by averaging the estimatorsOk(x)Ok′(x) and Ok(x)(Λ − eL(x)) for the matrix elements sk,k′ and the forcecomponents fk , respectively. Better sampling are also possible in this case, as theestimator of the forces fk and the matrix sk,k′ may acquire infinite variance and maybe problematic in some cases.8 This problem can be solved by a simple reweightingtechnique, as discussed in Ref. [16]. With this formulation, with or without thementioned reweighting technique, there is no difficulty to optimize the Jastrow andthe Slater part of the wave function at the same time.

After the system (8.50) has been solved, we update the variational parameters

αk = α(0)k + δαk

δα0for k = 1, . . . , p (8.51)

and we obtain a new trial wave function Ψg(α). By repeating this iteration schemeseveral times, one approaches the convergence when δαk

δα0→ 0 for k �= 0, and in

this limit the conditions (8.49) implies the Euler equations of the minimum energy.Obviously, the solution of the linear system (8.50) is affected by statistical errors,yielding statistical fluctuations of the final variational parameters αk even when

8 In Fig. 8.3, one can see a jump around the iteration 50, that is due to an infinite variance problemoccurring in the variational parameter of the mean-field Hamiltonian.


Fig. 8.3 Convergence of the BCS pairing function as a function of the iteration steps for the 2DHubbard model at U/t = 8 for the 6×6 cluster at half-filling (Δt = 0.05 in Eq. 8.54). A finite valueof the BCS pairing function is obtained only when the Gutzwiller correlation term g (upper panel)

is optimized together with variational parameter Δx2−y2

BC S appearing in the mean field Hamiltonian

given by: HBC S = − ∑k,σ (cos kx +cos ky)c

†σ,kck,σ +Δx2−y2

BC S

∑k(cos kx −cos ky)c

†k,↑c†

−k,↓ +h.c.

convergence has been reached, namely when the {αk}k=1,...,p fluctuate without driftaround an average value. As is illustrated in Fig. (8.3) we perform several iterations inthat regime; in this way, the variational parameters can be determined more accuratelyby averaging them over all these iterations and by evaluating also the correspondingstatistical error bars. In this way one can perform an accurate finite size scaling ofthe variational parameters (see Fig. 8.4) to establish for instance whether the d-wavepairing remains at finite doping and in the thermodynamics limit, implying a d-wavesuperconducting ground state, at least within the simplest correlated wave function.In this pictures we also see the main advantage of the variational approach: when thecorrelated part of the wave function (the Gutzwiller term) is optimized together withthe mean-field like one (the gap function in the BCS Hamiltonian) a qualitativelynew effect can be obtained, namely that there is tendency to form d-wave electronpairs in a model with only repulsive interactions.9

It is worth noting that the solution of the linear system (8.50) depends on Λonly through the δα0 variable δα0 = f0 − ∑

k>0 δαks0,k . Therefore the constant Λindirectly controls the rate of change in the parameters at each step, i.e. the speed ofthe algorithm for convergence and the stability at equilibrium: a too small value willproduce uncontrolled fluctuations for the variational parameters, a too large one willallow convergence in an exceedingly large number of iterations. The choice of Λcan be controlled by evaluating the change of the normalized wave function at eachstep as:

ΔΨ = | Ψ′g

|Ψ ′g|

− Ψg

|Ψg| |2 =

∑

k �=0,k′ �=0

δαk δαk′ sk,k′ (8.52)

9 For a technical description on how to compute the operator Ok(x) in Eq. (8.44), namely when

the variational parameter of the determinantal part (Δx2−y2

BC S ) is defined in terms of a mean fieldHamiltonian, see for instance Ref. [17].

228 S. Sorella

Fig. 8.4 Finite size scaling of the d-wave BCS parameter for the 2D square lattice Hubbard modelat U/t = 4 and U/t = 8 and 10 % doping. In this case the BCS hamiltonian defined in the previous

figure contains also the chemical potential, that is also optimized together with Δx2−y2

BC S and theGutzwiller parameter g. Notice that the BCS pairing appears to go to zero or to an extremely smallvalue at weak coupling

where the reduced overlap matrix, for k, k′ �= 0, is given by:

sk,k′ = sk,k′ − s0,ks0,k′ =<(Ok(x)− <Ok>)(Ok′(x)−<Ok′>)> (8.53)

Actually it is possible to formulate the minimization procedure, as a constrainedminimization: the minimum of the energy is searched with the constraint that ΔΨis fixed to a small value. By using a simple Lagrange multiplier μ, by minimizingEg + μΔΨ , it is simple to obtain a solution independent of Λ:

δαk = Δt∑

k′s−1

k,k′ fk′ (8.54)

where fk = −<Ok(x)(eL(x) − Eg)> and Δt = 1μ

small enough. Notice also thatif μ = δα0, Eq. (8.50) is consistent with the above one, meaning that, as anticipated,Λ implicitly defines the amplitude of the step ΔΨ at each iteration.

For large number of parameters the positive definite matrix s can become veryill conditioned, namely the largest eigenvalue divided by the smallest one is a bignumber. Thus an important tip to improve the stability and the efficiency of thecalculation is the regularization of the inversion in Eq. (8.54) [18]. This may beachieved by scaling the strictly positive diagonal elements of the matrix s by a factorslightly larger than one:

sk,k → (1 + ε)sk,k (8.55)

with ε > 0 (e.g. ε = 10−3).The ill conditioned nature of the matrix s is intrinsic and essentially unavoidable

for an accurate parametrization of a correlated wave function, where variationalparameters define an highly non linear space. When this condition number is very


Fig. 8.5 Convergence of the nearest neighbor spin Jastrow factor in the 1D 150-sites Heisenbergring. During the optimization all independent 74 parameters of the spin Jastrow factor are optimized.Each iteration is obtained by averaging all quantities necessary, such as the matrix s and the vectorfk in Eq. (8.54), over M = 2500 � 74 samples, each generated by applying 300 Metropolis singleparticle attempts. Both in the stochastic reconfiguration method (SR) and in the steepest descentthe amplitude of the stepΔt is optimized and chosen about a factor two smaller than the maximumpossible for a stable and convergent optimization of the energy (as well known if Δt is too largethe methods are not stable even without statistical errors)

large, the standard steepest descent method10 is very inefficient as shown in Fig. (8.5),simply because some direction in this non linear space can be changed only extremelyslowly, with a speed inversely proportional to the condition number. As it is alsoshown in the same picture, the much better speed of convergence of the iterativemethod described by Eq. (8.54) is not affected much by the proposed regularizationof the inverse (Eq. 8.55). On the other hand this is highly recommended becausethe matrix s is known only statistically and already a small value of ε preventsunnecessary large fluctuations in the variational parameters. We remark here, that εdoes not introduce any bias in the calculation of the optimal wave function, becausein absence of statistical errors, a stationary solution is reached only when δαk = 0 inEq. (8.54), implying that fk = 0 (s is positive definite), namely the Euler conditions ofminimum energy. For finite statistical error, a non zero value of ε greatly reduces theerror bars of the variational parameters, enhancing the efficiency of the optimization(see Fig. 8.6). Indeed if we define the efficiency as the inverse of the computer timenecessary to have the chosen variational parameter within a given error bar, this isenhanced by about a factor 1000 compared to the ε = 0 case. In other words withε = 10−3 we need one thousand less computer time to obtain the wave functionoptimized with the same quality. Notice also that, in this case, the described methodwith ε = 0.001 is about 20 times more efficient than the recent more complicatedminimization techniques (not shown), requiring also the derivative of the local energywith respect to all the variational parameters [19, 20].

We conclude this section by remarking that only after considering the non lineardependency of the variational parameters by means of the matrix s it is possible

10 This is obtained by applying the iteration given by Eq. (8.54) without using the matrix s, namelyδαk = Δt fk .

230 S. Sorella

Fig. 8.6 Same as in Fig. 8.5for several iterations, showingthe different performances ofthe two methods

to optimize a large number of parameters (before ’90s correlated wave functionscontained only few parameters), provided the number of samples M used to evaluatethis matrix is much larger than its dimension p given by the total number of variationalparameters. The reason why M ≥ p, is easily understood, because if a small numberof sampling M < p is used, the resulting matrix s is rank-deficient with a numberp − M of exactly zero eigenvalues, despite the fact the matrix s should be strictlypositive definite if all the parameters are independent. This shows that this rank-deficiency is spurious and simply produced by the too small statistics for M < p.With the restriction M >∼ p, we notice that the variational Monte Carlo methodbecomes substantially slower when the number of parameters is large, and growingwith some power of the system size. So far it is possible to optimize a number ofparameters p L , yielding an algorithm scaling with the fourth power of the systemsize C PU L4 as opposed to the much cheaper computational cost (Eq. 8.36) fora parameter free variational wave function (like the Laughlin’s one for the fractionalquantum Hall effect).

8.6 Conclusion

In this chapter, we have described the simplest quantum Monte Carlo technique in arigorous way, emphasizing how is possible to sample an exponentially large Hilbertspace with an amount of computer time that remains feasible even for large number ofelectrons. Several improvement along this line are possible and should be mentioned:

• Systematically convergent improvements of the variational guess by the Lanczosalgorithm. This technique was introduced by Heeb and Rice [21] and was latergeneralized in Ref. [7]. It allows us to apply a small number q of Lanczos iterationsto a generic variational wave function of the standard type, with a computationalcost scaling as M ×Lq+2, that is considerably larger than the standard cost M ×L3

only for q > 1. Obviously the method is not feasible for large q, but very accuratecalculations can be obtained especially when the starting q = 0 wave functionis particularly good (see again Fig. 8.2). In particular the variance extrapolatedenergy obtained with few lanczos iterations represents the most simple and effec-


tive method to estimate the exact ground state energy of a model Hamiltonian on alattice. Generalization to continuous model of this powerful technique is not pos-sible because, due to Coulomb singularity, the standard Lanczos algorithm doesnot work in the present formulation.

• Fixed node Diffusion Monte Carlo, and is recent extension to lattice models[22–24]. Basically in order to avoid the exponential growth of the signal to noiseratio, due to so called “fermion sign problem”, an approximate projection to theground state is employed by requiring that only the amplitudes of the wave func-tion and not its phases (or signs) change during the projection Ψg → e−τHΨg ,that clearly for τ → ∞ filters out the exact ground state component of Ψg . Thismethod is more accurate than the variational Monte Carlo described here, in thesense that the corresponding energy is below the variational estimate, and never-theless represents a rigorous upper bound. It is a bit slower, from a factor two onlattice models, to a factor 10 in continuous systems, with a clear improvement ofthe variational energy estimate, closer to the exact value (when known) by at leasta factor two.

• Release nodes, by directly sampling the sign one can deal with the exact projectionΨg → e−τHΨg with no approximation. In this way one can achieve for smallcluster sizes or small number of electrons, exact ground state answers, with anexponentially large computer time [25]. It is a pity that this method has not beenapplied to lattice model yet, considering that the uniform electron gas problemwas essentially solved exactly with this technique, at least for not too large valuesof rs .

For fermions, there are not Monte Carlo methods that allow us to obtain an exactsolution of the many-body wave function in a polynomial computational time. Never-theless, in lattice model calculations, the variance extrapolation based on few lanczositerations seems the most practical and efficient technique to estimate the exact energyand various correlation functions.

We conclude this chapter by showing the extraordinary good performances thatcan be obtained by fixed Node Diffusion Monte Carlo in modern Petaflop supercom-

Fig. 8.7 Speed performance of the standard Lattice diffusion Monte Carlo as a function of thenumber nc of cores at the JuGene supercomputer in Jülich, Germany. Theoretical scaling neglectingcommunication time is ∝ nc (continuous line). By including the communication time a bit slowerscaling is theoretically expected nc/ ln nc, which seems to be consistent with our test

232 S. Sorella

puters. In Monte Carlo the amount of communications is minimal, when the paradigmto replicate independent calculations on different processors is adopted. With Diffu-sion Monte Carlo some communication is necessary, but the scaling is extraordinarilygood as shown in Fig. (8.7). Since the major limitation of Monte Carlo is so far duein my opinion to the too large statistical errors that can be obtained in a conven-tional computer, the present fantastic improvements in the number of processors andperformances represent a great opportunity, especially for the young generations, toestablish quantum Monte Carlo as the method for strongly correlated systems andfor electronic structure calculations, with robust predictive power, and fully ab initiomany-body wave function based approach.

AppendixSome Efficient Algebra for Determinants

The basic algebra for implementing all calculations with fermions is given by thewell known rule for computing the overlap of two arbitrary N− particle Slater deter-minants of the form given in Eq. (8.4):

〈Ψ |�〉 = DetS = |S| (8.56)

where S is the overlap matrix between the orbitals ψi (r, σ ) and φi (r, σ ) of the twoSlater determinant |Ψ 〉 and |�〉, respectively. Indeed the matrix elements of S aregiven by:

Si, j =∑

r,σ

ψ∗i (r, σ )φ j (r, σ ) (8.57)

Everything can be obtained by using the above equation, that is simple to deriveusing the canonical commutation rules and that cr,σ |0〉 = 0 for all r, σ , namely thatthe vacuum is annihilated by all the destruction operators.11

A.1 Efficient Calculation of Determinant Ratios

In particular, let us try to describe how to get the ratio of two determinants appearingin Eq. (8.9) corresponding to a single particle move r, σ → r ′, σ ′:

|x ′〉 = c†r ′,σ ′cr,σ |x〉 (8.58)

11 All the forthcoming algebra is useful also for BCS wave functions, since after a simple particle-hole on the spin down electrons: c†

i,↓ → ci,↓ [26], the BCS wave function turns in a simple SlaterDeterminant with L orbitals written in the form given by Eq. (8.4).


Since |x〉 is a Slater determinant in position space, x ′ is also a non vanishing Slaterdeterminant in position space provided cr,σ annihilates some creation operator c†

r j ,σ j

appearing in the definition of |x〉 =(

N∏j=1

c†r j ,σ j

)|0〉 (see Eq. 8.2) for j equal to some

integer k such that rk = r and σk = σ , and that r ′, σ ′ does not coincide with any

of such operators as(

c†r j ,σ j

)2 = 0 for fermions. Under the above assumptions x ′

differs from x only for the replacement of the position and spin of the kth operatorc†

r,σ → c†r ′,σ ′ , as all the other operators commute with c†

r ′,σ ′cr,σ . Then we can applyEq. (8.56) to compute each of the two determinants:

r = |A′||A| (8.59)

where the matrix A′ differs from A only for the position of the k−th column, namely:

Ai, j = ψi (r j , σ j )

A′i, j = Ai, j + δ j,k(ψi (r

′, σ ′)− ψi (r, σ )) (8.60)

We can now multiply the RHS of the above equation for the identity AA−1 and obtainthat A′ = AT where T is a very simple matrix that can be written in terms of thematrix W defined in Eq. (8.10).

Ti, j = δi, j + (Wi (r

′, σ ′)− δi,k)δ j,k (8.61)

The determinant of this matrix is very simple to calculate as it contains only one nontrivial column, and by applying the standard expansion of the determinant over thiscolumn it is simple to realize that only the diagonal element of this column providesa non vanishing contribution. Then by applying that the determinant of a product oftwo matrix is the product of their determinants, we obtain a simple expression forthe determinant ratio:

r = |A′||A| = |T | = Wk(r

′, σ ′) (8.62)

Analogously the inverse of the matrix T can be easily guessed by simpleinspection:

T −1i, j = δi, j + g(Wi (r

′, σ ′)− δi,k)δ j,k (8.63)

Indeed by making the simple product T −1T we get that the above expression isactually correct provided 1+g+g(Wk(r ′, σ ′)−1) = 0, namely g = −1/Wk(r ′, σ ′).Using the above simple expression we obtain the new values of the matrix W ′ thatcan be useful for the next Markov step:

234 S. Sorella

W ′i (r, σ ) = Wi (r, σ )− Wi (r ′, σ ′)− δi,k

Wk(r ′, σ ′)Wk(r, σ ) (8.64)

which takes N ×2L operations. The above matrix can be updated during the Markovprocess, and allows to compute the local energy and the determinant ratio for theMetropolis algorithm in a very efficient way. Each time the new position is acceptedthe cost for updating the matrix W is optimal and correspond to standard linear algebraoperations that can be further optimized using the technique described below.

A.2 Delayed Updates

The basic operation in Eq. (8.64) is the so called rank-1 update of a generic N × 2Lmatrix:

W ′i, j = Wi, j + ai b j (8.65)

This operation can be computationally inefficient, when, for large size, the matrix Wis not completely contained in the cache of the processor.12 A way to overcome thisdrawback is to delay the update of the matrix W , without loosing its information.This can be obtained by storing a set of left and right vectors and the initial fullmatrix W 0, from which we begin to delay the updates:

Wi, j = W 0i, j +

m∑

l=1

ali b

lj (8.66)

as, each time we accept a new configuration, a new pair of vectors am+1i and bm+1

j

can be easily computed in few operations in term of W 0, ali , bl

j l = 1, · · · m, bysubstituting Eq. (8.66) into the RHS of Eq. (8.64):

am+1i = −Wi (r ′, σ ′)− δi,k

Wk(r ′, σ ′)(8.67)

bm+1r,σ = Wk(r, σ ). (8.68)

where here the index j of the vector b has been replaced by the pair r, σ runningover 2L values. Notice that the number of operations required to evaluate the aboveexpressions in term of W written in the form (8.66) is m(2L + N ), negligiblecompared to the full update for m � L .

In this way we can find an optimal m = krep, when we can evaluate the full matrixWi, j by a standard matrix multiplication:

W = W 0 + ABT (8.69)

12 For more fancy implementation of the algorithm see e.g. Ref. [27].


Fig. 8.8 Speedup obtained by using delayed updates in quantum Monte Carlo for a variational wavefunction containing 3600 electrons in the Hubbard model at half filling. With the same number ofMetropolis updates and calculation of the local energies the algorithm described in this appendix isseveral times faster compared with the conventional one (krep = 1). Test calculations were done inthe JuGene supercomputer in Jülich (maximum speedup 6.4 with krep = 56), in the Curie machine(maximum speedup 17.9 with krep = 56) hosted in Bruyeres-le-Chatel France, in the K-computerhosted in Kobe Japan (maximum speedup 23.9 for kr ep = 56), and in the sp6-CINECA (maximumspeedup 16.1 for krep = 64) in Bologna

where A and B are N × krep and 2L × krep matrices made of the l = 1, 2, · · · krep

column vectors ali and bl

j , respectively. After this standard matrix-matrix product

one can continue with a new delayed update with a new W 0 = W , by initializingagain to zero the integer m in Eq. (8.66). The clear advantage of this is that after acycle of krep Markov steps the bulk of the computation is given by the evaluation ofthe matrix-matrix product in Eq. (8.69), that is much more efficient and is not cachelimited compared with the krep rank-1 original updates of W given in Eq. (8.64).With the krep delayed algorithm, once the optimal krep is found one can improve thespeed of the variational Monte Carlo code by about an order of magnitude for largenumber of electrons (see Fig. 8.8).

References

1. E.H. Lieb, F.Y. Wu, Phys. Rev. Lett. 20, 1445 (1968)2. Z.Y. Meng, T.C. Lang, S. Wessel, F.F. Assaad, A. Muramatsu, Nature 464, 2010 (2010)3. S. Sorella, E. Tosatti, Europhys. Lett. 19, 699 (1992)4. D. Duffy, A. Moreo, Phys. Rev. B 55, R676 (1997)5. L.F. Tocchio, F. Becca, A. Parola, S. Sorella, Phys. Rev. B 78, R041101 (2008)6. W.L. McMillan, Phys. Rev. 138, A442 (1965)7. S. Sorella, Phys. Rev. B 64, 024512 (2001)8. P. Lecheminant, B. Bernu, C. Lhuillier, L. Pierre, P. Sindzingre, Phys. Rev. B 56, 2521 (1997)9. S. Yan, D.A. Huse, S.R. White, ArXiV:cond-mat, p. 1011.6114 (2011)

10. D.S. Fisher, Phys. Rev. Lett. 69, 534 (1992)11. D.S. Fisher, Phys. Rev. B 50, 3799 (1994)12. D.S. Fisher, Phys. Rev. B 51, 6411 (1995)13. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, J. Chem. Phys. 21, 1087 (1953)

236 S. Sorella

14. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.N. Teller, E. Teller, Equation of statecalculations by fast computing machines. J. Chem. Phys. 21, 1078 (1953)

15. M. Casula, S. Sorella, J. Chem. Phys. 119, 6500 (2003)16. C. Attaccalite, S. Sorella, Phys. Rev. Lett. 100, 114501 (2008)17. S. Yunoki, S. Sorella, Phys. Rev. B 74, 014408 (2006)18. M. Casula, S. Sorella, D. Rocca, J. Chem. Phys. 127, 014105 (2007)19. S. Sorella, Phys. Rev. B 71, R241103 (2005)20. C. Umrigar, J. Toulouse, C. Filippi, S. Sorella, R. Henning, Phys. Rev. Lett. 98, 110201 (2007)21. E.S. Heeb, T.M. Rice, Europhys. Lett. 27, 673 (1994)22. S. Sorella, L. Capriotti, Phys. Rev. B 61, 2599 (2000)23. D.F.B. ten Haaf, H.J.M. van Bemmel, J.M.J. van Leeuwen, W. van Saarloos, D.M. Ceperley,

Phys. Rev. B 51, 13039 (1995)24. N. Trivedi, D.M. Ceperley, Phys. Rev. B 41, 4552 (1990)25. D.M. Ceperley, B.J. Alder, J. Chem. Phys. 81, 5833 (1984)26. M. Ogata, H. Shiba, J. Phys. Soc. Jpn. 58, 2836 (1989)27. P.K.V.V. Nukala et al., Phys. Rev. B 80, 195111 (2009)

[springer series in solid-state sciences] strongly correlated systems volume 176 || variational...

Documents