monte carlo methods in statistical mechanics: foundations and

Monte Carlo Methods

in Statistical Mechanics�

Foundations and New Algorithms

Alan D� Sokal

Department of Physics

New York University

� Washington Place

New York� NY ��

USA

E�mail� SOKAL�NYU�EDU

Lectures at the Carg�ese Summer School on�Functional Integration� Basics and Applications�

September ��

These notes are an updated version of lectures given at the Cours de Troisi�emeCycle de la Physique en Suisse Romande �Lausanne Switzerland in June�� We thank the Troisi�eme Cycle de la Physique en Suisse Romande andProfessor Michel Droz for kindly giving permission to reprint these notes�

Note to the Reader

The following notes are based on my course �Monte Carlo Methods in StatisticalMechanics� Foundations and New Algorithms� given at the Cours de Troisi�eme Cyclede la Physique en Suisse Romande �Lausanne Switzerland in June �� and on mycourse �Multi Grid Monte Carlo for Lattice Field Theories� given at the Winter Collegeon Multilevel Techniques in Computational Physics �Trieste Italy in January�February��

The reader is warned that some of this material is out of date �this is particularlytrue as regards reports of numerical work� For lack of time I have made no attempt toupdate the text but I have added footnotes marked �Note Added �� that correcta few errors and give additional bibliography�

My �rst two lectures at Carg�ese �� were based on the material included here� Mythird lecture described the new �nite size scaling extrapolation method of ��

� Introduction

The goal of these lectures is to give an introduction to current research on MonteCarlo methods in statistical mechanics and quantum �eld theory with an emphasis on�

� the conceptual foundations of the method including the possible dangers and mis uses and the correct use of statistical error analysis� and

� new Monte Carlo algorithms for problems in critical phenomena and quantum�eld theory aimed at reducing or eliminating the �critical slowing down� found inconventional algorithms�

These lectures are aimed at a mixed audience of theoretical computational and math ematical physicists � some of whom are currently doing or want to do Monte Carlostudies themselves others of whom want to be able to evaluate the reliability of publishedMonte Carlo work�

Before embarking on � hours of lectures on Monte Carlo methods let me o�er awarning�

Monte Carlo is an extremely bad method� it should be used only when allalternative methods are worse�

Why is this so� Firstly all numerical methods are potentially dangerous comparedto analytic methods� there are more ways to make mistakes� Secondly as numericalmethods go Monte Carlo is one of the least e�cient� it should be used only on thoseintractable problems for which all other numerical methods are even less e�cient�

�

Let me be more precise about this latter point� Virtually all Monte Carlo methodshave the property that the statistical error behaves as

error � �pcomputational budget

�or worse� this is an essentially universal consequence of the central limit theorem� Itmay be possible to improve the proportionality constant in this relation by a factor of�� or more � this is one of the principal subjects of these lectures � but the overall��pn behavior is inescapable� This should be contrasted with traditional deterministic

numerical methods whose rate of convergence is typically something like ��n� or e�n ore��

n

� Therefore Monte Carlo methods should be used only on those extremely di�cultproblems in which all alternative numerical methods behave even worse than ��

pn�

Consider for example the problem of numerical integration in d dimensions andlet us compare Monte Carlo integration with a traditional deterministic method suchas Simpson�s rule� As is well known the error in Simpson�s rule with n nodal pointsbehaves asymptotically as n��d �for smooth integrands� In low dimension �d � � thisis much better than Monte Carlo integration but in high dimension �d � � it is muchworse� So it is not surprising that Monte Carlo is the method of choice for performinghigh dimensional integrals� It is still a bad method� with an error proportional to n��it is di�cult to achieve more than � or � digits accuracy� But numerical integration inhigh dimension is very di�cult� though Monte Carlo is bad all other known methodsare worse��

In summary Monte Carlo methods should be used only when neither analytic meth ods nor deterministic numerical methods are workable �or e�cient� One general domainof application of Monte Carlo methods will be therefore to systems with many degreesof freedom� far from the perturbative regime� But such systems are precisely the onesof greatest interest in statistical mechanics and quantum �eld theory�

It is appropriate to close this introduction with a general methodological observationably articulated by Wood and Erpenbeck ��

� � � these �Monte Carlo� investigations share some of the features of ordinaryexperimental work in that they are susceptible to both statistical and sys tematic errors� With regard to these matters we believe that papers shouldmeet much the same standards as are normally required for experimental in vestigations� We have in mind the inclusion of estimates of statistical errordescriptions of experimental conditions �i�e� parameters of the calculation

�This discussion of numerical integration is grossly oversimpli�ed� Firstly� there are deterministicmethods better than Simpson�s rule� and there are also sophisticated Monte Carlo methods whoseasymptotic behavior �on smooth integrands� behaves as n�p with p strictly greater than �� Secondly� for all these algorithms �except standard Monte Carlo�� the asymptotic behavior as n � �may be irrelevant in practice� because it is achieved only at ridiculously large values of n� For example�to carry out Simpson�s rule with even �� nodes per axis �a very coarse mesh� requires n ��d� whichis unachievable for d ��

�

relevant details of apparatus �program design comparisons with previousinvestigations discussion of systematic errors etc� Only if these are providedwill the results be trustworthy guides to improved theoretical understanding�

� Dynamic Monte Carlo Methods�

General Theory

AllMonte Carlo work has the same general structure� given some probability measure� on some con�guration space S we wish to generate many random samples from ��How is this to be done�

Monte Carlo methods can be classi�ed as static or dynamic� Static methods arethose that generate a sequence of statistically independent samples from the desiredprobability distribution �� These techniques are widely used in Monte Carlo numericalintegration in spaces of not too high dimension �� But they are unfeasible for mostapplications in statistical physics and quantum �eld theory where � is the Gibbs measureof some rather complicated system �extremely many coupled degrees of freedom�

The idea of dynamic Monte Carlo methods is to invent a stochastic process withstate space S having � as its unique equilibrium distribution� We then simulate thisstochastic process on the computer starting from an arbitrary initial con�guration� oncethe system has reached equilibrium we measure time averages which converge �as therun time tends to in�nity to � averages� In physical terms we are inventing a stochastictime evolution for the given system� Let us emphasize however that this time evolutionneed not correspond to any real �physical� dynamics� rather the dynamics is simply anumerical algorithm and it is to be chosen like all numerical algorithms on the basisof its computational e�ciency�

In practice the stochastic process is always taken to be a Markov process� So our�rst order of business is to review some of the general theory of Markov chains�� Forsimplicity let us assume that the state space S is discrete �i�e� �nite or countably in�nite�Much of the theory for general state space can be guessed from the discrete theory bymaking the obvious replacements �sums by integrals matrices by kernels although theproofs tend to be considerably harder�

Loosely speaking a Markov chain �� discrete timeMarkov process with state spaceS is a sequence of S valued random variables X��X��X�� such that successive tran sitions Xt � Xt�� are statistically independent ��the future depends on the past onlythrough the present�� More precisely a Markov chain is speci�ed by two ingredients�

�The books of Kemeny and Snell �� and Iosifescu �� are excellent references on the theory of Markovchains with �nite state space� At a somewhat higher mathematical level� the books of Chung �� andNummelin �� deal with the cases of countable and general state space� respectively�

�

� The initial distribution �� Here � is a probability distribution on S and theprocess will be de�ned so that IP�X� � x � �x�

� The transition probability matrix P � fpxygx�y�S � fp�x� ygx�y�S� Here P is amatrix satisfying pxy � � for all x� y and

Py pxy � � for all x� The process will be

de�ned so that IP�Xt�� y jXt � x � pxy�

The Markov chain is then completely speci�ed by the joint probabilities

IP�X� � x�� X� � x�� X� � x�� Xn � xn � �x� px�x� px�x� � � � pxn��xn � ��

This product structure expresses the fact that �successive transitions are independent��Next we de�ne the n�step transition probabilities

p�n�xy � IP�Xt�n � y jXt � x � ��

Clearly p��xy � �xy p��xy � pxy and in general fp�n�xy g are the matrix elements of P n�A Markov chain is said to be irreducible if from each state it is possible to get to

each other state� that is for each pair x� y � S there exists an n � � for which p�n�xy � ��We shall be considered almost exclusively with irreducible Markov chains�

For each state x we de�ne the period of x �denoted dx to be the greatest commondivisor of the numbers n � � for which p�n�xx � �� If dx � � the state x is calledaperiodic� It can be shown that in an irreducible chain all states have the same period�so we can speak of the chain having period d� Moreover the state space can thenbe partitioned into subsets S�� S�� Sd around which the chain moves cyclically i�e�p�n�xy � � whenever x � Si y � Sj with j � i � n �mod d� Finally it can be shownthat a chain is irreducible and aperiodic if and only if for each pair x� y there existsNxy such that p�n�xy � � for all n � Nxy�

We now come to the fundamental topic in the theory of Markov chains which is theproblem of convergence to equilibrium� A probability measure � � f�xgx�S is calleda stationary distribution �or invariant distribution or equilibrium distribution for theMarkov chain P in case X

x

�x pxy � �y for all y� ��

A stationary probability distribution need not exist� but if it does then a lot morefollows�

Theorem � Let P be the transition probability matrix of an irreducible Markov chainof period d� If a stationary probability distribution � exists� then it is unique� and �x � �for all x� Moreover�

limn��

p�nd�r�xy �

�d�y if x � Si� y � Sj with j � i � r �mod d� if x � Si� y � Sj with j � i � r �mod d

��

for all x� y� In particular� if P is aperiodic� then

limn��

p�n�xy � �y � ��

�

This theorem shows that the Markov chain converges as t � to the equilibriumdistribution � irrespective of the initial distribution �� Moreover under the conditionsof this theorem much more can be proven � for example a strong law of large numbersa central limit theorem and a law of the iterated logarithm� For statements and proofsof all these theorems we refer the reader to Chung ��

We can now see how to set up a dynamicMonte Carlo method for generating samplesfrom the probability distribution �� It su�ces to invent a transition probability matrixP � fpxyg � fp�x� yg satisfying the following two conditions�

�A Irreducibility� For each pair x� y � S there exists an n � � for which p�n�xy � ��

�B Stationarity of �� For each y � SXx

�x pxy � �y� ��

Then Theorem � �together with its more precise counterparts shows that simulation ofthe Markov chain P constitutes a legitimateMonte Carlo method for estimating averageswith respect to �� We can start the system in any state x and the system is guaranteedto converge to equilibrium as t � �at least in the averaged sense �� Long timeaverages of any observable f will converge with probability � to � averages �stronglaw of large numbers and will do so with uctuations of size � n�� central limittheorem� In practice we will discard the data from the initial transient i�e� before thesystem has come close to equilibrium but in principle this is not necessary �the bias isof order n�� hence asymptotically much smaller than the statistical uctuations�

So far so good� But while this is a correct Monte Carlo algorithm it may or maynot be an e�cient one� The key di�culty is that the successive states X�� X�� of theMarkov chain are correlated � perhaps very strongly � so the variance of estimatesproduced from the dynamic Monte Carlo simulation may be much higher than in staticMonte Carlo �independent sampling� To make this precise let f � ff�xgx�S be areal valued function de�ned on the state space S �i�e� a real valued observable that issquare integrable with respect to �� Now consider the stationary Markov chain �i�e�start the system in the stationary distribution � or equivalently �equilibrate� it for avery long time prior to observing the system� Then fftg � ff�Xtg is a stationarystochastic process with mean

f � hfti �Xx

�x f�x ��

�We avoid the term �ergodicity� because of its multiple and con�icting meanings� In the physicsliterature� and in the mathematics literature on Markov chains with �nite state space� �ergodic� is typ�ically used as a synonym for �irreducible� �� Section �� Chapter �� However� in the mathematicsliterature on Markov chains with general state space� �ergodic� is used as a synonym for �irreducible�aperiodic and positive Harris recurrent� �� p� �� p� ��

�

and unnormalized autocorrelation function�

Cff�t � hfsfs�ti � �f ��

�Xx�y

f�x ��xp�jtj�xy � �x�y� f�y�

The normalized autocorrelation function is then

ff �t � Cff�t�Cff��

Typically ff �t decays exponentially �� e�jtj�� for large t� we de�ne the exponentialautocorrelation time

�exp�f � lim supt��

t

� log jff�tj ��

and�exp � sup

f�exp�f � ��

Thus �exp is the relaxation time of the slowest mode in the system� �If the state spaceis in�nite �exp might be !�

An equivalent de�nition which is useful for rigorous analysis involves consideringthe spectrum of the transition probability matrix P considered as an operator on theHilbert space l�� It is not hard to prove the following facts about P �

�a The operator P is a contraction� �In particular its spectrum lies in the closedunit disk�

�b � is a simple eigenvalue of P as well as of its adjoint P � with eigenvector equalto the constant function ��

�c If the Markov chain is aperiodic then � is the only eigenvalue of P �and of P �on the unit circle�

�d Let R be the spectral radius of P acting on the orthogonal complement of theconstant functions�

R � infnr� spec �P jn �� f�� j�j � rg

o� ��

Then R � e��exp�

Facts �a��c are a generalized Perron Frobenius theorem �� fact �d is a consequenceof a generalized spectral radius formula �� Propositions ��

�In the statistics literature� this is called the autocovariance function�

� l�� is the space of complex�valued functions on S that are square�integrable with respect to ��kfk � �

Px �xjf�x�j

�� The inner product is given by �f� g� �P

x �xf�x��g�x��

�

The rate of convergence to equilibrium from an initial nonequilibrium distributioncan be bounded above in terms of R �and hence �exp� More precisely let is a proba bility measure on S and let us de�ne its deviation from equilibrium in the l� sense

d�� jj �� jjl�� sup

jjf jjl��

��Z f d �Zf d�

��

Then clearlyd��P

t�� jjP t jn ��jj d��

And by the spectral radius formula

jjP t jn ��jj � Rt � exp��t��exp ��

asymptotically as t� with equality for all t if P is self adjoint �see below�On the other hand for a given observable f we de�ne the integrated autocorrelation

time

�int�f ��

�

�Xt��

ff�t ��

��

�!

�Xt��

ff�t

�The factor of ��is purely a matter of convention� it is inserted so that �int�f �exp�f if

ff�t � e�jtj�� with � � �� The integrated autocorrelation time controls the statisticalerror in Monte Carlo measurements of hfi� More precisely the sample mean

"f � �

n

nXt��

ft ��

has variance

var� "f ��

n�

nXr�s��

Cff�r � s ��

��

n

n��Xt��n��

�� jtj

n

�Cff�t ��

�

n��int�f Cff �� for n� � ��

Thus the variance of "f is a factor ��int�f larger than it would be if the fftg were statis tically independent� Stated di�erently the number of �e�ectively independent samples�in a run of length n is roughly n��int�f �

It is sometimes convenient to measure the integrated autocorrelation time in terms ofthe equivalent pure exponential decay that would produce the same value of

P�t�� ff�t

namely e�int�f � ��log

��int�f��int�f��

� � ��

�

This quantity has the nice feature that a sequence of uncorrelated data has e�int�f � ��but �int�f � �

�� Of course e�int�f is ill de�ned if �int�f �� as can occasionally happen

in cases of anticorrelation�In summary the autocorrelation times �exp and �int�f play di�erent roles in Monte

Carlo simulations� �exp places an upper bound on the number of iterations ndisc whichshould be discarded at the beginning of the run before the system has attained equi librium� for example ndisc ��exp is usually more than adequate� On the other hand�int�f determines the statistical errors in Monte Carlo measurements of hfi once equi librium has been attained�

Most commonly it is assumed that �exp and �int�f are of the same order of mag nitude at least for �reasonable� observables f � But this is not true in general� Infact in statistical mechanical problems near a critical point one usually expects theautocorrelation function ff�t to obey a dynamic scaling law �� of the form

ff �t�� jtj�a F�� c jtjb

��

valid in the region

jtj � �� j� � �cj � �� j� � �cj jtjb bounded� ��

Here a� b � � are dynamic critical exponents and F is a suitable scaling function� � issome �temperature like� parameter and �c is the critical point� Now suppose that Fis continuous and strictly positive with F �x decaying rapidly �e�g� exponentially asjxj � � Then it is not hard to see that

�exp�f � j� � �cj��b ��

�int�f � j� � �cj��a��b ��

ff�t� � � �c � jtj�a ��

so that �exp�f and �int�f have di�erent critical exponents unless a � �� Actually thisshould not be surprising� replacing �time� by �space� we see that �exp�f is the analogueof a correlation length while �int�f is the analogue of a susceptibility� and ��are the analogue of the well known scaling law � � �� clearly � � in general�So it is crucial to distinguish between the two types of autocorrelation time�

Returning to the general theory we note that one convenient way of satisfying con dition �B is to satisfy the following stronger condition�

�B� For each pair x� y � S� �xpxy � �ypyx� ��

�Our discussion of this topic in �� is incorrect�

�

�Summing �B� over x we recover �B�� B� is called the detailed�balance condition�a Markov chain satisfying �B� is called reversible�� B� is equivalent to the self�adjointness of P as on operator on the space l�� In this case the spectrum of Pis real and lies in the closed interval �� we de�ne

�min � inf spec �P jn ��

�max � sup spec �P jn ��

#From �� we have

�exp ��

logmax�j�minj� �max��

For many purposes only the spectrum near !� matters so it is useful to de�ne themodied exponential autocorrelation time

� �exp �� log �max if �max � �! if �max � �

��

Now let us apply the spectral theorem to the operator P � it follows that the auto correlation function ff�t has a spectral representation

ff �t �

�max�fZ�min�f

�jtj d�ff ��

with a nonnegative spectral weight d�ff �� supported on an interval��min�f � �max�f � � ��min� �max�� Clearly

�exp�f ��

log max�j�min�f j� �max�f ��

and we can de�ne

� �exp�f �

�� log �max�f if �max�f � �� if �max�f � �

��

�if �max�f � �� On the other hand

�int�f ��

�

�max�fZ�min�f

� ! �

� � �d�ff ��

It follows that

�int�f � �

�

�� ! e��

exp�f

� � e��

exp�f

A � �

�

�� ! e��

�exp

�� e��exp

� � �exp � ��

�For the physical signi�cance of this term� see Kemeny and Snell �� section �� or Iosifescu ��section ��

�

Moreover since � �� ! �� is a convex function Jensen�s inequality impliesthat

�int�f � �

�

� ! ff ��

� � ff��

If we de�ne the initial autocorrelation time

�init�f �

�� log ff �� if ff�� unde�ned if ff��

��

then these inequalities can be summarized conveniently as

�init�f � e�int�f � � �exp�f � � �exp � ��

Conversely it is not hard to see that

supf�l��

�init�f � supf�l��

e�int�f � supf�l��

� �exp�f � � �exp � ��

it su�ces to choose f so that its spectral weight d�ff is supported in a very smallinterval near �max�

Finally let us make a remark about transition probabilities P that are �built up outof� other transition probabilities P�� P�� Pn�

a If P�� P�� Pn satisfy the stationarity condition �resp� the detailed balance con dition for � then so does any convex combination P �

Pni�� iPi� Here �i � �

andPn

i�� i � ��

b If P�� P�� Pn satisfy the stationarity condition for � then so does the productP � P�P� � � �Pn� �Note however that P does not in general satisfy the detailed balance condition even if the individual Pi do�

Algorithmically the convex combination amounts to choosing randomly with probabil ities f�ig from among the �elementary operations� Pi� �It is crucial here that the �iare constants independent of the current con�guration of the system� only in this casedoes P leave � stationary in general� Similarly the product corresponds to performingsequentially the operations P�� P�� Pn�

�Recall that if A and B are self�adjoint operators� then AB is self�adjoint if and only if A and Bcommute�

��

� Statistical Analysis of Dynamic Monte Carlo Data

Many published Monte Carlo studies contain statements like�

We ran for a total of �� iterations discarding the �rst �� iterations�for equilibration and then taking measurements once every �� iterations�

It is important to emphasize that unless further information is given � namely theautocorrelation time of the algorithm � such statements have no value whatsoever �

Is a run of �� iterations long enough� Are �� iterations su�cient for equi libration� That depends on how big the autocorrelation time is� The purpose of thislecture is to give some practical advice for choosing the parameters of a dynamic MonteCarlo simulation and to give an introduction to the statistical theory that puts thisadvice on a sound mathematical footing�

There are two fundamental � and quite distinct � issues in dynamic Monte Carlosimulation�

� Initialization bias� If the Markov chain is started in a distribution � that is notequal to the stationary distribution � then there is an �initial transient� in whichthe data do not re ect the desired equilibrium distribution �� This results in asystematic error �bias�

� Autocorrelation in equilibrium� The Markov chain once it reaches equilibriumprovides correlated samples from �� This correlation causes the statistical error�variance to be a factor ��int�f larger than in independent sampling�

Let us discuss these issues in turn�

Initialization bias� Often the Markov chain is started in some chosen con�gurationx� then � � �x� For example in an Ising model x might be the con�guration with�all spins up�� this is sometimes called an ordered or cold start� Alternatively theMarkov chain might be started in a random con�guration chosen according to somesimple probability distribution �� For example in an Ising model we might initializethe spins randomly and independently with equal probabilities of up and down� this issometimes called a random or hot start� In all these cases the initial distribution � isclearly not equal to the equilibrium distribution �� Therefore the system is initially�out of equilibrium�� Theorem � guarantees that the system approaches equilibrium ast� but we need to know something about the rate of convergence to equilibrium�

Using the exponential autocorrelation time �exp we can set an upper bound on theamount of time we have to wait before equilibrium is �for all practical purposes� at tained� For example if we wait a time ��exp then the deviation from equilibrium �inthe l� sense will be at most e�� times the initial deviation from equi librium� There are two di�culties with this bound� Firstly it is usually impossible toapply in practice since we almost never know �exp �or a rigorous upper bound for it�Secondly even if we can apply it it may be overly conservative� indeed there existperfectly reasonable algorithms in which �exp � ! �see Sections � and ��

��

Lacking rigorous knowledge of the autocorrelation time �exp we should try to esti mate it both theoretically and empirically� To make a heuristic theoretical estimate of�exp we attempt to understand the physical mechanism�s causing slow convergence toequilibrium� but it is always possible that we have overlooked one or more such mech anisms and have therefore grossly underestimated �exp� To make a rough empiricalestimate of �exp we measure the autocorrelation function Cff�t for a suitably largeset of observables f �see below�� but there is always the danger that our chosen set ofobservables has failed to include one that has strong enough overlap with the slowestmode again leading to a gross underestimate of �exp�

On the other hand the actual rate of convergence to equilibrium from a given initialdistribution � may be much faster than the worst case estimate given by �exp� So itis usual to determine empirically when �equilibrium� has been achieved by plottingselected observables as a function of time and noting when the initial transient appearsto end� More sophisticated statistical tests for initialization bias can also be employed��

In all empirical methods of determining when �equilibrium� has beenachieved a serious danger is the possibility ofmetastability� That is it could appear thatequilibrium has been achieved when in fact the system has only settled down to a long lived �metastable region of con�guration space that may be very far from equilibrium�The only sure �re protection against metastability is a proof of an upper bound on �exp�or more generally on the deviation from equilibrium as a function of the elapsed timet� The next best protection is a convincing heuristic argument that metastability isunlikely �i�e� that �exp is not too large� but as mentioned before even if one rules outseveral potential physical mechanisms for metastability it is very di�cult to be certainthat one has not overlooked others� If one cannot rule out metastability on theoreticalgrounds then it is helpful at least to have an idea of what the possible metastable regionslook like� then one can perform several runs with di�erent initial conditions typical ofeach of the possible metastable regions and test whether the answers are consistent�For example near a �rst order phase transition most Monte Carlo methods su�er frommetastability associated with transitions between con�gurations typical of the distinctpure phases� We can try initial conditions typical of each of these phases �e�g� for manymodels a �hot� start and a �cold� start� Consistency between these runs does notguarantee that metastability is absent but it does give increased con�dence� Plots ofobservables as a function of time are also useful indicators of possible metastability�

But when all is said and done no purely empirical estimate of � from a run of lengthn can be guaranteed to be even approximately correct� What we can say is that if�estimated � n then either � �estimated or else � �� n�

Once we know �or guess the time needed to attain �equilibrium� what do we dowith it� The answer is clear� we discard the data from the initial transient up to sometime ndisc and include only the subsequent data in our averages� In principle this is�asymptotically unnecessary because the systematic errors from this initial transientwill be of order ��n while the statistical errors will be of order ��n�� But in practicethe coe�cient of ��n in the systematic error may be fairly large if the initial distribution

��

is very far from equilibrium� By throwing away the data from the initial transient welose nothing and avoid a potentially large systematic error�

Autocorrelation in equilibrium� As explained in the preceding lecture the varianceof the sample mean "f in a dynamic Monte Carlo method is a factor ��int�f higher thanit would be in independent sampling� Otherwise put a run of length n contains onlyn��int�f �e�ectively independent data points��

This has several implications for Monte Carlo work� On the one hand it meansthat the the computational e�ciency of the algorithm is determined principally by itsautocorrelation time� More precisely if one wishes to compare two alternative MonteCarlo algorithms for the same problem then the better algorithm is the one that hasthe smaller autocorrelation time� when time is measured in units of computer CPU�time� �In general there may arise tradeo�s between �physical� autocorrelation time �i�e�� measured in iterations and computational complexity per iteration�� So accuratemeasurements of the autocorrelation time are essential to evaluating the computationale�ciency of competing algorithms�

On the other hand even for a �xed algorithm knowledge of �int�f is essential fordetermining run lengths � is a run of �� sweeps long enough� � and for settingerror bars on estimates of hfi� Roughly speaking error bars will be of order ��n��so if we want �$ accuracy then we need a run of length �� and so on� Aboveall there is a basic self consistency requirement� the run length n must be � than theestimates of � produced by that same run otherwise none of the results from that runshould be believed� Of course while self consistency is a necessary condition for thetrustworthiness of Monte Carlo data it is not a su�cient condition� there is always thedanger of metastability�

Already we can draw a conclusion about the relative importance of initializationbias and autocorrelation as di�culties in dynamic Monte Carlo work� Let us assumethat the time for initial convergence to equilibrium is comparable to �or at least not toomuch larger than the equilibrium autocorrelation time �int�f �for the observables f ofinterest � this is often but not always the case� Then initialization bias is a relativelytrivial problem compared to autocorrelation in equilibrium� To eliminate initializationbias it su�ces to discard �� of the data at the beginning of the run� but to achievea reasonably small statistical error it is necessary to make a run of length �� ormore� So the data that must be discarded at the beginning ndisc is a negligible fractionof the total run length n� This estimate also shows that the exact value of ndisc is notparticularly delicate� anything between �� and n�� will eliminate essentially allinitialization bias while paying less than a ��$ price in the �nal error bars�

In this remainder of this lecture I would like to discuss in more detail the statisticalanalysis of dynamic Monte Carlo data �assumed to be already �in equilibrium� withemphasis on how to estimate the autocorrelation time �int�f and how to compute validerror bars� What is involved here is a branch of mathematical statistics called time�series analysis� An excellent exposition can be found in the books of of Priestley ��and Anderson ��

��

Let fftg be a real valued stationary stochastic process with mean

� hfti � ��

unnormalized autocorrelation function

C�t � hfs fs�ti � � � ��

normalized autocorrelation function

�t � C�t�C��

and integrated autocorrelation time

�int ��

�

�Xt��

�t � ��

Our goal is to estimate C�t �t and �int based on a �nite �but large samplef�� fn from this stochastic process�

The �natural� estimator of is the sample mean

f � �

n

nXi� �

fi � ��

This estimator is unbiased �i�e� hfi � and has variance

var�f ��

n

n��Xt��n��

�� jtjn C�t ��

�

n��int C�� for n� � ��

Thus even if we are interested only in the static quantity it is necessary to estimatethe dynamic quantity �int in order to determine valid error bars for �

The �natural� estimator of C�t is

bC�t � �

n � jtjn� jtjXi� �

�fi � �fi� jtj � ��

if the mean is known and

bbC�t � �

n � jtjn� jtjXi��

�fi � f�fi� jtj � f ��

if the mean is unknown� We emphasize the conceptual distinction between the au tocorrelation function C�t which for each t is a number and the estimator bC�t orbbC�t which for each t is a random variable� As will become clear this distinction is

��

also of practical importance� bC�t is an unbiased estimator of C�t andbbC�t is almost

unbiased �the bias is of order ��n �� p� �� Their variances and covariances are ��pp� �� pp� ��

var� bC�t ��

n

�Xm��

hC�m� ! C�m! tC�m� t ! ��t� m� m! t

i! o

��

n

��

cov� bC�t� bC�u ��

n

�Xm��

�C�mC�m! u� t ! C�m! uC�m� t

! ��t� m� m! u� ! o��

n

��

where t� u � � and � is the connected � point autocorrelation function

��r� s� t � h�fi � �fi�r � �fi�s � �fi�t � i� C�rC�t� s� C�sC�t� r � C�tC�s� r � ��

To leading order in ��n the behavior ofbbC is identical to that of bC�

The �natural� estimator of �t is

b�t � bC�t� bC��

if the mean is known and bb�t � bbC�t� bbC��

if the mean is unknown� The variances and covariances of b�t and bb�t can becomputed �for large n from �� we omit the detailed formulae�

The �natural� estimator of �int would seem to be

b�int ��

�

n� �Xt��n� ��

b�t ��

�or the analogous thing with bb but this is wrong� The estimator de�ned in �� has avariance that does not go to zero as the sample size n goes to in�nity �� pp� ��so it is clearly a very bad estimator of �int� Roughly speaking this is because the sampleautocorrelations b�t for jtj � � contain much �noise� but little �signal�� and there areso many of them �order n that the noise adds up to a total variance of order �� For amore detailed discussion see �� pp� �� The solution is to cut o� the sum in�� using a �window� ��t which is � for jtj �� but � for jtj � � �

b�int � �

�

n� �Xt��n� ��

��t b�t � ��

��

This retains most of the �signal� but discards most of the �noise�� A good choice is therectangular window

��t �

�� if jtj �M� if jtj � M

��

where M is a suitably chosen cuto�� This cuto� introduces a bias

bias�b�int � ��

�

Xjtj�M

�t ! o��

n

��

On the other hand the variance of b�int can be computed from �� after some algebraone obtains

var�b�int ��M ! �

n� �int � ��

where we have made the approximation � �M � n�� The choice ofM is thus a tradeo�between bias and variance� the bias can be made small by taking M large enough sothat �t is negligible for jtj � M �e�g� M � a few times � usually su�ces whilethe variance is kept small by taking M to be no larger than necessary consistent withthis constraint� We have found the following �automatic windowing� algorithm �� tobe convenient� choose M to be the smallest integer such that M � c b�int�M� If �twere roughly a pure exponential then it would su�ce to take c � �since e�� $�However in many cases �t is expected to have an asymptotic or pre asymptotic decayslower than exponential so it is usually prudent to take c at least � and possibly aslarge as ��

We have found this automatic windowing procedure to work well in practice providedthat a su�cient quantity of data is available �n �� However at present wehave very little understanding of the conditions under which this windowing algorithmmay produce biased estimates of �int or of its own error bars� Further theoretical andexperimental study of the windowing algorithm � e�g� experiments on various exactly known stochastic processes with various run lengths � would be highly desirable�

� Conventional Monte Carlo Algorithms for

Spin Models

In this lecture we describe the construction of dynamic Monte Carlo algorithms formodels in statistical mechanics and quantum �eld theory� Recall our goal� given aprobability measure � on the state space S we wish to construct a transition matrixP � fpxyg satisfying�

We have also assumed that the only strong peaks in the Fourier transform of C�t� are at zerofrequency� This assumption is valid if C�t� � �� but could fail if there are strong anticorrelations�

��

�A Irreducibility� For each pair x� y � S there exists an n � � for which p�n�xy � ��

�B Stationarity of �� For each y � SXx

�x pxy � �y� ��

A su�cient condition for �B which is often more convenient to verify is�

�B� Detailed balance for �� For each pair x� y � S� �xpxy � �ypyx�

A very general method for constructing transition matrices satisfying detailed bal ance for a given distribution � was introduced in �� by Metropolis et al� �� witha slight extension two decades later by Hastings �� The idea is the following� LetP �� fp��xy g be an arbitrary irreducible transition matrix on S� We call P �� theproposal matrix � we shall use it to generate proposed moves x � y that will then beaccepted or rejected with probabilities axy and ��axy respectively� If a proposed moveis rejected then we make a �null transition� x � x� Therefore the transition matrixP � fpxyg of the full algorithm is

pxy � p��xy axy for x � y

pxx � p��xx !Xy �x

p��xy �� axy ��

where of course we must have � � axy � � for all x� y� It is easy to see that P satis�esdetailed balance for � if and only if

axyayx

��y p

��yx

�x p��xy

��

for all pairs x � y� But this is easily arranged� just set

axy � F

��y p

��yx

�x p��xy

��

where F � ��!�� is any function satisfying

F �z

F ��z� z for all z� ��

The choice suggested by Metropolis et al� is

F �z � min�z� � � ��

this is the maximal function satisfying �� Another choice sometimes used is

F �z �z

� ! z� ��

��

Of course it is still necessary to check that P is irreducible� this is usually done on acase by case basis�

Note that if the proposal matrix P �� happens to already satisfy detailed balance for� then we have �y p��yx ��x p

��xy � � so that axy � � �if we use the Metropolis choice of

F and P � P �� On the other hand no matter what P �� is we obtain a matrix P thatsatis�es detailed balance for �� So the Metropolis Hastings procedure can be thoughtof as a prescription for minimally modifying a given transition matrix P �� so that itsatis�es detailed balance for ��

Many textbooks and articles describe the Metropolis Hastings procedure only in thespecial case in which the proposal matrix P �� is symmetric namely p��xy � p��yx � In thiscase �� reduces to

axy � F��y�x

��

In statistical mechanics we have

�x �e��ExPye��Ey

�e��Ex

Z� ��

and hence�y�x

� e��Ey�Ex� � ��

Note that the partition function Z has disappeared from this expression� this is crucialas Z is almost never explicitly computable� Using the Metropolis acceptance probabilityF �z � min�z� � we obtain the following rules for acceptance or rejection�

� If %E � Ey � Ex � � then we accept the proposal always �i�e� with probability��

� If %E � � then we accept the proposal with probability e��E �� That iswe choose a random number r uniformly distributed on �� and we accept theproposal if r � e��E�

But there is nothing special about P �� being symmetric� any proposal matrix P �� isperfectly legitimate and the Metropolis Hastings procedure is de�ned quite generallyby ��

Let us emphasize once more that the Metropolis Hastings procedure is a generaltechnique� it produces an in�nite family of di�erent algorithms depending on the choiceof the proposal matrix P �� In the literature the term �Metropolis algorithm� is oftenused to denote the algorithm resulting from some particular commonly used choice ofP �� but it is important not to be misled�

To see the Metropolis Hastings procedure in action let us consider a typical statistical mechanical model the Ising model � On each site i of some �nite d dimensional latticewe place a random variable �i taking the values �� The Hamiltonian is

H�� Xhiji

�i�j � ��

��

where the sum runs over all nearest neighbor pairs� The corresponding Gibbs measureis

�� Z�� exp��H��

where Z is a normalization constant �the partition function� Two di�erent proposalmatrices are in common use�

� Single�spin� ip Glauber� dynamics� Fix some site i� The proposal is to ip �ihence

p��i �f�g � f��g �

�� if ��i � ��i and ��j � �j for all j � i� otherwise

��

Here P ��i is symmetric so the acceptance probability is

ai�f�g � f��g � min�e��E� � � ��

where%E � E�f��g� E�f�g � ��i

Xj n�n� of i

�j � ��

So %E is easily computed by comparing the status of �i and its neighbors�This de�nes a transition matrix Pi in which only the spin at site i is touched� The

full �single spin ip Metropolis algorithm� involves sweeping through the entire latticein either a random or periodic fashion i�e� either

P ��

V

Xi

Pi �random site updating ��

orP � Pi�Pi� � � �PiV �sequential site updating ��

�here V is the volume� In the former case the transition matrix P satis�es detailedbalance for �� In the latter case P does not in general satisfy detailed balance for �but it does satisfy stationarity for � which is all that really matters�

It is easy to check that P is irreducible except in the case of sequential site updatingwith � � ��

� Pair�interchange Kawasaki� dynamics� Fix a pair of sites i� j� The proposal isto interchange �i and �j hence

p��ij��f�g � f��g �

�� if ��i � �j ��j � �i and ��k � �k for all k � i� j� otherwise

��

�Note Added �� This statement iswrong�� For the Metropolis acceptance probability F �z� min�z� �� one may construct examples of nonergodicity for sequential site updating at any � �� the idea is to construct con�gurations for which every proposed update �working sequentially� has�E � and is thus accepted� Of course� the ergodicity can be restored by taking any F �z� � �� suchas the choice F �z� z�� z��

��

The rest of the formulae are analogous to those for single spin ip dynamics� The overallalgorithm is again constructed by a random or periodic sweep over a suitable set of pairsi� j usually taken to be nearest neighbor pairs� It should be noted that this algorithm isnot irreducible as it conserves the total magnetizationM �

Pi �i� But it is irreducible

on subspaces of �xed M �except for sequential updating with � � ��

A very di�erent approach to constructing transition matrices satisfying detailed bal ance for � is the heat�bath method � This is best illustrated in a speci�c example� Con sider again the Ising model �� and focus on a single site i� Then the conditionalprobability distribution of �i given all the other spins f�jgj �i is

P ��i j f�jgj �i � const �f�jgj �i � exp

��i Xj n�n� of i

�j

��

�Note that this conditional distribution is precisely that of a single Ising spin �i in an�e�ective magnetic �eld� produced by the �xed neighboring spins �j� The heat bathalgorithm updates �i by choosing a new spin value ��i independent of the old value �ifrom the conditional distribution �� all the other spins f�jgj �i remain unchanged��

As in the Metropolis algorithm this operation is carried out over the whole lattice eitherrandomly or sequentially�

Analogous algorithms can be developed for more complicated models e�g� P ��models � models and lattice gauge theories� In each case we focus on a single �eldvariable �holding all the other variables �xed and give it a new value independentof the old value chosen from the appropriate conditional distribution� Of course thefeasibility of this algorithm depends on our ability to construct an e�cient subroutinefor generating the required single site �or single link random variables� But even if thisalgorithm is not always the most e�cient one in practice it serves as a clear standardof comparison which is useful in the development of more sophisticated algorithms�

A more general version of the heat bath idea is called partial resampling� here wefocus on a set of variables rather than only one and the new values need not be in dependent of the old values� That is we divide the variables of the system call themf�g into two subsets call them f�g and f�g� For �xed values of the f�g variables� induces a conditional probability distribution of f�g given f�g call it P ��f�g j f�g�Then any algorithm for updating f�g with f�g xed that leaves invariant all of thedistributions P �� j f�g will also leave invariant �� One possibility is to use an inde�pendent resampling of f�g� we throw away the old values f�g and take f��g to be anew random variable chosen from the probability distribution P �� j f�g independentof the old values� Independent resampling might also called block heat�bath updating�On the other hand if f�g is a large set of variables independent resampling is probably

��The alert reader will note that in the Ising case the heat�bath algorithm is equivalent to the single�spin��ip Metropolis algorithm with the choice F �z� z�� z� of acceptance function� But thiscorrespondence does not hold for more complicated models�

��

unfeasible but we are free to use any updating that leaves invariant the appropriateconditional distributions� Of course in this generality �partial resampling� includes alldynamic Monte Carlo algorithms � we could just take f�g to be the entire system� but it is in many cases conceptually useful to focus on some subset of variables�The partial resampling idea will be at the heart of the multi grid Monte Carlo method�Section � and the embedding algorithms �Section ��

We have now de�ned a rather large class of dynamic Monte Carlo algorithms� thesingle spin ip Metropolis algorithm the single site heat bath algorithm and so on�How well do these algorithms perform� Away from phase transitions they performrather well� However near a phase transition the autocorrelation time grows rapidly�In particular near a critical point �second order phase transition the autocorrelationtime typically diverges as

� � min�L� �z � ��

where L is the linear size of the system � is the correlation length of an in�nite volumesystem at the same temperature and z is a dynamic critical exponent � This phenomenonis called critical slowing�down� it severely hampers the study of critical phenomena byMonte Carlo methods� Most of the remainder of these lectures will be devoted thereforeto describing recent progress in inventing new Monte Carlo algorithms with radicallyreduced critical slowing down�

The critical slowing down of the conventional algorithms arises fundamentally fromthe fact that their updates are local � in a single step of the algorithm �information� istransmitted from a given site only to its nearest neighbors� Crudely one might guess thatthis �information� executes a random walk around the lattice� In order for the system toevolve to an �essentially new� con�guration the �information� has to travel a distanceof order � the �static correlation length� One would guess therefore that � � �� nearcriticality i�e� that the dynamic critical exponent z equals �� This guess is correct forthe Gaussian model �free �eld�� For other models we have a situation analogous totheory of static critical phenomena� the dynamic critical exponent is a nontrivial numberthat characterizes a rather large class of algorithms �a so called �dynamic universalityclass�� In any case for most models of interest the dynamic critical exponent forlocal algorithms is close to � �usually somewhat higher �� Accurate measurementsof dynamic critical exponents are however very di�cult � even more di�cult thanmeasurements of static critical exponents � and require enormous quantities of MonteCarlo data� run lengths of �� when � is itself getting large�

We can now make a rough estimate of the computer time needed to study the Isingmodel near its critical point or quantum chromodynamics near the continuum limit�Each sweep of the lattice takes a time of order Ld where d is the spatial �or space �time�dimensionality of the model� And we need �� sweeps in order to get one �e�ectively

��Indeed� for the Gaussian model this random�walk picture can be made rigorous� see �� combinedwith �� Section ��

��

independent� sample� So this means a computer time of order Ld�z �� d�z �� For high precision statistics one might want �� independent� samples� The reader is invitedto plug in � � �� d � � �or d � � if you�re an elementary particle physicist and getdepressed� It should be emphasized that the factor �d is inherent in all Monte Carloalgorithms for spin models and �eld theories �but not for self avoiding walks see Section�� The factor �z could however conceivably be reduced or eliminated by a more cleveralgorithm�

What is to be done� Our knowledge of the physics of critical slowing down tells usthat the slow modes are the long wavelength modes if the updating is purely local� Thenatural solution is therefore to speed up those modes by some sort of collective�mode�nonlocal updating� It is necessary then to identify physically the appropriate collectivemodes and to devise an e�cient computational algorithm for speeding up those modes�These two goals are unfortunately in con ict� it is very di�cult to devise collective mode algorithms that are not so nonlocal that their computational cost outweighs thereduction in critical slowing down� Speci�c implementations of the collective mode ideaare thus highly model dependent� At least three such algorithms have been invented sofar�

� Fourier acceleration ��

� Multi grid Monte Carlo �MGMC ��

� The Swendsen Wang algorithm �� and its generalizations

Fourier acceleration and MGMC are very similar in spirit �though quite di�erent tech nically� Their performance is thus probably qualitatively similar in the sense that theyprobably work well for the same models and work badly for the same models� In thenext lecture we give an introduction to the MGMC method� in the following lecture wediscuss algorithms of Swendsen Wang type�

��Clearly one must take L �� in order to avoid severe �nite�size e�ects� Typically one approachesthe critical point with L � c�� where c � �� and then uses �nite�size scaling � �� to extrapolate tothe in�nite�volume limit� Note Added �� Recently� radical advances have been made in applying�nite�size scaling to Monte Carlo simulations �see �� and especially �� thepreceding two sentences can now be seen to be far too pessimistic� For reliable extrapolation to thein�nite�volume limit� L and � must both be � �� but the ratio L�� can in some cases be as smallas �� or even smaller �depending on the model and on the quality of the data�� However� reliableextrapolation requires careful attention to systematic errors arising from correction�to�scaling terms�In practice� reliable in�nite�volume values �with both statistical and systematic errors of order a fewpercent� can be obtained at � � �� from lattices of size L � �� at least in some models ��

��

� Multi�Grid Algorithms

The phenomenon of critical slowing down is not con�ned to Monte Carlo simulations�very similar di�culties were encountered long ago by numerical analysts concernedwith the numerical solution of partial di�erential equations� An ingenious solutionnow called the multi grid �MG method was proposed in �� by the Soviet numericalanalyst Fedorenko �� the idea is to consider in addition to the original ��ne grid�problem a sequence of auxiliary �coarse grid� problems that approximate the behaviorof the original problem for excitations at successively longer length scales �a sort of�coarse graining� procedure� The local updates of the traditional algorithms are thensupplemented by coarse grid updates� To a present day physicist this philosophy isremarkably reminiscent of the renormalization group � so it is all the more remarkablethat it was invented two years before the work of Kadano� �� and seven years beforethe work of Wilson �� After a decade of dormancy multi grid was revived in the mid ��s �� and was shown to be an extremely e�cient computational method� In the��s multi grid methods have become an active area of research in numerical analysisand have been applied to a wide variety of problems in classical physics �� Veryrecently �� it was shown how a stochastic generalization of the multi grid method �multi gridMonte Carlo �MGMC � can be applied to problems in statistical and hencealso Euclidean quantum physics�

In this lecture we begin by giving a brief introduction to the deterministic multi gridmethod� we then explain the stochastic analogue�� But it is worth indicating now thebasic idea behind this generalization� There is a strong analogy between solving latticesystems of equations �such as the discrete Laplace equation and making Monte Carlosimulations of lattice random �elds� Indeed given a HamiltonianH�� the deterministicproblem is that of minimizing H�� while the stochastic problem is that of generat�ing random samples from the Boltzmann Gibbs probability distribution e��H�� Thestatistical mechanical problem reduces to the deterministic one in the zero temperaturelimit � � !�

Many �but not all of the deterministic iterative algorithms for minimizing H��can be generalized to stochastic iterative algorithms � that is dynamic Monte Carlomethods � for generating random samples from e��H�� For example the Gauss Seidelalgorithm for minimizingH and the heat bath algorithm for random sampling from e��H

are very closely related� Both algorithms sweep successively through the lattice workingon one site x at a time� The Gauss Seidel algorithm updates �x so as to minimize theHamiltonian H�� H��x� f�ygy �x when all the other �elds f�ygy �x are held �xed attheir current values� The heat bath algorithm gives �x a new random value chosen fromthe probability distribution exp��H��x� f�ygy �x� with all the �elds f�ygy �x againheld �xed� As � � ! the heat bath algorithm approaches Gauss Seidel� A similarcorrespondence holds between MG and MGMC�

��For an excellent introduction to the deterministic multi�gridmethod� see Briggs �� more advancedtopics are covered in the book of Hackbusch �� Both MG and MGMC are discussed in detail in ��

��

Before entering into details let us emphasize that although the multi grid methodand the block spin renormalization group �RG are based on very similar philosophies� dealing with a single length scale at a time � they are in fact very di�erent� Inparticular the conditional coarse grid Hamiltonian employed in the MGMC method isnot the same as the renormalized Hamiltonian given by a block spin RG transformation�The RG transformation computes the marginal not the conditional distribution of theblock means � that is it integrates over the complementary degrees of freedom whilethe MGMC method xes these degrees of freedom at their current �random values� Theconditional Hamiltonian employed in MGMC is given by an explicit �nite expressionwhile the marginal �RG Hamiltonian cannot be computed in closed form� The failure toappreciate these distinctions has unfortunately led to much confusion in the literature��

�� Deterministic Multi�Grid

In this section we give a pedagogical introduction to multi grid methods in the sim plest case namely the solution of deterministic linear or nonlinear systems of equations�

Consider for purposes of exposition the lattice Poisson equation �%� � f in aregion & � Zd with zero Dirichlet data� Thus the equation is

��%�x � �d�x �X

x�� jx�x�j��

�x� � fx ��

for x � & with �x � � for x �� &� This problem is equivalent to minimizing thequadratic Hamiltonian

H��

�

Xhxyi

��x � �y� �X

x

fx�x � ��

More generally we may wish to solve a linear system

A� � f � ��

where for simplicity we shall assume A to be symmetric and positive de�nite� Thisproblem is equivalent to minimizing

H��

�h��A�i � hf� �i � ��

Later we shall consider also non quadratic Hamiltonians�Our goal is to devise a rapidly convergent iterativemethod for solving numerically the

linear system �� We shall restrict attention to �rst order stationary linear iterationsof the general form

��n�� M��n� !Nf � ��

��For further discussion� see �� Section ��

��

where �� is an arbitrary initial guess for the solution� Obviously we must demand atthe very least that the true solution � � A��f be a �xed point of �� imposing thiscondition for all f we conclude that

N � �I �MA��

The iteration �� is thus completely speci�ed by its iteration matrix M � Moreover�� imply that the error e�n� � ��n� � � satis�es

e�n�� Me�n� � ��

That is the iteration matrix is the ampli�cation matrix for the error� It follows easilythat the iteration �� is convergent for all initial vectors �� if and only if the spectralradius �M � limn�� jjMnjj��n is � �� and in this case the convergence is exponentialwith asymptotic rate at least �M i�e�

jj��n� � �jj � Knp�Mn ��

for some K� p � �K depends on ��Now let us return to the speci�c system �� One simple iterative algorithm arises

by solving �� repeatedly for �x�

��n��x �

�

�d

� Xx�� jx�x�j��

��n�x� ! fx

��

�� is called the Jacobi iteration� It is convenient to consider also a slight generalizationof �� let � � � � � and de�ne

��n��x � �� n�

x !�

�d

� Xx�� jx�x�j��

��n�x� ! fx

��

�� is called the damped Jacobi iteration with damping parameter �� for � � � itreduces to the ordinary Jacobi iteration�

It can be shown �� that the spectral radius �MDJ� of the damped Jacobi iterationmatrix is less than � so that the iteration �� converges exponentially to the solution�� This would appear to be a happy situation� Unfortunately however the convergencefactor �MDJ� is usually very close to � so that many iterations are required in orderto reduce the error jj��n� � �jj to a small fraction of its initial value� Insight intothis phenomenon can be gained by considering the simple model problem in which thedomain & is a square f�� Lg � f�� Lg� In this case we can solve exactly for theeigenvectors and eigenvalues of MDJ�� they are

��p�x � sin p�x� sin p�x� ��

�p � �� !�

��cos p� ! cos p� ��

��

where p�� p� � �L��

� ��L��

� � � � � L�L��

� The spectral radius of MDJ� is the eigenvalue oflargest magnitude namely

�MDJ� � � �L��

� �L��

� �� cos

�

L! �

�� O�L��

It follows that O�L� iterations are needed for the damped Jacobi iteration to convergeadequately� This represents an enormous computational labor when L is large�

It is easy to see what is going on here� the slow modes ��p � are the long�wavelength modes �p�� p� � �� If � � then some modes with wavenumber p ��p�� p� �� have eigenvalue �p �� and so also are slow� This phenomenon canbe avoided by taking � signi�cantly less than �� for simplicity we shall henceforth take� � �

� which makes �p � � for all p�� It is also easy to see physically why the long wavelength modes are slow� The key fact is that the �damped Jacobi iteration is local �in a single step of the algorithm �information� is transmitted only to nearest neighbors�One might guess that this �information� executes a random walk around the lattice� andfor the true solution to be reached �information� must propagate from the boundariesto the interior �and back and forth until �equilibrium� is attained� This takes a timeof order L� in agreement with �� In fact this random walk picture can be maderigorous ��

This is an example of a critical phenomenon in precisely the same sense that the termis used in statistical mechanics� The Laplace operator A � �% is critical inasmuchas its Green function A�� has long range correlations �power law decay in dimensiond � � or growth in d � �� This means that the solution of Poisson�s equation in oneregion of the lattice depends strongly on the solution in distant regions of the lattice��information� must propagate globally in order for �equilibrium� to be reached� Putanother way excitations at many length scales are signi�cant from one lattice spacingat the smallest to the entire lattice at the largest� The situation would be very di�erentif we were to consider instead the Helmholtz Yukawa equation ��%!m�� f withm � �� its Green function has exponential decay with characteristic length m�� sothat regions of the lattice separated by distances � m�� are essentially decoupled� Inthis case �information� need only propagate a distance of order min�m�� L in orderfor �equilibrium� to be reached� This takes a time of order min�m�� L� an estimatewhich can be con�rmed rigorously by computing the obvious generalization of �� On the other hand as m� � we recover the Laplace operator with its attendantdi�culties� m � � is a critical point � We have here an example of critical slowing�downin classical physics�

The general structure of a remedy should now be obvious to physicists reared onthe renormalization group� don�t try to deal with all length scales at once but de�neinstead a sequence of problems in which each length scale beginning with the smallestand working towards the largest can be dealt with separately� An algorithm of preciselythis form was proposed in �� by the Soviet numerical analyst Fedorenko �� and isnow called the multi�grid method �

��

Note �rst that the only slow modes in the damped Jacobi iteration are the long wavelength modes �provided that � is not near �� as long as say max�p�� p� � �

� wehave � � �p �

��for � � �

� independent of L� It follows that the short wavelength

components of the error e�n� � ��n� � � can be e�ectively killed by a few �say �veor ten damped Jacobi iterations� The remaining error has primarily long wavelengthcomponents and so is slowly varying in x space� But a slowly varying function canbe well represented on a coarser grid� if for example we were told e�n�x only at evenvalues of x we could nevertheless reconstruct with high accuracy the function e�n�x at allx by say linear interpolation� This suggests an improved algorithm for solving ��perform a few damped Jacobi iterations on the original grid until the �unknown erroris smooth in x space� then set up an auxiliary coarse grid problem whose solution willbe approximately this error �this problem will turn out to be a Poisson equation on thecoarser grid� perform a few damped Jacobi iterations on the coarser grid� and thentransfer �interpolate the result back to the original ��ne grid and add it in to thecurrent approximate solution�

There are two advantages to performing the damped Jacobi iterations on the coarsegrid� Firstly the iterations take less work because there are fewer lattice points on thecoarse grid ��d times as many for a factor of � coarsening in d dimensions� Secondlywith respect to the coarse grid the long wavelength modes no longer have such longwavelength� their wavelength has been halved �i�e� their wavenumber has been doubled�This suggests that those modes with say max�p�� p� � �

� can be e�ectively killedby a few damped Jacobi iterations on the coarse grid� And then we can transfer theremaining �smooth error to a yet coarser grid and so on recursively� These are theessential ideas of the multi grid method�

Let us now give a precise de�nition of the multi grid algorithm� For simplicitywe shall restrict attention to problems de�ned in variational form�� thus the goal isto minimize a real valued function ��Hamiltonian� H�� where � runs over someN dimensional real vector space U � We shall treat quadratic and non quadratic Hamil tonians on an equal footing� In order to specify the algorithm we must specify thefollowing ingredients�

� A sequence of coarse�grid spaces UM � U� UM�� UM�� U�� Here dimUl �Nl and N � NM � NM�� NM�� N��

� Prolongation �or �interpolation� operators pl�l�� Ul�� Ul for � � l �M �

� Basic �or �smoothing� iterations Sl� Ul �Hl � Ul for � � l � M � Here Hl isa space of �possible Hamiltonians� de�ned on Ul� we discuss this in more detailbelow� The role of Sl is to take an approximate minimizer ��l of the HamiltonianHl

and compute a new �hopefully better approximateminimizer��l � Sl��l�Hl� �For

��In fact� the multi�grid method can be applied to the solution of linear or nonlinear systems ofequations� whether or not these equations come from a variational principle� See� for example� �� and�� Section ��

��

the present we can imagine that Sl consists of a few iterations of damped Jacobifor the Hamiltonian Hl�� Most generally we shall use two smoothing iterationsSprel and Spostl � they may be the same but need not be�

� Cycle control parameters �integers �l � � for � � l � M which control thenumber of times that the coarse grids are visited�

We discuss these ingredients in more detail below�The multi grid algorithm is then de�ned recursively as follows�

procedure mgm�l� ��Hl

comment This algorithm takes an approximate minimizer � of the Hamil tonian Hl and overwrites it with a better approximate minimizer�

�� Sprel ��Hl

if l � � then

compute Hl�� Hl��! pl�l��

for j � � until �l do mgm�l� �� Hl��

�� ! pl�l��

endif

�� Spostl ��Hl

end

Here is what is going on� We wish to solve the minimize the Hamiltonian Hl and aregiven as input an approximate minimizer �� The algorithm consists of three steps�

� Pre�smoothing� We apply the basic iteration �e�g� a few sweeps of damped Jacobito the given approximate minimizer� This produces a better approximate minimizer inwhich the high frequency �short wavelength components of the error have been reducedsigni�cantly� Therefore the error although still large is smooth in x space �whence thename �smoothing iteration��

� Coarse�grid correction� We want to move rapidly towards the minimizer �� of Hlusing coarse grid updates� Because of the pre smoothing the error �� is a smoothfunction in x space so it should be well approximated by �elds in the range of theprolongation operator pl�l�� We will therefore carry out a coarse grid update in which� is replaced by � ! pl�l�� where � lies in the coarse grid subspace Ul�� A sensiblegoal is to attempt to choose � so as to minimize Hl� that is we attempt to minimize

Hl�� Hl��! pl�l��

To carry out this approximate minimization we use a few ��l iterations of the bestalgorithm we know � namely multi grid itself� And we start at the best approximate

��

minimizer we know namely � � �� The goal of this coarse grid correction step is toreduce signi�cantly the low frequency components of the error in � �hopefully withoutcreating large new high frequency error components�

� Post�smoothing� We apply for good measure a few more sweeps of the basicsmoother� �This would protect against any high frequency error components whichmay inadvertently have been created by the coarse grid correction step�

The foregoing constitutes of course a single step of the multi grid algorithm� Inpractice this step would be repeated several times as in any other iteration until theerror has been reduced to an acceptably small value� The advantage of multi grid overthe traditional �e�g� damped Jacobi iterative methods is that with a suitable choiceof the ingredients pl�l�� Sl and so on only a few �maybe �ve or ten iterations areneeded to reduce the error to a small value independent of the lattice size L� Thiscontrasts favorably with the behavior �� of the damped Jacobi method in whichO�L� iterations are needed�

The multi grid algorithm is thus a general framework� the user has considerablefreedom in choosing the speci�c ingredients which must be adapted to the speci�cproblem� We now discuss brie y each of these ingredients� more details can be foundin Chapter � of the book of Hackbusch ��

Coarse grids� Most commonly one uses a uniform factor of � coarsening betweeneach grid &l and the next coarser grid &l�� The coarse grid points could be either asubset of the �ne grid points �Fig� � or a subset of the dual lattice �Fig� �� Theseschemes have obvious generalizations to higher dimensional cubic lattices� In dimensiond � � another possibility is a uniform factor of

p� coarsening �Fig� �� note that the

coarse grid is again a square lattice rotated by �� Figs� �� are often referred to as�standard coarsening� �staggered coarsening� and �red black �or checkerboard coars ening� respectively� Coarsenings by a larger factor �e�g� � could also be consideredbut are generally disadvantageous� Note that each of the above schemes works also forperiodic boundary conditions provided that the linear size Ll of the grid &l is even� Forthis reason it is most convenient to take the linear size L � LM of the original ��nestgrid & � &M to be a power of � or at least a power of � times a small integer� Otherde�nitions of coarse grids �e�g� anisotropic coarsening are occasionally appropriate�

Prolongation operators� For a coarse grid as in Fig� � a natural choice of prolonga tion operator is piecewise�constant injection�

�pl�l��l��x�� x��

�� l��x��x� for all x � &l��

�illustrated here for d � �� It can be represented in an obvious shorthand notation bythe stencil �

� ��

��

For a coarse grid as in Fig� � a natural choice is piecewise�linear interpolation one

��

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

Figure �� Standard coarsening �factor of � in dimension d � �� Dots are �ne grid sitesand crosses are coarse grid sites�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

Figure �� Staggered coarsening �factor of � in dimension d � ��

��

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

Figure �� Red black coarsening �factor of p� in dimension d � ��

example of which is the nine�point prolongation ��

��

��

��

� ��

��

��

��

��

Higher order interpolations �e�g� quadratic or cubic can also be considered� All theseprolongation operators can easily be generalized to higher dimensions�

We have ignored here some important subtleties concerning the treatment of theboundaries in de�ning the prolongation operator� Fortunately we shall not have toworry much about this problem since most applications in statistical mechanics andquantum �eld theory use periodic boundary conditions�

Coarse�grid Hamiltonians� What does the coarse grid Hamiltonian Hl�� look like�If the �ne grid Hamiltonian Hl is quadratic

Hl��

�h��Al�i � hfl� �i � ��

then so is the coarse grid Hamiltonian Hl��

Hl�� Hl��! pl�l��

��

�h��Al��i � hd� �i ! const � ��

where

Al�� p�l�l��Al pl�l��

d � p�l�l��f �Al� ��

��

The coarse grid problem is thus also a linear equation whose right hand side is justthe �coarse graining� of the residual r � f � Al�� this coarse graining is performedusing the adjoint of the interpolation operator pl�l�� The exact form of the coarse grid operator Al�� depends on the �ne grid operator Al and on the choice of interpola tion operator pl�l�� For example if Al is the nearest neighbor Laplacian and pl�l�� ispiecewise constant injection then it is easily checked that Al�� is also a nearest neighborLaplacian �multiplied by an extra �d�� On the other hand if pl�l�� is piecewise linearinterpolation then Al�� will have nearest neighbor and next�nearest�neighbor terms �butnothing worse than that�

Clearly the point is to choose classes of Hamiltonians Hl with the property that ifHl � Hl and � � Ul then the coarse grid HamiltonianHl�� de�ned by �� necessarilylies in Hl�� In particular it is convenient �though not in principle necessary to chooseall the Hamiltonians to have the same �functional form�� this functional form must beone which is stable under the coarsening operation �� For example suppose thatthe Hamiltonian Hl is a �� theory with nearest neighbor gradient term and possiblysite dependent coe�cients�

Hl��

�

Xjx�x�j��

��x � �x�� !

Xx

Vx��x� ��

whereVx��x � ��

x ! �x�x !Ax�

�x ! hx�x� ��

Suppose further that the prolongation operator pl�l�� is piecewise constant injection�� Then the coarse grid Hamiltonian Hl�� Hl�� ! pl�l�� can easily becomputed� it is

Hl��

�

Xjy�y� j��

��y � �y�� !

Xy

V �y��y ! const� ��

whereV �y ��y � ��

y ! ��y�y !A�

y��y ! h�y�y ��

and

�� d��

�� d� ��

��y �Xx�By

��x ! �x ��

A�y �

Xx�By

��x ! ��x�x !Ax ��

h�y �Xx�By

��x ! ��x�

�x ! �Ax�x ! hx ��

Here By is the block consisting of those �d sites of grid &l which are a�ected by inter polation from the coarse grid site y � &l�� see Figure �� Note that the coarse grid

��

Hamiltonian Hl�� has the same functional form as the ��ne grid� Hamiltonian Hl� it isspeci�ed by the coe�cients �� y A

�y and h�y� The step �compute Hl�� therefore

means to compute these coe�cients� Note also the importance of allowing in ��for � and � terms and for site dependent coe�cients� even if these are not present inthe original Hamiltonian H � HM they will be generated on coarser grids� Finallywe emphasize that the coarse grid Hamiltonian Hl�� depends implicitly on the currentvalue of the �ne lattice �eld � � Ul� although our notation suppresses this dependenceit should be kept in mind�

Basic smoothing� iterations� We have already discussed the damped Jacobi itera tion as one possible smoother� Note that in this method only the �old� values ��n� areused on the right hand side of ��'�� even though for some of the terms the �new�value ��n�� may already have been computed� An alternative algorithm is to use ateach stage on the right hand side the �newest� available value� This algorithm is calledthe Gauss�Seidel iteration�� Note that the Gauss Seidel algorithm unlike the Jacobialgorithm depends on the ordering of the grid points� For example if a � dimensionalgrid is swept in lexicographic order �� L� � �� L� � � � � �� L �� L � � � �L�L then the Gauss Seidel iteration becomes

��n��x��x�

��

��n�

x��x� ! ��n��x��x� ! �

�n�x��x�� ! �

�n��x��x�� ! fx��x��

Another convenient ordering is the red�black �or checkerboard ordering in which the�red� sublattice &r � fx � &� x� ! � � � ! xd is eveng is swept �rst followed by the�black� sublattice &b � fx � &� x� ! � � � ! xd is oddg� Note that the ordering of thegrid points within each sublattice is irrelevant �for the usual nearest neighbor Laplacian�� since the matrix A does not couple sites of the same color� This means that red black Gauss Seidel is particularly well suited to vector or parallel computation� Notethat the red black ordering makes sense with periodic boundary conditions only if thelinear size Ll of the grid &l is even�

It turns out that Gauss Seidel is a better smoother than damped Jacobi �even if thelatter is given its optimal �� Moreover Gauss Seidel is easier to program and requiresonly half the storage space �no need for separate storage of �old� and �new� values�The only reason we introduced damped Jacobi at all is that it is easier to understandand to analyze�

Many other smoothing iterations can be considered and can be advantageous inanisotropic or otherwise singular problems �� Section �� and Chapters �� Butwe shall stick to ordinary Gauss Seidel usually with red black ordering�

Thus Sprel and Spostl will consist respectively of m� and m� iterations of the Gauss Seidel algorithm� The balance between pre smoothing and post smoothing is usuallynot very crucial� only the total m� !m� seems to matter much� Indeed one �but not

��It is amusing to note that �Gauss did not use a cyclic order of relaxation� and � � �Seidel speci�callyrecommended against using it� �� p� ��n�� See also ��

��

both� of m� or m� could be zero i�e� either the pre smoothing or the post smoothingcould be omitted entirely� Increasing m� and m� improves the convergence rate of themulti grid iteration but at the expense of increased computational labor per iteration�The optimal tradeo� seems to be achieved in most cases with m�!m� between about �and �� The coarsest grid &� is a special case� it usually has so few grid points �perhapsonly one� that S� can be an exact solver�

The variational point of view gives special insight into the Gauss Seidel algorithmand shows how to generalize it to non quadratic Hamiltonians� When updating site xthe new value ��x is chosen so as to minimize the Hamiltonian H�� H��x� f�ygy �xwhen all the other �elds f�ygy �x are held �xed at their current values� The natural gen eralization of Gauss Seidel to non quadratic Hamiltonians is to adopt this variationalde�nition� ��x should be the absolute minimizer of H�� H��x� f�ygy �x� �If theabsolute minimizer is non unique then one such minimizer is chosen by some arbitraryrule�� This algorithm is called nonlinear Gauss Seidel with exact minimization �NL GSEM �� This de�nition of the algorithm presupposes of course that it is feasibleto carry out the requisite exact one dimensional minimizations� For example for a ��

theory it would be necessary to compute the absolute minimum of a quartic polynomialin one variable� In practice these one dimensional minimizations might themselves becarried out iteratively e�g� by some variant of Newton�s method�

Cycle control parameters and computational labor� Usually the parameters �l are alltaken to be equal i�e� �l � � � � for � � l � M � Then one iteration of the multi gridalgorithm at level M comprises one visit to grid M � visits to grid M � � �� visitsto grid M � � and so on� Thus � determines the degree of emphasis placed on thecoarse grid updates� �� would correspond to the pure Gauss Seidel iteration on the�nest grid alone�

We can now estimate the computational labor required for one iteration of the multi grid algorithm� Each visit to a given grid involves m�!m� Gauss Seidel sweeps on thatgrid plus some computation of the coarse grid Hamiltonian and the prolongation� Thework involved is proportional to the number of lattice points on that grid� Let Wl bethe work required for these operations on grid l� Then for grids de�ned by a factor of �coarsening in d dimensions we have

Wl ��d�M�l�WM � ��

so that the total work for one multi grid iteration is

work�MG ��X

l�M

�M�lWl

WM

�Xl�M

��dM�l

� WM �� d�� if � � �d � ��

Thus provided that � � �d the work required for one entire multi grid iteration is nomore than �� d�� times the work required for m� ! m� Gauss Seidel iterations

��

�plus a little auxiliary computation on the �nest grid alone � irrespective of the totalnumber of levels� The most common choices are � � � �which is called the V cycle and� � � �the W cycle�

Convergence proofs� For certain classes of Hamiltonians H � primarily quadraticones � and suitable choices of the coarse grids prolongations smoothing iterationsand cycle control parameters it can be proven rigorously� that the multi grid iterationmatricesMl satisfy a uniform bound

jjMljj � C � � � ��

valid irrespective of the total number of levels� Thus a �xed number of multi griditerations �maybe �ve or ten are su�cient to reduce the error to a small value inde�pendent of the lattice size L� In other words critical slowing�down has been completelyeliminated �

The rigorous convergence proofs are somewhat arcane so we cannot describe themhere in any detail but certain general features are worth noting� The convergence proofsare most straightforward when linear or higher order interpolation and restriction areused and � � � �e�g� the W cycle� When either low order interpolation �e�g� piecewise constant or � � � �the V cycle is used the convergence proofs become much moredelicate� Indeed if both piecewise constant interpolation and a V cycle are used thenthe uniform bound �� has not yet been proven and it is most likely false� To someextent these features may be artifacts of the current methods of proof but we suspectthat they do also re ect real properties of the multi grid method and so the convergenceproofs may serve as guidance for practice� For example in our work we have usedpiecewise constant interpolation �so as to preserve the simple nearest neighbor couplingon the coarse grids and thus for safety we stick to the W cycle� There is in any casemuch room for further research both theoretical and experimental�

To recapitulate the extraordinary e�ciency of the multi grid method arises from thecombination of two key features�

� The convergence estimate �� This means that only O�� iterations are neededindependent of the lattice size L�

� The work estimate �� This means that each iteration requires only a compu tational labor of order Ld �the �ne grid lattice volume�

It follows that the complete solution of the minimization problem to any speci�edaccuracy � requires a computational labor of order Ld�

Unigrid point of view� Let us look again at the multi grid algorithm from the vari ational standpoint� One natural class of iterative algorithms for minimizing H are the

��For a detailed exposition of multi�grid convergence proofs� see �� Chapters �� andthe references cited therein� The additional work needed to handle the piecewise�constant interpolationcan be found in ��

��

so called directional methods� let p�� p�� be a sequence of �direction vectors� in U and de�ne ��n�� to be that vector of the form ��n� ! �pn which minimizes H� Thealgorithm thus travels �downhill� from ��n� along the line ��n� ! �pn until reaching theminimum of H then switches to direction pn�� starting from this new point ��n�� andso on� For a suitable choice of the direction vectors p�� p�� this method can be provento converge to the global minimum of H �� pp� ��

Now some iterative algorithms for minimizing H�� can be recognized as specialcases of the directional method� For example the Gauss Seidel iteration is a directionalmethod in which the direction vectors are chosen to be unit vectors e�� e�� eN �i�e�vectors which take the value � at a single grid point and zero at all others whereN � dimU � �One step of the Gauss Seidel iteration corresponds to N steps of thedirectional method�� Similarly it is not hard to see �� that the multi grid iteration withthe variational choices of restriction and coarse grid operators and with Gauss Seidelsmoothing at each level is itself a directional method� some of the direction vectors arethe unit vectors e�M�

� � e�M�� e

�M�NM

of the �ne grid space but other direction vectorsare the images in the �ne grid space of the unit vectors of the coarse grid spaces i�e� theyare pM�l e

�l�� pM�l e

�l�� pM�l e

�l�Nl� The exact order in which these direction vectors are

interleaved depends on the parameters m� m� and � which de�ne the cycling structureof the multi grid algorithm� For example if m� � � m� � � and � � � the orderof the direction vectors is fMg fM � �g � � � f�g where flg denotes the sequence

pM�l e�l�� pM�l e

�l�� pM�l e

�l�Nl� If m� � � m� � � and � � � the order is f�g f�g � � �

fMg� The reader is invited to work out other cases�Thus the multi grid algorithm �for problems de�ned in variational form is a direc

tional method in which the direction vectors include both �single site modes� fMg andalso �collective modes� fM��g� fM ��g� � � � � f�g on all length scales� For example ifpl�l�� is piecewise constant injection then the direction vectors are characteristic func tions �B �i�e� functions which are � on the block B � & and zero outside B where thesets B are successively single sites cubes of side � cubes of side � and so on� Similarlyif pl�l�� is linear interpolation then the direction vectors are triangular waves of variouswidths�

The multi grid algorithm has thus an alternative interpretation as a collective modealgorithm working solely in the �ne grid space U � We emphasize that this �unigrid�viewpoint �� is mathematically fully equivalent to the recursive de�nition given earlier�But it gives we think an important additional insight into what the multi grid algorithmis really doing�

For example for the simple model problem �Poisson equation in a square we knowthat the �correct� collective modes are sine waves in the sense that these modes diag onalize the Laplacian so that in this basis the Jacobi or Gauss Seidel algorithm wouldgive the exact solution in a single iteration �MJacobi � MGS � �� On the other handthe multi grid method uses square wave �or triangular wave updates which are notexactly the �correct� collective modes� Nevertheless the multi grid convergence proofs�� assure us that they are �close enough�� the norm of the multi grid iterationmatrix Ml is bounded away from � uniformly in the lattice size so that an accurate

��

solution is reached in a very few MG iterations �in particular critical slowing down iscompletely eliminated� This viewpoint also explains why MG convergence is more del icate for piecewise constant interpolation than for piecewise linear� the point is that asine wave �or other slowly varying function can be approximated to arbitrary accuracy�in energy norm by piecewise linear functions but not by piecewise constant functions�

We remark that McCormick and Ruge �� have advocated the �unigrid� idea notjust as an alternate point of view on the multi grid algorithm but as an alternate com�putational procedure� To be sure the unigrid method is somewhat simpler to programand this could have pedagogical advantages� But one of the key properties of the multi grid method namely the O�Ld computational labor per iteration is sacri�ced in theunigrid scheme� Instead of �� one has

Wl WM ��

and hence

work�UG WM

�Xl�M

�M�l

��MWM if � � ��MWM if � � �

��

Since M log� L and WM � Ld we obtain

work�UG ��Ld log L if � � �Ld�log� � if � � �

��

For a V cycle the additional factor of logL is perhaps not terribly harmful but for aW cycle the additional factor of L is a severe drawback �though not as severe as theO�L� critical slowing down of the traditional algorithms� Thus we do not advocatethe use of unigrid as a computational method if there is a viable multi grid alternative�Unigrid could however be of interest in cases where true multi grid is unfeasible asmay occur for non Abelian lattice gauge theories�

Multi grid algorithms can also be devised for some models in which state space is anonlinear manifold such as nonlinear � models and lattice gauge theories �� Sections�� The simplest case is the XY model� both the �ne grid and coarse grid �eldvariables are angles and the interpolation operator is piecewise constant �with anglesadded modulo �� Thus a coarse grid variable �y speci�es the angle by which the�d spins in the block By are to be simultaneously rotated� A similar strategy can beemployed for nonlinear � models taking values in a group G �the so called �principalchiral models�� the coarse grid variable �y simultaneously left multiplies the �d spins inthe block By� For nonlinear � models taking values in a nonlinear manifoldM on whicha group G acts �e�g� the n vector model with M � Sn�� and G � SO�n� the coarse grid correction moves are still simultaneous rotation� this means that while the �ne grid

��

�elds lie in M the coarse grid �elds all lie in G� Similar ideas can be applied to latticegauge theories� the key requirement is to respect the geometric �parallel transportproperties of the theory� Unfortunately the resulting algorithms appear to be practicalonly in the abelian case� �In the non abelian case the coarse grid Hamiltonian becomestoo complicated� Much more work needs to be done on devising good interpolationoperators for non abelian lattice gauge theories�

�� Multi�Grid Monte Carlo

Classical equilibrium statistical mechanics is a natural generalization of classicalstatics �for problems posed in variational form� in the latter we seek to minimize aHamiltonian H�� while in the former we seek to generate random samples from theBoltzmann Gibbs probability distribution e��H�� The statistical mechanical problemreduces to the deterministic one in the zero temperature limit � � !�

Likewise many �but not all of the deterministic iterative algorithms for minimizingH�� can be generalized to stochastic iterative algorithms � that is dynamic MonteCarlo methods � for generating random samples from e��H�� For example thestochastic generalization of the Gauss Seidel algorithm �or more generally nonlinearGauss Seidel with exact minimization is the single site heat bath algorithm� and thestochastic generalization of multi grid is multi grid Monte Carlo�

Let us explain these correspondences in more detail� In the Gauss Seidel algorithmthe grid points are swept in some order and at each stage the Hamiltonian is minimizedas a function of a single variable �x with all other variables f�ygy �x being held �xed�The single site heat bath algorithm has the same general structure but the new value��x is chosen randomly from the conditional distribution of e��H�� given f�ygy �x i�e�from the one dimensional probability distribution

P ��x d��x � const � exp ��H��x� f�ygy �x� d��x ��

�where the normalizing constant depends on f�ygy �x� It is not di�cult to see that thisoperation leaves invariant the Gibbs distribution e��H�� As � � ! it reduces tothe Gauss Seidel algorithm�

It is useful to visualize geometrically the action of the Gauss Seidel and heat bathalgorithms within the space U of all possible �eld con�gurations� Starting at the current�eld con�guration � the Gauss Seidel and heat bath algorithms propose to move thesystem along the line in U consisting of con�gurations of the form �� ! t�x�� t � where �x denotes the con�guration which is � at site x and zeroelsewhere� In the Gauss Seidel algorithm t is chosen so as to minimize the Hamiltonianrestricted to the given line� while in the heat bath algorithm t is chosen randomlyfrom the the conditional distribution of e��H�� restricted to the given line namelythe one dimensional distribution with probability density Pcond�t � exp��Hcond�t� �exp��H��! t�x��

The method of partial resampling generalizes the heat bath algorithm in two ways�

��

� The ��bers� used by the algorithm need not be lines but can be higher dimensionallinear or even nonlinear manifolds�

� The new con�guration �� need not be chosen independently of the old con�guration� �as in the heat bath algorithm� rather it can be selected by any updating procedurewhich leaves invariant the conditional probability distribution of e��H�� restricted tothe �ber�

The multi grid Monte Carlo �MGMC algorithm is a partial resampling algorithmin which the ��bers� are the sets of �eld con�gurations that can be obtained one fromanother by a coarse grid correction step i�e� the sets of �elds � ! pl�l�� with � �xedand � varying over Ul�� These �bers form a family of parallel a�ne subspaces in Ulof dimension Nl�� dimUl��

The ingredients of the MGMC algorithm are identical to those of the determin istic MG algorithm with one exception� the deterministic smoothing iteration Sl isreplaced by a stochastic smoothing iteration �for example single site heat bath� Thatis Sl� � �Hl is a stochastic updating procedure �l � ��l that leaves invariant the Gibbsdistribution e��Hl�Z

d�l e��Hl�l� PSl� � �Hl��l � ��l � d��l e

��Hl��

l� � ��

The MGMC algorithm is then de�ned as follows�

procedure mgmc�l� ��Hl

comment This algorithm updates the �eld � in such a way as to leaveinvariant the probability distribution e��Hl�

�� Sprel ��Hl

if l � � then

compute Hl�� Hl��! pl�l��

for j � � until �l do mgmc�l� �� Hl��

�� ! pl�l��

endif

�� Spostl ��Hl

end

The alert reader will note that this algorithm is identical to the deterministic MGalgorithm presented earlier� only the meaning of Sl is di�erent�

The validity of the MGMC algorithm is proven inductively starting at level � andworking upwards� That is if mgmc�l � �� Hl�� is a stochastic updating procedurethat leaves invariant the probability distribution e��Hl�� then mgmc�l� � �Hl leavesinvariant e��Hl� Note that the coarse grid correction step of the MGMC algorithm

��

di�ers from the heat bath algorithm in that the new con�guration �� is not chosenindependently of the old con�guration �� to do so would be impractical since the �berhas such high dimension� Rather �� or what is equivalent � is chosen by a validupdating procedure � namely MGMC itself�

The MGMC algorithm has also an alternate interpretation � the unigrid viewpoint� in which the �bers are one dimensional and the resamplings are independent� Moreprecisely the �bers are lines of the form �� !t�B �� t � where �B denotesthe function which is � for sites belonging to the block B and zero elsewhere� The setsB are taken successively to be single sites cubes of side � cubes of side � and so on��If linear interpolation were used then the �direction vectors� �B would be replaced bytriangular waves of various widths� Just as the deterministic unigrid algorithm choosest so as to minimize the �conditional Hamiltonian� Hcond�t � H�� ! t�B so thestochastic unigrid algorithm chooses t randomly from the one dimensional distributionwith probability density Pcond�t � exp��Hcond�t�� Conceptually this algorithm is nomore complicated than the single site heat bath algorithm� But physically it is of coursevery di�erent as the direction vectors �B represent collective modes on all length scales�

We emphasize that the stochastic unigrid algorithm is mathematically and physicallyequivalent to the multi grid Monte Carlo algorithm described above� But it is useful webelieve to be able to look at MGMC from either of the two points of view� independentresamplings in one dimensional �bers or non independent resamplings �de�ned recur sively in higher dimensional �coarse grid �bers� On the other hand the two algorithmsare not computationally equivalent� One MGMC sweep requires a CPU time of ordervolume �provided that � � �d while the time for a unigrid sweep grows faster than thevolume �cf� the work estimates �� and �� Therefore we advocate unigrid onlyas a conceptual device not as a computational algorithm�

How well does MGMC perform� The answer is highly model dependent�

� For the Gaussian model it can be proven rigorously �� that � is boundedas criticality is approached �empirically � � � �� therefore critical slowing down iscompletely eliminated � The proof is a simple Fock space argument combined with theconvergence proof for deterministic MG� this will be discussed in Section ��

� For the �� model numerical experiments �� show that � diverges with the samedynamic critical exponent as in the heat bath algorithm� the gain in e�ciency thusapproaches a constant factor F �� near the critical point� This behavior can be under stood �� Section �� as due to the double well nature of the �� potential which makesMGMC ine�ective on large blocks� Thus the correct collective modes at long lengthscales are nonlinear excitations not well modelled by � � � ! t�B� �See Section � foran algorithm that appears to model these excitations well at least for � not too small�

� For the d � � XY model our numerical data �� show a more complicated behav ior� As the critical temperature is approached from above � diverges with a dynamic

��

critical exponent z � �� for the MGMC algorithm �in either V cycle or W cyclecompared to z � �� for the heat bath algorithm� Thus critical slowing down issigni�cantly reduced but is still very far from being eliminated� On the other hand be low the critical temperature � is very small � �� uniformly in L and � �at least forthe W cycle� critical slowing down appears to be completely eliminated� This very dif ferent behavior in the two phases can be understood physically� in the low temperaturephase the main excitations are spin waves which are well handled by MGMC �as inthe Gaussian model� but near the critical temperature the important excitations arewidely separated vortex antivortex pairs which are apparently not easily created by theMGMC updates�

� For the O�� nonlinear � model in two dimensions which is asymptotically freepreliminary data �� show a very strong reduction but not the total elimination ofcritical slowing down� For a W cycle we �nd that z �� I emphasize that these dataare very preliminary�� Previously Goodman and I had argued heuristically �� Section�� that for asymptotically free theories with a continuous symmetry group MGMC�with a W cycle should completely eliminate critical slowing down except for a possiblelogarithm� But our reasoning may now need to be re examined��

�� Stochastic Linear Iterations for the Gaussian Model

In this section we analyze an important class of Markov chains the stochastic lineariterations for Gaussian models �� Section �� This class includes among others thesingle site heat bath algorithm �with deterministic sweep of the sites the stochasticSOR algorithm �� and the multi grid Monte Carlo algorithm � all of coursein the Gaussian case only� We show that the behavior of the stochastic algorithm iscompletely determined by the behavior of the corresponding deterministic algorithm forsolving linear equations� In particular we show that the exponential autocorrelation time�exp of the stochastic linear iteration is equal to the relaxation time of the correspondinglinear iteration�

Consider any quadratic Hamiltonian

H��

��A� � �f� � � ��

where A is a symmetric positive de�nite matrix� The corresponding Gaussian measure

d�� const � e�� A��f�� d� ��

has mean A��f and covariance matrix A�� Next consider any �rst order stationarylinear stochastic iteration of the form

��n�� M��n� ! Nf ! Q��n� � ��

�Note Added �� For work on multi�grid Monte Carlo covering the period �� see �� and the references cited therein�

�Some of this material appears also in ��

��

where M N and Q are �xed matrices and the ��n� are independent Gaussian randomvectors with mean zero and covariance matrix C� The iteration �� has a uniquestationary distribution if and only if the spectral radius �M � limn�� kMnk��n is� �� and in this case the stationary distribution is the desired Gaussian measure ��for all f if and only if

N � �I �MA�� a

QCQT � A�� MA��MT ��b

�here T denotes transpose�The reader will note the close analogy with the deterministic linear problem ��

�� Indeed ��a is identical with �� and if we take the �zero temperaturelimit� in which H is replaced by �H with � � ! then the Gaussian measure ��approaches a delta function concentrated at the unique minimum of H �namely thesolution of the linear equation A� � f and the �noise� term disappears �Q� � sothat the stochastic iteration �� turns into the deterministic iteration �� That is�

�a The linear deterministic problem is the zero temperature limit of the Gaussianstochastic problem� and the �rst order stationary linear deterministic iteration is thezero temperature limit of the �rst order stationary linear stochastic iteration� Thereforeany stochastic linear iteration for generating samples from the Gaussian measure ��gives rise to a deterministic linear iteration for solving the linear equation �� simplyby setting Q � ��

�b Conversely the stochastic problem and iteration are the nonzero temperaturegeneralizations of the deterministic ones� In principle this means that a deterministiclinear iteration for solving �� can be generalized to a stochastic linear iteration forgenerating samples from �� if and only if the matrix A�� MA��MT is positive semide�nite� just choose a matrix Q satisfying ��b� In practice however such analgorithm is computationally tractable only if the matrixQ has additional nice propertiessuch as sparsity �or triangularity with a sparse inverse�

Examples� �� Single�site heat bath with deterministic sweep of the sites� � stochasticGauss�Seidel� On each visit to site i �i is replaced by a new value ��i chosen inde pendently from the conditional distribution of �� with f�jgj �i �xed at their currentvalues� that is ��i is Gaussian with mean �fi�Pj �i aij�j�aii and variance ��aii� Whenupdating �i at sweep n ! � the variables �j with j � i have already been visited on

this sweep hence have their �new� values ��n��j while the variables �j with j � i have

not yet been visited on this sweep and so have their �old� values ��n�j � It follows that

��n��i � a��ii

�fi �Xj�i

aij��n��j �X

j�i

aij��n�j

A ! a��ii �

�n�i � ��

where � has covariance matrix I� A little algebra brings this into the matrix form ��with

M � ��D ! L��LT ��a

��

N � �D ! L�� b

Q � �D ! L��D�� c

where D and L are the diagonal and lower triangular parts of the matrixA respectively�It is straightforward to verify that ��ab are satis�ed�� The single site heat bathalgorithm is clearly the stochastic generalization of the Gauss Seidel algorithm�

�� Stochastic SOR� For models which are Gaussian �or more generally �multi Gaussian� Adler �� and Whitmer �� have shown that the successive over relaxation�SOR iteration admits a stochastic generalization namely

��n��i � ��

�n�i ! �a��ii

�fi �Xj�i

aij��n��j �X

j�i

aij��n�j

A !

��

aii

��

��n�i �

��where � � � � �� For � � � this reduces to the single site heat bath algorithm� This iseasily seen to be of the form �� with

M � ��D ! �L�� D ! �LT

��a

N � ��D ! �L�� b

Q � �� D ! �L��D�� c

whereD and L are as before� It is straightforward to verify that ��ab are satis�ed��

�� Multi�Grid Monte Carlo MGMC�� The multi grid Monte Carlo algorithm mgmc�de�ned in Section �� is identical to the corresponding deterministic multi grid algo rithmmgm �de�ned in Section �� except that Sl is a stochastic rather than determinis tic updating� Consider for example the case in which Sl is a stochastic linear updating�e�g� single site heat bath� Then the MGMC is also a stochastic linear updating of theform �� in fact M equals MMG the iteration matrix of the corresponding deter ministic MG method and N equals NMG� the matrix Q is rather complicated but sincethe MGMC algorithm is correct Q must satisfy ��b� �The easiest way to see thatM � MMG is to imagine what would happen if all the random numbers ��n� were zero�Then the stochastic linear updating would reduce to the corresponding deterministicupdating and hence the same would be true for the MGMC updating as a whole��

�� Langevin equation with small time step� As far as I know there does not existany useful stochastic generalization of the Jacobi iteration� However let us discretizethe Langevin equation

d�

dt� ��

�C�A�� f ! � � ��

��We remark that this veri�cation never uses the fact that D is diagonal or that L is lower triangular�It is su�cient to have A D � L � LT with D symmetric positive�de�nite and D � L nonsingular�However� for the method to be practical� it is important that D�� and �D�L�� be �easy� to computewhen applied to a vector�

��

where � is Gaussian white noise with covariance matrix C using a small time step ��The result is an iteration of the form �� with

M � I � �

�CA ��a

N ��

�C ��b

Q � ��I ��c

This satis�es ��a exactly but satis�es ��b only up to an error of order �� IfC � D�� these M�N correspond to a damped Jacobi iteration with � � ��

It is straightforward to analyze the dynamic behavior of the stochastic linear iteration�� Using �� and �� to express ��n� in terms of the independent randomvariables �� n�� we �nd after a bit of manipulation that

h��n�i � Mnh��i ! �I �MnA��f ��

and

cov��s�� t� � M s cov�� MT t !

��A�� M sA��MT s��MT t�s if s � tM s�t�A�� M tA��MT t� if s � t

��Now let us either start the stochastic process in equilibrium

h��i � A��f ��a

cov�� A�� b

or else let it relax to equilibrium by taking s� t � ! with s � t �xed� Either waywe conclude that in equilibrium �� de�nes a Gaussian stationary stochastic processwith mean A��f and autocovariance matrix

cov��s�� t� ��A��MT t�s if s � tM s�tA�� if s � t

��

Moreover since the stochastic process is Gaussian all higher order time dependent cor relation functions are determined in terms of the mean and autocovariance� Thus thematrixM determines the autocorrelation functions of the Monte Carlo algorithm�

Another way to state these relationships is to recall �� that the Hilbert spaceL�� is isomorphic to the bosonic Fock space F�U built on the �energy Hilbert space��U�A� the �n particle states� are the homogeneous Wick polynomials of degree n inthe shifted �eld e� � � � A��f � �If U is one dimensional these are just the Hermitepolynomials� Then the transition probability P ��n� � ��n�� induces on the Fockspace an operator

P � (�MT � I �MT � �MT �MT � � � � ��

��

that is the second quantization of the operator MT on the energy Hilbert space �see ��Section �� for details� It follows from �� that

k(�Mn jn ��kL�� kMnk�U�A� ��

and hence that�(�M jn �� M � ��

Moreover P is self adjoint on L�� i�e� satis�es detailed balance� if and only if M isself adjoint with respect to the energy inner product i�e�

MA � AMT � ��

and in this case

�(�M jn �� k(�M jn ��kL�� M � kMk�U�A� � ��

In summary we have shown that the dynamic behavior of any stochastic lineariteration is completely determined by the behavior of the corresponding deterministiclinear iteration� In particular the exponential autocorrelation time �exp �slowest decayrate of any autocorrelation function is given by

exp��exp � �M � ��

and this decay rate is achieved by at least one observable which is linear in the �eld�� In other words the �worst case convergence rate of the Monte Carlo algorithm isprecisely equal to the �worst case convergence rate of the corresponding deterministiciteration�

In particular for Gaussian MGMC the convergence proofs for deterministic multi grid �� combined with the arguments of the present section prove rigorouslythat critical slowing�down is completely eliminated �at least for a W cycle� That isthe autocorrelation time � of the MGMC method is bounded as criticality is approached�empirically � ��

� Swendsen�Wang Algorithms

A very di�erent type of collective mode algorithm was proposed two years ago�� bySwendsen and Wang �� for Potts spin models� Since then there has been an explosionof work trying to understand why this algorithm works so well �and why it does not work

��Note Added �� Now nearly ten years ago� The great interest in algorithms of Swendsen�Wang type �frequently called cluster algorithms� has not abated� see the references cited in the �NotesAdded� below� plus many others� See also �� for reviews that are slightly more up�to�date thanthe present notes �� rather than ��

��

even better and trying to improve or generalize it� The basic idea behind all algorithmsof Swendsen Wang type is to augment the given model by means of auxiliary variablesand then to simulate this augmented model� In this lecture we describe the Swendsen Wang �SW algorithm and review some of the proposed variants and generalizations�

Let us �rst recall that the q state Potts model �� is a generalization of theIsing model in which each spin �i can take q distinct values rather than just two ��i �� q� here q is an integer � �� Neighboring spins prefer to be in the same stateand pay an energy price if they are not� The Hamiltonian is therefore

H�� Xhiji

Jij �� i� j ��

with Jij � � for all i� j ��ferromagnetism� and the partition function is

Z �Xf g

exp��H��

�Xf g

exp

�Xhiji

Jij �� i� j � �

��

Xf g

Yhiji

�� pij ! pij� i� j � ��

where we have de�ned pij � �� exp��Jij� The Gibbs measure Potts�� is of course

Potts�� Z�� exp

�Xhiji

Jij �� i� j � �

�� Z��

Yhiji

�� pij ! pij� i� j � ��

We now use the deep identity

a! b ��X

n��

�a�n�� ! b�n��

on each bond hiji� that is we introduce on each bond hiji an auxiliary variable nijtaking the values � and � and obtain

Z �Xf g

Xfng

Yhiji

�� pij �nij�� ! pij�nij�� i� j � � ��

Let us now take seriously the fng as dynamical variables� we can think of nij as anoccupation variable for the bond hiji �� occupied � � empty� We therefore de�nethe Fortuin�Kasteleyn�Swendsen�Wang FKSW� model to be a joint model having q state Potts spins �i at the sites and occupation variables nij on the bonds with jointprobability distribution

FKSW �� n � Z��Yhiji

�� pij �nij �� ! pij�nij�� i� j � � ��

��

Finally let us see what happens if we sum over the f�g at �xed fng� Each occupiedbond hiji imposes a constraint that the spins �i and �j must be in the same statebut otherwise the spins are unconstrained� We therefore group the sites into connectedclusters �two sites are in the same cluster if they can be joined by a path of occupiedbonds� then all the spins within a cluster must be in the same state �all q values areequally probable and distinct clusters are independent� It follows that

Z �Xfng

� Yhiji� nij��

pij

A� Yhiji� nij��

�� pij

A qC�n� � ��

where C�n is the number of connected clusters �including one site clusters in the graphwhose edges are the bonds having nij � �� The corresponding probability distribution

RC�n � Z��

� Yhiji� nij��

pij

A� Yhiji� nij��

�� pij

A qC�n� � ��

is called the random�cluster model with parameter q� This is a generalized bond percolation model with non local correlations coming from the factor qC�n�� for q � �it reduces to ordinary bond percolation� Note by the way that in the random clustermodel �unlike the Potts and FKSW models q is merely a parameter� it can take anypositive real value not just �� So the random cluster model de�nes in some sensean analytic continuation of the Potts model to non integer q� ordinary bond percolationcorresponds to the �one state Potts model��

We have already veri�ed the following facts about the FKSW model�

a ZPotts � ZFKSW � ZRC�

b The marginal distribution of FKSW on the Potts variables f�g �integrating outthe fng is precisely the Potts model Potts��

c The marginal distribution of FKSW on the bond occupation variables fng �inte grating out the f�g is precisely the random cluster model RC�n�

The conditional distributions of FKSW are also simple�

d The conditional distribution of the fng given the f�g is as follows� independentlyfor each bond hiji one sets nij � � in case �i � �j and sets nij � �� withprobability � � pij� pij respectively in case �i � �j�

e The conditional distribution of the f�g given the fng is as follows� independentlyfor each connected cluster one sets all the spins �i in the cluster to the same valuechosen equiprobably from f�� qg�

��

These facts can be used for both analytic and numerical purposes� For example byusing facts �b �c and �e we can prove an identity relating expectations in the Pottsmodel to connection probabilities in the random cluster model�

h� i� jiPotts�q � h� i� j iFKSW�q �by �b�

� hE�� i� j j fngiFKSW�q

�

��q � ��ij ! �

q

�FKSW�q

�by �e�

�

��q � ��ij ! �

q

�RC�q

�by �c� ��

Here

�ij � �ij�n �� if i is connected to j� if i is not connected to j

��

and E� � j fng denotes conditional expectation given fng�� For the Ising model withthe usual convention � � �� can be written more simply as

h�i�jiIsing � h�ijiRC�q��

Similar identities can be proven for higher order correlation functions and can be em ployed to prove Gri�ths type correlation inequalities for the Potts model ��

On the other hand Swendsen and Wang �� exploited facts �b��e to devise aradically new type of Monte Carlo algorithm� The Swendsen Wang algorithm �SWsimulates the joint model �� by alternately applying the conditional distributions �dand �e � that is by alternately generating new bond occupation variables �independentof the old ones given the spins and new spin variables �independent of the old onesgiven the bonds� Each of these operations can be carried out in a computer time oforder volume� for generating the bond variables this is trivial and for generating thespin variable it relies on e�cient �linear time algorithms for computing the connectedclusters�� It is trivial that the SW algorithm leaves invariant the Gibbs measure ��since any product of conditional probability operators has this property� It is also easyto see that the algorithm is ergodic in the sense that every con�guration f�� ng having

��For an excellent introduction to conditional expectations� see ��

��Determining the connected components of an undirected graph is a classic problem of computerscience� The depth��rst�search and breadth��rst�search algorithms �� have a running time of orderV � while the Fischer�Galler�Hoshen�Kopelman algorithm �in one of its variants� �� has a worst�caserunning time of order V logV � and an observed mean running time of order V in percolation�typeproblems �� Both of these algorithms are non�vectorizable� Shiloach and Vishkin �� have inventeda SIMD parallel algorithm� and we have very recently vectorized it for the Cyber �� obtaining aspeedup of a factor of �� over scalar mode� We are currently carrying out a comparative test of thesethree algorithms� as a function of lattice size and bond density �� In view of the extraordinaryperformance of the SW algorithm �see below� and the fact that virtually all its CPU time is spent�nding connected components� we feel that the desirability of �nding improved algorithms for thisproblem is self�evident�

��

d � � Ising ModelL � �int�E��

Table �� Susceptibility� and autocorrelation time �int�E �E � energy slowest mode fortwo dimensional Ising model at criticality using Swendsen Wang algorithm� Standarderror is shown in parentheses�

nonzero FKSW measure is accessible from every other� So the SW algorithm is atleast a correct algorithm for simulating the FKSW model� It is also an algorithm forsimulating the Potts and random cluster models since expectations in these two modelsare equal to the corresponding expectations in the FKSW model�

Historical remark� The random cluster model was introduced in �� by Fortuinand Kasteleyn �� they derived the identity ZPotts � ZRC along with the correlation function identity �� and some generalizations� These relations were rediscoveredseveral times during the subsequent two decades �� Surprisingly however no oneseems to have noticed the joint probability distribution FKSW that underlay all theseidentities� this was discovered implicitly by Swendsen and Wang �� and was madeexplicit by Edwards and Sokal ��

It is certainly plausible that the SW algorithm might have less critical slowing downthan the conventional �single spin update algorithms� the reason is that a local move inone set of variables can have highly nonlocal e�ects in the other� For example settingnb � � on a single bond may disconnect a cluster causing a big subset of the spins inthat cluster to be ipped simultaneously� In some sense therefore the SW algorithmis a collective mode algorithm in which the collective modes are chosen by the systemrather than imposed from the outside as in multi grid� �The miracle is that this is donein a way that preserves the correct Gibbs measure�

How well does the SW algorithm perform� In at least some cases the performanceis nothing short of extraordinary� Table � shows some preliminary data �� on a two dimensional Ising model at the bulk critical temperature� These data are consistentwith the estimate �SW � L �� By contrast the conventional single spin ip

��But precisely because rises so slowly with L� good estimates of the dynamic critical exponentwill require the use of extremely large lattices� Even with lattices up to L �� we are unable todistinguish convincingly between z � �� and z � �� Note Added �� For more recent and

��

Estimates of zSWq � � q � � q � � q � �

d � � � � � �d � � � �� exact��d � � � �� d � � � � �exact� � �

Table �� Current best estimates of the dynamic critical exponent z for the Swendsen Wang algorithm� Estimates are taken from �� for d � �� q � �� for d � �q � �� and �� for d � � q � �� Error bar is a ��$ con�dence interval�

algorithms for the two dimensional Ising model have �conv � L �� So the advantageof Swendsen Wang over conventional algorithms �for this model grows asymptoticallylike L �� To be sure one iteration of the Swendsen Wang algorithm may be a factorof � �� more costly in CPU time than one iteration of a conventional algorithm �theexact factor depends on the e�ciency of the cluster �nding subroutine� But the SWalgorithm wins already for modest values of L�

For other Potts models the performance of the SW algorithm is less spectacularthan for the two dimensional Ising model but it is still very impressive� In Table � wegive the current best estimates of the dynamic critical exponent zSW for q state Pottsmodels in d dimensions as a function of q and d��

All these exponents are much lower than the z �� observed in the single spin ipalgorithms�

Although the SW algorithm performs impressively well we understand very littleabout why these exponents take the values they do� Some cases are easy� If q � � thenall spins are in the same state �the only state� and all bonds are thrown independentlyso the autocorrelation time is zero� �Here the SW algorithm just reduces to the standardstatic algorithm for independent bond percolation� If d � � �more generally if thelattice is a tree the SW dynamics is exactly soluble� the behavior of each bond isindependent of each other bond and �exp � �� log�� q � as � � !� Butthe remainder of our understanding is very murky� Two principal insights have beenobtained so far�

a A calculation yielding zSW � � in a mean �eld �Curie Weiss Ising model ��

much more precise data� see �� These data are consistent with SW � L�� but they arealso consistent with SW � log�L �� It is extremely di�cult to distinguish a small power from alogarithm�

��Note Added �� For more recent data� see �� for d � q � �� for d � q ��and �� for d � q �� The situation for d �� q is extremely unclear� exponent estimatesby di�erent workers �using di�erent methods� disagree wildly�

��

This suggests �but of course does not prove that zSW � � for Ising models�q � � in dimension d � ��

b A rigorous proof that zSW � �� This bound while valid for all d and q isextremely far from sharp for the Ising models in dimensions � and higher� But itis reasonably good for the � and � state Potts models in two dimensions and inthe latter case it may even be sharp��

But much remains to be understood�The Potts model with q large behaves very di�erently� Instead of a critical point

the model undergoes a rst�order phase transition� in two dimensions this occurs whenq � � while in three or more dimensions it is believed to occur already when q � �� At a �rst order transition both the conventional algorithms and the Swendsen Wang algorithm have an extremely severe slowing down �much more severe than theslowing down at a critical point� right at the transition temperature we expect � �exp�cLd�� This is because sets of con�gurations typical of the ordered and disorderedphases are separated by free energy barriers of order Ld�� i�e� by sets of intermediatecon�gurations that contain interfaces of surface area � Ld�� and therefore have anequilibrium probability � exp��cLd��

Wol� �� has proposed a interesting modi�cation of the SW algorithm in which onebuilds only a single cluster �starting at a randomly chosen site and ips it� Clearlyone step of the single cluster SW algorithm makes less change in the system than onestep of the standard SW algorithm but it also takes much less work� If one enumer ates the cluster using depth �rst search or breadth �rst search then the CPU time isproportional to the size of the cluster� and by the Fortuin Kasteleyn identity �� themean cluster size is equal to the susceptibility�

Xj

h�iji �Xj

�q� i� j � �

q � �

��

So the relevant quantity in the single cluster SW algorithm is the dynamic critical ex ponent measured in CPU units�

z��cluster�CPU � z��cluster ��d� �

��

��Note Added �� See �� for more recent data on the possible sharpness of theLi�Sokal bound for two�dimensional models with q � �� These data appear to be consistent withsharpness modulo a logarithm� i�e� SW �CH � logL�

��Note Added �� Great progress has been made in recent years in Monte Carlo methods forsystems undergoing a �rst�order phase transition� The key idea is to simulate a suitably chosen non�

Boltzmann probability distribution ��umbrella�� multicanonical�� and then apply reweightingmethods� The exponential slowing�down � exp�cLd�� resulting from barrier penetration is replacedby a power�law slowing�down � Lp resulting from di�usion� provided that the distribution to besimulated is suitably chosen� See �� for reviews�

��

The value of the single cluster algorithm is that the probability of choosing a clusteris proportional to its size �since we pick a random site so the work is concentratedpreferentially on larger clusters � and we think that is these clusters which are most im portant near the critical point� So it would not be surprising if z��cluster�CPU were smallerthan zSW � Preliminary measurements indicate that z��cluster�CPU is about the same aszSW for the two dimensional Ising model but is signi�cantly smaller for the three andfour dimensional Ising models �� But a convincing theoretical understanding of thisbehavior is lacking�

Several other generalizations of the SW algorithm for Potts models have been proposed��

One is a multi grid extension of the SW algorithm� the idea is to carry out only a partialFKSW transformation but then to apply this concept recursively �� This algorithmmay have a dynamic critical exponent that is smaller than that of standard SW �but theclaims that z � � are in my opinion unconvincing�� A second generalization whichworks only in two dimensions augments the SW algorithm by transformations to thedual lattice �� This algorithm appears to achieve a modest improvement in criticalslowing down in the scaling region j� � �cj � L��

Finally the SW algorithm can be generalized in a straightforward manner to Pottslattice gauge theories �more precisely lattice gauge theories with a nite abelian gaugegroup G and Potts ��function� action� Preliminary results for the three dimensionalZ� gauge theory yield a dynamic critical exponent roughly equal to that of the ordinarySW algorithm for the three dimensional Ising model �to which the gauge theory is dual��

In the past year there has been a urry of papers trying to generalize the SW algo rithm to non Potts models� Interesting proposals for a direct generalization of the SWalgorithm were made by Edwards and Sokal �� and by Niedermayer �� But the mostpromising ideas at present seem to be the embedding algorithms proposed by Brower andTamayo �� for one component �� models and by Wol� �� for multi componentO�n invariant models�

The idea of the embedding algorithms is to �nd Ising like variables underlying ageneral spin variable and then to update the resulting Ising model using the ordinarySW algorithm �or the single cluster variant� For one component spins this embeddingis the obvious decomposition into magnitude and sign� Consider therefore a one

�Note Added �� In addition to the generalizations discussed below� see �� for SW�typealgorithms for the Ashkin�Teller model� see �� for a clever SW�type algorithm for the fully frustratedIsing model� and see �� for a very interesting SW�type algorithm for antiferromagnetic

Potts models� based on the �embedding� idea to be described below�

�Note Added �� For at least some versions of the multi�grid SW algorithm� it can be proven�� that the bound zMGSW � �� holds� Thus� the critical slowing�down is not completely eliminatedin any model in which the speci�c heat is divergent�

��

component model with Hamiltonian

H�� Xhxyi

�x � �y !Xx

V ��x � ��

where � � � and V �� V �� We write

�x � �x j�xj � ��

where the signs f�g are Ising variables� For xed values of the magnitudes fj�jg theconditional probability distribution of the f�g is given by an Ising model with ferromag�netic �though space dependent couplings Jxy � �j�xjj�yj� Therefore the f�g modelcan be updated by the Swendsen Wang algorithm� �Heat bath or MGMC sweeps mustalso be performed in order to update the magnitudes� For the two dimensional ��

model Brower and Tamayo �� nd a dynamic critical behavior identical to that of theSW algorithm for the two dimensional Ising model � just as one would expect basedon the idea that the �important� collective modes in the �� model are spin ips�

Wol��s embedding algorithm for n component models �n � � is equally simple�Consider an O�n invariant model with Hamiltonian

H�� Xhxyi

�x � �y !Xx

V �j�xj � ��

with � � �� Now �x a unit vector r � IRn� then any spin vector �x � IRn can be writtenuniquely �except for a set of measure zero in the form

�x � ��x ! �xj�x � rjr � ��

where��x � �x � ��x � rr ��

is the component of �x perpendicular to r and

�x � sgn��x � r ��

takes the values �� Therefore for xed values of the f��g and fj� �rjg the probabilitydistribution of the f�g is given by an Ising model with ferromagnetic couplings Jxy ��j�x � rjj�y � rj� The algorithm is then� Choose at random a unit vector r� �x the f��gand fj� �rjg at their current values and update the f�g by the standard Swendsen Wangalgorithm �or the single cluster variant� Flipping �x corresponds to re ecting �x in thehyperplane perpendicular to r�

At �rst thought it may seem strange �and somehow �unphysical� to try to �nd Ising like �i�e� discrete variables in a model with a continuous symmetry group� Howeverupon re ection �pardon the pun one sees what is going on� if the spin con�guration isslowly varying �e�g� a long wavelength spin wave then the clusters tend to break alongthe surfaces where Jxy is small hence where � �r �� Then ipping �x on some clusters

��

Figure �� Action of the Wol� embedding algorithm on a long wavelength spin wave� Forsimplicity both spin space �� and physical space �x are depicted as one dimensional�

corresponds to a �soft� change near the cluster boundaries but a �hard� change in thecluster interiors i�e� a long wavelength collective mode �Figure �� So it is conceivablethat the algorithm could have very small �or even zero� critical slowing down in modelswhere the important large scale collective modes are spin waves�

Even more strikingly consider a two dimensionalXY model con�guration consistingof a widely separated vortex antivortex pair� in the continuum limit this is given by

��z � Im log�z � a� Imlog�z ! a ��

where z � x� ! ix� and � � �cos �� sin �� Then the surface � � r � � is an ellipsepassing directly through the vortex and antivortex�� Re ecting the spins inside thisellipse produces a con�guration f�newg that is a continuous map of the doubly puncturedplane into the semicircle j�j � � � � r � �� Such a map necessarily has zero windingnumber around the points �a� So the Wol� update has destroyed the vorticity��

Therefore the key collective modes in the two dimensionalXY and O�n models �spin waves and �for the XY case vortices � are well �encoded� by the Ising variablesf�g� So it is quite plausible that critical slowing down could be eliminated or almosteliminated� In fact tests of this algorithm on the two dimensionalXY �n � � classicalHeisenberg �n � � and O�� models are consistent with the complete absence of criticalslowing down ��

In view of the extraordinary success of the Wol� algorithm for spin models it istempting to try to extend it to lattice gauge theories with continuous gauge group �forexample U�� SU�N or SO�N�� But I have nothing to report at present�

��With r �cos �� sin�� the equation of this ellipse is x�� x� � a tan�� a� sec� ��

��This picture of the action of the Wol� algorithm on vortex�antivortex pairs was developed indiscussions with Richard Brower and Robert Swendsen�

��Note Added �� See �� for a general theory of Wol��type embedding algorithms as appliedto nonlinear �models �spin models taking values in a Riemannian manifold M �� Roughly speaking�we argue that a Wol��type embedding algorithm can work well in the sense of having a dynamic

��

Algorithms for the Self�Avoiding Walk��

An N�step self�avoiding walk � on a lattice L is a sequence of distinct points�� N in L such that each point is a nearest neighbor of its predecessor� Let cN�resp� cN�x� be the number of N step SAWs starting at the origin and ending anywhere�resp� ending at x�� Let h��

N i be the mean square end to end distance of an N stepSAW� These quantities are believed to have the asymptotic behavior

cN � N N��

cN �x � N N�sing�� x �xed � � ��

h��N i � N��

as N �� here � �sing and are critical exponents while �the connective constantof the lattice is the analogue of a critical temperature� The SAW has direct applicationin polymer physics �� and is indirectly relevant to ferromagnetism and quantum �eldtheory by virtue of its equivalence with the n� � limit of the n vector model ��

The SAW has some advantages over spin systems for Monte Carlo work� Firstly onecan work directly with SAWs on an in�nite lattice� there are no systematic errors due to�nite volume corrections� Secondly there is no Ld �or �d factor in the computationalwork so one can go closer to criticality� Thus the SAW is an exceptionally advantageous�laboratory� for the numerical study of critical phenomena�

Di�erent aspects of the SAW can be probed in three di�erent ensembles�

� Free endpoint grand canonical ensemble �variable N variable x

� Fixed endpoint grand canonical ensemble �variable N �xed x

� Canonical ensemble ��xed N variable x

In the remainder of this section we survey some typical Monte Carlo algorithms for theseensembles�

Free�endpoint grand canonical ensemble� Here the con�guration space S is the setof all SAWs of arbitrary length starting at the origin and ending anywhere� The grand

critical exponent z signi�cantly less than only if M is a discrete quotient of a product of spheres�for example� a sphere SN�� or a real projective space RPN�� SN��Z�� In particular� we do not

expect Wol��type embedding algorithms to work well for �models taking values in SU �N � for N � ��

��Note Added �� An extensive and up�to�date review of Monte Carlo algorithms for the self�avoiding walk can be found in �� A briefer version is ��

��The proper terminology for these ensembles is unclear to me� Perhaps the grand canonical andcanonical ensembles ought to be called �canonical� and �microcanonical�� respectively� reserving theterm �grand canonical� for ensembles of many SAWs� Note Added �� I now prefer calling thesethe �variable�length� and ��xed�length� ensembles� respectively� and avoiding the ��canonical� termsentirely� this avoids ambiguity�

��

partition function is

)�� X

N��

�N cN ��

and the Gibbs measure is�� )�� jj� ��

The �monomer activity� � is a user chosen parameter satisfying � � � � �c � ��As � approaches the critical activity �c the average walk length hNi tends to in�nity�The connective constant and the critical exponent � can be estimated from the MonteCarlo data using the method of maximum likelihood ��

A dynamic Monte Carlo algorithm for this ensemble was proposed by Berretti andSokal �� The algorithm�s elementary moves are as follows� either one attempts toappend a new step to the walk with equal probability in each of the q possible directions�here q is the coordination number of the lattice� or else one deletes the last step fromthe walk� In the former case one must check that the proposed new walk is self avoiding�if it isn�t then the attempted move is rejected and the old con�guration is counted againin the sample ��null transition�� If an attempt is made to delete a step from an already empty walk then a null transition is also made� The relative probabilities of %N � !�and %N � �� attempts are chosen to be

IP�%N � !� attempt �q�

� ! q��

IP�%N � �� attempt ��

� ! q��

Therefore the transition probability matrix is

p��

��

��q��SAW�� if � � ��

��q� if �� or � � ��

��q�A�� if � � ��

��

where

�SAW��

�� if �� is a SAW� if �� is not a SAW

��

Here � � �� denotes that the walk �� is a one step extension of �� and A�� is the numberof non�self�avoiding walks �� with � � �� It is easily veri�ed that this transition matrixsatis�es detailed balance for the Gibbs distribution � i�e�

�� p�� p��

for all �� S� It is also easily veri�ed that the algorithm is ergodic �� irreducible�to get from a walk � to another walk �� it su�ces to use %N � �� moves to transform� into the empty walk � and then use %N � !� moves to build up the walk ��

Let us now analyze the critical slowing down of the Berretti Sokal algorithm� Wecan argue heuristically that

� � hNi� � ��

��

To see this consider the quantity N�t � j�j�t the number of steps in the walk attime t� This quantity executes crudely speaking a random walk �with drift on thenonnegative integers� the average time to go from some point N to the point � �i�e� theemptywalk is of order N�� Now each time the emptywalk is reached all memory of thepast is erased� future walks are then independent of past ones� Thus the autocorrelationtime ought to be of order hN�i or equivalently hNi��

This heuristic argument can be turned into a rigorous proof of a lower bound � �const � hNi� �� However as an argument for an upper bound of the same form itis not entirely convincing as it assumes without proof that the slowest mode is the onerepresented by N�t� With considerably more work it is possible to prove an upperbound on � that is only slightly weaker than the heuristic prediction� � � const�hNi�� Note that the critical exponent � is believed to equal �� in dimensiond � � �� in d � � and � in d � �� In fact we suspect �� that the true behavior is� � hNip for some exponent p strictly between � and � ! �� A deeper understanding ofthe dynamic critical behavior of the Berretti Sokal algorithm would be of de�nite value�

It is worth comparing the computational work required for SAW versus Ising sim ulations� hNi � � � �� for the d � � SAW versus �d�z � � �� resp�� for the d � � Ising model using the Metropolis �resp� Swendsen Wang algorithm� Thisvindicates our assertion that the SAW is an advantageous model for Monte Carlo studiesof critical phenomena�

Fixed�endpoint grand canonical ensemble� The con�guration space S is the set of allSAWs of arbitrary length starting at the origin and ending at the �xed site x �� Theensemble is as in the free endpoint case with cN replaced by NcN �x� The connectiveconstant and the critical exponent �sing can be estimated from the Monte Carlo datausing the method of maximum likelihood�

A dynamic Monte Carlo algorithm for this ensemble was proposed by Berg andFoerster �� and Arag*ao de Carvalho Caracciolo and Fr+ohlich �BFACF �� Theelementary moves are local deformations of the chain with %N � �� The criticalslowing down in the BFACF algorithm is quite subtle� On the one hand Sokal andThomas �� have proven the surprising result that �exp � ! for all � � � �see Section�� On the other hand numerical experiments �� show that �int�N � hNi�� ind � �� Clearly the BFACF dynamics is not well understood at present� further workboth theoretical and numerical is needed�

In addition Caracciolo Pelissetto and Sokal �� are studying a �hybrid� algorithmthat combines local �BFACF moves with non local �cut and paste moves� The criticalslowing down measured in CPU units appears to be reduced slightly compared to thepure BFACF algorithm� � � hNi �� in d � ��

Canonical ensemble� Algorithms for this ensemble based on local deformations ofthe chain have been used by polymer physicists for more than �� years �� So therecent proof �� that all such algorithms are nonergodic �� not irreducible comes as a

��All these proofs are discussed in Section ��

��

slight embarrassment� Fortunately there does exist a non�local �xed N algorithm whichis ergodic� the �pivot� algorithm invented by Lal �� and independently reinvented byMacDonald et al� �� and by Madras �� The elementary move is as follows� chooseat random a pivot point k along the walk �� k � N � �� choose at random a non identity element of the symmetry group of the lattice �rotation or re ection� then applythe symmetry group element to �k�� N using �k as a pivot� The resulting walkis accepted if it is self avoiding� otherwise it is rejected and the walk � is counted oncemore in the sample� It can be proven �� that this algorithm is ergodic and preservesthe equal weight probability distribution�

At �rst thought the pivot algorithm sounds terrible �at least it did to me� for Nlarge nearly all the proposed moves will get rejected� This is in fact true� the acceptancefraction behaves N�p as N � where p �� in d � � �� On the other handthe pivot moves are very radical� after very few �� or �� accepted moves the SAWwill have reached an �essentially new� conformation� One conjectures therefore that� � Np� Actually it is necessary to be a bit more careful� for global observables f�such as the end to end distance ��

N one expects �int�f � Np� but local observables�such as the angle between the ��th and ��th bonds of the walk are expected to evolve afactor of N more slowly� �int�f � N��p� Thus the slowest mode is expected to behave as�exp � N��p� For the pivot algorithm applied to ordinary random walk one can calculatethe dynamical behavior exactly �� for global observables f the autocorrelation functionbehaves roughly like

ff �t �nXi��

�� i

n

�t� ��

from which it follows that

�int�f � log N ��

�exp�f � N ��

� in agreement with our heuristic argument modulo logarithms� For the SAW it isfound numerically �� that �int�f � N �� in d � � also in close agreement with theheuristic argument�

A careful analysis of the computational complexity of the pivot algorithm �� showsthat one �e�ectively independent� sample �at least as regards global observables canbe produced in a computer time of order N � This is a factor N more e�cient than theBerretti Sokal algorithm a fact which opens up exciting prospects for high precisionMonte Carlo studies of critical phenomena in the SAW� Thus with a modest computa tional e�ort �� hours on a Cyber �� Madras and I found � �� $ con�dence limits for � dimensional SAWs of lengths �� N � �� We hope to carry out soon a convincing numerical test of hyperscaling in the three dimensional SAW��

��Note Added �� This study has now been carried out �� and yields � �� subjective ��! con�dence limits�� based on SAWs of length up to �� steps� Proper treatment

��

Rigorous Analysis of Dynamic Monte Carlo Algo�

rithms

In this �nal lecture we discuss techniques for proving rigorous lower and upperbounds on the autocorrelation times of dynamic Monte Carlo algorithms� This topic isof primary interest of course to mathematical physicists� it constitutes the �rst stepstoward a rigorous theory of dynamic critical phenomena along lines parallel to the well established rigorous theory of static critical phenomena� But these proofs are I believealso of some importance for practical Monte Carlo work as they give insight into thephysical basis of critical slowing down and may point towards improved algorithms withreduced critical slowing down�

There is a big di�erence between the techniques used for proving lower and upperbounds and it is easy to understand this physically� To prove a lower bound on theautocorrelation time it su�ces to �nd one physical reason why the dynamics should beslow i�e� to �nd one �slow mode�� This physical insight can often be converted directlyinto a rigorous proof using the variational method described below� On the other handto prove an upper bound on the autocorrelation time it is necessary to understand allconceivable physical reasons for slowness and to prove that none of them cause too greata slowing down� This is extremely di�cult to do and has been carried to completionin very few cases�

We shall be obliged to restrict attention to reversible Markov chains �i�e� those satis fying the detailed balance condition �� as these give rise to self adjoint operators�Non reversible Markov chains corresponding to non self adjoint operators are muchmore di�cult to analyze rigorously�

The principal method for proving lower bounds on �exp �and in fact on �int�f is thevariational �or Rayleigh�Ritz method� Let us recall some of the theory of reversibleMarkov chains from Section �� Let � be a probability measure and let P � fpxyg be atransition probability matrix that satis�es detailed balance with respect to �� Now Pacts naturally on functions �observables according to

�Pf�x � Xy

pxy f�y � ��

In particular when acting on the space l�� of � square integrable functions the oper ator P is a self�adjoint contraction� Its spectrum therefore lies in the interval �� Moreover P has an eigenvalue � with eigenvector equal to the constant function ��Let , be the orthogonal projector in l�� onto the constant functions� Then for eachreal valued observable f � l�� we have

Cff�t � �f� �P jtj �,f

of corrections to scaling is crucial in obtaining this estimate� We also show that the interpenetrationratio " approaches its limiting value "� �� from above� contrary to the prediction ofthe two�parameter renormalization�group theory� We have critically reexamined this theory and shownwhere the error lies ��

��

� �f� �I �,P jtj�I �,f ��

where �g� h � hg� hi� denotes the inner product in l�� By the spectral theorem thiscan be written as

Cff�t �Z �

��jtj d ff��

where d ff is a positive measure� It follows that

�int�f ��

�

R ��

�� d ff ��R �

�� d ff ��

� �

�

� ! ff��

�� ff��

by Jensen�s inequality �since the function � �� ! �� is convex�

Our method will be to compute explicitly a lower bound on the normalized autocor relation function at time lag �

ff�� Cff��

Cff��

for a suitably chosen trial observable f � Equivalently we compute an upper bound onthe Rayleigh quotient

�f� �I � P f

�f� �I �,f�

Cff�� Cff��

Cff�� ff��

The crux of the matter is to pick a trial observable f that has a strong enough overlapwith the �slowest mode�� A useful formula is

�f� �I � P f ��

�

Xx�y

�x pxy jf�x� f�yj� � ��

That is the numerator of the Rayleigh quotient is half the mean square change in f ina single time step�

Example �� Single�site heat�bath algorithm� Let us consider for simplicity a translation invariant spin model in a periodic box of volume V � Let Pi � fPi�� g be thetransition probability matrix associated with applying the heat bath operation at sitei�� Then the operator Pi has the following properties�

��It also follows from �� that �ff �t� � �ff ��jtj for even values of t� Moreover� this holds for oddvalues of t if d�ff is supported on � � � �though not necessarily otherwise�� this is the case for theheat�bath and Swendsen�Wang algorithms� in which P � �� Therefore� the decay as t � � of �ff �t�is also bounded below in terms of �ff ��

�Probabilistically� Pi is the conditional expectation E�� j f�jgj ��i�� Analytically� Pi is the orthog�

onal projection in l�� onto the linear subspace consisting of functions depending only on f�jgj ��i�

��

�a Pi is an orthogonal projection� �In particular P �i � Pi and � � Pi � I�

�b Pi� � �� In particular ,Pi � Pi, � ,�

�c Pif � f if f depends only on f�jgj �i� �In fact a stronger property holds� Pi�fg �fPig if f depends only on f�jgj �i� But we shall not need this�

For simplicity we consider only random site updating� Then the transition matrix of theheat bath algorithm is

P ��

V

Xi

Pi � ��

We shall use the notation of the Ising model but the same proof applies to much moregeneral models�

As explained in Section � the critical slowing down of the local algorithms �such assingle site heat bath arises from the fact that large regions in which the net magnetiza tion is positive or negative tend to move slowly� In particular the total magnetization ofthe lattice uctuates slowly� So it is natural to expect that that the total magnetizationM �

Pi �i will be a �slow mode�� Indeed let us compute an upper bound on the

Rayleigh quotient �� with f �M� The denominator is

�M� �I �,M � hM�i � hMi� � V � ��

�since hMi � � by symmetry� The numerator is

�M� �I � P M ��

V

Xi�j�k

��i� �I � Pj�k

��

V

Xi

��i� �I � Pi�i

� �

V

Xi

��i� �i

��

V

Xi

h��i i� � � ��

Here we �rst used properties �a and �c to deduce that ��i� �I � Pj�k � � unlessi � j � k and then used property �a to bound ��i� �I � Pi�i� It follows from ��and �� that

�� M�M��

V ��

and hence that

�int�M � V ��

��

��

Note however that here time is measured in hits of a single site� If we measure timein hits per site then we conclude that

��HPS�int�M � ��

�V � � ��

This proves a lower bound �� on the dynamic critical exponent z namelyz � �� Here � and are the static critical exponents for the susceptibility andcorrelation length respectively� Note that by the usual static scaling law �� and for most models � is very close to zero� So this argument almost makes rigorous�as a lower bound the heuristic argument that z ��

Virtually the same argument applies in fact to any reversible single site algorithm�e�g� Metropolis� The only di�erence is that the property � � Pi � I is replaced by�I � Pi � I so the bound on the numerator is a factor of � larger� Hence

��HPS�int�M � �

��

�V �

��

Example �� Swendsen�Wang algorithm� Recall that the SW algorithm simulates thejoint model �� by alternately applying the conditional distributions of fng given f�gand f�g given fng � that is by alternately generating new bond occupation variables�independent of the old ones given the spins and new spin variables �independent ofthe old ones given the bonds� The transition matrix PSW � PSW �f�� ng � f�� n�g istherefore a product

PSW � PbondPspin � ��

where Pbond is the conditional expectation operator E� � jf�g and Pspin is E� � jfng�We shall use the variational method with f chosen to be the bond density

f � N �Xhiji

nij � ��

To lighten the notation let us write � b � � i� j for a bond b � hiji� We then have

E�nbjf�g � pb � b ��

E�nbnb� jf�g � pb pb� � b � b� for b � b� ��

It follows from �� that

hnb�t � �nb��t � �i � hnb E�E�nb�jf�gjfngi� hnb E�nb�jf�gi� hE�nbjf�gE�nb�jf�gi� pb pb� h� b � b�i � ��

��

The corresponding truncated �connected correlation function is clearly

hnb�t � � �nb��t � �i � pb pb� h� b � � b� i � ��

where hA �Bi � hABi�hAihBi� We have thus expressed a dynamic correlation functionof the SW algorithm in terms of a static correlation function of the Potts model�

Now let us compute the same quantity at time lag �� by �� for b � b� we have

hnb�t � �nb��t � �i � pb pb� h� b � b� i ��

and hencehnb�t � � �nb��t � �i � pb pb� h� b � � b�i � ��

On the other hand for b � b� we clearly have

hnb�t � � �nb��t � �i � hnbi � hnbi�� pb h� bi � p�b h� bi� � ��

Consider now the usual case in which

pb �

�p for b � B� otherwise

��

for some family of bonds B� Combining �� and �� and summing overb� b� we obtain

hN �t � � �N �t � �i � p�hE � Ei ��

hN �t � � �N �t � �i � p�hE � Ei � p�� phEi ��

where E � �Pb�B � b � � is the energy� Hence the normalized autocorrelation function

at time lag � is exactly

NN �� hN �t � � �N �t � �ihN �t � � �N �t � �i � � � �� pE

pCH � �� pE� ��

where E � V ��hEi is the mean energy and CH � V ��hE � Ei is the speci�c heat �V isthe volume� At criticality p� pcrit � � and E � Ecrit � � so

NN �� const

CH

� ��

We now remark that although PSW � PbondPspin is not self adjoint the modi�edtransition matrix P �

SW � PspinPbondPspin is self adjoint �and positive semide�nite andN has the same autocorrelation function for both�

�N � �PbondPspintN � �N � �PspinPbondPspin

tN ��

��

It follows thatNN �t � NN ��

jtj � �� const�CHjtj ��

and hence�int�N � const � CH � ��

This proves a lower bound �� on the dynamic critical exponent zSW namely zSW �� Here � and are the static critical exponents for the susceptibility and correla tion length respectively� For the two dimensional Potts models with q � �� it isknown exactly �but non rigorously� that �� with multiplicative logarithmiccorrections for q � � �� The bound on zSW may be conceivably be sharp �or sharpup to a logarithm for q � � ��

Example �� Berretti�Sokal algorithm for SAWs� Let us consider the observable

f�� j�j �� N � ��

the total number of bonds in the walk� We have argued heuristically that � � hNi�� wecan now make this argument rigorous as a lower bound � Indeed it su�ces to use ��since the maximum change of j�j in a single time step is �� it follows that

�f� �I � P f � �

��

On the other hand the denominator of the Rayleigh quotient is

�f� �I �,f � hN�i � hNi� � ��

Assuming the usual scaling behavior �� for the cN we have

hNi �

��

hN�i �� ! �

��

asymptotically as � � �c � �� and hence

�int�N � const � hNi� � ��

A very di�erent approach to proving lower bounds on the autocorrelation time �expis the minimum hitting�time argument �� Consider a Markov chain with transitionmatrix P satisfying detailed balance for some probability measure �� If A and B aresubsets of the state space S let TAB be the minimum time for getting from A to B withnonzero probability i�e�

TAB � minfn� p�n�xy � � for some x � A� y � Bg ��

Then the theorem asserts that if TAB is large and this is not �justi�ed� by the rarity ofA and'or B in the equilibrium distribution � then the autocorrelation time �exp mustbe large� More precisely�

��

Theorem � Consider a Markov chain with transition matrix P satisfying detailed bal�ance for the probability measure �� Let TAB be dened as in �� Then

�exp � supA�B�S

� �TAB � �

� log ��A��B��

Proof� Let A�B � S and let n � TAB� Then by de�nition of TAB

��A� Pn�Bl��

Xx�A

y�B

�x p�n�xy � � � ��

On the other hand P� � P �� It follows that

��A � ��A�� P n��B � ��B� l�� A��B � ��

Now since P is a self�adjoint operator we have

kP n jn ��k � kP jn ��kn � Rn � ��

where R � e��exp is the spectral radius �� norm of P jn �� Hence by the Schwarzinequality ��A � ��A�� P n��B � � l��

�� Rn k�A � ��A�kl�� k�B � ��B�kl�� Rn ��A�� A��B�� B��

� Rn ��A��B��

Combining �� with �� and taking n � TAB � � we arrive after a little algebra at��

Remark� Using Chebyshev polynomials a stronger bound can be proven� roughlyspeaking �exp is bounded below by the square of the RHS of �� For details see ��Theorem ��

Let us apply this theorem to the BFACF algorithm for variable length �xed endpointSAWs� Let �� be a �xed short walk from � to x and let �n be a quasi rectangularwalk from � to x of linear size n� Then �� and ��n � � �n so that� log��n � n� On the other hand � and this is the key point � the minimumtime required to get from �n to �� or vice versa in the BFACF algorithm is of ordern� since the surface area spanned by �n �� can change by at most one unit in a localdeformation� Applying the theorem with A � f�ng and B � f��g we obtain

�exp � supn

� n�

� n� !� ��

��

As noted at the beginning of this lecture it is much more di�cult to prove upperbounds on the autocorrelation time � or even to prove that �exp � � and fewnontrivial results have been obtained so far�

The only general method �to my knowledge for proving upper bounds on the auto correlation time is Cheeger�s inequality �� the basic idea of which is to searchfor �bottlenecks� in the state space�� Consider �rst the rate of probability ow inthe stationary Markov chain from a set A to its complement Ac normalized by theinvariant probabilities of A and Ac�

k�A �P

x�A�y�Ac�x pxy

��A��Ac�

��A� P�Acl��A��Ac

� ��

Now look for the worst such decomposition into A and Ac�

k � infA� ��A��

k�A � ��

If for some set A the ow from A to Ac is very small compared to the invariant proba bilities of A and Ac then intuitively the Markov chain must have very slow convergenceto equilibrium �the sets A and Ac are �metastable�� For reversible Markov chains atrivial variational argument makes this intuition rigorous� just take f � �A� Whatis much more exciting is the converse� if there does not exist a set A for which the ow from A to Ac is unduly small then the Markov chain must have rapid convergenceto equilibrium in the sense that the modi�ed autocorrelation time � �exp is small� Theprecise statement is the following �� Theorem ��

Theorem � Let P be a transition probability matrix satisfying detailed balance for ��and let k be dened as above� Then

��log�� k

� � �exp ��

log�� k�

� ��

The proof is not di�cult but enough is enough � the interested reader can look itup in the original paper�

Since this is a two�sided bound we see that � �exp is �nite if and only if k � ��So in principle Cheeger�s inequality can be used to prove exponential convergence toequilibrium whenever it holds� But in practice it is almost impossible to control thein�mum �� over all sets A� The most tractable case seems to be when the state

�Note Added �� There is now an extensive literature �largely by probabilists and the�oretical computer scientists� on upper bounds on the autocorrelation time for Markov chains�based on inequalities of Cheeger� Poincar#e� Nash and log�Sobolev types� For reviews� see e�g��

��

space is a tree� then A can always be chosen so that it is connected to Ac by a singlebond� Using this fact Lawler and Sokal �� used Cheeger�s inequality to prove

� �exp � const � hNi��

for the Berretti Sokal algorithm for SAWs�A very di�erent proof of an upper bound on � �exp in the Berretti Sokal algorithm was

given by Sokal and Thomas �� Their method is to study in detail the exponentialmoments of the hitting times from an arbitrary walk � to the empty walk� Using asequence of identities they are able to write an algebraic inequality for such a momentin terms of itself� this inequality says roughly that the moment is either small or huge�where �huge� includes the possibility of ! but cannot lie in between �there is a�forbidden interval�� Then by a continuity argument they are able to rule out thepossibility that the moment is huge� So it must be small� The �nal result is

� �exp � const � hNi��

slightly better than the Lawler Sokal bound��

For spin models and lattice �eld theories almost nothing is known about upperbounds on the autocorrelation time except at high temperature� For the single siteheat bath dynamics it is easy to show that �exp � �uniformly in the volume abovethe Dobrushin uniqueness temperature� indeed this is precisely what the standard proofof the Dobrushin uniqueness theorem �� does� One expects the same result to holdfor all temperatures above critical but this remains an open problem despite recentprogress by Aizenman and Holley ��

Acknowledgments

Many of the ideas reported here have grown out of joint work with my colleaguesAlberto Berretti Frank Brown Sergio Caracciolo Robert Edwards Jonathan GoodmanTony Guttmann Greg Lawler Xiao Jian Li Neal Madras Andrea Pelissetto LarryThomas and Dan Zwanziger� I thank them for many pleasant and fruitful collaborations�In addition I would like to thank Mal Kalos for introducing me to Monte Carlo workand for patiently answering my �rst naive questions�

The research of the author was supported in part by the U�S� National Science Foun dation grants DMS�� and DMS�� the U�S� Department of Energy con tract DE FG�� ER�� the John Von Neumann National Supercomputer Centery

��Note Added �� See also �� which recovers the bound �� by a much simpler argumentusing the Poincar#e inequality ��

��Note Added �� There has been great progress on this problem in recent years� see �� and references cited therein�

yDeceased�

��

grant NAC�� and the Pittsburgh Supercomputing Center grant PHY��P� Ac knowledgment is also made to the donors of the Petroleum Research Fund administeredby the American Chemical Society for partial support of this research�

References

�� N� Bakhvalov M�ethodes Num�eriques �MIR Moscou �� Chapter ��

�� J�M� Hammersley and D�C� HandscombMonte Carlo Methods �Methuen London�� Chapter ��

�� W�W� Wood and J�J� Erpenbeck Ann� Rev� Phys� Chem� ��

�� J� G� Kemeny and J� L� Snell Finite Markov Chains �Springer New York ��

�� M� Iosifescu Finite Markov Processes and Their Applications �Wiley Chichester��

�� K� L� Chung Markov Chains with Stationary Transition Probabilities �nd ed��Springer New York ��

�� E� Nummelin General Irreducible Markov Chains and Non�Negative Operators�Cambridge Univ� Press Cambridge ��

�� D� Revuz Markov Chains �North Holland Amsterdam ��

�� Z� -Sidak Czechoslovak Math� J� ��

�� A� D� Sokal and L� E� Thomas J� Stat� Phys� � ��

�� P�C� Hohenberg and B�I� Halperin Rev� Mod� Phys� ��

�� S� Caracciolo and A�D� Sokal J� Phys� A�� L��

�� D� Goldsman Ph�D� thesis School of Operations Research and Industrial Engi neering Cornell University �� L� Schruben Oper� Res� � �� and��

�� M�B� Priestley Spectral Analysis and Time Series � vols� �Academic London�� Chapters � ��

�� T�W� Anderson The Statistical Analysis of Time Series �Wiley New York ��

�� N� Madras and A�D� Sokal J� Stat� Phys� ��

��

�� N� Metropolis A�W� Rosenbluth M�N� Rosenbluth A�H� Teller and E� Teller J�Chem� Phys� ��

�� W�K� Hastings Biometrika � ��

�� J� Goodman and N� Madras Lin� Alg� Appl� ��

�� J� Goodman and A�D� Sokal Phys� Rev� D� ��

�� G�F� Mazenko and O�T� Valls Phys� Rev� B�� C� Kalle J� Phys�A�� L�� J�K� Williams J� Phys� A�� R�B� Pearson J�L�Richardson and D� Toussaint Phys� Rev� B�� S� Wansleben andD�P� Landau J� Appl� Phys� ��

�� M�N� Barber in Phase Transitions and Critical Phenomena vol� � ed� C� Domband J�L� Lebowitz �Academic Press London ��

�� K� Binder J� Comput� Phys� � � ��

�� G� Parisi in Progress in Gauge Field Theory �� Carg�ese lectures ed� G� �tHooft et al� �Plenum New York �� G�G� Batrouni G�R� Katz A�S� KronfeldG�P� Lepage B� Svetitsky and K�G� Wilson Phys� Rev� D�� C�Davies G� Batrouni G� Katz A� Kronfeld P� Lepage P� Rossi B� Svetitsky andK� Wilson J� Stat� Phys� �� J�B� Kogut Nucl� Phys� B�� FS�� S� Duane R� Kenway B�J� Pendleton and D� Roweth Phys� Lett� ��B�� E� Dagotto and J�B� Kogut Phys� Rev� Lett� � �� and Nucl�Phys� B�� FS��

�� J� Goodman and A�D� Sokal Phys� Rev� Lett� � ��

�� R�G� Edwards J� Goodman and A�D� Sokal Nucl� Phys� B��

�� R�H� Swendsen and J� S� Wang Phys� Rev� Lett� � ��

�� R�P� Fedorenko Zh� Vychisl� i Mat� Fiz� � �� USSR Comput� Math� andMath� Phys� � ��

�� L�P� Kadano� Physics � ��

�� K�G� Wilson Phys� Rev� B� ��

�� A� Brandt in Proceedings of the Third International Conference on NumericalMethods in Fluid Mechanics �Paris July �� ed� H� Cabannes and R� Temam�Lecture Notes in Physics .�� Springer Berlin �� R�A� Nicolaides J� Com put� Phys� �� and Math� Comp� �� A� Brandt Math�Comp� �� W� Hackbusch in Numerical Treatment of Di�erentialEquations �Oberwolfach July �� ed� R� Bulirsch R�D� Griegorie� and J�Schr+oder �Lecture Notes in Mathematics .�� Springer Berlin ��

��

�� W� Hackbusch Multi�Grid Methods and Applications �Springer Berlin ��

�� W� Hackbusch and U� Trottenberg editors Multigrid Methods �Lecture Notesin Mathematics .�� Springer Berlin �� Proceedings of the First CopperMountain Conference on Multigrid Methods ed� S� McCormick and U� Trotten berg Appl� Math� Comput� �� no� � � �� D�J� Paddon and H� Hol stein editors Multigrid Methods for Integral and Di�erential Equations �Claren don Press Oxford �� Proceedings of the Second Copper Mountain Conferenceon Multigrid Methods ed� S� McCormick Appl� Math� Comput� �� no� � � � �� W� Hackbusch and U� Trottenberg editors Multigrid Methods II �LectureNotes in Mathematics .�� Springer Berlin �� S�F� McCormick editorMultigrid Methods �SIAM Philadelphia �� Proceedings of the Third CopperMountain Conference on Multigrid Methods ed� S� McCormick �Dekker NewYork ��

�� W�L� Briggs A Multigrid Tutorial �SIAM Philadelphia ��

�� R�S� Varga Matrix Iterative Analysis �Prentice Hall Englewood Cli�s N�J��

�� H�R� Schwarz H� Rutishauer and E� Stiefel Numerical Analysis of SymmetricMatrices �Prentice Hall Englewood Cli�s N�J� ��

�� A�M� Ostrowski Rendiconti Mat� e Appl� �� at p� ��n �A� OstrowskiCollected Mathematical Papers vol� � �Birkha+user Basel �� p� ��n��

�� J�M� Ortega and W�C� Rheinboldt Iterative Solution of Nonlinear Equations inSeveral Variables �Academic Press New York�London ��

�� S�F� McCormick and J� Ruge Math� Comp� ��

�� J� Mandel S� McCormick and R� Bank inMultigrid Methods ed� S�F� McCormick�SIAM Philadelphia �� Chapter ��

�� J� Goodman and A�D� Sokal unpublished�

�� R�G� Edwards S�J� Ferreira J� Goodman and A�D� Sokal Nucl� Phys� B��

�� S�L� Adler Phys� Rev� Lett� � ��

�� S�L� Adler Phys� Rev� D��

�� C� Whitmer Phys� Rev� D��

�� E� Nelson in Constructive Quantum Field Theory ed� G� Velo and A� Wightman�Lecture Notes in Physics .�� Springer Berlin ��

��

�� B� Simon The P �� Euclidean Quantum� Field Theory �Princeton Univ� PressPrinceton N�J� �� Chapter I�

�� R�B� Potts Proc� Camb� Phil� Soc� ��

�� F�Y� Wu Rev� Mod� Phys� � �� E ��

�� K� Krickeberg Probability Theory �Addison Wesley Reading Mass� �� Sec tions �� and ��

�� R�H� Schonmann J� Stat� Phys� � ��

�� A�D� Sokal unpublished�

�� E�M� Reingold J� Nievergelt and N� Deo Combinatorial Algorithms� Theory andPractice �Prentice Hall Englewood Cli�s N�J� �� Chapter �� A� GibbonsAlgorithmic Graph Theory �Cambridge University Press �� Chapter �� S�Even Graph Algorithms �Computer Science Press Potomac Maryland ��Chapter ��

�� B�A� Galler and M�J� Fischer Commun� ACM � �� D�E� KnuthThe Art of Computer Programming vol� � �nd ed� �Addison Wesley ReadingMassachusetts �� pp� �� J� Hoshen and R� Kopelman Phys�Rev� B��

�� K� Binder and D� Stau�er in Applications of the Monte Carlo Method in StatisticalPhysics ed� K� Binder �Springer Verlag Berlin �� section �� D� Stau�erIntroduction to Percolation Theory �Taylor / Francis London ��

�� Y� Shiloach and U� Vishkin J� Algorithms � ��

�� R�G� Edwards X� J� Li and A�D� Sokal Sequential and vectorized algorithms forcomputing the connected components of an undirected graph unpublished �anduncompleted notes�

�� P�W� Kasteleyn and C�M� Fortuin J� Phys� Soc� Japan �� Suppl� �� C�M� Fortuin and P�W� Kasteleyn Physica � �� C�M� Fortuin Physica� �� C�M� Fortuin Physica � ��

�� C� K� Hu Phys� Rev�B�� T�A� Larsson J� Phys� A�� and A� ��

�� R�G� Edwards and A�D� Sokal Phys� Rev� D��

�� A�D� Sokal in Computer Simulation Studies in Condensed Matter Physics� RecentDevelopments �Athens Georgia �� February �� ed� D�P� Landau K�K�Mon and H� B� Sch+uttler �Springer Verlag Berlin�Heidelberg ��

��

�� X� J� Li and A�D� Sokal Phys� Rev� Lett� ��

�� W� Klein T� Ray and P� Tamayo Phys� Rev� Lett� ��

�� T�S� Ray P� Tamayo and W� Klein Phys� Rev� A��

�� U� Wol� Phys� Rev� Lett� ��

�� U� Wol� Phys� Lett� B�� P� Tamayo R�C� Brower and W� KleinJ� Stat� Phys� � �� and � ��

�� D� Kandel E� Domany D� Ron A� Brandt and E� Loh Phys� Rev� Lett� � �� D� Kandel E� Domany and A� Brandt Phys� Rev� B� ��

�� R�G� Edwards and A�D� Sokal Swendsen Wang algorithm augmented by dualitytransformations for the two dimensional Potts model in preparation�

�� R� Ben Av D� Kandel E� Katznelson P�G� Lauwers and S� Solomon J� Stat�Phys� � �� R�C� Brower and S� Huang Phys� Rev� B��

�� F� Niedermayer Phys� Rev� Lett� ��

�� R�C� Brower and P� Tamayo Phys� Rev� Lett� ��

�� U� Wol� Nucl� Phys� B�� Phys� Lett� B�� Nucl�Phys� B��

�� R�G� Edwards and A�D� Sokal Phys� Rev� D� ��

�� P�G� DeGennes Scaling Concepts in Polymer Physics �Cornell Univ� Press IthacaNY ��

�� C� Arag*ao de Carvalho S� Caracciolo and J� Fr+ohlich Nucl� Phys� B�� FS��

�� A� Berretti and A�D� Sokal J� Stat� Phys� � ��

�� G� Lawler and A�D� Sokal Trans� Amer� Math� Soc� ��

�� G� Lawler and A�D� Sokal in preparation�

�� B� Berg and D� Foerster Phys� Lett� ��B ��

�� C� Arag*ao de Carvalho and S� Caracciolo J� Physique ��

�� A� D� Sokal and L� E� Thomas J� Stat� Phys� � ��

�� S� Caracciolo A� Pelissetto and A�D� Sokal J� Stat� Phys� � � ��

��

�� M� Delbr+uck in Mathematical Problems in the Biological Sciences �Proc� Symp�Appl� Math� vol� �� ed� R�E� Bellman �American Math� Soc� Providence RI��

�� P�H� Verdier and W�H� Stockmayer J� Chem� Phys� ��

�� N� Madras and A�D� Sokal J� Stat� Phys� ��

�� M� Lal Molec� Phys� ��

�� B� MacDonald et al� J� Phys� A��

�� K� Kawasaki Phys� Rev� ��

�� B�I� Halperin Phys� Rev� B� ��

�� R� Alicki M� Fannes and A� Verbeure J� Stat� Phys� ��

�� M�P�M� den Nijs J� Phys� A�� J�L� Black and V�J� Emery Phys�Rev� B�� B� Nienhuis J� Stat� Phys� ��

�� A� Sinclair and M� Jerrum Information and Computation ��

�� R� Brooks P� Doyle and R� Kohn in preparation�

�� L�N� Vasershtein Prob� Inform� Transm� ��

�� O�E� Lanford III in Statistical Mechanics and Mathematical Problems �LectureNotes in Physics .�� ed� A� Lenard �Springer Berlin ��

�� M� Aizenman and R� Holley in Percolation Theory and Ergodic Theory of InniteParticle Systems ed� H� Kesten �Springer New York ��

ADDED BIBLIOGRAPHY �DECEMBER ��

�� M� L+uscher P� Weisz and U� Wol� Nucl� Phys� B��

�� J� K� Kim Phys� Rev� Lett� � �� Nucl� Phys� B �Proc� Suppl� �� Phys� Rev� D �� Europhys� Lett� �� Phys�Lett� B��

�� S� Caracciolo R�G� Edwards S�J� Ferreira A� Pelissetto and A�D� Sokal Phys�Rev� Lett� ��

�� S�J� Ferreira and A�D� Sokal Phys� Rev� B� ��

�� S� Caracciolo R�G� Edwards A� Pelissetto and A�D� Sokal Phys� Rev� Lett� ��

��

�� G� Mana A� Pelissetto and A�D� Sokal Phys� Rev� D� R��

�� G� Mana A� Pelissetto and A�D� Sokal hep�lat�� submitted to Phys�Rev� D�

�� C� R� Hwang and S� J� Sheu in Stochastic Models� Statistical Methods� and Al�gorithms in Image Analysis �Lecture Notes in Statistics .�� ed� P� BaroneA� Frigessi and M� Piccioni �Springer Berlin ��

�� B� Gidas private communication ��

�� T� Mendes and A�D� Sokal Phys� Rev� D� ��

�� G� Mana T� Mendes A� Pelissetto and A�D� Sokal Nucl� Phys� B �Proc� Suppl��

�� T� Mendes and A� Pelissetto and A�D� Sokal Nucl� Phys� B��

�� A�D� Sokal Nucl� Phys� B �Proc� Suppl� � ��

�� J� S� Wang and R�H� Swendsen Physica A��

�� C�F� Baillie and P�D� Coddington Phys� Rev� Lett� �� and privatecommunication�

�� J� Salas and A�D� Sokal J� Stat� Phys� � ��

�� J� Salas and A�D� Sokal Dynamic critical behavior of the Swendsen�Wang algo rithm� The two dimensional � state Potts model revisited hep�lat�� J�Stat� Phys� �� no� �'� �April ��

�� J� Salas and A�D� Sokal Logarithmic corrections and �nite size scaling in thetwo dimensional � State Potts model hep�lat�� submitted to J� Stat�Phys�

�� G�M� Torrie and J�P� Valleau Chem� Phys� Lett� �� J� Comput� Phys�� J� Chem� Phys� ��

�� B�A� Berg and T� Neuhaus Phys� Lett� B�� Phys� Rev� Lett� ��

�� B�A� Berg Int� J� Mod� Phys� C� ��

�� A�D� Kennedy Nucl� Phys� B �Proc� Suppl� � ��

�� W� Janke in Computer Simulations in Condensed Matter Physics VII eds� D�P�Landau K�K� Mon and H� B� Sch+uttler �Springer Verlag Berlin ��

�� S� Wiseman and E� Domany Phys� Rev� E ��

��

�� D� Kandel R� Ben Av and E� Domany Phys� Rev� Lett� � �� and Phys�Rev� B� ��

�� J� S� Wang R�H� Swendsen and R� Koteck0y Phys� Rev� Lett� ��

�� J� S� Wang R�H� Swendsen and R� Koteck0y Phys� Rev� B��

�� M� Lubin and A�D� Sokal Phys� Rev� Lett� ��

�� X� J� Li and A�D� Sokal Phys� Rev� Lett� ��

�� S� Caracciolo R�G� Edwards A� Pelissetto and A�D� Sokal Nucl� Phys� B��

�� A�D� Sokal in Monte Carlo and Molecular Dynamics Simulations in PolymerScience ed� K� Binder �Oxford University Press New York ��

�� A�D� Sokal Nucl� Phys� B �Proc� Suppl� ��

�� B� Li N� Madras and A�D� Sokal J� Stat� Phys� � ��

�� A�D� Sokal Europhys� Lett� ��

�� P� Diaconis and D� Stroock Ann� Appl� Prob� � ��

�� A�J� Sinclair Randomised Algorithms for Counting and Generating CombinatorialStructures �Birkh+auser Boston ��

�� P� Diaconis and L� Salo� Coste Ann� Appl� Prob� � ��

�� P� Diaconis and L� Salo� Coste Ann� Appl� Prob� � ��

�� P� Diaconis and L� Salo� Coste J� Theoret� Prob� � ��

�� M� Jerrum and A� Sinclair in Approximation Algorithms for NP�Hard Problemsed� D�S� Hochbaum �PWS Publishing Boston ��

�� D� Randall and A�J� Sinclair in Proceedings of the �th Annual ACM�SIAM Sym�posium on Discrete Algorithms �ACM Press New York ��

�� D�W� Stroock and B� Zegarlinski J� Funct� Anal� ��

�� D�W� Stroock and B� Zegarlinski Commun� Math� Phys� ��

�� D�W� Stroock and B� Zegarlinski Commun� Math� Phys� ��

�� S�L� Lu and H�T� Yau Commun� Math� Phys� ��

�� F� Martinelli and E� Olivieri Commun� Math� Phys� ��

�� D�W� Stroock and B� Zegarlinski J� Stat� Phys� ��

��

monte carlo methods in statistical mechanics: foundations and

Documents