markov chains - ucla department of mathematicsbiskup/275b.1.13w/pdfs/chapter5.pdf · markov chains...

21
Chapter 6 Markov Chains 6.1 Existence and notation Along with the discussion of martingales, we have introduced the concept of a discrete-time stochastic process. In this chapter we will study a particular class of such stochastic processes called Markov chains. Informally, a Markov chain is a discrete-time stochastic process for which, given the present state, the future and the past are independent. The formal definition is as follows: Definition 6.1 Consider a probability space (W, F , P) with filtration (F n ) n0 and let (S , B(S )) be a standard Borel space. Let ( X n ) n0 be an S -valued stochastic process adapted to (F n ). We call this process a Markov chain with state space S if for ev- ery B 2 B(S ) and every n 0, P( X n+1 2 B|F n )= P( X n+1 2 B| X n ), P-a.s. (6.1) Here and henceforth, P(·| X n ) abbreviates P(·|s( X n ))—the conditional distributions exist by Theorem 4.25. The relation (6.1) expresses the fact that, in order to know the future, we only need to know the present state. In practical situations, in order to prove that (6.1) is true, we will typically have to calculate P( X n+1 2 B|F n ) and show that it depends only on X n . Since s( X n ) F n , eq. (6.1) then follows by the “smaller always wins” principle for conditional expectations. If we turn the above definition around, we can view (6.1) as a way to define the Markov chain. Indeed, suppose we know the distribution of X 0 . Then (6.1) allows us to calculate the joint distribution of ( X 0 , X 1 ). Similarly, if we know the distri- bution of ( X 0 ,..., X n ) the above lets us calculate ( X 0 ,..., X n+1 ). The object on the right-hand side of (6.1), when viewed as a measure-valued function of X n , then falls into the following category: Definition 6.2 A function p : S B(S ) ! [0, 1] is called a transition probability if (1) B 7 ! p( x, B) is a probability measure on B(S ) for all x 2 S . 103

Upload: dangnhu

Post on 06-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

Chapter 6

Markov Chains

6.1 Existence and notation

Along with the discussion of martingales, we have introduced the concept of adiscrete-time stochastic process. In this chapter we will study a particular class ofsuch stochastic processes called Markov chains. Informally, a Markov chain is adiscrete-time stochastic process for which, given the present state, the future andthe past are independent. The formal definition is as follows:

Definition 6.1 Consider a probability space (W, F , P) with filtration (Fn

)n�0 and let

(S , B(S)) be a standard Borel space. Let (X

n

)n�0 be an S-valued stochastic process

adapted to (Fn

). We call this process a Markov chain with state space S if for ev-

ery B 2 B(S) and every n � 0,

P(X

n+1 2 B|Fn

) = P(X

n+1 2 B|Xn

), P-a.s. (6.1)

Here and henceforth, P(·|Xn

) abbreviates P(·|s(X

n

))—the conditional distributions exist

by Theorem 4.25.

The relation (6.1) expresses the fact that, in order to know the future, we only needto know the present state. In practical situations, in order to prove that (6.1) is true,we will typically have to calculate P(X

n+1 2 B|Fn

) and show that it depends onlyon X

n

. Since s(X

n

) ⇢ Fn

, eq. (6.1) then follows by the “smaller always wins”principle for conditional expectations.If we turn the above definition around, we can view (6.1) as a way to define theMarkov chain. Indeed, suppose we know the distribution of X0. Then (6.1) allowsus to calculate the joint distribution of (X0, X1). Similarly, if we know the distri-bution of (X0, . . . , X

n

) the above lets us calculate (X0, . . . , X

n+1). The object on theright-hand side of (6.1), when viewed as a measure-valued function of X

n

, thenfalls into the following category:

Definition 6.2 A function p : S ⇥B(S) ! [0, 1] is called a transition probability if

(1) B 7! p(x, B) is a probability measure on B(S) for all x 2 S .

103

Page 2: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

104 CHAPTER 6. MARKOV CHAINS

(2) x 7! p(x, B) is B(S)-measurable for all B 2 B(S).

Our first item of concern is the existence of a Markov chain with given transitionprobabilities and initial distribution:

Theorem 6.3 Let (p

n

)n�0 be a sequence of transition probabilities and let µ be a probabil-

ity measure on B(S). Then there exists a unique probability measure Pµ

on the measurable

space (SN0 , B(SN0)), where N0 = N [ {0}, such that w 7! X

n

(w) = w

n

is a Markov

chain with respect to the filtration Fn

= s(X0, . . . , X

n

) and such that for all B 2 B(S),

(X0 2 B) = µ(B) (6.2)

and

(X

n+1 2 B|Xn

) = p

n

(X

n

, B), Pµ

-a.s. (6.3)

for all n � 0. In other words, (X

n

) defined by the coordinate maps on (SN0 , B(SN0), Pµ

)is a Markov chain with transition probabilities (p

n

) and initial distribution µ.

Proof. This result is a direct consequence of Kolmogorov Extension Theorem. In-deed, recall that F is the least B(SN0)-algebra containing all sets A0 ⇥ · · ·⇥ A

n

⇥S ⇥ . . . where A

j

2 B(S). We will define a consistent family of probability mea-sures on the measurable space (Sn, B(Sn)) by putting

P(n)µ

(A0 ⇥ · · ·⇥ A

n

) =Z

A0

µ(dx0)Z

A1

p0(x0, dx1) . . .Z

A

n

p

n�1(x

n�1, dx

n

), (6.4)

and extending this to B(Sn). It is easy to see that these measures are consistentbecause if A 2 B(Sn) then

P(n+1)µ

(A⇥ S) = P(n)µ

(A). (6.5)

By Kolmogorov’s Extension Theorem, there exists a unique probability measure Pµ

on the infinite product space (SN0 , B(SN0)) such that Pµ

(A⇥ S ⇥ . . . ) = P(n)µ

(A)for every n � 0 and every A 2 B(Sn).It remains to show that the coordinate maps X

n

(w) = w

n

define a Markov chainsuch that (6.2–6.3) hold true. The proof of (6.2) is easy:

(X0 2 B) = Pµ

(B⇥ S ⇥ . . . ) = P(0)µ

(B) = µ(B). (6.6)

In order to prove (6.3), we claim that for all B 2 B(S) and all A 2 Fn

,

E(1{X

n+12B}1A

) = E�

p

n

(X

n

, B)1A

. (6.7)

To this end we first note that, by interpreting both sides as probability measureson B(SN0), it suffices to prove this just for A 2 F

n

of the form A = A0 ⇥ · · ·⇥A

n

⇥ S ⇥ . . . . But for such A we have

E(1{X

n+12B}1A

) = P(n+1)µ

(A0 ⇥ · · ·⇥ A

n

⇥ B) (6.8)

Page 3: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.2. EXAMPLES 105

which by inspection of (6.4)—it suffices to integrate the last coordinate—simplyequals E(1

A

p

n

(X

n

, B)) as desired.Once (6.7) is proved, it remains to note that p

n

(X

n

, B) is Fn

-measurable. Hencep

n

(X

n

, B) is a version of Pµ

(X

n+1 2 B|Fn

). But s(X

n

) ⇢ Fn

and since p

n

(X

n

, B)is s(X

n

)-measurable, the “smaller always wins” principle implies that (6.1) holds.Thus (X

n

) is a Markov chain satisfying (6.2–6.3) as we desired to prove.

While the existence argument was carried out in a “continuum” setup, the remain-der of this chapter will be specilized to the case when the X

n

’s can take only a count-able set of values with positive probability. Denoting the (countable) state spaceby S , the transition probabilities p

n

(x, dy) will become functions p

n

: S⇥S ! [0, 1]with the property

Ây2S

p

n

(x, y) = 1 (6.9)

for all x 2 S . (Clearly, p

n

(x, y) is an abbreviation of p

n

(x, {y}).) We will callsuch p

n

’s a stochastic matrix. Similarly, the initial distribution µ will become a func-tion µ : S ! [0, 1] with the property

Âx2S

µ(x) = 1. (6.10)

(Again, µ(x) abbreviates µ({x}).)The subindex n on the transition matrix p

n

(x, y) reflects the possibility that a differ-ent transition matrix is used at each step. This would correspond to time-inhomo-

geneous Markov chain. While this is sometimes a useful generalization, an over-whelming majority Markov chains that are ever considered are time-homogeneous.In light of Theorem 6.3, such Markov chain are then determined by two objects: theinitial distribution µ and the transition matrix p(x, y), satisfying (6.10) and (6.9) re-spectively. For the rest of this chapter we will focus on time-homogeneous Markovchains.We finish with a remark on general notation. As used before, the object P

µ

denotesthe Markov chain with initial distribution µ. If µ is the point mass at state x 2 S ,then we denote the resulting measure by P

x

.

6.2 Examples

We proceed by a list of examples of countable-state time-homogeneous Markovchains.

Example 6.4 SRW starting at x : Let S = Zd and, using x ⇠ y to denote that x and y

are nearest neighbors on Zd, let

p(x, y) =

(

12d

, if x ⇠ y,0, otherwise.

(6.11)

Page 4: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

106 CHAPTER 6. MARKOV CHAINS

Consider the measure Px

generated from the initial distribution µ(y) = d

x

(y) usingthe above transition matrix. As is easy to check, the resulting Markov chain issimply the simple random walk started at x.

Example 6.5 Galton-Watson branching process : Consider i.i.d. random variables (x

n

)n�1

taking values in N [ {0} and define the stochastic matrix p(n, m) by

p(n, m) = P(x1 + · · · + x

n

= m), n, m � 0. (6.12)

(Here we use the interpretation p(0, m) = 0 unless m = 0.) Let S = N [ {0} andlet P1 be the corresponding Markov chain started at x = 1. As is easy to verify, theresult is exactly the Galton-Watson branching process introduced in Chapter 5.

Example 6.6 Ehrenfest chain : In his study of convergence to equilibrium in thermo-dynamics (and perhaps in relation to the so called Maxwell’s daemon), Ehrenfestintroduced the following simple model: Consider two boxes with altogether m la-beled balls. At each time step we pick one ball at random and move it to the otherbox.To formulate the problem more precisely, we will only mark down the number ofballs in one of the boxes. Thus, the set of possible values—i.e., the state space—issimply S = {0, 1, . . . , m}. To calculate the transition probabilities, we note that thenumber of balls always changes only by one. The resulting probabilities are thus

p(n, n� 1) =n

m

and p(n, n + 1) =m� n

m

. (6.13)

Note that this automatically gives zero probability to the situation when there is noballs left or where all balls are in one box. The initial distribution can be whateveris of interest.

Example 6.7 Birth-death chain : A generalization of Ehrenfest chain is the situationwhere we think of a population evolving in discrete time steps. At each time onlyone of three things can happen: Either an individual is born or dies or nothinghappens. The state space of such chain will be S = N [ {0}. The transition proba-bilities will then be determined by three sequences (a

x

), (b

x

) and (g

x

) via

p(x, x + 1) = a

x

, p(x, x� 1) = b

x

, p(x, x) = g

x

. (6.14)

Clearly, we need that a

x

+ b

x

+ g

x

= 1 for all z 2 S and, since the number ofindividuals is always non-negative, we also require that b0 = 0.

Example 6.8 Stochastic matrix : An abstract example of a Markov chain arises when-ever we are given a square stochastic matrix. For instance,

p =

3/4 1/41/5 4/5

!

(6.15)

defines a Markov chain on S = {0, 1} with the matrix elements corresponding tothe values of p(x, y).

Page 5: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.2. EXAMPLES 107

Example 6.9 Random walk on a graph : Consider a graph G = (V, E), where V isthe countable set—the vertex set—and E ⇢ V ⇥ V is a binary relation—the edge set

of G. We will assume that G is unoriented, i.e., E is symmetric, and that there areno self-loops, i.e., E is antireflexive. Let d(x) denote the degree of vertex x which issimply the number of y 2 V with (x, y) 2 E—i.e., the number of neighbors of x.Suppose that there are no isolated vertices, which means that d(x) > 0 for all x.We will define a random walk on V as a Markov chain on S = V with transitionprobabilities

p(x, y) =a

xy

d(x), (6.16)

where (a

xy

)x,y2V

is the adjacency matrix,

a

xy

=

(

1, if (x, y) 2 E,0, otherwise.

(6.17)

It is easy to verified that Ây2V

a

xy

= d(x) and so p(x, y) is indeed a stochasticmatrix. (Here we assumed without saying that G is locally finite, i.e., d(x) < • forall x 2 V.)This extends the “usual” simple random walk on Zd—which we defined in termsof sums of i.i.d. random variables—to any locally finite graph. The initial distribu-tion is typically concentrated at one point; namely, the starting point of the walk,see Example 6.4.

Example 6.10 Simple exclusion : Next we add a twist to the previous example. Con-sider a system of particles on the vertices of a finite graph G and assume that therecan be at most one particle at each site—an exclusion constraint. If it weren’t forthe exclusion constraint, the particles would like to perform independent randomwalks. With the constraint in place, at each unit of time a random particle attemptsa jump to a randomly chosen neighbor. If the neighbor is empty, the move is ac-cepted, if it is occupied, the move is discarded.We will define a Markov chain mimicing this process. The state space of the chainwill be S = {0, 1}V , i.e., the set of configurations of particles on V. At each unitof time, we pick an edge at random and interchange whatever there is at the end-points. (If both endpoints are either occupied or empty, this results in no change; ifone is occupied and the other empty, the “particle” at the occupied end is movedto the vacant end.) This algorithm is easy to implement on the computer; writingthe transition probability will require a little effort.Let h and h

0 be the states of the chain before and after the jump. We have selectedan edge e = (x, y) 2 E with probability 1/|E|, but the move could have taken h

to h

0 if and only if h

x

= h

0y

, h

y

= h

0x

and h

z

= h

0z

for z 6= x, y. Let E(h, h

0) denotethe set of edges (x, y) 2 E with these properties. Then the transition probability isgiven by

p(h, h

0) =|E(h, h

0)||E| . (6.18)

Page 6: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

108 CHAPTER 6. MARKOV CHAINS

We leave it as an exercise to show that p is really a transition matrix which boilsdown to showing that Â

h

02S |E(h, h

0)| = |E|.

Having amassed a bunch of examples, we begin investigating some general proper-ties of countable-state time-homogeneous Markov chains. (To keep the statementsof theorems and lemmas concise, we will not state these as our assumptions anymore.)

6.3 Stationary and reversible measures

The first question we will try to explore is that of stationarity. To motivate theforthcoming definitions, consider a Markov chain (X

n

)n�0 with transition proba-

bility p and X0 distributed according to µ0. As is easy to check, the law of X1 isthen described by measure µ1 which is computed by

µ1(y) = Âx2S

µ0(x)p(x, y). (6.19)

We are interested in the situation when the distribution of X1 is the same as thatof X0. This leads us to this, somewhat more general, definition:

Definition 6.11 A (positive) measure n on the state space S is called stationary if

n(y) = Âx2S

n(x)p(x, y), y 2 S . (6.20)

If n has total mass one we call it stationary distribution.

Remark 6.12 While we allow ourselves to consider measures µ on S that are notnormalized, we will always assume that the measure assigns finite mass to everyelement of S .

Clearly, once the laws of X0 and X1 are the same, then all X

n

’s have the same law(provided the chain is time-homogeneous). Let us find stationary measures for theEhrenfest and birth-death chains:

Lemma 6.13 Consider the Ehrenfest chain with state space S = {0, 1, . . . , m} and let

the transition matrix be as in (6.13). Then

n(k) =✓

m

k

2�m, k = 0, . . . , m, (6.21)

is a stationary distribution.

Proof. We have to show that n satisfies (6.20). First we note that

Âk2S

n(k)p(k, l) = n(l � 1)p(l � 1, l) + n(l + 1)p(l + 1, l). (6.22)

Page 7: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.3. STATIONARY AND REVERSIBLE MEASURES 109

Then we calculate

rhs of (6.22) = 2�m

h

m

l � 1

m� l + 1m

+✓

m

l + 1

l + 1m

i

= 2�m

m

l

h

l

m� l + 1m� l + 1

m

+m� l

l + 1l + 1

m

i

.

(6.23)

The proof is finished by noting that, after a cancellation, the bracket is simply one.

Concerning the birth-death chain, we state the following:

Lemma 6.14 Consider the birth-death chain on N[ {0} characterized by sequences (a

n

),

(b

n

) and (g

n

), cf (6.14). Suppose that b

n

> 0 for all n � 1. Then n(0) = 1 and

n(n) =n

’k=1

a

k�1b

k

, n � 1, (6.24)

defines a stationary measure of the chain.

In order to prove this lemma, we will introduce the following interesting concept:

Definition 6.15 A measure n of a Markov chain on S is called reversible if

n(x)p(x, y) = n(y)p(y, x) (6.25)

holds for all x, y 2 S .

As we will show later, reversibility owes its name to the fact that, if we run thechain backwards (whatever this means for now) starting from n, we would get thesame Markov chain. A simple consequence of reversibility is stationarity:

Lemma 6.16 A reversible measure is automatically stationary.

Proof. We just have to sum both sides of (6.25) over y. Since p(x, y) is stochastic, theleft hand side produces n(x) while the right-hand side gives Â

y2S n(y)p(y, x).

Equipped with these observation, the proof of Lemma 6.14 is a piece of cake:

Proof of Lemma 6.14. We claim that n in (6.24) is reversible. To prove this we observethat (6.24) implies for all k � 0

n(k + 1) = n(k)a

k

b

k+1= n(k)

p(k, k + 1)p(k + 1, k)

. (6.26)

This shows that (6.25) holds for x and y differing by one. The case when x = y

is always satisfied and in the remaining cases p(x, y) = 0, so (6.25) is proved ingeneral. Hence n is reversible and thus stationary.

Page 8: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

110 CHAPTER 6. MARKOV CHAINS

Remark 6.17 Recall that stationary measures need not be finite and thus the exis-tence of a stationary measure does not imply the existence of a stationary distribu-tion. (The distinction is really whether finite or infinite because a finite stationarymeasure can always be normalized.) For the Ehrenfest chain we immediately pro-duced a stationary distribution. However, for the birth-death chain the questionwhether n is finite or infinite depends sensitively on the asymptotic properties ofthe ratio a

k+1/b

k

.

Next we will address the underlying meaning of the reversible measure. We willdo this by showing that reversing a Markov chain we obtain another Markov chain,which in the reversible situation will be the same as the original chain.

Theorem 6.18 Consider a Markov chain (X

n

)n�0 started from a stationary initial distri-

bution µ and transition matrix p(x, y). Fix N large and for n = 1, . . . , N, let

Y

n

= X

N�n

. (6.27)

Then (Y

n

)N

n=0 is a (time-homogeneous) Markov chain—called the reversed chain—with

initial distribution µ and transition matrix q(x, y) defined by

q(x, y) =µ(y)µ(x)

p(y, x), µ(x) > 0, (6.28)

(The values q(x, y) for x such that µ(x) = 0 are immaterial since x will never be visited

starting from the initial distribution µ.)

Proof. Fix a collection of values y1, . . . , y

N

2 S and consider the probability P(Y

n

=y

n

, n = 1, . . . , N), where Y

n

are defined from X

n

as in (6.27). Then we have

P(Y

n

= y

n

, n = 0, . . . , N) = µ(y

N

)p(y

N

, y

N�1) . . . p(y1, y0). (6.29)

A simple argument now shows that either this probability vanishes or µ(y

k

) > 0for all k = 0, . . . , N. Since we do not care about sequences with zero probability,assuming the latter we can rewrite this as follows:

P(Y

n

= y

n

, n = 0, . . . , N) =µ(y

N

)µ(y

N�1)p(y

N

, y

N�1) . . .µ(y1)µ(y0)

p(y1, y0)µ(y0)

= µ(y0)q(y0, y1) . . . q(y

N�1, y

N

).

(6.30)

Hence (Y

n

) is a Markov chain with initial distribution µ and transition matrix q(x, y).

Clearly, a reversible measure µ would imply that q(x, y) = p(x, y), i.e., the dualchain (Y

n

) is identical to (X

n

). This allows us to extend any stationary Markovchain to negative infinity—into a two-sided sequence (X

n

)n2Z. As an additional

exercise we will apply these concepts to the random walk on a locally finite graph.

Page 9: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.4. EXISTENCE/UNIQUENESS OF STATIONARY MEASURES 111

Lemma 6.19 Consider a locally finite unoriented graph G = (V, E) and let d(x) denote

the degree of vertex x. Suppose that there are no isolated vertices, i.e., d(x) > 0 for every x.

Then

n(x) = d(x), x 2 V, (6.31)

is a reversible and hence stationary measure for the random walk on G.

Proof. We have p(x, y) = a

xy

/d(x), where a

xy

is the adjacency matrix. Since G isunoriented, the adjacency matrix is symmetric. This allows us to calculate

n(x)p(x, y) = d(x)a

xy

d(x)= a

xy

= a

yx

= d(y)a

yx

d(y)= n(y)p(y, x). (6.32)

Thus n is reversible and hence stationary.

Clearly, n is finite if and only if E is finite which by the fact that no vertex is isolatedimplies that V is finite. However, the measure n may not be unique. Indeed, if G

has two separate components, even the restriction of n to one of the componentwould be stationary. We proceed by analyzing the question of uniqueness (andexistence) of stationary measures.

6.4 Existence/uniqueness of stationary measures

As alluded to in the example of the random walk on a general graph, a simpleobstruction to uniqueness is when there are parts of the state space S for whichthe transition from one onto the other happens with zero probability. We define aproper name for this situation:

Definition 6.20 We call the transition matrix p (or the Markov chain itself) irreducibleif for all x, y 2 S there exists a number n such that

p

n(x, y) = Ây1,...,y

n�12Sp(x, y1)p(y1, y2) . . . p(y

n�1, y) > 0. (6.33)

The object p

n(x, y)—not to be confused with p

n

(x, y)—is the n-th power of thetransition matrix p. As is easy to check, p

n(x, y) simply equal the probability thatthe Markov chain started at x is at y at time n, i.e., P

x

(X

n

= y) = p

n(x, y).Irreducibility can be characterized in terms of a stopping time.

Lemma 6.21 Consider a Markov chain (X

n

)n�0 on S and let T

y

= inf{n � 1 : X

n

= y}be the first time the chain visits y (note that we do not count the initial state X0 in this

definition). Then the chain is irreducible if and only if for all x, y,

Px

(T

y

< •) > 0. (6.34)

Proof. This is a trivial consequence of the fact that Px

(X

n

= y) = p

n(x, y).

Page 10: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

112 CHAPTER 6. MARKOV CHAINS

However, irreducibility alone is not sufficient to guarantee the existence and unique-ness of a stationary measure. The principal concept here is recurrence, which wehave already encountered in the context of random walks:

Definition 6.22 A state x 2 S is called recurrent if Px

(T

x

< •) = 1. A Markov chain

is recurrent if every state is recurrent.

Theorem 6.23 Consider an irreducible Markov chain (X

n

)n�0 and let x 2 S be a recur-

rent state. Then

n

x

(y) = Ex

T

x

Ân=1

1{X

n

=y}

= Ân�1

Px

(X

n

= y, T

x

� n) (6.35)

is finite for all y 2 S and defines a stationary measure on S . Moreover, any other stationary

measure is a multiple of n

x

.

The crux of the proof and the principal reason why we need recurrence is becauseof the following observation: If T

x

< • almost surely, we can also write n

x

asfollows:

n

x

(y) = Ex

T

x

�1

Ân=0

1{X

n

=y}

= Ân�1

Px

(X

n�1 = y, T

x

� n). (6.36)

The first equality comes from the fact that if y 6= x then X

n

6= y for n = 0 and n = T

x

anyway, while if y = x then the sum in the first expectation in (6.35) and (6.36)equals one in both cases. The second equality in (6.36) follows by a convenientrelabeling.

Proof of existence. Let x 2 S be a recurrent state. Then

Px

(X

n

= y, T

x

� n) = Âz2S

Px

(X

n�1 = z, X

n

= y, T

x

� n)

= Âz2S

Ex

Px

(X

n

= y|Fn�1)1{X

n�1=z}1{T

x

�n}�

= Âz2S

p(z, y)Px

(X

n�1 = z, T

x

� n),

(6.37)

where we used that {X

n�1 = z} and {T

x

� n} are both Fn�1-measurable to derive

the second line. The third line is a consequence of the fact that on {X

n�1 = z} wehave that P

x

(X

n

= y|Fn�1) = P

x

(X

n

= y|Xn�1) = p(z, y).

Summing the above over n � 1, applying discrete Fubini (everything is positive)and invoking (6.35) on the left and (6.36) on the right-hand side gives us n

x

(y) =Â

z2S n

x

(z)p(z, y). It remains to show that n

x

(y) < • for all y 2 S . First notethat n

x

(x) = 1 by definition. Next we note that we actually have

n

x

(x) = Âz2S

n

x

(z)p

n(z, x) � n

x

(y)p

n(y, x) (6.38)

for all n � 1 and all y 2 S . Thus, n

x

(y) < • whenever p

n(y, x) > 0. By irreducibil-ity this will happen for some n for every y 2 S and so n

x

(y) < • for all y 2 S .Hence n

x

is a stationary measure.

Page 11: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.5. STRONG MARKOV PROPERTY 113

The proof of uniqueness provides some motivation for how n

x

was constructed:

Proof of uniqueness. Suppose x is a recurrent state and let n

x

be the stationary mea-sure in (6.35). Let µ be another stationary measure (we require that µ(y) < • forall y 2 S even though, as we will see, it is enough to assume that µ(x) < •).Stationarity of µ can also be written as

µ(y) = µ(x)p(x, y) + Âz 6=x

µ(z)p(z, y). (6.39)

Plugging this for µ(z) in the second term and iterating gives us

µ(y) = µ(x)h

p(x, y) + Âz 6=x

p(x, z)p(z, y) + · · · Âz1,...,z

n

6=x

p(x, z1) . . . p(z

n

, y)i

+ Âz0,...,z

n

6=x

µ(z0)p(z0, z1) . . . p(z

n

, y). (6.40)

We would like to pass to the limit and conclude that the last term tends to zero.However, a direct proof of this appears unlikely and so we proceed by using in-equalities. Noting that the k-th term in the bracket equals P

x

(X

k

= y, T

x

� k), wehave

µ(y) � µ(x)n+1

Âk=1

Px

(X

k

= y, T

x

� k) �!n!•

µ(x)n

x

(y). (6.41)

In particular, we have µ(y) � µ(x)n

x

(y) for all y 2 S .Our goal is to show that equality holds. Suppose that for some x, y we have µ(y) >µ(x)n

x

(y). By irreducibility, there exists n � 1 such that p

n(y, x) > 0 and so

µ(x) = Âz2S

µ(z)p

n(z, x) > µ(x) Âz2S

n

x

(z)p(z, x) = µ(x)n

x

(x) = µ(x), (6.42)

a contradiction. So µ(y) = µ(x)n

x

(y), i.e., µ is a rescaled version of n

x

.

6.5 Strong Markov property

In the proof of existence and uniqueness of the stationary measure we have barelytouched upon the recurrence property. Before we delve deeper into that subject—which is what we will do in Section 6.6—let us state and prove an interesting con-sequence of the definition of Markov chain.In order to be more explicit about the whole setup, suppose that our Markov chainis defined on the measurable space (W, F ), where W = SN0 and F is a product s-algebra. Then we can represent X

n

using the coordinate map X

n

(w) = w

n

. Recallthe definition of the shift operator q which acts on sequences w by (qw)

n

= w

n+1.For any n � 1 we define q

n to be the n-fold composition of q, i.e., (q

n

w)k

= w

k+n

.If N is a stopping time of the filtration F

n

= s(X0, . . . , X

n

), then we let q

N to be theoperator q

n on {N = n}. On {N = •} we leave q

N undefined.

Page 12: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

114 CHAPTER 6. MARKOV CHAINS

Theorem 6.24 [Strong Markov property] Consider a Markov chain (X

n

)n�0 with

initial distribution µ and let N be a stopping time. Suppose that Pµ

(N < •) > 0 and

let q

N

be as defined above. Then for all B 2 F ,

(1B

� q

N |FN

) = PX

N

(B) (6.43)

almost surely on {N < •}. Here FN

= {A 2 F : A \ {N = n} 2 Fn

, n � 0}and X

N

is defined to be X

n

on {N = n}.

This property is called “strong” because it is a strengthening of the Markov prop-erty

(1B

� q

n|FN

) = PX

n

(B) (6.44)

to random n. In our case the proof of the Markov property and the strong Markovproperty amount more or less to the same. In particular, no additional assumptionsare needed. This is not true for continuous-time where strong Markov propertytypically fails in the absence of (rather natural) continuity conditions.

Proof of Theorem 6.24. Let A 2 FN

be such that A ⇢ {N < •}. First we willpartition according to the values of N—this is what Durrett calls the “divide &conquer” principle:

1A

(1B

� q

N)�

= Ân�0

1A\{N=n}(1B

� q

n)�

. (6.45)

(Note that, by our assumptions about A, we do not have to include the value N =•. Once we are on {N = n} we can replace q

N by q

n.) Now A \ {N = n} 2 Fn

while 1B

� q

n is s(X

n

, X

n+1, . . . )-measurable. This allows us to condition on Fn

anduse (6.3):

1A\{N=n}(1B

� q

n)�

= Eµ

1A\{N=n}E

µ

(1B

� q

n|Fn

)�

= Eµ

1A\{N=n}P

X

n

(B)�

. (6.46)

Plugging this back to (6.45), we conclude that

1A

(1B

� q

N)�

= Eµ

1A

PX

N

(B)�

(6.47)

for all A 2 FN

with A ⇢ {N < •}. Since PX

N

(B) is FN

measurable, a standardargument implies that P

µ

(1B

� q

N |FN

) equals PX

N

(B) almost surely on {N < •}as claimed.

We proceed by listing some applications of the Strong Markov property. Considerthe stopping time T

x

defined by

T

x

= inf{n � 1 : X

n

= x}. (6.48)

Here we deliberately omit n = 0 from the sum, so that even in Px

we may have T

x

=• almost surely. Let T

n

x

denote the n-th iteration of T

x

by which we mean simply T

x

for the sequence q

T

n�1x

w, see the corresponding definition in the context of randomwalks. Then we have the following variant of Lemma 3.23:

Page 13: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.5. STRONG MARKOV PROPERTY 115

Lemma 6.25 Consider a Markov chain with state space S and let T

x

be as defined above.

Then for all x, y 2 S ,

Px

(T

n

y

< •) = Px

(T

y

< •) Py

(T

y

< •)n�1. (6.49)

Proof. The event {T

n

y

< •} can be written as {T

y

< •} intersected with the shiftof {T

n�1y

< •} by T

y

. Applying the strong Markov property we thus get

Px

(T

n

y

< •) = Px

(T

y

< •)Py

(T

n�1y

< •). (6.50)

Using the same for the second term on the right-hand side, we arrive at (6.49).

As for the random walk this statement allows us to characterize recurrence of x interms of the expected number of visits of the chain back to x.

Corollary 6.26 Let N(x) = Ân�1 1{X

n

=x}. Then

Ex

N(y)�

=P

x

(T

y

< •)1�P

y

(T

y

< •). (6.51)

Here the right-hand side is to be interpreted as zero if the numerator vanishes and as infinity

if the numerator is positive and the denominator vanishes.

Proof. This is a consequence of the fact that Ex

(N(y)) = Ân�1 P

x

(T

n

y

< •) and theformula (6.49).

Corollary 6.27 A state x is recurrent if and only if Ex

(N(x)) = •. In particular, for

an irreducible Markov chain either all states are recurrent or none of them are. Finally, an

irreducible finite-state Markov chain is recurrent.

Proof. A state x is recurrent iff Px

(T

x

< •) = 1 which by (6.51) is true iff Ex

(N(x)) =•. To show the second claim, suppose that x is recurrent and let us show thatso is any y 2 S . To that end, let k and l be numbers such that p

k(y, x) > 0and p

l(x, y) > 0—these numbers exist by irreducibility. Then

p

n+k+l(y, y) � p

k(y, x)p

n(x, x)p

l(x, y), (6.52)

which implies

Ey

N(y)�

= Âm�1

p

m(y, y) � Ân�1

p

k(y, x)p

n(x, x)p

l(x, y)

= p

k(y, x)p

l(x, y)Ex

N(x)�

. (6.53)

But p

k(y, x)p

l(x, y) > 0 and so Ex

(N(x)) = • implies Ey

(N(y)) = •. Hence,all states of an irreducible Markov chain are recurrent if one of them are. Finally,if S is finite, the trivial relation Â

x2S N(x) = • implies Ex

(N(x)) = • for at leastone x 2 S .

Page 14: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

116 CHAPTER 6. MARKOV CHAINS

6.6 Recurrence, transience and stationary distributions

In the previous section we concluded that, for irreducible Markov chains, recur-rence is a class property, i.e., a property that either holds for all states or none. Wehave also shown that, once the chain is recurrent (on top of irreducibility), there ex-ists a stationary measure. In this section we will give conditions under which thestationary measure has finite mass which means it can be normalized to produce astationary distribution. To that end we introduce the following definitions:

Definition 6.28 A state x 2 S of a Markov chain is said to be

(1) transient if Px

(T

x

< •) < 1.

(2) null recurrent if Px

(T

x

< •) = 1 but Ex

T

x

= •.

(3) positive recurrent if Ex

T

x

< •.

We will justify the terminology later. Our goal is to show that a stationary distribu-tion exists if and only if every state of the (irreducible) chain is positive recurrent.The principal result is formulated as follows:

Theorem 6.29 Consider a Markov chain with state space S . If there exists a stationary

measure µ with 0 < µ(S) < • then every x with µ(x) > 0 is recurrent. If the chain is

irreducible, then

µ(x) =µ(S)E

x

T

x

(6.54)

for all x 2 S . In particular, Ex

T

x

< • for all x, i.e., every state is positive recurrent.

Proof. Let x be such that µ(x) > 0. The fact that µ is stationary implies that µ(x) =Â

z2S µ(z)p

n(z, x) for all n � 1. Therefore

• = Ân�1

µ(x) =Fubini

Âz2S

µ(z) Ân�1

p

n(z, x) = Âz2S

µ(z)Ez

N(x)�

. (6.55)

But (6.51) implies Ez

(N(x)) [1�Px

(T

x

< •)]�1 and so

• µ(S)1�P

x

(T

x

< •). (6.56)

Since µ(S) < •, we must have Px

(T

x

< •) = 1, i.e., x is recurrent.In order to prove the second part of the claim, we note that irreducibility impliesthat µ(x) > 0 for all x—unless µ ⌘ 0 which we do not consider to be worthdiscussion—and so all states are recurrent. From (the proof of) Theorem 6.23 weglean out the relation µ(y) = µ(x)n

x

(y) which implies

µ(S) = µ(x) Ây2S

n

x

(y) = µ(x)Ex

T

x

, (6.57)

which implies (6.54) But µ(S) < • and so we must have Ex

T

x

< •.

We summarize the interesting part of the result in a corollary:

Page 15: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.6. RECURRENCE, TRANSIENCE AND STATIONARY DISTRIBUTIONS 117

Corollary 6.30 If a Markov chain with state space S is irreducible, then the following are

equivalent:

(1) Some state is positive recurrent.

(2) There exists a stationary measure µ with µ(S) < •.

(3) Every state is positive recurrent.

Proof. (1))(2): Let x be positive recurrent. Then n

x

is a stationary measure with n

x

(S) =E

x

T

x

< •. (2))(3): This is the content of Theorem 6.29. (3))(1): Trivial.

We finish this section by providing a justification for the terminology of “positiveand null recurrent” states/Markov chains (both are class properties):

Theorem 6.31 Consider a Markov chain with state space S . Let N

n

(y) = Ân

m=1 1{X

m

=y}.

If y is recurrent, then for all x 2 S ,

limn!•

N

n

(y)n

=1

Ey

T

y

1{T

y

<•}, Px

-a.s. (6.58)

Proof. Let us first consider the case x = y. Then recurrence implies 1{T

y

<•} = 1almost surely. Define the sequence of times t

n

= T

n

y

� T

n�1y

where t1 = T

y

. By theStrong Markov Property, (t

n

) are i.i.d. with the same distribution as T

y

. In terms ofthe t

n

’s, we haveN

n

(y) = sup{k � 0 : t1 + · · · + t

k

n}, (6.59)

i.e., N

n

(y) is a renewal sequence. The Renewal Theorem then gives us

limn!•

N

n

(y)n

=1

Ey

t1=

1E

y

T

y

, (6.60)

Py

-almost surely.Now we will look at the cases x 6= y. If P

x

(T

y

= •) = 1 then N

n

(y) = 0 almostsurely for all n and there is nothing to prove. We can thus assume that P

x

(T

y

<•) > 0 and decompose according to the values of T

y

. We will use the Markovproperty which tells us that for any A 2 F , we have P

x

(q

m(A)|Ty

= m) = Py

(A).We will apply this to the event

A =n

limn!•

N

n

(y)n

=1

Ey

T

y

o

. (6.61)

Indeed, this event occurs almost surely in Py

and so we have Px

(q

m(A)|Ty

= m) =1. But on q

m(A) \ {T

y

= m} we have

limn!•

N

n+m

(y)� N

m

(y)n

=1

Ey

T

y

(6.62)

Page 16: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

118 CHAPTER 6. MARKOV CHAINS

which implies that N

n

(y)/n ! 1/Ey

T

y

. Therefore, Px

(A|Ty

= m) = 1 for all m

with Px

(T

y

= m) > 0. It follows that A occurs almost surely in Px

(·|Ty

< •).Hence, the limit in (6.58) equals 1/E

y

T

y

almost surely on {T

y

< •} and zero almostsurely on {T

y

= •}. This proves the claim.

The upshot of this theorem is that a state is positive recurrent if it is visited at apositive density of times and null recurrent if it is visited infinitely often, but thedensity of visit tends to zero. Formulas of the form (6.54) and (6.58) are reminiscentof the Kac’s recurrence theorem from ergodic theory.

6.7 Convergence to equilibrium

Markov chains are often run on a computer in order to sample from a complicateddistribution on a large state space. The idea is to define a Markov chain for whichthe desired distribution is stationary and then wait long enough for the chain to“equilibrate.” The last aspect of Markov chains we wish to examine is the conver-gence to equilibrium.When run on a computer, only one state of the Markov chain is stored at eachtime—this is why Markov chains are relatively easy to implement—and so we areasking about the convergence of the distribution P

µ

(X

n

2 ·). Noting that this iscaptured by the quantities p

n(x, y), we will thus study the convergence of p

n(x, ·)as n ! •.For irreducible Markov chains, we can generally guarantee convergence is in Ce-saro sense:

limn!•

1n

n

Âm=1

p

m(x, y) = µ(y), (6.63)

Indeed, subsequential limits produce stationary measures which are zero unlessthe chain is positive recurrent. In the latter case, the stationary measure is uniqueand so every subsequential limit is the same—i.e., the Cesaro averages converge.Unfortunately, the p

n(x, y) themselves may not converge. For instance, if we havea chain that “hops” between two states, p

n(x, y) will oscillate between zero andone as n changes. The obstruction is clearly related to periodicity—if there was aslightest probability to not “hop,” the chain would soon get out of sync and theequilibrium would be reached.In order to classify the periodic situation, let I

x

= {n � 1 : p

n(x, x) > 0}. Bystandard arguments, I

x

is an additive semigroup (a set closed under addition). Thisallows us to define a number d

x

as the largest integer that divides all n 2 I

x

. (Sinceone divides all integers, such number indeed exists.) We call d

x

the period of x.

Lemma 6.32 If the Markov chain is irreducible, then d

x

is the same for all x.

Proof. See the textbook.

Definition 6.33 A Markov chain is called aperiodic if d

x

= 1 for all x.

Page 17: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.7. CONVERGENCE TO EQUILIBRIUM 119

Lemma 6.34 An aperiodic Markov chain with state space S satisfies the following: For

all x, y 2 S there exists n0 = n0(x, y) such that p

n(x, y) > 0 for all n � n0.

Proof. See (5.4) Lemma on page 314.

Our goals it to prove the following result:

Theorem 6.35 Consider an irreducible, aperiodic Markov chain on state space S . Sup-

pose there exists a stationary distribution p. Then for all x 2 S ,

p

n(x, y) �!n!•

p(y), y 2 S . (6.64)

The proof of this theorem will be based on a general technique, called coupling.The idea is as follows: We will run one Markov chain started at x and the otherstarted at a z which was itself chosen at random from distribution p. As long asthe chains stay away from each other, we keep generating them independently. Thefirst moment they collide, we glue them and from that time on move both of themsynchronously.The upshot is that, if we observe only the chain started at x, we see a chain startedat x while if we observe the chain started at z, we observe only a chain started at z.But the latter was started from stationary distribution and so it will be stationaryat each time. It follows that, provided the chains glued, also the one started from x

will eventually be stationary.To make this precise, we will have to define both chains on the same probabilityspace. We will generalize the initial distributions to any two measures µ and n on S .Let us therefore consider a Markov chain on S ⇥ S with transition probabilities

p

(x1, x2), (y1, y2)�

=

8

>

<

>

:

p(x1, y1)p(x2, y2) if x1 6= x2,p(x1, y1), if x1 = x2 and y1 = y2,0, otherwise,

(6.65)

and initial distribution µ⌦ n. We will use Pµ⌦n

to denote the corresponding proba-bility measure—called the coupling measure—and (X

(1)n

, X

(2)n

) to denote the coupled

process.First we will verify that each of the marginals is the original Markov chain:

Lemma 6.36 Let (X

(1)n

, X

(2)n

) denote the coupled process in measure Pµ⌦n

. Then (X

(1)n

)is the original Markov chain on S with initial distribution µ, while (X

(2)n

) is the original

Markov chain on S with initial distribution n.

Proof. Let A = {X

(1)k

= x

k

, k = 0, . . . , n}. Abusing the notation slightly, we wantto show that P

µ⌦n

(A) = Pµ

(A). Since A fixes only the X

(1)k

’s, we can calculate theprobability of A by summing over the possible values of X

(2)k

:

Pµ⌦n

(A) = Â(y

k

)µ(x0)n(y0)

n�1

’k=0

p

(x

k

, y

k

), (x

k+1, y

k+1)�

. (6.66)

Page 18: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

120 CHAPTER 6. MARKOV CHAINS

Next we note that

Ây

02Sp

(x, y), (x

0, y

0)�

=

(

Ây

02S p(x, x

0)p(y, y

0), if x 6= y,p(x, x

0), if x = y.(6.67)

In both cases the sum equals p(x, x

0) which we note is independent of y. Therefore,the sums in (6.66) can be performed one by one with the result

Pµ⌦n

(A) = µ(x0)n�1

’k=0

p(x

k

, x

k+1), (6.68)

which is exactly Pµ

(A). The second marginal is handled analogously.

Our next item of interest is the time when the chains first collide:

Lemma 6.37 Let T = inf{n � 0 : X

(1)n

= X

(2)n

}. Under the conditions of Theorem 6.35,

Pµ⌦n

(T < •) = 1 (6.69)

for any pair of initial distributions µ and n.

Proof. We will consider an uncoupled chain on S ⇥ S where both original Markovchains move independently forever. This chain has the transition probability

q

(x1, y1), (x2, y2)�

= p(x1, y1)p(x2, y2). (6.70)

As a moment’s though reveals, the time T has the same distribution in both cou-pled and uncoupled chains. Therefore, we just need to prove the lemma for theuncoupled chain.First, let us note that the uncoupled chain is irreducible (this is where aperiodic-ity is needed). Indeed, by Lemma 6.34 aperiodicity implies that p

n(x1, y1) > 0and p

n(x2, y2) > 0 for n sufficiently large and so we also have q

n((x1, y1), (x2, y2)) >0 for n sufficiently large. Second, we observe that the uncoupled chain is recurrent.Indeed, p̂(x, y) = p(x)p(y) is a stationary distribution and, using irreducibility,every state of the chain is thus recurrent. But then, for any x 2 S , the first hittingtime of (x, x) is finite almost surely which implies the same for T, which is the firsthitting time of the diagonal in S ⇥ S .

The principal idea behind coupling now reduces to the following lemma:

Lemma 6.38 [Coupling inequality] Consider the coupled Markov chain with initial

distribution µ⌦ n and let T = inf{n � 0 : X

(1)n

= X

(2)n

}. Let µ

n

(·) = Pµ⌦n

(X

(1)n

2 ·)and n

n

(·) = Pµ⌦n

(X

(2)n

2 ·) be the marginals at time n. Then

n

� n

n

k Pµ⌦n

(T > n), (6.71)

where kµ

n

� n

n

k = supA✓S |µ

n

(A)� n

n

(A)| is the variational distance of µ

n

and n

n

.

Page 19: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.8. DISCRETE HARMONIC ANALYSIS 121

Proof. Let S+ = {x 2 S : µ

n

(x) > n

n

(x)}. The proof is based on the fact that

n

� n

n

k = µ

n

(S+)� n

n

(S+). (6.72)

This makes it reasonable to evaluate the difference

µ

n

(S+)� n

n

(S+) = Pµ⌦n

(X

(1)n

2 S+)�Pµ⌦n

(X

(2)n

2 S+)= E

µ⌦n

1{X

(1)n

2S+} � 1{X

(2)n

2S+}�

= Eµ⌦n

1{T>n}�

1{X

(1)n

2S+} � 1{X

(2)n

2S+}�

.

(6.73)

Here we have noted that if T n then either both {X

(1)n

2 S+} and {X

(2)n

2 S+}occur or both don’t. Estimating the difference of the two indicators by one, wethus get µ

n

(S+)� n

n

(S+) Pµ⌦n

(T > n). Plugging this into (6.72), the desiredestimate follows.

Now we are ready to prove the convergence to equilibrium:

Proof of Theorem 6.35. Consider two Markov chains, one started from µ and theother from n. By Lemmas 6.36 and 6.38, the variational distance between the dis-tributions µ

n

and n

n

of X

n

in these two chains is bounded by Pµ⌦n

(T > n). ButLemma 6.37 implies that P

µ⌦n

(T > n) tends to zero as n ! • and kµ

n

� n

n

k ! 0.To get (6.64) we now let µ = d

x

and n = p. Then µ

n

(·) = p

n(x, ·) while n

n

= p

for all n. Hence we have kp

n(x, ·)� pk ! 0 which means that p

n(x, ·) ! p in thevariational norm. This implies (6.64).

The method of proof is quite general and can be adapted to other circumstances.See Lindvall’s book “Lectures on the coupling method.” We observe that Lem-mas 6.38 and 6.36 allow us to estimate the time it takes for the two marginals toget closer than prescribed. On the basis of the proof of Lemma 6.36, the couplingtime can be studied in terms of the uncoupled process, which is slightly easier tohandle.

6.8 Discrete harmonic analysis

We finish with a brief section on discrete harmonic analysis. The reasons for study-ing harmonic functions (to be defined below) on discrete structures come from thestriking connection with simple random walk that we want to demonstrate. In thecontinuum a similar connection exists with the Brownian motion.Harmonic analysis is a subject whose primary (or initial) focus are the solutions ofLaplace’s equation. In discrete setting, the usual Laplacian takes a function f : Zd !R and assigns to it the function

(D f )(x) =1

2d

Ây⇠x

f (y)� f (x)�

, (6.74)

Page 20: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

122 CHAPTER 6. MARKOV CHAINS

where y ⇠ x means that y is a nearest neighbor of x on Zd. The Laplacian is directlyrelated to the notion of harmonic functions:

Definition 6.39 A function f : Zd ! R is called harmonic in L ⇢ Zd

if

f (x) =1

2d

Ây⇠x

f (y) (6.75)

holds for all x 2 L. (I.e., f is harmonic in L if (D f )(x) = 0 for all x 2 L.) Similarly, we

call f subharmonic if “” holds and superharmonic if “�” holds for all x 2 L.

In the following we will show that harmonic functions and simple random walkhave a lot in common. For that we will need the following notation: For each x 2Zd, let P

x

denote the probability measure on simple random walks on Zd startedat x. More specifically, P

x

is a measure on sequences (S

n

)n�0 such that S

n

� S0 isthe “usual” simple random walk and S0 = x almost surely.

Lemma 6.40 Let x 2 Zd

and let L ⇢ Zd

be an arbitrary set. Let T = inf{n � 0 : S

n

62L} and let f : Zd ! R be a function. Then on (W, F , P

x

), we have: If f is subharmonic

in L, then

M

n

= f (S

T^n

) (6.76)

is a submartingale of the filtration Fn

= s(S1, . . . , S

n

). In particular, if f is harmonic

in L then M

n

is a martingale.

Remark 6.41 If we knew that f is harmonic everywhere, it would suffice to showthat f (S

n

) is a martingale because then the result would follow by Lemma 5.48.However, in the present case an independent proof is easier.

Proof. Clearly, it suffices to prove the statement for subharmonic f . Here we de-compose according to the values of T:

E(M

n+1|Fn

) = E�

f (S

T^(n+1))1{T�n+1}�

�Fn

+ E�

f (S

T^(n+1))1{Tn}�

�Fn

. (6.77)

Now the first term can be written

E�

f (S

T^(n+1))1{T�n+1}�

�Fn

= E�

f (S

n

+ X

n+1)1{T�n+1}�

�Fn

=1

2d

Ây⇠S

n

f (y)1{T�n+1} � f (S

n

)1{T�n+1}. (6.78)

Here we used that 1{T�n+1} is Fn

-measurable to take it out of the expectation, thenwe wrote S

n+1 = S

n

+ X

n+1 where S

n

is Fn

-measurable and X

n+1 is independentof F

n

. This allows us to take expectation with respect to X

n+1. The final inequalitycomes from the fact that, on {T � n + 1} we have S

n

2 L and so f is subharmonicat S

n

.In order to address the second term, we notice that f (S

T^(n+1))1{Tn} = f (S

T

)1{Tn}which is F

n

-measurable. Hence we get

E(M

n+1|Fn

) � f (S

n

)1{T�n+1} + f (S

T

)1{Tn}, (6.79)

Page 21: Markov Chains - UCLA Department of Mathematicsbiskup/275b.1.13w/PDFs/chapter5.pdf · Markov Chains 6.1 Existence and notation Along with the discussion of martingales, ... it suffices

6.8. DISCRETE HARMONIC ANALYSIS 123

which is exactly M

n

. Hence, M

n

is a submartingale.

As already mentioned, the core problem of harmonic analysis is to study the solu-tions of Laplace’s equation. However, even in a finite domain, there will be plentyof solutions unless we prescribe a boundary condition. This leads us to the follow-ing (discrete) Dirichlet problem for function f :

(

(D f )(x) = 0, x 2 L,f (x) = g(x), x 2 ∂L,

(6.80)

where ∂L is the set of sites in Lc that have a neighbor in L. The function g is theboundary condition which we need in order to make sense of the Laplacian at allsites of L.

The advertised link with the theory of random walks is then provided by:

Theorem 6.42 Let L ⇢ Zd

be finite and let T = inf{n � 0 : S

n

62 L}. Then T < •almost surely in P

x

for every x and

f (x) = Ex

g(S

T

)�

, x 2 L, (6.81)

is the unique solution to the Dirichlet problem (6.80) with boundary condition g.

Proof. By an argument similar to that used in a homework assignment, we have P(T >n) e

�dn for n � 1. Thus T < • almost surely.First we will prove that the above f is a solution to (6.80). To that end we let x 2 Land pick a nearest neighbor y of x. Crucial for the argument will be the observationthat the sequence (S2, S3, . . . ) in measure P

x

(·|S1 = y) has the same distributionas (S1, S2, . . . ) in P

y

. In particular, S

T

will be the same in both sequences. Therefore,

f (x) = Ex

g(S

T

)�

= Ây⇠x

Ex

g(S

T

)1{S1 = y}�

= Ây⇠x

12d

Ey

g(S

T

)�

=1

2d

Ây⇠x

f (y).(6.82)

Hence, f is harmonic in L. But f (x) = g(x) for all x 2 ∂L and so f solves (6.80).Next we want to show uniqueness. Let f be a solution to (6.80). Then f is har-monic in L and so, by Lemma 6.40, M

n

= f (S

T^n

) is a bounded martingale.Since T < • almost surely, the Optional Stopping Theorem gives us that EM

T

=EM0. But EM0 = E f (S0) = f (x), while EM

T

= E f (S

T

) = Eg(S

T

). The function f

thus satisfies (6.81) and hence the solution to (6.80) is unique.

Remark 6.43 In dimension d 2, a similar argument allows us to conclude thatthe same holds even in infinite domains L ⇢ Zd. However, in dimensions d � 3the solution to (6.80) in infinite L can pick up an extra factor “due” to the possi-bility T = •. We refer to e.g. Lawler’s book “Intersections of random walks” formore information.