sta447/2006 (stochastic processes) lecture notes, winter...

STA447/2006 (Stochastic Processes) Lecture Notes, Winter 2017

by Jeffrey Rosenthal, University of Toronto

(Last updated: March 30, 2017)

Note: These lecture notes will be posted on the STA447/2006 course web page after thecorresponding lecture material, for your convenience. However, they are just rough, point-form notes, with no guarantee of completeness or accuracy. They should in no way beregarded as a substitute for attending and actively learning from the course lectures.

Introduction:

• Course web page, outline, evaluation, etc. (www.probability.ca/sta447)

• Schedule: will take 15-minute break if you return promptly!

• Background: STA347 prerequisite – required for undergrads! (last semester? previously?)

(includes various math and stat second-year prerequisites)

• Your status: undergrad? grad? special? STA specialist? major? Act Sci? other?

• You should already know basic probability theory: probability spaces, random variables,

expected value, independence, conditional probability, discrete and continuous distribu-

tions, etc. (You do not need to know measure theory.)

• This class considers stochastic processes, i.e. randomness which proceeds in time.

− Will develop their mathematical theory (with a few applications).

Markov chains:

• EXAMPLE (Frog Example):

− 20 lily pads arranged in a circle. (diagram)

− Frog starts at pad #20.

− Each minute, she jumps either straight up, or one pad clockwise, or one pad counter-

clockwise, each with probability 1/3.

− (see e.g. www.probability.ca/frogwalk)

• So, P(at pad #1 after 1 step) = 1/3.

− P(at pad #20 after 1 step) = 1/3.

− P(at pad #19 after 1 step) = 1/3.

− P(at pad #2 after 2 steps) = 1/9.

− P(at pad #19 after 2 steps) = 2/9.

− etc.

• What happens in the long run?

− What is P(frog at pad #14 after 987 steps)?

− What is limk→∞

P(frog at pad #14 after k steps)?

1

http://probability.ca/sta447

http://probability.ca/frogwalk

− Will the frog necessarily eventually return to pad #20?

− Will the frog necessarily eventually visit every pad?

• And what happens if we have:

− different jump probabilities?

− different arrangement of the pads?

− more pads?

− infinitely many pads?

− etc.

• A (discrete time, discrete space, time homogeneous) Markov chain is specified by three

ingredients:

− A state space S, any non-empty finite or countable set. (e.g. the 20 lily pads)

− initial probabilities {νi}i∈S, where νi is the probability of starting at i (at time 0).

(So, νi ≥ 0, and∑i νi = 1.)

− transition probabilities {pij}i,j∈S, where pij is the probability of jumping to j if you

start at i. (So, pij ≥ 0, and∑j pij = 1 for all i.)

• In the frog example, S = {1, 2, 3, . . . , 20}, and

pij =

1/3, |j − i| ≤ 11/3, |j − i| = 19

0, otherwise

and ν20 = 1 (with νi = 0 otherwise).

• Let Xn be the Markov chain’s state at time n.

− Then P(Xn+1 = j |Xn = i) = pij, ∀i, j ∈ S, n = 0, 1, 2, . . .. (Doesn’t depend on n:

time-homogeneous.)

− Also P(Xn+1 = j |X0 = i0, X1 = i1, X2 = i2, . . . , Xn = in) = pinj. (Markov property.)

− Also P(X0 = i,X1 = j,X2 = k) = νipijpjk, etc.

− More generally, P(X0 = i0, X1 = i1, . . . , Xn = in) = νi0pi0i1pi1i2 . . . pin−1in .

− The random sequence {Xn}∞n=0 “is” the Markov chain.

• In the frog example:

− P(X0 = 20) = 1, P(X0 = 7) = 0, etc.

− P(X1 = 1) = 1/3, P(X1 = 20) = 1/3, P(X2 = 2) = 1/9, P(X2 = 19) = 2/9, etc.

More Examples of Markov Chains:

• Example: simple random walk (s.r.w.).

− Let 0 < p < 1. (e.g. p = 1/2)

− Suppose you repeatedly bet $1.

− Each time, you have probability p of winning $1, and probability 1− p of losing $1.

2

− Let Xn be net gain (in dollars) after n bets.

− Then {Xn} is a Markov chain, with S = Z, ν0 = 1, and

pij =

p, j = i+ 1

1− p, j = i− 10, otherwise

− What happens in the long run? Will you necessarily go broke? etc.

− (see e.g. www.probability.ca/randwalk)

• Example: Bernoulli process. (e.g. counting sunny days)

− Let 0 < p < 1. (e.g. p = 1/2)

− Repeatedly flip a “p-coin” (i.e., a coin whose probability of heads is p).

− Let Xn = # of heads on first n flips.

− Then {Xn} is Markov chain, with S = {0, 1, 2, . . .}, X0 = 0 (i.e. ν0 = 1), and

pij =

p, j = i+ 1

1− p, j = i0, otherwise

• Example: simple finite Markov chain.

− Let S = {1, 2, 3}, and X0 = 3, and

( pij ) =

0 1/2 1/21/3 1/3 1/31/4 1/4 1/2

.

− What happens in the long run? (diagram)

• Example: Ehrenfest’s Urn

− Have d balls in total, divided into two urns.

− At each time, we choose one of the d balls uniformly at random, and move it to the

other urn.

− Let Xn = # balls in Urn 1 at time n.

− Then {Xn} is Markov chain, with S = {0, 1, 2, . . . , d}, and pi,i−1 = i/d, and pi,i+1 =

(d− i)/d, with pij = 0 otherwise.

− What happens in the long run? Does Xn become uniformly distributed? Does it stay

close to X0? to d/2?

• Example: human Markov chain?

− Everyone take out a coin (or borrow one).

− Then pick out two other students, one for “heads” and one for “tails”.

− Each time the frog comes to you, catch it, and flip your coin. Then toss the frog to

either your heads or your tails student, depending on the result of the flip.

3

http://probability.ca/randwalk

− What happens in the long run?

Elementary Computations:

• Let {Xn} be a Markov chain, with state space S, and transition probabilities pij, and

initial probabilities νi.

• Recall that:

− P(X0 = i0) = νi0 .

− P(X0 = i0, X1 = i1) = νi0pi0i1 .

− P(X0 = i0, X1 = i1, . . . , Xn = in) = νi0pi0i1pi1i2 . . . pin−1in .

− etc.

• In frog example: P(X0 = 20, X1 = 19, X2 = 20) = ν20p20,19p19,20 = (1)(1/3)(1/3) = 1/9,

etc.

• Now, let µ(n)i = P(Xn = i).

− Then µ(0)i = νi.

• What is µ(1)j in terms of νi and pij?

− µ(1)j = P(X1 = j) =

∑i∈S

P(X0 = i, X1 = j) =∑i∈S

νipij.

− (“Law of Total Probability”, “additivity”, “partition”)

• In matrix form:

− Write µ(n) = (µ(n)1 , µ

(n)2 , µ

(n)3 , . . .). [row vector]

− And write P = ( pij ) =

p11 p12 p13 . . .p21 p22 p23 . . .

p31...

.... . .

. [matrix]

− And write ν = (ν1, ν2, ν3, . . .). [row vector]

− Then µ(1) = νP = µ(0)P . [matrix multiplication]

• e.g. if S = {1, 2, 3}, and µ(0) = (1/7, 2/7, 4/7), and

( pij ) =

0 1/2 1/21/3 1/3 1/31/4 1/4 1/2

,

then µ(1)2 = P(X1 = 2) = µ

(0)1 p12 + µ

(0)2 p22 + µ

(0)3 p32 = (1/7)(1/2) + (2/7)(1/3) +

(4/7)(1/4) = 13/42.

• Similarly, µ(2)k =

∑i∈S

∑j∈S νipijpjk, etc.

− Matrix form: µ(2) = µ(0)PP = µ(0)P 2.

− By induction: µ(n) = µ(0)P n, for n = 1, 2, 3, . . ..

− Convention: P 0 = I (identity). Then true for n = 0 too.

− e.g. in frog example, µ(2)19 = ν20p20,19p19,19 + ν20p20,20p20,19 + 0 = (1)(1/3)(1/3) +

(1)(1/3)(1/3) + 0 = 2/9.

4

• n-step transitions: p(n)ij = P(Xm+n = j |Xm = i).

− (Again, doesn’t depend on m: time-homogeneous.)

− p(1)ij = pij. (of course)

− What about p(2)ij ?

− Well, p(2)ij = P(X2 = j |X0 = i) =

∑k∈S

P(X2 = j, X1 = k |X0 = i) =∑k∈S

pikpkj.

− Matrix form: P (2) =(p

(2)ij

)= P P = P 2.

− By induction: P (n) = P n, i.e. to compute probabilities of n-step jumps, you can take

nth powers of the transition matrix P .

− Convention: P (0) = I = identity matrix, i.e. p(0)ij = δij =

{1, i = j0, otherwise

• Observation: p(m+n)ij = P(Xm+n = j |X0 = i) =

∑k∈S

P(Xm+n = j, Xm = k |X0 = i) =∑k∈S

p(m)ik p

(n)kj .

− Matrix form: P (m+n) = P (m)P (n).

− (Of course, since P (m+n) = Pm+n = PmP n.)

− “Chapman-Kolmogorov equations”.

− Follows that e.g. p(m+n)ij ≥ p

(m)ik p

(n)kj for any state k.

Classification of States:

• Shorthand: write Pi(· · ·) for P(· · · |X0 = i). And, write Ei(· · ·) for E(· · · |X0 = i).

• Let N(i) = #{n ≥ 1 : Xn = i) = total # times the chain hits i. (Random variable; could

be infinite.)

• Let fij = Pi(Xn = j for some n ≥ 1) .

• Defn: a state i of a Markov chain is recurrent (or, persistent) if Pi(Xn = i for some

n ≥ 1) = 1, i.e. if fii = 1. Otherwise, i is transient. (Previous examples? Frog? s.r.w.?)

• RECURRENCE THEOREM:

− i recurrent iff∑∞n=1 p

(n)ii =∞ iff Pi(N(i) =∞) = 1.

− And, i transient iff∑∞n=1 p

(n)ii <∞ iff Pi(N(i) =∞) = 0.

• To prove this, let

• Also, Pi(N(i) ≥ 1) = fii, and Pi(N(i) ≥ 2) = (fii)2, etc.

− In general, for k = 0, 1, 2, . . ., Pi(N(i) ≥ k) = (fii)k.

• Also, recall that if Z is any non-negative-integer-valued random variable, then

∞∑k=1

P(Z ≥ k) = E(Z) .

5

• PROOF OF RECURRENCE THEOREM: First, by continuity of probabilities,

Pi(N(i) =∞) = limk→∞

Pi(N(i) ≥ k) = limk→∞

(fii)k =

{1, fii = 10, fii < 1

Second, using countable linearity (okay since non-negative),

∞∑n=1

p(n)ii =

∞∑n=1

Pi(Xn = i) =∞∑n=1

Ei(1Xn=i)

= Ei

( ∞∑n=1

1Xn=i

)= Ei (N(i)) =

∞∑k=1

Pi (N(i) ≥ k)

=∞∑k=1

(fii)k =

{∞, fii = 1

fii1−fii <∞, fii < 1

Q.E.D.

————————— END OF WEEK #1 ——————————

• EXAMPLE: S = {1, 2, 3, 4}, and P =

1 0 0 0

1/4 1/4 1/2 00 0 1/2 1/20 0 1/3 2/3

.

− Here f11 = 1, f22 = 1/4, f33 = 1, and f44 = 1.

− So, states 1, 3, and 4 are recurrent, but state 2 is transient.

− Also, f12 = 0 = f13 = f14 = f32 = f31.

− And, f34 = 1 = f43.

− And, f21 = 1/3 [since e.g. f21 = p21 +p22f21 +p23f31 +p24f41 = (1/4)+(1/4)f21 +0+0,

so f21 = (1/4)/(3/4) = 1/3; alternatively, in this special case only, f21 = P2(X1 =

1 |X1 6= 2) = (1/4)/[(1/4) + (1/2)] = 1/3].

− And, f23 = 2/3, and f24 = 2/3, etc.

− (Harder example to come on homework!)

• F-EXPANSION: fij = pij +∑k 6=j pikfkj

• What about e.g. Frog Example? Harder. Later!

• EXAMPLE: Simple random walk (s.r.w.). (S = Z, and pi,i+1 = p, and pi,i−1 = 1− p.)− Is the state 0 recurrent?

− Well, if n odd, then p(n)00 = 0.

− If n even, then p(n)00 = P(n/2 heads and n/2 tails on first n tosses)

=(nn/2

)pn/2(1− p)n/2 = n!

[(n/2)!]2pn/2(1− p)n/2. [binomial distribution]

− Stirling’s approximation: if n large, then n! ≈ (n/e)n√

2πn.

− So, for n large and even,

p(n)00 ≈

(n/e)n√

2πn

[(n/2e)n/2√

2πn/2]2pn/2(1− p)n/2

6

= [4p(1− p)]n/2√

2/πn .

− Now, if p = 1/2, then 4p(1− p) = 1, so∞∑n=1

p(n)00 ≈

∑n=2,4,6,...

√2/πn = ∞, so state 0

is recurrent.

− But if p 6= 1/2, then 4p(1− p) < 1, so∞∑n=1

p(n)00 ≈

∑n=2,4,6,...

[4p(1− p)]n/2√

2/πn < ∞, so state 0 is transient.

− (Similarly true for all other states besides 0, too.)

Communicating States:

• Say that state i communicates with state j, written i→ j, if fij > 0, i.e. if it is possible

to get from i to j, i.e. if ∃m ≥ 1 with p(m)ij > 0.

− Write i↔ j if both i→ j and j → i.

• Say a Markov chain is irreducible if i→ j for all i, j ∈ S. (Previous examples?)

• SUM LEMMA: if i→ k, and `→ j, and∞∑n=1

p(n)k` =∞, then

∞∑n=1

p(n)ij =∞.

• PROOF OF SUM LEMMA: Find m, r ≥ 1 with p(m)ik > 0 and p

(r)`j > 0. Note that

p(m+s+r)ij ≥ p

(m)ik p

(s)k` p

(r)`j . Hence,

∞∑n=1

p(n)ij ≥

∞∑n=m+r+1

p(n)ij =

∞∑s=1

p(m+s+r)ij ≥

∞∑s=1

p(m)ik p

(s)k` p

(r)`j

= p(m)ik p

(r)`j

∞∑s=1

p(s)k` = (positive)(positive)(∞) = ∞ . Q.E.D.

• SUM COROLLARY: if i↔ k, then i is recurrent iff k is recurrent.

• PROOF: Combine Sum Lemma (i = j and k = `) with Recurrence Theorem. Q.E.D.

• CASES THEOREM: For an irreducible Markov chain, either

(a)∞∑n=1

p(n)ij =∞ for all i, j ∈ S, so all states are recurrent.

(“recurrent Markov chain”)

or (b)∞∑n=1

p(n)ij <∞ for all i, j ∈ S, so all states are transient.

(“transient Markov chain”)

• EXAMPLE: simple random walk. Irreducible!

− p = 1/2: case (a).

− p 6= 1/2: case (b).

• What about Frog Example? Also irreducible, but which case?? Answer given by:

• FINITE SPACE THEOREM: an irreducible Markov chain on a finite state space always

falls into case (a), i.e.∞∑n=1

p(n)ij =∞ for all i, j ∈ S, and all states are recurrent.

• PROOF OF FINITE SPACE THEOREM:

7

− Choose any state i ∈ S. Then by exchanging the sums (okay since non-negative), we

have ∑j∈S

∞∑n=1

p(n)ij =

∞∑n=1

∑j∈S

p(n)ij =

∞∑n=1

1 = ∞ .

− Since S is finite, there must be at least one j ∈ S with∞∑n=1

p(n)ij =∞.

− So, we must be in case (a). Q.E.D.

• So, in Frog Example, P(∃n ≥ 1 with Xn = 20 |X0 = 20) = 1.

− But what about P(∃n ≥ 1 with Xn = 14 |X0 = 20)??

• To continue, define Ti = min{n ≥ 1 : Xn = i}. (Ti =∞ if never hit i.)

• HIT LEMMA: If j → i with j 6= i, then Pj(Ti < Tj) > 0.

− Intuitively obvious(?). But formal proof is:

− Since j → i, there is some possible path from j to i, i.e. there is m ∈ N and

x0, x1, . . . , xm with x0 = j and xm = i and pxrxr+1 > 0 for all 0 ≤ r ≤ m− 1.

− Let S = max{r : xr = j} be the last time this path hits j.

− Then xS, xS+1, . . . , xm is a possible path which goes from j to i without first returning

to j.

− So, Pj(Ti < Tj) ≥ Pj(this path) = pxSxS+1pxS+1xS+2

. . . pxm−1xm > 0, Q.E.D.

• F-LEMMA: If j → i and fjj = 1, then fij = 1.

• PROOF OF F-LEMMA:

− Assume i 6= j (otherwise trivial).

− Since j → i, Pj(Ti < Tj) > 0 by Hit Lemma.

− But 1− fjj = Pj(Tj =∞) ≥ Pj(Ti < Tj) Pi(Tj =∞)

= Pj(Ti < Tj) (1− fij).− If fjj = 1, then 1− fjj = 0, so must have 1− fij = 0, i.e. fij = 1. Q.E.D.

• Putting all of the above together, we obtain:

• STRONGER RECURRENCE THEOREM: If chain irreducible, then the following are

equivalent (and all correspond to “case (a)”):

(1) There are k, ` ∈ S with∑∞n=1 p

(n)k` =∞.

(2) For all i, j ∈ S, we have∑∞n=1 p

(n)ij =∞.

(3) There is k ∈ S with fkk = 1, i.e. with k recurrent.

(4) For all j ∈ S, we have fjj = 1, i.e. all states are recurrent.

(5) For all i, j ∈ S, we have fij = 1.

• PROOF:

− (1) ⇒ (2): Sum Lemma.

− (2) ⇒ (4): Recurrence Theorem (with i = j).

− (4) ⇒ (5): F-Lemma.

8

− (5) ⇒ (3): Immediate.

− (3) ⇒ (1): Recurrence Theorem (with ` = k).

− Q.E.D.

• Frog Example: P(∃n ≥ 1 with Xn = 14 |X0 = 20) = f20,14 = 1, etc.

• Simple random walk with p = 1/2: P(∃n ≥ 1 with Xn = 1, 000, 000 |X0 = 0) = 1, etc.

(And similarly for any conceivable pattern of values, i.e. the chain’s values “fluctuate”

arbitrarily.)

• Example: S = {1, 2, 3}, and ( pij ) =

0 1/2 1/20 1 00 0 1

.

− Then∑∞n=1 p

(n)12 =

∑∞n=1(1/2) =∞.

− And f22 = 1. Recurrent!

− But f11 = 0 < 1. Transient!

− Also f12 = 1/2 < 1.

− Not irreducible!

• Example: Simple random walk with p > 1/2.

− Irreducible.

− f00 < 1. (transient)

− Claim: f05 = 1.

− (Contradiction? No! Could still have∑∞n=1 p

(n)05 <∞.)

− Indeed, let Zn = Xn −Xn−1.

− Then P(Zn = +1) = p, P(Zn = −1) = 1− p, and {Zn} i.i.d.

− So, by Strong Law of Large Numbers, w.p. 1, limn→∞1n(Z1 +Z2 + . . .+Zn) = E(Z1) =

p(1) + (1− p)(−1) = 2p− 1 > 0.

− So, w.p. 1, limn→∞(Z1 + Z2 + . . .+ Zn) = +∞.

− i.e., w.p. 1, Xn −X0 →∞, so Xn →∞.

− Follows that if i < j, then fij = 1 (since must pass j when going from i to ∞).

− In particular, f05 = 1.

− (Similarly, if p < 1/2 and i > j, then fij = 1.)

− This is why the Stronger Recurrence Theorem does not include the equivalence “There

are k, ` ∈ S with fk` = 1”.

• CLOSED SUBSET NOTE: if chain not irreducible, but it has a closed subset C ⊆ S

(i.e., pij = 0 for i ∈ C and j 6∈ C) on which it is irreducible (i.e., i→ j for all i, j ∈ C),

then can still apply the Stronger Recurrence Theorem etc to the chain restricted to C.

9

− e.g. in previous example with P =

1 0 0 0

1/4 1/4 1/2 00 0 1/2 1/20 0 1/3 2/3

, can take C = {3, 4}, to

prove that f34 = 1,∑∞n=1 p

(n)43 =∞, etc.

Stationary Distributions:

• What about a Markov chain’s long-run probabilities?

− Does limn→∞P[Xn = i] exist?

− What does it equal?

• Let π be a probability distribution on S, i.e. πi ≥ 0 for all i ∈ S, and∑i∈S

πi = 1.

• Defn: π is stationary for a Markov chain P = ( pij ) if∑i∈S

πipij = πj for all j ∈ S.

− Matrix notation: π P = π.

− Then by induction, π P n = π for n = 0, 1, 2, . . ., i.e.∑i∈S πip

(n)ij = πj.

− Intuition, if chain starts with probabilities {πi}, then chain will keep the same prob-

abilities one time unit later.

− That is, if µ(n) = π, i.e. P(Xn = i) = πi for all i, then µ(n+1) = µ(n)P = πP = π, i.e.

µ(n+1) also equals π.

− And then, by induction, µ(m) = π for all m ≥ n. (“stationary”)

• Frog Example:

− Let πi = 120

for all i ∈ S.

————————— END OF WEEK #2 ——————————

− Then πi ≥ 0 and∑i πi = 1.

− Also, for all j ∈ S,∑i∈S

πipij = 120

(13

+ 13

+ 13) = 1

20= πj.

− So, {πi} is stationary distribution!

• (More generally, if chain is “doubly stochastic”, i.e.∑i∈S

pij = 1 for all j ∈ S, and if

|S| <∞, then if πi = 1/|S| for all i ∈ S, then {πi} is stationary [check].)

• Ehrenfest’s Urn example: (S = {0, 1, 2, . . . , d}, pij = i/d for j = i− 1, pij = (d− i)/d for

j = i+ 1)

− Does πi = 1d+1

for all i?

− Well, if e.g. j = 1, then∑i∈S πipij = 1

d+1(p01 + p21) = 1

d+1(1 + 2

d) 6= 1

d+1= πj.

− So, should not take πi = 1d+1

for all i.

− So, πi = ???

• Defn: a Markov chain is reversible (or time reversible, or satisfies detailed balance) with

respect to a probability distribution {πi} if πipij = πjpji for all i, j ∈ S.

10

• PROPOSITION: if chain is reversible w.r.t. {πi}, then {πi} is a stationary distribution.

(Converse false.)

− PROOF: for j ∈ S,∑i∈S πipij =

∑i∈S πjpji = πj

∑i∈S pji = πj. Q.E.D.

• Frog Example:

− πi = 1/20

− If |j − i| ≤ 1 or |j − i| = 19, then πipij = (1/20)(1/3) = πjpji.

− Otherwise both sides 0.

− So, reversible! (easier way to check stationarity)

• Example: S = {1, 2, 3}, p12 = p23 = p31 = 1, π1 = π2 = π3 = 1/3. Then {πi} stationary

(check!), but chain is not reversible w.r.t. {πi}.• Ehrenfest’s Urn:

− New idea: perhaps each ball is equally likely to be in either Urn.

− That is, let πi = 2−d(di

)= 2−d d!

i!(d−i)! .

− Then πi ≥ 0 and∑i πi = 1.

− Stationary? Need to check if∑i∈S

πipij = πj for all j ∈ S. Possible but messy. (Need

the Pascal’s Triangle identity that(d−1j−1

)+(d−1j

)=(dj

).) Better way?

− Use reversibility!

− If j = i+ 1, then

πipij = 2−d(d

i

)d− id

= 2−dd!

i!(d− i)!d− id

= 2−d(d− 1)!

i!(d− i− 1)!.

Also

πjpji = 2−d(d

j

)j

d= 2−d

d!

j!(d− j)!j

d= 2−d

(d− 1)!

(j − 1)!(d− j)!= 2−d

(d− 1)!

i!(d− i− 1)!= πipij .

− If j = i− 1, then again πipij = πipij [check! or just interchange i and j].

− Otherwise both sides 0.

− So, reversible!

− So, {πi} is stationary distribution!

− Intuitively, πi is larger when i is close to d/2.

− But does µ(n)i → πi? We’ll see!

11

Obstacles to Convergence:

• If chain has stationary distribution {πi}, does limn→∞P[Xn = i] = πi?

• Not necessarily!

• Example: S = {1, 2}, and ν1 = 1, and ( pij ) =(

1 00 1

).

− If π1 = π2 = 12

(say), then {πi} stationary (check!).

− But limn→∞P[Xn = 1] = limn→∞ 1 = 1 6= 12

= π1.

− Not irreducible! (“reducible”)

• Example: S = {1, 2}, and ν1 = 1, and ( pij ) =(

0 11 0

).

− Again, if π1 = π2 = 12, then {πi} stationary (check!).

− But P(Xn = 1) =

{1, n even0, n odd

− So, limn→∞P[Xn = 1] does not even exist!

− “periodic”

• Defn: the period of a state i is the greatest common divisor of the set {n ≥ 1; p(n)ii > 0}.

− e.g. if period of i is 2, this means that it is only possible to get from i to i in an even

numbers of steps.

− If period if each state is 1, say chain is “aperiodic”.

• Example: S = {1, 2, 3}, and p12 = p23 = p31 = 1.

− Then period of each state is 3.

• Example: S = {1, 2, 3}, and ( pij ) =

0 1/2 1/21/2 0 1/21/2 1/2 0

.

− Then period of state 1 is gcd{2, 3, . . .} = 1.

− Aperiodic!

• Observation: if pii > 0, then period of i is 1 (since gcd{1, . . .} = 1).

− (Converse false, as in previous example.)

− Or, if there is some n ≥ 1 with p(n)ii > 0 and p

(n+1)ii > 0, then period of i is 1 (since

gcd{n, n+ 1, . . .} = 1).

• Frog Example: pii > 0, so chain aperiodic.

• Simple Random Walk: can only return after even number of steps, so period of each state

is 2.

• Ehrenfest’s Urn: again, can only return after even number of steps, so period of each

state is 2.

• EQUAL PERIODS LEMMA: if i↔ j, then the periods of i and of j are equal.

• PROOF:

12

− Let the periods of i and j be ti and tj.

− Find r, s ∈ N with p(r)ij > 0 and p

(s)ji > 0.

− Then p(r+s)ii ≥ p

(r)ij p

(s)ji > 0, so ti divides r + s.

− Also if p(n)jj > 0, then p

(r+n+s)ii ≥ p

(r)ij p

(n)jj p

(s)ji > 0, so ti divides r + n + s, hence ti also

divides n.

− So, ti is a common divisor of {n ∈ N; p(n)jj > 0}.

− So, tj ≥ ti (since tj is greatest common divisor).

− Similarly, ti ≥ tj, so ti = tj. Q.E.D.

• COR: if chain irreducible, then all states have same period.

• COR: if chain irreducible and pii > 0 for some state i, then chain is aperiodic.

Markov Chain Convergence Theorem:

• MARKOV CHAIN CONVERGENCE THEOREM: If a Markov chain is irreducible, and

aperiodic, and has a stationary distribution {πi}, then limn→∞ p(n)ij = πj for all i, j ∈ S,

and limn→∞P(Xn = j) = πj for any initial probabilities {νi}.• To prove this (big) theorem, we need some lemmas.

• STATIONARY RECURRENCE LEMMA: If chain irreducible, and has stationary dist,

then it is recurrent.

• PROOF:

− Suppose the chain is not recurrent.

− Then by Stronger Recurrence Theorem, for all i, j ∈ S,∑∞n=1 p

(n)ij <∞.

− Hence, limn→∞

p(n)ij = 0.

− But πj =∑i∈S πip

(n)ij for all n.

− Since∑i∈S supn |πip

(n)ij | ≤

∑i∈S πi = 1 <∞, and each term πip

(n)ij → 0, it follows from

the Dominated Convergence Theorem (or “Weierstrass M-test”?) that as n → ∞,∑i∈S πip

(n)ij → 0 as well.

− This implies that πj = 0 for all j ∈ S.

− But we must have∑j∈S πj = 1. Impossible!

− So, the chain must be recurrent. Q.E.D.

• NUMBER THEORY LEMMA: If a set A of positive integers is non-empty, and additive

(i.e. m+ n ∈ A whenever m ∈ A and n ∈ A), and aperiodic (i.e. gcd(A) = 1), then there

is n0 ∈ N such that n ∈ A for all n ≥ n0.

− (Proof omited; see e.g. Durrett p. 51 / 2nd ed. p. 24, or Rosenthal p. 92.)

• COR: If a state i is aperiodic, and fii > 0, then there is n0(i) such that p(n)ii > 0 for all

n ≥ n0(i).

• PROOF: Let A = {n ≥ 1 : p(n)ii > 0}.

13

− Then A is non-empty since fii > 0.

− And, A is additive since p(m+n)ii ≥ p

(m)ii p

(n)ii .

− And, A is aperiodic by assumption.

− Hence, the result follows from the Number Theory Lemma. Q.E.D.

• COR: If a chain is irreducible and aperiodic, then for any states i, j ∈ S, there is n0(i, j)

such that p(n)ij > 0 for all n ≥ n0(i, j).

• PROOF:

− Find n0(i) as above.

− Find m ∈ N such that p(m)ij > 0.

− Then let n0(i, j) = n0(i) +m.

− Then if n ≥ n0(i, j), then n−m ≥ n0(i), so p(n)ij ≥ p

(n−m)ii p

(m)ij > 0. Q.E.D.

• PROOF OF MARKOV CHAIN CONVERGENCE THEOREM (long!):

• Define a new Markov chain {(Xn, Yn)}∞n=0, with state space S = S × S, and transition

probabilities p(ij),(k`) = pikpj`.

− Intuition: new chain has two coordinates, each of which is an independent copy of

the original Markov chain. (“coupling”)

• The new chain has stationary distribution π(ij) = πiπj (because of independence).

• Furthermore, p(n)(ij),(k`) > 0 whenever n ≥ max [n0(i, k), n0(j, `)].

− So, new chain is irreducible and aperiodic.

• So, by Stationary Recurrence Lemma, new chain is recurrent.

• Choose any i0 ∈ S, and set τ = inf{n ≥ 0;Xn = Yn = i0}.• By Stronger Recurrence Theorem, f (ij),(i0i0) = 1, i.e. P(ij)(τ <∞) = 1.

• Note also that if n ≥ m, then

P(ij)(τ = m,Xn = k) = P(ij)(τ = m) p(n−m)i0,k

= P(ij)(τ = m,Yn = k) .

• Hence, for i, j, k ∈ S,∣∣∣p(n)ik − p

(n)jk

∣∣∣ =∣∣∣P(ij)(Xn = k)−P(ij)(Yn = k)

∣∣∣=

∣∣∣∑nm=1 P(ij)(Xn = k, τ = m) + P(ij)(Xn = k, τ > n)

− ∑nm=1 P(ij)(Yn = k, τ = m)−P(ij)(Yn = k, τ > n)

∣∣∣=

∣∣∣P(ij)(Xn = k, τ > n)−P(ij)(Yn = k, τ > n)∣∣∣

≤ 2 P(ij)(τ > n) ,

which → 0 as n→∞ since P(ij)(τ <∞) = 1.

− (Above factor of “2” not really necessary, since both terms non-negative.)

————————— END OF WEEK #3 ——————————

14

• Hence, it follows that

∣∣∣p(n)ij − πj

∣∣∣ =

∣∣∣∣∣∣∑k∈S

πk(p

(n)ij − p

(n)kj

)∣∣∣∣∣∣ ≤∑k∈S

πk∣∣∣p(n)ij − p

(n)kj

∣∣∣ ,which → 0 as n→∞ since |p(n)

ij − p(n)kj | → 0 for all k ∈ S (using the M-test).

• Finally, for any {νi} (again using the M-test),

limn→∞

P(Xn = j) = limn→∞

∑i∈S

P(X0 = i, Xn = j) = limn→∞

∑i∈S

νi p(n)ij

=∑i∈S

νi limn→∞

p(n)ij =

∑i∈S

νi πj = πj .

Q.E.D. (phew!)

• So, for Frog Example, limn→∞P(Xn = 14) = 1/20, regardless of {νi}.• COR: If chain irreducible and aperiodic, then it has at most one stationary distribution.

− PROOF: If it has at least one, then by the above, each one must be equal to

limn→∞P(Xn = j), so they’re all equal. Q.E.D.

• Example: S = {1, 2, 3}, and ( pij ) =

2/3 1/3 01/3 2/3 00 0 1

− Stationary dist #1: π1 = π2 = 1/2 and π3 = 0.

− Stationary dist #2: π1 = π2 = 0 and π3 = 1.

− Stationary dist #3: π1 = π2 = 1/8 and π3 = 3/4.

− So, here, the stationary distribution is not unique!

− But chain is not irreducible.

Periodic Convergence:

• What about periodic chains? (e.g. s.r.w., Ehrenfest)

• CYCLIC DECOMPOSITION LEMMA: Suppose chain irreducible, with period b ≥ 2.

Then there is a “cyclic” disjoint partition S = S0

•∪S1

•∪ . . .

•∪Sb−1 such that if i ∈ Sr for

some 0 ≤ r ≤ b − 2, then∑j∈Sr+1

pij = 1, while if i ∈ Sb, then∑j∈S0

pij = 1. [picture]

Furthermore, the chain P (b), when restricted to S0, is irreducible and aperiodic.

• PROOF (outline): Fix i0 ∈ S. Then for r = 0, 1, 2, . . . , b−1, let Sr = {j ∈ S : p(bm+r)i0j > 0

for some m ∈ N}. Then the stated properties are easily verified [CHECK!]. Q.E.D.

• PERIODIC CONVERGENCE THM: Suppose chain irreducible, with period b ≥ 2, and

stat dist {πi}. Then ∀i, j ∈ S, limn→∞1b

[p(n)ij + ...+ p

(n+b−1)ij ] = πj, i.e.

limn→∞

1

bP[Xn = j or Xn+1 = j or . . . or Xn+b−1 = j] = πj .

15

• PROOF (outline): Let S0, S1, . . . , Sb−1 be as in Cyclic Decomposition Lemma.

− Then must have π(S0) = π(S1) = . . . = π(Sb−1) = 1/b [check!].

− Let P be the Markov chain corresponding to P (b) restricted to S0.

− Then let πi = b πi for all i ∈ S0.

− Then π is stationary for P [check!].

− By usual Markov Chain Convergence Theorem, limn→∞ p(n)ij = πj for all i, j ∈ S0, i.e.

limn→∞ p(bn)ij = b πj.

− Then for i ∈ S0 and j ∈ Sr, we have limn→∞ p(bn−r)ij = limn→∞

∑k∈S p

(b−r)ik p

(b(n−1))kj =∑

k∈S p(b−r)ik limn→∞ p

(b(n−1))kj =

∑k∈S0

p(b−r)ik limn→∞ p

(b(n−1))kj =

∑k∈S p

(b−r)ik [b πj] = b πj,

again using the M-test.

− Hence, for i ∈ S0, limn→∞1b[p

(bn)ij + p

(bn+1)ij + . . .+ p

(bn+b−1)ij ] = 1

b[b πj + 0] = πj for any

j ∈ S.

− By relabeling, for any i, j ∈ S, limn→∞1b[p

(bn)ij + p

(bn+1)ij + . . .+ p

(bn+b−1)ij ] = πj.

− Follows (by reindexing) that limn→∞1b

[p(n)ij + ...+ p

(n+b−1)ij ] = πj. Q.E.D.

• e.g. for Ehrenfest’s Urn, if i0 = 0, then S0 = {even i ∈ S}, and S1 = {odd i ∈ S}, and

limn→∞12[p

(n)ij + p

(n+1)ij ] = πj = 2−d

(dj

).

• PERIODIC CONVERGENCE COR: If a Markov chain is irreducible with stationary

distribution {πi} (whether periodic or not), then for all i, j ∈ S,

limn→∞1n

[p(1)ij + p

(2)ij + . . .+ p

(n)ij ] = πj, i.e. limn→∞

1n

∑n`=1 p

(`)ij = πj.

• PROOF: For aperiodic chains, this follows from the usual Markov Chain Convergence

Theorem, together with “Cezaro sums”. For chains with period b ≥ 2, this follows from

the Periodic Markov Chain Convergence Theorem, again by “Cezaro sums”. Q.E.D.

• UNIQUE STATIONARY COROLLARY: If Markov chain P is irreducible (not necessarily

aperiodic), then it has at most one stationary distribution (just like before).

• PROOF: If it has at least one, then by the above, each one must be equal to limn→∞1n

[p(1)ij +

p(2)ij + . . .+ p

(n)ij ], so they’re all equal. Q.E.D.

• Example: S = {1, 2, 3}, p11 = 1, p22 = p23 = p32 = p33 = 1/2. For any a ∈ [0, 1],

let π1 = a, and π2 = π3 = (1 − a)/2. Stationary! So, infinitely many stationary

distributions! e.g. π1 = 1 and π2 = π3 = 0, or π1 = 0 and π2 = π3 = 1/2. Not unique!

But not irreducible.

• What about simple random walk? Does it have a stationary dist?

− (Would be reversible with respect to “uniform”, but what is that??)

− Know that p(n)ii ≈ [4p(1− p)]n/2

√2/πn, so p

(n)ii ≤

√2/πn→ 0.

− Then for any i, j ∈ S, find m ∈ N with p(m)ji > 0, then p

(n+m)ii ≥ p

(n)ij p

(m)ji , so we must

have p(n)ij ≤ p

(n+m)ii / p

(m)ji → 0 as well.

− Then, if had stat dist {πi}, then ∀j ∈ S, πj =∑i∈S πip

(n)ij → 0 (using M-test).

16

− [Or, alternatively, would have 12[p

(n)ij + p

(n+1)ij ]→ πj and also 1

2[p

(n)ij + p

(n+1)ij ]→ 0.]

− So, would have πj = 0 for all j, so∑j πj = 0. Impossible! No stationary distribution!

− [Aside: here∑j p

(n)ij = 1 for all n, even though

∑j limn→∞ p

(n)ij = 0. So, M-test

conditions are not satisfied for that limit.]

• If S is infinite, can there ever be a stationary distribution? Yes!

• Example: S = N = {1, 2, 3, . . .}, and for i ≥ 2, pi,i = pi,i+1 = 1/4 and pi,i−1 = 1/2, and

p1,1 = 3/4 and p1,2 = 1/4.

− Let πi = 2−i, so πi ≥ 0 and∑i πi = 1.

− Then for any i ∈ S, πipi,i+1 = 2−i(1/4) = 2−i−2.

− Also, πi+1pi+1,i = 2−(i+1)(1/2) = 2−i−2. Equal!

− And πipi,j = 0 if |j − i| ≥ 2.

− So reversible! So, {πi} is stationary dist.

− Also irreducible and aperiodic (easy).

− So, limn→∞ P (Xn = j) = πj = 2−j for all j ∈ S.

Application – Metropolis Algorithm (Markov Chain Monte Carlo) (MCMC):

• Let S = Z, and let {πi} be any prob dist on S. Assume πi > 0 for all i.

• Can we create Markov chain transitions {pij} so that limn→∞

p(n)ij = πj.

• Yes! Let pi,i+1 = 12

min[1, πi+1

πi], pi,i−1 = 1

2min[1, πi−1

πi], and pi,i = 1− pi,i+1− pi,i−1, with

pij = 0 otherwise.

• Equivalent algorithmic version: Given Xn−1, let Yn equal Xn−1 ± 1 (prob 1/2 each), and

Un ∼ Uniform[0, 1] (indep.), and

Xn =

{Yn, Un <

πYnπXn−1

(“accept”)

Xn−1, otherwise (“reject”)

• Then πipi,i+1 = πi12

min[1, πi+1

πi] = 1

2min[πi, πi+1].

• Also πi+1pi+1,i = πi+112

min[1, πiπi+1

] = 12

min[πi+1, πi].

• So πipij = πjpji if j = i+ 1, hence for all i, j ∈ S.

• So, chain is reversible w.r.t. {πi}, so {πi} stationary.

• Also irreducible and aperiodic (easy).

• So, limn→∞

p(n)ij = πj, i.e. lim

n→∞P[Xn = j] = πj. Q.E.D.

• Widely used to sample from complicated distributions {πi}, and thus estimate their prob-

ability / expected values / etc.

− [ ** Animated version available at: www.probability.ca/met ** ]

• Also works on continuous state spaces, with π a density function (e.g. the Bayesian

posterior density).

17

http://probability.ca/met

− “markov chain monte carlo” gives 2,130,000 hits in Google!

Application – Random Walks on Graphs:

• Let V be a non-empty finite or countable set.

• Let w : V × V → [0,∞) be a symmetric weight function (i.e. w(u, v) = w(v, u)).

− Usual (unweighted) case: w(u, v) = 1 if there is an edge between u and v, otherwise

w(u, v) = 0. (diagram)

− Or can have other weights, multiple edges, self-loops (w(u, u) > 0), etc.

• Let d(u) =∑v∈V w(u, v). (“degree” of vertex u)

• Define a Markov chain on S = V by puv = w(u,v)d(u)

.

− Check:∑v∈V puv =

∑v∈V w(u,v)∑v∈V w(u,v)

= 1.

− “(simple) random walk on the weighted undirected graph (V,w)”

• Usual case: each w(u, v) = 0 or 1, so from u, the chain moves to one of the d(u) vertices

connected to u with equal prob.

————————— END OF WEEK #4 ——————————

• Other examples: Irreducible? Aperiodic? Stationarity distribution?

• Example: V = {1, 2, 3, 4, 5}, with w(i, i + 1) = w(i + 1, i) = 1 for i = 1, 2, 3, 4, and

w(5, 1) = w(1, 5) = 1, with w(i, j) = 0 otherwise. (“ring”) (diagram) Irreducible?

Aperiodic? Stationarity distribution?

• Example: V = {0, 1, 2, . . . , K}, with w(i, 0) = w(0, i) = 1 for i = 1, 2, 3, with w(i, j) = 0

otherwise. (“star”) (diagram)

• Example: V = {1, 2, . . . , K}, with w(i, i + 1) = w(i + 1, i) = 1 for 1 ≤ i ≤ K − 1, with

w(i, j) = 0 otherwise. (“stick”) (diagram)

• Example: V = Z, with w(i, i + 1) = w(i + 1, i) = 1 for all i ∈ V , and w(i, j) = 0

otherwise.

− Random walk on this graph corresponds to simple random walk with p = 1/2.

• Example: V = {1, 2, . . . , 20}, with w(i, i) = 1 for 1 ≤ i ≤ 20, and w(i, i+1) = w(i+1, i) =

1 for 1 ≤ i ≤ 19, and w(20, 1) = w(1, 20) = 1, and w(i, j) = 0 otherwise.

− Random walk on this graph corresponds to the Frog Example!

• Let Z =∑u∈V d(u) =

∑u,v∈V w(u, v).

− In unweighted case, Z = 2×(number of edges).

− Assume that Z is finite (it might not be, if V is infinite).

− And, assume that d(u) > 0 for all u ∈ V (so any isolated point has a self-loop), to

make puv = w(u,v)d(u)

well-defined.

• Let πu = d(u)Z

, so πu ≥ 0 and∑u πu = 1.

18

− Then πupuv = d(u)Z

w(u,v)d(u)

= w(u,v)Z

.

− And, πvpvu = d(v)Z

w(v,u)d(v)

= w(v,u)Z

= w(u,v)Z

. Same!

− So, chain is reversible w.r.t. {πu}.− So, {πu} is stationary dist.

• If graph is connected, then chain is irreducible.

• If graph is bipartite (i.e., can be divided into two subsets s.t. all links go from one to the

other), then the chain has period 2.

− Otherwise, the chain is aperiodic (since can return to u in 2 steps).

− (i.e., 1 and 2 are the only possible periods)

• This proves: GRAPH CONVERGENCE THM: for random walk on a connected non-

bipartite graph, if Z <∞, then limn→∞ p(n)uv = πv = d(v)

Zfor all u, v ∈ V .

− i.e., limn→∞P[Xn = v] = d(v)Z

.

• What about bipartite graphs? Use Periodic Convergence Thm!

− PERIODIC GRAPH CONVERGENCE THM: for random walk on any connected

graph with Z <∞ (whether bipartite or not), limn→∞12[p(n)uv + p(n+1)

uv ] = d(v)Z

.

• Example: V = {1, 2, . . . , K}, with w(i, i + 1) = w(i + 1, i) = 1 for 1 ≤ i ≤ K − 1, with

w(i, j) = 0 otherwise. (“stick”)

− Connected, but bipartite.

− p12 = 1, and pK,K−1 = 1, and pi,i+1 = pi,i−1 = 1/2 for 2 ≤ i ≤ K − 1.

− πi = 12K−2

for i = 1, K, and πi = 22K−2

for 2 ≤ i ≤ K − 1.

− Then, know that limn→∞12[p

(n)ij + p

(n+1)ij ] = πj for all j ∈ V .

• What about star? Or, star with an extra edge between 0 and 0?

Application – Gambler’s Ruin:

• Let 0 < a < c be integers, and let 0 < p < 1.

• Suppose player A starts with a dollars, and player B starts with c− a dollars.

• At each bet, A wins $1 with prob p, or loses $1 with prob 1− p.• Let Xn be the amount of money A has at time n.

− So, X0 = a.

• Let Ti = inf{n ≥ 0 : Xn = i} be the first time A has i dollars.

• QUESTION: what is Pa(Tc < T0), i.e. the prob that A reaches c dollars before reaching

0 (i.e., before losing all their money)?

− (see e.g. www.probability.ca/randwalk)

• Example: What does it equal if c = 10, 000, a = 9, 700, and p = 0.49?

• Example: Is it higher if c = 8, a = 6, p = 1/3 (“born rich”), or if c = 8, a = 2, p = 2/3

(“born lucky”)?

19


• Here {Xn} is a Markov chain (good), but there’s no limit to how long the game might

take (bad).

− So, how to solve it??

• Key: write Pa(Tc < T0) as s(a), and consider it to be a function of a.

− Can we related the different unknown s(a) to each other?

• Clearly s(0) = 0, and s(c) = 1.

• Furthermore, on the first bet, A either wins or loses $1.

− So, for 1 ≤ a ≤ c− 1,

s(a) = Pa(Tc < T0)

= Pa(Tc < T0, X1 = X0 + 1) + Pa(Tc < T0, X1 = X0 − 1)

= P(X1 = X0 + 1) Pa(Tc < T0 |X1 = X0 + 1)

+ P(X1 = X0 − 1) Pa(Tc < T0 |X1 = X0 − 1)

= p s(a+ 1) + (1− p) s(a− 1) .

• This gives c− 1 equations for the c− 1 unknowns.

− Can solve using simple algebra!

• Re-arranging, p s(a) + (1− p) s(a) = p s(a+ 1) + (1− p) s(a− 1).

− Hence, s(a+ 1)− s(a) = 1−pp

[s(a)− s(a− 1)].

− Let x = s(1) (unknown).

− Then s(1)− s(0) = x, and s(2)− s(1) = 1−pp

[s(1)− s(0)] = 1−ppx.

− Then s(3)− s(2) = 1−pp

[s(2)− s(1)] =(

1−pp

)2x.

− In general, for 1 ≤ a ≤ c− 1, s(a+ 1)− s(a) =(

1−pp

)ax.

− Hence, for 1 ≤ a ≤ c− 1,

s(a) = s(a)− s(0)

= (s(a)− s(a− 1)) + (s(a− 1)− s(a− 2)) + . . .+ (s(1)− s(0))

=

(1− pp

)a−1

+

(1− pp

)a−2

+ . . .+

(1− pp

)+ 1

x

=

(

( 1−pp )

a−1

( 1−pp )−1

)x, p 6= 1/2

ax, p = 1/2

− But s(c) = 1, so we can solve for x:

x =

( 1−p

p )−1

( 1−pp )

c−1, p 6= 1/2

1/c, p = 1/2

20

• We then obtain our final Gambler’s Ruin formula:

s(a) =

( 1−p

p )a−1

( 1−pp )

c−1, p 6= 1/2

a/c, p = 1/2

• Example: If c = 10, 000, a = 9, 700, p = 0.49, then

s(a) =

(0.510.49

)9,700− 1(

0.510.49

)10,000− 1

.= 0.000006134

.= 1/163, 000 .

• Example: If c = 8, a = 6, p = 1/3 (“born rich”),

s(a) =

(2/31/3

)6− 1(

2/31/3

)8− 1

= 63/255.= 0.247 .

• Example: If c = 8, a = 2, p = 2/3 (“born lucky”),

s(a) =

(1/32/3

)2− 1(

1/32/3

)8− 1

= (3/4) / (255/256).= 0.753 .

• So, it is better to be born lucky than rich!

• Check: is s(a) continuous as a function of p, as p→ 1/2?

Eventual Ruin:

• What happens to Gambler’s Ruin as c→∞?

• What is P(T0 <∞), the probability of eventual ruin?

• By continuity of probabilities, P(T0 <∞) = limc→∞Pa(T0 < Tc).

− So, let r(a) = 1− s(a) = Pa(T0 < Tc) = P(ruin).

− Then r(a) is like s(a), but from the other player’s perspective.

− Hence, need to replace a by c− a, and replace p by 1− p.− Follows that

r(a) =

( p1−p)

c−a−1

( p1−p)

c−1

, p 6= 1/2

(c− a)/c, p = 1/2

− (Check: r(a) + s(a) = 1.)

• Then compute that

P(T0 <∞) = limc→∞

Pa(T0 < Tc) =

0−10−1

= 1, p < 1/2

11

= 1, p = 1/2(p

1−p

)−a, p > 1/2

21

• e.g. if p = 2/3 and a = 2, then P(T0 <∞) =(

2/31−(2/3)

)−2= 2−2 = 1/4.

− Hence, P(T0 =∞) = 3/4, i.e. have probability 3/4 of never going broke.

Expected Return Times:

• DEFN: the expected return time of a state i is mi = Ei(inf{n ≥ 1 : Xn − i}).• DEFN: a state is positive recurrent if mi <∞.

− (It is null recurrent if it is recurrent but mi =∞.)

− Connection to stationary distributions??

• Suppose a chain starts at i, and mi <∞.

− Over the long run, what fraction of time will it spend at i?

− Well, on average, the chain returns to i once every mi steps.

− So, by the SLLN, limn→∞1n

∑nk=1 1Xk=i = 1/mi.

− Hence, by Bounded Convergence Theorem, limn→∞Ei

(1n

∑nk=1 1Xk=i

)= 1/mi.

− But if the chain converges to π, then limn→∞Ei

(1n

∑nk=1 1Xk=i

)= limn→∞

1n

∑nk=1 Ei(1Xk=i) =

limn→∞1n

∑nk=1 Pi(Xk = i) = limn→∞Pi(Xn = i) = limn→∞ p

(n)ii = πi.

− (If the chain is periodic, then this still holds, by the Periodic Convergence Theorem.)

− So, we must have 1/mi = πi, i.e. mi = 1/πi!

− This proves the RETURN TIME THEOREM: If an irreducible Markov chain on a

discrete state space S has a stationary distribution π, then for any state i ∈ S, the

mean return time satisfies mi ≡ Ei(Ti) = 1/πi.

− (For more details, see e.g. Rosenthal, Theorem 8.4.9.)

• By contrast, if mj =∞ for all j, then there is no stationary distribution.

− So, for s.r.w., must have mj =∞ for all j.

− However, if chain is irreducible and S is finite, then mj < ∞ for all j, i.e. there is a

stationary distribution.

Martingales:

• MOTIVATION: Gambler’s ruin with p = 1/2. (So know that s(a) = a/c.)

− Let T = inf{n ≥ 0 : Xn = 0 or c} = time when game ends.

− Then E(XT ) = cP(XT = c) + 0P(XT = 0) = c s(a) + 0 (1− s(a)) = c (a/c) + 0 (1−a/c) = a.

− So E(XT ) = E(X0), i.e. “on average it stays the same”.

————————— END OF WEEK #5 ——————————

22

− Makes sense since E(Xn+1 |Xn = i) = (1/2)(i+ 1) + (1/2)(i− 1) = i.

− Reverse logic: If we knew that E(XT ) = E(X0) = a, then could compute that a =

c s(a) + 0 (1− s(a)), so must have s(a) = a/c. (Easier solution!)

• DEFN: A sequence {Xn}∞n=0 of random variables is a martingale if E|Xn| < ∞ for each

n, and also E(Xn+1|X0, ..., Xn) = Xn (i.e., it stays same on average).

• SPECIAL CASE: If {Xn} is a Markov chain (with E|Xn| <∞), then E[Xn+1|X0, ..., Xn] =∑j j P [Xn+1 = j|X0, ..., Xn] =

∑j j pXn,j, so martingale if

∑j j pij = i for all i.

• EXAMPLE: Let {Xn} be simple random walk with p = 1/2 (i.e., “simple symmetric

random walk”, or s.s.r.w.).

− Martingale, since∑j j pij = (i+ 1)(1/2) + (i− 1)(1/2) = i.

• (Optional aside: in defn of martingale, suffices to check that E|Xn| < ∞ for all n,

and also E(Xn+1|Fn) = Xn, where {Fn} is any nested filtration for {Xn}, i.e. Fn is a

sub-σ-algebra with σ(X0, X1, . . . , Xn) ⊆ Fn ⊆ Fn+1.)

• If {Xn} martingale, then it follows from “double-expectation formula” that

E(Xn+1) = E[E(Xn+1 |X0, X1, . . . , Xn)

]= E(Xn) ,

i.e. that E(Xn) = E(X0) for all n.

• But what about E(XT ) for a random time T?

Stopping Times:

• DEFN: A non-negative-integer-valued random variable T is a stopping time for {Xn} if

the event {T = n} is determined by X0, X1, . . . , Xn.

− i.e., can’t look into future before deciding to stop.

− e.g. T = inf{n ≥ 0 : Xn = 5} is a valid stopping time. (=∞ if never hit 5)

− e.g. T = inf{n ≥ 0 : Xn = 0 or Xn = c} is a valid stopping time.

− e.g. T = inf{n ≥ 2 : Xn−2 = 5} is a valid stopping time.

− e.g. T = inf{n ≥ 2 : Xn−1 = 5, Xn = 6} is a valid stopping time.

− e.g. T = inf{n ≥ 0 : Xn+1 = 5} is not a valid stopping time (since it looks into the

future).

• Do we always have E(XT ) = E(X0), if T is a stopping time?

− At least if P (T <∞) = 1?

• Not necessarily!

− e.g. let {Xn} be s.s.r.w. with X0 = 0. Martingale!

− Let T = T−5 = inf{n ≥ 0 : Xn = −5}. Stopping time!

− And, P(T <∞) = 1 since s.s.r.w. is recurrent.

− But XT = −5, so E(XT ) = −5 6= 0 = E(X0).

− What went wrong? Need some boundedness conditions!

23

• OPTIONAL STOPPING LEMMA: If {Xn} martingale, with stopping time T which is

bounded (i.e., ∃M <∞ with P(T ≤M) = 1), then E(XT ) = E(X0).

• PROOF: Using the double-expectation formula, and then the fact that “1 − 1T≤k−1” is

completely determined by X0, X1, . . . , Xk−1 (and thus can be treated as a constant in the

conditional expectation; this fact is optional), we have:

E(XT )− E(X0) = E(XT −X0) = E

[T∑k=1

(Xk −Xk−1)

]

= E

[M∑k=1

(Xk −Xk−1)1k≤T

]=

M∑k=1

E[(Xk −Xk−1)1k≤T ]

=M∑k=1

E[(Xk −Xk−1)(1− 1T≤k−1)]

=M∑k=1

E(E[(Xk −Xk−1)(1− 1T≤k−1)

∣∣∣ X0, X1, . . . , Xk−1])

=M∑k=1

E(E[(Xk −Xk−1)

∣∣∣ X0, X1, . . . , Xk−1](1− 1T≤k−1))

=M∑k=1

E((0)(1− 1T≤k−1)

)= 0 , Q.E.D.

− Question: How does this proof break down if M =∞?

• Example: s.s.r.w., with X0 = 0, and let

T = min(1012, inf{n ≥ 0 : Xn = −5}

).

− Then T ≤ 1012, so T bounded, so E(XT ) = E(X0) = E(0) = 0.

− But nearly always have XT = −5. Contradiction??

− No, since by the Law of Total Expectation, 0 = E(XT ) = P(XT = −5)E(XT |XT =

−5)+P(XT 6= 5)E(XT |XT 6= 5), and E(XT |XT = −5) = −5, and P(XT = −5) ≈ 1,

and P(XT 6= 5) ≈ 0, but the equation still holds since E(XT |XT 6= 5) is huge.

• Can we apply this to the Gambler’s Ruin problem?

− No, since there T is not bounded!

− Need something more general!

• OPTIONAL STOPPING THM: If {Xn} is martingale with stopping time T , and P(T <

∞) = 1, and E|XT | <∞, and limn→∞

E(Xn 1T>n) = 0, then E(XT ) = E(X0).

• PROOF:

− For each m ∈ N, let Sm = min(T,m). Stopping time! Bounded!

− Then by Optional Stopping Lemma, E(XSm) = E(X0) (for any m).

− But XSm = Xmin(T,m) = XT1T≤m +Xm1T>m = XT −XT1T>m +Xm1T>m.

− So, XT = XSm +XT1T>m −Xm1T>m.

24

− So, E(XT ) = E(XSm) + E(XT1T>m)− E(Xm1T>m). (three terms to consider)

− First term = E(X0) from above.

− Second term→ 0 as m→∞ by Dominated Convergence Thm, since E|XT | <∞ and

1T>m → 0 (since P(T <∞) = 1).

− Third term → 0 as m→∞ by assumption.

− So, E(XT )→ E(X0), i.e. E(XT ) = E(X0). Q.E.D.

• OPTIONAL STOPPING COROLLARY: If {Xn} is martingale with stopping time T ,

which is “bounded up to time T” (i.e., ∃M <∞ with P(|Xn|1n≤T ≤M) = 1 for all n),

and P(T <∞) = 1, then E(XT ) = E(X0).

• PROOF:

− It follows that P(|XT | ≤ M) = 1. [Formally, this holds since P(|XT | > M) =∑nP(T = n, |XT | > M) =

∑nP(T = n, |Xn|1n≤T > M) ≤ ∑

nP(|Xn|1n≤T >

M) =∑n(0) = 0.]

− Hence, E|XT | ≤M <∞.

− Also, |E(Xn1T>n)| ≤ E(|Xn|1T>n) = E(|Xn|1n≤T1T>n) ≤ E(M1T>n) = M P(T >

n), which → 0 as n→∞ since P(T <∞) = 1.

− Hence, result follows from Optional Stopping Theorem. Q.E.D.

• Example: Gambler’s Ruin with p = 1/2, and T = inf{n ≥ 0 : Xn = 0 or Xn = c}.− Then P(T < ∞) = 1 (game must eventually end). [Formally: P(T > mc) ≤ (1 −pc)m → 0 as m→∞, since if win c times in a row then game over.]

− Also, |Xn|1n≤T ≤ c <∞ for all n.

− So, by Optional Stopping Corollary, E(XT ) = E(X0) = a.

− Hence, as before, a = c s(a) + 0 (1− s(a)), so must have s(a) = a/c. (Easier solution!)

• What about Gambler’s Ruin with p 6= 1/2?

− Here {Xn} is not a martingale:∑j j pij = p(i+ 1) + (1− p)(i− 1) = i+ 2p− 1 6= i.

− Trick: let Yn =(

1−pp

)Xn

. Then {Yn} is Markov chain too.

− Then E(Yn+1 |Y0, Y1, . . . , Yn) = p[Yn(

1−pp

)]+ (1 − p)

[Yn /

(1−pp

)]= Yn(1 − p) +

Yn(p) = Yn.

− So, {Yn} is a martingale!

− And, P(T <∞) = 1 as before (with the same T ).

− And, |Yn|1n≤T ≤ max((

1−pp

)0,(

1−pp

)c)<∞ for all n.

− Hence, E(YT ) = E(Y0) =(

1−pp

)a.

− But YT =(

1−pp

)cif win, or YT =

(1−pp

)0= 1 if lose.

− Hence,(

1−pp

)a= E(YT ) = s(a)

(1−pp

)c+ [1− s(a)](1) = 1 + s(a)

[(1−pp

)c− 1

].

25

− Solving, s(a) =( 1−p

p )a−1

( 1−pp )

c−1

. (Again, easier solution!)

• REMINDER: Midterm next week!

• REMINDER: Reading Week (no class) the week after!

————————— END OF WEEK #6 ——————————

• (Midterm.)

————————— END OF WEEK #7 ——————————

Wald’s Theorem:

• WALD’S THM: Suppose Xn = a + Z1 + . . . + Zn, where {Zi} are iid, with finite mean

m. Let T be a stopping time for {Xn} which has finite mean, i.e. E(T ) < ∞. Then

E(XT ) = a+mE(T ).

• Special case: if m = 0, then {Xn} is a martingale, and Wald’s Thm says that E(XT ) =

a = E(X0), as usual.

• Example: {Xn} is s.s.r.w. with X0 = 0, and T = inf{n ≥ 0 : Xn = −5}.− Then E(XT ) = −5 6= 0 = E(X0).

− Contradiction?? No; it turns out that here E(T ) =∞.

• PROOF: We compute (using that Zi indep of {T ≥ i} = {T ≤ i− 1}C) that

E(XT )− a = E(XT − a) = E(Z1 + . . .+ ZT )

= E

[T∑i=1

Zi

]= E

[ ∞∑i=1

Zi 1T≥i

]=

∞∑i=1

E [Zi 1T≥i]

=∞∑i=1

E[Zi]E[1T≥i] = m∞∑i=1

P[T ≥ i] = m E(T ) , Q.E.D.

− (Note: the above calculation can be justified using the Dominated Convergence Thm

with dominator Y =∑∞i=1 |Zi|1T≥i, since E(Y ) = E[

∑∞i=1 |Zi|1T≥i] =

∑∞i=1 E[|Zi|1T≥i] =∑∞

i=1 E|Zi|E[1T≥i] = E|Z1|∑∞i=1 P[T ≥ i] = E|Z1|E(T ) <∞.)

• EXAMPLE: Gambler’s Ruin with p 6= 1/2, and T = inf{n ≥ 0 : Xn = 0 or c}. (see e.g.

www.probability.ca/randwalk)

− What is E(T ) = expected number of bets in the game?

− Well, here m = E(Zi) = p(1) + (1− p)(−1) = 2p− 1.

− Also, E(XT ) = c s(a) + 0 (1− s(a)) = c( 1−p

p )a−1

( 1−pp )

c−1

.

− And, E(T ) < ∞. [For example, this follows since P(T ≥ cn) ≤ (1 − pc)n so E(T ) =∑∞i=1 P(T ≥ i) ≤ ∑∞j=0 cP(T ≥ cj) ≤ ∑∞j=0 c (1− pc)j = c/[1− (1− pc)] = c/pc <∞.]

26


− Hence, by Wald’s Thm, E(XT ) = a+mE(T ).

− So, E(T ) = 1m

(E(XT )− a) = 12p−1

(c( 1−p

p )a−1

( 1−pp )

c−1− a

).

− e.g. p = 0.49, a = 9, 700, c = 10, 000: E(T ) = 484, 997. (large!)

• But what about E(T ) when p = 1/2?

− Then m = 0, so the above method does not work.

• LEMMA: Let Xn = a+Z1 + . . .+Zn, where {Zi} i.i.d. with mean 0 and variance v <∞.

Let Yn = (Xn − a)2 − nv = (Z1 + . . .+ Zn)2 − nv. Then {Yn} is a martingale.

• PROOF:

− Check: E|Yn| ≤ Var(Xn) + nv = 2nv <∞.

− Also, since Zn+1 indep of Z1, . . . , Zn, Y0, . . . , Yn, we have

E[Yn+1 |Y0, Y1, . . . , Yn] = E[(Z1 + . . .+ Zn + Zn+1)2 − (n+ 1)v

∣∣∣ Y0, Y1, . . . , Yn]

= E[(Z1 + . . .+ Zn)2 + (Zn+1)2 + 2Zn+1(Z1 + . . .+ Zn)− nv − v

∣∣∣ Y0, Y1, . . . , Yn]

= E[Yn + (Zn+1)2 − v + 2Zn+1(Z1 + . . .+ Zn)

∣∣∣ Y0, Y1, . . . , Yn]

= Yn + v − v + 2E(Zn+1)E[Z1 + . . .+ Zn

∣∣∣ Y0, Y1, . . . , Yn]

= Yn + 0 , Q.E.D.

• COR: If {Xn} is Gambler’s Ruin with p = 1/2, and T = inf{n ≥ 0 : Xn = 0 or c}, then

E(T ) = Var(XT ) = a(c− a).

• PROOF:

− Let Yn = (Xn − a)2 − n (since here v = 1). Martingale (by Lemma)!

− Choose M > 0, and let SM = min(T,M). Stopping time! Bounded!

− Hence, by Optional Stopping Lemma, E[YSM] = E[Y0] = (a− a)2 − 0 = 0.

− But YSM= (XSM

− a)2 − SM , so E(SM) = E[(XSM− a)2].

− As M → ∞, SM → T (obviously). This implies that E(SM) → E(T ) (by Monotone

Convergence Thm), and E[(XSM− a)2] → E[(XT − a)2] (by Bounded Convergence

Thm, since for any n, (XSM− a)2 ≤ max (a2, (c− a)2) <∞).

− Hence, E(T ) = E[(XT − a)2] = Var(XT ) (since E(XT ) = a).

− But Var(XT ) = (a/c) (c − a)2 + (1 − a/c)a2 = (a/c)(c2 + a2 − 2ac) + (a2 − a3/c) =

ac+ a3/c− 2a2 + a2 − a3/c = ac− a2 = a(c− a), Q.E.D.

• e.g. c = 10, 000, a = 9, 700, p = 1/2: E(T ) = a(c− a) = 2, 910, 000. (even larger!)

• e.g. “Double Until You Win”:

− Let {Zi} be iid ±1 with probability 1/2 each.

− Suppose we bet 2i dollars on the ith bet.

− Then our winnings after n bets equals Xn =∑ni=1 2iZi.

− Let T = inf{n ≥ 1 : Zn = +1} be our first win.

27

− Then w.p. 1, T <∞ and XT = −1− 2− 4− . . .− 2T−1 + 2T = +1.

− That is, we are guaranteed (!) to be up $1 at time T , even though we start at a = 0

and have average gain m = 0 on each bet.

− Thus E(XT ) = 1 6= a+mE(T ), even though E(T ) = 2 <∞.

− This does not contradict Wald’s Theorem, since {2iZi} is not iid.

Application – Sequence Waiting Times:

• Suppose we repeatedly flip a fair coin. Let τ be the first time the sequence “HTH” is

completed. What is E(τ)?

− And, is the answer the same for “THH”?

− Try it out: file www.probability.ca/sta447/Rseqwait

• One solution: use Markov chains!

• Let Xn be the partial amount of the desired sequence (HTH) that the chain has “achieved

so far” after n flips. (For example, if the flips begin with HHTTHT, then X1 = 1, X2 = 1,

X3 = 2, X4 = 0, X5 = 1, and X6 = 2.) Take X0 = 0.

− So, S = {0, 1, 2, 3}, with X0 = 0, and Xτ = 3.

− Then p01 = p12 = p23 = 1/2. (Probability of continuing the sequence.)

− Also p00 = p20 = 1/2. But instead of p10 = 1/2, have p11 = 1/2. Key!

− (That is, if you fail to match the second flip, T, then you’ve already matched the first

flip, H, for the next try.)

− For completeness, assume we “start over” as soon as we win, so p3j = p0j for all j,

i.e. p31 = p30 = 1/2.

− Thus, P =

1/2 1/2 0 00 1/2 1/2 0

1/2 0 0 1/21/2 1/2 0 0

.

• Compute from equation πP = π (check!) that the stationary distribution is (0.3, 0.4, 0.2, 0.1).

− So, by the Return Time Theorem, the mean time to return from state 3 to state 3 is

1/π3 = 1/0.1 = 10.

− But returning from state 3 to state 3 has the same probabilities as going from state 0

to state 3.

− Hence, the mean time to go from state 0 to state 3 is 10.

− That is, mean waiting time for HTH is 10.

− Solved it!


• What about THH? Is it the same?

28

http://probability.ca/sta447/Rseqwait


− Here we compute similarly (check!) that P =

1/2 1/2 0 00 1/2 1/2 00 1/2 0 1/2

1/2 1/2 0 0

.

− Then compute (check!) that the stationary distribution is (1/8, 1/2, 1/4, 1/8).

− So, mean time to return to state 3 is 1/(1/8) = 8. Smaller!


• ANOTHER APPROACH (to HTH), USING MARTINGALES:

• Suppose that at each time n, a new “player” appears, and bets $1 on heads, then if they

win they bet $2 on tails, then if they win again they bet $4 on heads. (Each player stops

betting as soon as they either lose once or win three bets in a row.)

− Let Xn be the total amount won by all the betters by time n.

− Then since the bets were fair, {Xn} is a martingale with stopping time τ .

− Then have (check!) that Xτ = −(τ − 3) + (−1) + (1) + (7) = −τ + 10.

− It follows that E(Xτ ) = E(X0) = 0. (Indeed, if Tm = min(τ,m), then E(XTm) = 0

by Optional Stopping Lemma, and limm→∞E(XTm) = E(Xτ ) by Dominated Conver-

gence Theorem with dominator Y = 7τ since |Xn −Xn−1| ≤ 7.)

− Hence, 0 = E(Xτ ) = −E(τ) + 10, whence E(τ) = 10. Same as before!

• Similarly, for THH, get that Xτ = −(τ − 3) + (−1) + (−1) + (7) = −τ + 8, whence

E(τ) = 8. Same as before!

Martingale Convergence Theorem:

• EXAMPLE: Let {Xn} be a Markov chain on S = {2m : m ∈ Z}, with X0 = 1, and

pi,2i = 1/3 and pi,i/2 = 2/3 for i ∈ S.

− Martingale, since∑j j pij = (2i)(1/3) + (i/2)(2/3) = i.

− What happens in the long run?

− Trick: let Yn = log2Xn. Then Y0 = 0, and {Yn} is s.r.w. with p = 1/3, so Yn → −∞w.p. 1.

− Hence, Xn = 2Yn → 2−∞ = 0 w.p. 1.

• EXAMPLE: Let {Xn} be Gambler’s Ruin with p = 1/2. Then Xn → X w.p. 1, where

P(X = c) = a/c and P(X = 0) = 1− a/c.• MARTINGALE CONVERGENCE THM: Any non-negative martingale {Xn} converges

w.p. 1 to some random variable X (e.g. X ≡ 0).

− Intuition: since it’s non-negative (i.e., bounded on one side), it can’t “spread out”

forever. And since it’s a martingale, it can’t “drift” anywhere. So eventually it has

to stop somewhere.

− Proof omitted; see e.g. Rosenthal, p. 169.

• Example: s.s.r.w. – martingale, but not non-negative, does not converge.

29


• Example: s.s.r.w. stopped at zero – martingale, non-negative, converges to zero.

• Example: symmetric random walk which makes smaller and smaller jumps when near

zero – martingale, non-negative, converges to something.

• Example: s.r.w. with p = 2/3 stopped at zero – non-negative, does not converge (might

increase to infinity), but not a martingale.

• COR: Any martingale {Xn} with Xn ≥ −c for some c ∈ R converges w.p. 1 to some

random variable X.

− PROOF: Apply THM to {Xn + c}.• COR: Any martingale {Xn} with Xn ≤ c for some c ∈ R converges w.p. 1 to some

random variable X.

− PROOF: Apply THM to {−Xn + c}.

————————— END OF WEEK #8 ——————————

Application – Branching Processes:

• Let µ be any prob dist on {0, 1, 2, . . .}. (“offspring distribution”)

− Have Xn individuals at time n. (e.g., people with colds)

− Start with X0 = a individuals. Assume 0 < a <∞.

− Each of the Xn individuals at time n has a random number of offspring which is

i.i.d. ∼ µ, i.e. has i children with probability µ{i}. (diagram)

− (Just one “parent” per offspring, i.e. asexual reproduction.)

− That is, Xn+1 = Zn,1 + Zn,2 + . . .+ Zn,Xn , where {Zn,i}Xni=1 are i.i.d. ∼ µ.

− Then {Xn} is Markov chain, on state space S = {0, 1, 2, . . .}.− p00 = 1, with p0j = 0 for all j ≥ 1.

− pij is more complicated; in fact (optional), pij = (µ ∗ µ ∗ . . . ∗ µ)(j), a convolution of

i copies of µ.

• Will Xn = 0 for some n?

− How can martingales help?

• Let m =∑i i µ{i} = mean of µ. (“reproductive number”)

− Assume 0 < m <∞.

− Then E(Xn+1 |X0, . . . , Xn) = E(Zn,1 + Zn,2 + . . . + Zn,Xn |X0, . . . , Xn) = mXn. So,

by induction, E(Xn) = mnE(X0) = mna <∞.

• If m < 1, then E(Xn) = amn → 0.

− But E(Xn) =∑∞k=0 kP(Xn = k) ≥ ∑∞k=1 P(Xn = k) = P(Xn ≥ 1).

− Hence, P(Xn ≥ 1) ≤ E(Xn) = amn → 0, i.e. P(Xn = 0)→ 1.

− Certain extinction!

30

• If m > 1, then E(Xn) = amn →∞.

− FACT: In this case, also P(Xn →∞) > 0. (“flourishing”)

− But assuming µ{0} > 0, still have P(Xn → ∞) < 1, indeed P(Xn → 0) > 0 (e.g., if

no one has any offspring at all on the first iteration: prob = (µ{0})a > 0).

− So, have possible extinction, but also possible flourishing.

• What if m = 1?

− Then E(Xn) = E(X0) = a for all n.

− In fact, {Xn} is a martingale, and non-negative.

− So, by Martingale Convergence Thm, must have Xn → X w.p. 1, for some random

variable X.

− But how can {Xn} converge w.p. 1? Either (a) µ{1} = 1, or (b) X = 0.

− (In all other cases, {Xn} would continue to fluctuate, i.e. not converge w.p. 1.)

− So, if non-degenerate (i.e., µ{1} < 1), then X ≡ 0, i.e. {Xn} → 0 w.p. 1.

− Certain extinction, even when m = 1!

CONTINUOUS PROCESS: Brownian Motion:

• Let {Xn}∞n=0 be s.s.r.w., with X0 = 0.

• Represent this as Xn = Z1 + Z2 + . . . + Zn, where {Zi} are i.i.d. with P(Zi = +1) =

P(Zi = −1) = 1/2.

− That is, X0 = 0, and Xn+1 = Xn + Zn+1.

− Here E(Zi) = 0 and Var(Zi) = 1.

• Let M be a large integer, and let {Y (M)t } be like {Xn}, except with time speeded up by

a factor of M , and space shrunk down by a factor of√M .

− That is, Y(M)

0 = 0, and Y(M)i+1M

= Y(M)iM

+ 1√MZi+1. (diagram)

− Fill in {Y (M)t }t≥0 by linear interpolation. (file www.probability.ca/sta447/Rbrownian;

some images at www.probability.ca/sta447/brownian/)

• Brownian motion {Bt}t≥0 is (intuitively) the limit as M →∞ of {Y (M)t }.

• But Y(M)t = 1√

M(Z1 + Z2 + . . .+ ZtM) (at least, if tM ∈ Z; otherwise within O(1/

√M),

which doesn’t matter when M →∞).

− Thus, E(Y(M)t ) = 0, and Var(Y

(M)t ) = 1

M(tM) = t.

− So, as M →∞, by the Central Limit Theorem, Y(M)t → Normal(0, t).

− CONCLUSION: Bt ∼ Normal(0, t). (“normally distributed”)

• Also, if 0 < t < s, then Y (M)s − Y

(M)t = 1√

M(ZtM+1 + ZtM+2 + . . . + ZsM) (at least, if

tM, sM ∈ Z; otherwise within O(1/√M)).

− So, Y (M)s − Y (M)

t → Normal(0, s− t), and it is independent of Y(M)t .

− CONCLUSION: Bs −Bt ∼ Normal(0, s− t), and it’s independent of Bt.

31

http://probability.ca/sta447/Rbrownian

http://probability.ca/sta447/brownian/

− MORE GENERALLY: if 0 ≤ t1 ≤ s1 ≤ t2 ≤ s2 ≤ . . . ≤ tk ≤ sk, then Bsi − Bti ∼Normal(0, si − ti), and {Bsi − Bti}ki=1 are all independent. (“independent normal

increments”)

• Finally, if 0 < t ≤ s, then Cov(Bt, Bs) = E(BtBs) = E(Bt[Bs − Bt + Bt]) = E(Bt[Bs −Bt]) + E((Bt)

2) = E(Bt)E(Bs −Bt) + E((Bt)2) = (0)(0) + t = t.

− In general, Cov(Bt, Bs) = min(t, s). (“covariance structure”)

• DEFINITION: Brownian motion is a process {Bt}t≥0 satisfying the above properties, and

with continuous sample paths (i.e., the mapping t→ Bt is continuous).

− FACT: Such a process exists! (The above construction is intuitive, but a formal proof

of existence requires measure theory.)

• Example: What is E[(B2 +B3 + 1)2]?

− Well, E[(B2+B3+1)2] = E[(B2)2)]+E[(B3)2]+12+2E[B2B3]+2E[B2(1)]+2E[B3(1)] =

2 + 3 + 1 + 2(2) + 2(0) + 2(0) = 10.

• Example: What is Var[B3 +B5 + 6]?

− Well, Var[B3 + B5 + 6] = E[(B3 + B5)2] = E[(B3)2)] + E[(B5)2] + 2E[B3B5] =

3 + 5 + 2(3) = 14.

• Aside: w.p. 1, the function t 7→ Bt is continuous everywhere, but differentiable nowhere.

• Example: Let α > 0, and let Wt = αBt/α2 .

− Then Wt ∼ Normal(0, α2(t/α2)) = Normal(0, t). (same as for Bt)

− Also for 0 < t < s, E(WtWs) = α2E(Bt/α2Bs/α2) = α2(t/α2) = t.

− In fact, {Wt} has all the same properties as {Bt}.− That is, {Wt} “is” Brownian motion, too. (“transformation”)

• If 0 < t < s, then given Br for 0 ≤ r ≤ t, what is the conditional distribution of Bs?

− Similar to above, Bs |Bt = Bt + (Bs − Bt) | Bt = Bt + Normal(0, s − t) ∼Normal(Bt, s− t). (i.e., given Bt, Bs is normal with mean Bt, variance s− t.)

− So, in particular, E[Bs | {Br}0≤r≤t] = Bt.

− Hence, {Bt} is a (continuous-time) martingale!

− So, similar results apply just like for discrete-time martingales.

• Example: let a, b > 0, and let τ = min{t ≥ 0 : Bt = −a or b}.− What is p ≡ P(Bτ = b)?

− Well, here {Bt} is martingale, and τ is stopping time.

− Furthermore, {Bt} is bounded up to time τ , i.e. |Bt|1t≤τ ≤ max(|a|, |b|).− So, just like for discrete martingales, must have E(Bτ ) = E(B0) = 0.

− Hence, p(b) + (1− p)(−a) = 0, so p = aa+b

. (as expected)

− But what is e ≡ E(τ)?

• To continue, let Yt = B2t − t.

32

− Then for 0 < t < s, E[Ys | {Br}r≤t] = E[B2s − s | {Br}r≤t]

= Var[Bs | {Br}r≤t] + (E[Bs | {Br}r≤t])2 − s= (Bt)

2 + (s− t)− s = Yt.

− Then by double-expectation formula (see e.g. Rosenthal, Prop 13.2.7), we also have

E[Ys | {Yr}r≤t] = E[E[Ys | {Br}r≤t]

∣∣∣ {Yr}r≤t] = E[Yt | {Yr}r≤t] = Yt.

− So, {Yt} is also a martingale!

• Back to τ = min{t ≥ 0 : Bt = −a or b}. What is e ≡ E(τ)?

− Well, with Yt = B2t − t, have E(Yτ ) = E(B2

τ − τ) = E(B2τ ) − E(τ) = pb2 + (1 −

p)(−a)2 − e = aa+b

b2 + ba+b

a2 − e = ab− e.− Assuming E(Yτ ) = 0, solve to get e = ab. (like for discrete Gambler’s Ruin)

− But τ is not bounded . . .

• To justify this argument, i.e. show that E(Yτ ) = 0, let τM = min(τ,M).

− Then τM is bounded, so E(YτM ) = 0.

− But YτM = B2τM− τM , so E(τM) = E(B2

τM).

− As M → ∞, E(τM) → E(τ) by the Monotone Convergence Thm, and E(B2τM

) →E(B2

τ ) by the Bounded Convergence Thm.

− Therefore, E(τ) = E(B2τ ), i.e. E(Yτ ) = 0 as above.

• Example: Suppose Xt = 2 + 5t+ 3Bt for t ≥ 0.

− What are E(Xt) and Var(Xt) and Cov(Xt, Xs)?

− Well, E(Xt) = 2 + 5t, and Var(Xt) = 32Var(Bt) = 9t.

− Follows that Xt ∼ Normal(2 + 5t, 9t).

− Also for 0 < t < s, Cov(Xt, Xs) = E[(Xt − 5t− 2)(Xs − 5s− 2)] = E[(3Bt)(3Bs)] =

9E[BtBs] = 9t.

− Fancy notation: dXt = 5 dt+ 3 dBt. (“diffusion”)

————————— END OF WEEK #9 ——————————

• More generally, could have Xt = x0 + µt+ σBt. (file “Rbrownian”)

− Then dXt = µ dt+ σ dBt. (µ = “drift”; σ = “volatility”; σ ≥ 0)

− Then E(Xt) = x0 + µt, and Var(Xt) = σ2t, and Cov(Xt, Xs) = σ2 min(s, t).

− Optional: Even more generally, could have dXt = µ(Xt) dt+ σ(Xt) dBt, where µ and

σ are functions, i.e. non-constant drift and volatility.

33

Application – Financial Modeling:

• Common model for stock price: Xt = x0 exp(µt+ σBt).

− i.e. if Yt = log(Xt), then Yt = y0 + µt+ σBt, i.e. dYt = µ dt+ σ dBt.

− That is, changes occur proportional to total price (makes sense).

− So, Yt = log(Xt) is a diffusion.

• Also assume a risk-free interest rate r, so that $1 today is worth $ert a time t later.

− Equivalently, $1 at a future time t > 0 is worth $e−rt at time 0 (i.e. “today”).

− So, “discounted” stock price (in “today’s dollars”) is

Dt ≡ e−rtXt = e−rt x0 exp(µt+ σBt) = x0 exp((µ− r)t+ σBt) .

− Special case: if r = 0 then no discounting.

• Defn: A “European call option” is the option to buy the stock for some amount $K at

some fixed future time S > 0?

− At time S, this is worth max(0, XS −K).

− At time 0, it’s worth only e−rS max(0, XS −K).

− But at time 0, XS is unknown (random).

• QUESTION: what is the “fair price” of this option?

− This means the fair “no-arbitrage” price, i.e. a price such that you cannot make a

guaranteed profit by buying or selling the option, combined with buying and selling

the stock.

− Note: we assume the ability to buy/sell arbitrary amounts of stock and/or at any

time, infinitely often, including going negative (i.e., “shorting” the stock or option),

with no transaction fees.

− So, what is the fair price at time 0?

− Is it simply the expected value, E[e−rS max(0, XS −K)]?

− No! This would allow for arbitrage!

• MOTIVATION – DISCRETE-TIME VERSION (cf. Durrett):

− Suppose stock price equals 100 at time 0, and is either 80 or 130 at time S, and there

is an option to buy the stock at time S for K = 130.

− What is this option’s fair price??

− (Assume r = 0, i.e. no discounting.)

− Suppose at time 0 you buy x stocks (for 100 each), and y options (for c each).

− Then if the stock goes up to 130, your profit is 30x + (20− c)y. But if it goes down

to 80, your profit is −20x+ (−c)y.

− If y = −(5/2)x, then these both equal −(5/2)(8− c)x.

− Hence, there is no arbitrage iff c = 8 (fair price).

34

− Here if we assign probabilities P(XS = 80) = 3/5 and P(XS = 130) = 2/5, then the

stock price is a martingale since (3/5)(80) + (2/5)(130) = 100, and also the option is

a martingale since (3/5)(0) + (2/5)(130− 110) = 8 = c.

− Thus, fair price = martingale expected value = (3/5)(0) + (2/5)(130− 110) = 8.

• BACK TO THE DIFFUSION CASE: What is the fair price?

• KEY: If µ = r − σ2

2, then {Dt} becomes a martingale (HW#3).

• FACT: if {Dt} is a martingale, then similar to the discrete case, there is a (no-arbitrage)

continuous-time fair price process {Ct} which makes the option value into a martingale

too! (finance/actuarial classes . . . )

− Hence, under the martingale probabilities, the initial value (fair price) of the option

at time zero is the same as the expected value of the option at time S.

− (Proof? See e.g. Theorem 1.2.1 of I. Karatzas (1997), Lectures on the Mathematics

of Finance, CRM Monograph Series 8, American Mathematical Society.)

• CONCLUSION: The fair price for the option equals E[e−rS max(0, XS −K)], but only

after replacing µ by r − σ2

2.

− i.e., such that XS = x0 exp([r − σ2

2]S + σ BS), where BS ∼ Normal(0, S).

• So, the fair price can be computed by an integral (with respect to a normal density).

− After some computation (HW#3), this fair price becomes:

x0 Φ

(r + σ2

2)S − log(K/x0)

σ√S

− e−rSK Φ

(r − σ2

2)S − log(K/x0)

σ√S

,

where Φ(u) =∫ u−∞

1√2πe−v

2/2 dv is the cdf of a standard normal distribution. [“Black-

Scholes formula”. Do not have to memorise!]

• Note: this price does not depend on the drift (“appreciation rate”) µ, since we first have

to replace µ by r − σ2

2. This seems surprising! Intuition: if µ large, then can make good

money from the stock itself, so the option doesn’t add much value . . .

• However, the price is an increasing function of the volatility σ. This makes sense, since

the option “protects” you against large drops in the stock price.

Poisson Processes:

• MOTIVATING EXAMPLE:

− Suppose an average of λ = 2.5 fires in some region (Toronto?) per day.

− Intuitively, this is caused by a very large number n of buildings, each of which has a

very small probability p of having a fire.

− Then mean = np = λ, so p = λ/n.

− Then # fires today is Binomial(n, λ/n).

− So, as n→∞, P(#fires = k) =(nk

)pk(1− p)n−k

= n(n−1)(n−2)...(n−k+1)k!

(λ/n)k(1− (λ/n))n−k ≈ nk

k!(λ/n)k(1− (λ/n))n ≈ 1

k!λke−λ.

35

− So, if λ = 2.5, then P(# fires = k) ≈ e−2.5 (2.5)k

k!, for k = 0, 1, 2, 3, . . .

− i.e. # fires ∼ Poisson(λ) = Poisson(2.5).

− And, # fires today and tomorrow combined ≈ Poisson(2 ∗ λ) = Poisson(5), etc.

− Full distribution? P(fire within next hour)? etc.

• POISSON PROCESS CONSTRUCTION:

• Let {Yn}∞n=1 be i.i.d. ∼ Exp(λ), for some λ > 0.

− So, Yn has density function λe−λy for y > 0.

− And, P(Yn > y) = e−λy for y > 0.

− And, E(Yn) = 1/λ.

• Let T0 = 0, and Tn = Y1 + Y2 + . . .+ Yn for n ≥ 1. (“nth arrival time”)

− [e.g. Tn = time of nth fire.]

• Let N(t) = max{n ≥ 0 : Tn ≤ t} = #{n ≥ 1 : Tn ≤ t} = # arrivals up to time t.

− “Counting process”. (Counts number of arrivals.)

− [e.g. N(t) = # fires between times 0 and t.]

− “Poisson process with intensity λ”

• What is distribution of N(t), i.e. P(N(t) = m)?

− Well, N(t) = m iff both Tm ≤ t and Tm+1 > t, which is iff there is 0 ≤ s ≤ t with

Tm = s and Tm+1 − Tm > t− s.− Recall that Yn ∼ Exp(λ) = Gamma(1, λ), so Tm := Y1+Y2+. . .+Ym ∼ Gamma(m,λ),

with density function fTm(s) = λm

Γ(m)sm−1e−λs = λm

(m−1)!sm−1e−λs.

− Also P(Tm+1 − Tm > t− s) = P(Ym+1 > t− s) = e−λ(t−s). So,

P(N(t) = m) = P(Tm ≤ t, Tm+1 > t) = P(∃ 0 ≤ s ≤ t : Tm = s, Ym+1 > t− s)

=∫ t

0fTm(s) P(Ym+1 > t− s) ds =

∫ t

0

λm

(m− 1)!sm−1e−λs e−λ(t−s) ds

=λm

(m− 1)!e−λt

∫ t

0sm−1 ds =

λm

(m− 1)!e−λt[

tm

m] =

(λt)m

m!e−λt .

− Hence, N(t) ∼ Poisson(λt).

− So, E(N(t)) = λt, and Var(N(t)) = λt.

• Now, recall the “memoryless” (or “forgetfulness”) property of the Exp(λ) distribution:

for a, b > 0, P(Yn > b+ a |Yn > a) = P(Yn > b) = e−λb.

− This means the process {N(t)} “starts over” in each new time interval.

− It follows that N(t+ s)−N(s) ∼ N(t) ∼ Poisson(λt).

− Also follows that if 0 ≤ a < b ≤ c < d, then N(d)−N(c) indep. of N(b)−N(a), and

similarly for multiple non-overlapping time intervals. (“independent increments”)

36

− MORE GENERALLY: if 0 ≤ t1 ≤ s1 ≤ t2 ≤ s2 ≤ . . . ≤ tk ≤ sk, then N(si)−N(ti) ∼Poisson(λ(si−ti)), and {N(si)−N(ti)}ki=1 are all independent. (“independent Poisson

increments”)

• DEFN: A Poisson processes with intensity λ > 0 is a collection {N(t)}t≥0 of random

non-decreasing integer counts N(t), satisfying: (a) N(0) = 0; (b) N(t) ∼ Poisson(λt) for

all t ≥ 0; and (c) independent Poisson increments (as above).

• MOTIVATING EXAMPLE (cont’d):

− Here, fires approximately follow a Poisson process with intensity 2.5.

− So, P(9 fires today and tomorrow combined) ≈ e−2∗2.5 (2∗2.5)9

9!= e−5(59

9!).= 0.036.

− P(at least one fire in next hour) = 1−P(no fires in next hour)

= 1−P(N(1/24) = 0) = 1− e−2.5/24 (2.5/24)0

0!

.= 1− 0.90 = 0.10.

− P(exactly 3 fires in next hour) = e−2.5/24 (2.5/24)3

3!

.= 0.00017

.= 1/5891, etc.

• EXAMPLE: Let {N(t)} be a Poisson process with intensity λ = 2. Then

P[N(3) = 5, N(3.5) = 9] = P[N(3) = 5, N(3.5)−N(3) = 4]

= P[N(3) = 5] P[N(3.5)−N(3) = 4]

=

[e−λ 3 (λ 3)5

5!

] [e−λ 0.5 (λ 0.5)4

4!

]

=

(e−6 65

120

) (e−1 14

24

)= e−7(2.7)

.= 0.0025

.= 1/400 .

• EXAMPLE: Let {N(t)} be a Poisson process with intensity λ.

− Then for 0 < t < s,

P(N(t) = 1 |N(s) = 1) =P(N(t) = 1, N(s) = 1)

P(N(s) = 1)

=P(N(t) = 1, N(s)−N(t) = 0)

P(N(s) = 1)

=e−λt (λt)1

1!e−λ(s−t) (λ(s−t))0

0!

e−λs (λs)1

1!

= t/s .

− That is, conditional on N(s) = 1, the first event is uniform over [0, s]. (Distribution

does not depend on λ.)

• Also, e.g.

P(N(4) = 1 |N(5) = 3) =P(N(4) = 1, N(5) = 3)

P(N(5) = 3)

=P(N(4) = 1, N(5)−N(4) = 2)

P(N(5) = 3)

=(e−4λ(4λ)1/1!)(e−λλ2/2!)

e−5λ(5λ)3/3!=

(4)1/1!)(1/2!)

(5)3/3!

37

=4/2

125/6= 24/250 = 12/125 .

− This also does not depend on λ. [And equals(

31

)(4/5)1(1/5)2. Why?]

• ALTERNATIVE APPROACH: Given N(t), as h↘ 0,

− P(N(t+ h)−N(t) = 1) = λh+ o(h).

− P(N(t+ h)−N(t) ≥ 2) = o(h).

− This (together with independent increments) is another way to characterise Poisson

processes.

• NOTE: the {Ti} tend to “clump up” in various patterns just by chance alone.

− Doesn’t “mean” anything at all: they’re independent. (“Poisson clumping”)

− But it “seems” like it does have meaning!

− See e.g. www.probability.ca/pois

• Illustration:

− Suppose have PP on [0,100] with intensity λ = 1.

− Then expect about one event every distance 1. But will get clumps! e.g.:

P(∃r ∈ [1, 100] : N(r)−N(r − 1) = 4)

≥ P(∃m ∈ {1, 2, . . . , 100} : N(m)−N(m− 1) = 4)

= 1−P( 6 ∃m ∈ {1, 2, . . . , 100} : N(m)−N(m− 1) = 4)

= 1− (P(N(1) 6= 4))100 = 1− (1− P (N(1) = 4))100 = 1− (1− 1/24e)100 .= 0.787 .

• APPLICATION: pedestrian deaths example (true story).

− 7 pedestrian deaths in Toronto (14 in GTA) in January 2010.

− Media hype, friends concerned, etc.

− Facts: Toronto averages about 31.9 per year, i.e. λ = 2.66 per month.

− So, P(7 or more) =∑∞j=7 e

−2.66 (2.66)j

j!

.= 1.9%,

about once per 52 months, i.e. about once per 4.4 years.

− Not so rare! doesn’t “mean” anything! (Though tragic.) “Poisson clumping”

− See e.g. www.probability.ca/ped

− Later, just two in Feb 1 - Mar 15, 2010; less than expected (4), but no media!

• APPLICATION – BUS MODEL:

− Suppose have λ buses per hour, i.e. about n buses every n/λ hours.

− Suppose the arrival times are completely random.

————————— END OF WEEK #10 ——————————

38

http://probability.ca/pois

http://probability.ca/ped

− Model this as T1, T2, . . . , Tn ∼ Uniform[0, n/λ], i.i.d.

− Then for 0 < a < b, as n→∞,

#{i : Ti ∈ [a, b]} ∼ Binomial(n,b− an/λ

)

= Binomial(n,λ(b− a)

n) ≈ Poisson(λ(b− a)) .

− Like a Poisson process!

• APPLICATION: WAITING TIME PARADOX.

− Suppose there are an average of λ buses per hour. (e.g. λ = 5)

− You arrive at the bus stop at a random time.

− What is your expected waiting time until the next bus?

− If buses are completely regular, then waiting time is ∼ Uniform[0, 1λ], so mean = 1

2λ

hours. (e.g. λ = 5, mean = 110

hours = 6 minutes)

− If buses are completely random, then they form a Poisson process, so (by memoryless

property) waiting time is ∼ Exp(λ), so mean = 1λ

hours. Twice as long! (e.g. λ = 5,

mean = 15

hours = 12 minutes)

− But same number of buses! Contradiction??

− No: you’re more likely to arrive during a longer gap.

• Aside: What about streetcars? They can’t pass each other, so they sometimes clump up

even more than do (independent) buses. (e.g. Spadina streetcar; see this paper)

• SUPERPOSITION: Suppose {N1(t)}t≥0 and {N2(t)}t≥0 are two independent Poisson

processes, with rates λ1 and λ2 respectively. Let N(t) = N1(t) +N2(t).

− Then {N(t)}t≥0 is a Poisson process with rate λ1 + λ2.

− Proof? Sum of two independent Poissons is Poisson!

• EXAMPLE:

− Suppose undergrads arrive for office hours according to a Poisson process with inten-

sity λ1 = 5 (i.e. one every 12 minutes on average).

− And, grads arrive independently according to their own Poisson process with intensity

λ2 = 3 (i.e. one every 20 minutes on average).

− Then, what is expected number of minutes until first student arrives?

− Well, total # arrivals N(t) is Poisson process with λ = λ1 + λ2 = 5 + 3 = 8.

− Let A = time of first arrival.

− Then, P(A > t) = P(N(t) = 0) = e−λ t (λ t)0

0!= e−λ t; so A ∼ Exp(λ).

− Hence, E(A) = 1/λ = 1/8 hours, i.e. 7.5 minutes.

• THINNING: Let {N(t)}t≥0 be a Poisson process with rate λ.

− Suppose each arrival is independently of “type i” with probability pi, for i = 1, 2, 3, . . ..

(e.g. bus or streetcar, male or female, undergrad or grad, etc.)

39

http://probability.ca/jeff/ftpdir/streetcar.pdf

− Let Ni(t) be number of arrivals of type i up to time t.

− THM: The {Ni(t)} are independent Poisson processes, with rates λpi.

− PROOF: “independent increments” is obvious.

− For the distribution, suppose for notational simplicity that there are just two types,

with p1 + p2 = 1.

− Need to show: P(N1(t) = j, N2(t) = k)

=(e−(λp1t)(λp1t)

j/j!) (

e−(λp2t)(λp2t)k/k!

).

− But P(N1(t) = j, N2(t) = k)

= P(j + k arrivals up to time t, of which j of type 1 and k of type 2)

=(e−λt(λt)j+k/(j + k)!

)×(j+kj

)(p1)j(p2)k. Equal! (Check.)

• EXAMPLE: If students arrive for office hours according to a Poisson process, and each

student is independently either undergrad (prob p1) or grad (prob p2), then # undergrads

is independent of # grads (and each follows a Poisson distribution).

• ASIDE: Can also have time-inhomogeneous Poisson processes, where λ = λ(t), and

N(b)−N(a) ∼ Poisson( ∫ b

a λ(t) dt).

• ASIDE: Can also have Poisson processes on other regions, e.g. in two dimensions, etc.,

cf. www.probability.ca/pois

Continuous-Time, Discrete-Space Markov Processes:

• Recall: Markov chains {Xn}∞n=0 defined in discrete (integer) time.

− But Brownian motion {Bt}t≥0, and Poisson processes {N(t)}t≥0, both defined in

continuous (real) time.

− Can we define Markov processes in continuous time? Yes!

• DEFN: a continuous-time (time-homogeneous, non-explosive) Markov process, on a count-

able (discrete) state space S, is a collection {X(t)}t≥0 of random variables such that

P(X0 = i0, Xt1 = i1, Xt2 = i2, . . . , Xtn = in) = νi0p(t1)i0i1p

(t2−t1)i1i2 . . . p

(tn−tn−1)in−1in ,

for some initial distribution {νi}i∈S (with νi ≥ 0, and∑i∈S νi = 1), and transition probabilities

{p(t)ij } i,j∈S

t≥0(with p

(t)ij ≥ 0, and

∑j∈S p

(t)ij = 1).

− Just like for discrete-time chains, except need to keep track of elapsed time t too.

− As with discrete chains, p(0)ij = δij =

{1, i = j0, i 6= j

• Let P (t) =(p

(t)ij

)i,j∈S

= matrix version.

− Then P (0) = I = identity matrix.

− Also p(s+t)ij =

∑k∈S p

(s)ik p

(t)kj , i.e. P (s+t) = P (s) P (t). (“Chapman-Kolmogorov equa-

tions”, just like for discrete time)

− If µ(t)i = P(X(t) = i), and µ(t) =

(µ

(t)i

)i∈S

= row vector, and ν = ( νi )i∈S = row

vector, then µ(t)j =

∑i∈S

νip(t)ij , and µ(t) = ν P (t), and µ(t) P (s) = µ(t+s), etc.

40

http://probability.ca/pois

• Expect that limt↘0 p(t)ij = p

(0)ij = δij.

− Assume this is true. (“standard” Markov process)

− Follows that for small enough h, p(h)ii > 0. Hence, by Chapman-Kolmogorov, p

(nh)ii > 0

for all small enough h > 0 and all n ∈ N. Hence, p(t)ii > 0 for all t ≥ 0.

• Then can compute the process’s generator as gij = limt↘0p(t)ij −δijt

= p′ij(0). (right-handed

derivative)

− So, if G = ( gij )i,j∈S = matrix, then G = P ′(0) = limt↘0P (t)−I

t. (right-handed deriva-

tive)

− Here gii ≤ 0, while gij ≥ 0 for i 6= j.

− In fact, usually (e.g. if S is finite), have

∑j∈S

gij =∑j∈S

limt↘0

pij(t)− δijt

= limt↘0

∑j∈S pij(t)−

∑j∈S δij

t= lim

t↘0

1− 1

t= 0 .

− Furthermore, if t > 0 is small, then G ≈ P (t)−It

, so P (t) ≈ I + tG, i.e. p(t)ij ≈ δij + tgij.

• RUNNING EXAMPLE: S = {1, 2}, and G =(−3 3

6 −6

).

− Then for small t > 0, P (t) ≈ I + tG =(

1− 3t 3t6t 1− 6t

).

− So p(t)11 ≈ 1− 3t, p

(t)12 ≈ 3t, etc.

− e.g. if t = 0.02, then p(0.02)11

.= 1 − 3(0.02) = 0.94, p

(0.02)12

.= 3(0.02) = 0.06, p

(0.02)21

.=

6(0.02) = 0.12, and p(0.02)22

.= 1− 6(0.02) = 0.88, i.e. P (0.02) .

=(

0.94 0.060.12 0.88

).

• What about for larger t?

− Well, by Chapman-Kolmogorov eqn, for any m ∈ N,

P (t) = [P (t/m)]m = limn→∞

[P (t/n)]n = limn→∞

[I + (t/n)G]n

= exp(tG) := I + tG+t2G2

2!+t3G3

3!+ . . .

(matrix equation; similar to how limn→∞(1 + cn)n = ec).

− (Makes sense so that e.g. P (s+t) = exp((s+ t)G) = exp(sG) exp(tG) = P (s) P (t), etc.)

− So, in principle, the generator G tells us P (t) for all t ≥ 0.

• Can we actually compute P (t) = exp(tG) this way? Yes!

• Method #1: Compute the infinite matrix sum on a computer, numerically and approxi-

mately.

• Method #2: Note that in above example, if λ1 = 0 and λ2 = −9, and w1 = (2, 1) and

w2 = (1,−1), then w1G = λ1w1 = 0, and w2G = λ2w2 = −9w2. That is, {λi} are the

eigenvalues of G, with corresponding left-eigenvectors {wi}.− Now, if wi is a left-eigenvector with corresponding eigenvalue λi, then wi exp(tG) =

etλiwi. (Check.) Easy!

41

− So, if initial distribution is (say) ν = (1, 0), then first compute that ν = 13w1 + 1

3w2.

Then,

µ(t) = ν P (t) = ν exp(tG) = (1

3w1 +

1

3w2) exp(tG)

=1

3etλ1w1 +

1

3etλ2w2 =

1

3e0 t(2, 1) +

1

3e−9t(1,−1) = (

2 + e−9t

3,

1− e−9t

3) .

− So, P[Xt = 1] = p(t)11 = 2+e−9t

3, and P[Xt = 2] = p

(t)12 = 1−e−9t

3.

− Check: p(0)11 = 1, p

(0)12 = 0, and p

(t)11 + p

(t)12 = 1. (Phew.)

− (Or, by instead choosing ν = (0, 1), could compute p(t)21 and p

(t)22 .)

• Method #3: Note that

p′(t)ij = lim

h↘0

p(t+h)ij − p(t)

ij

h= lim

h↘0

(∑k∈S p

(t)ik p

(h)kj )− p(t)

ij

h

= limh↘0

(∑k∈S p

(t)ik [δkj + hgkj])− p(t)

ij

h

= limh↘0

(p(t)ij + h

∑k∈S p

(t)ik gkj)− p

(t)ij

h=

∑k∈S

p(t)ik gkj ,

i.e. P ′(t) = P (t) G. (“forward equations”)

− (Makes sense since P (t) = exp(tG), so P ′(t) = exp(tG)G = P (t) G.)

− So, in above example,

p′(t)11 = p

(t)11 g11 + p

(t)12 g21 = (−3)p

(t)11 + (6)p

(t)12 = (−3)p

(t)11 + (6)(1− p(t)

11 )

= (−9)p(t)11 + 6 = (−9)(p

(t)11 −

2

3) .

————————— END OF WEEK #11 ——————————

− But p′(t)ij = d

dt(p

(t)11 ) = d

dt(p

(t)11 − 2

3).

− So, ddt

(p(t)11 − 2

3) = (−9)(p

(t)11 − 2

3).

− So, p(t)11 − 2

3= Ke−9t, i.e. p

(t)11 = 2

3+Ke−9t.

− But p(0)11 = 1, so K = 1

3, so p

(t)11 = 2

3+ 1

3e−9t = 2+e−9t

3.

− And then p(t)12 = 1− p(t)

11 = 1−e−9t

3.

− Same answers as before. (Phew.)

• What about LIMITING PROBABILITIES?

• In above example, µ(t) = (2+e−9t

3, 1−e−9t

3), so limt→∞ µ

(t) = (23, 1

3) =: π.

− Note that∑i∈S πigi1 = 2

3(−3) + 1

3(6) = 0, and

∑i∈S πigi2 = 2

3(3) + 1

3(−6) = 0.

− i.e.,∑i∈S πigij = 0 for all j ∈ S, i.e. πG = 0.

42

• Does this make sense?

− Well, as in discrete case, {πi} should be stationary.

− i.e.∑i∈S πip

(t)ij = πj for all j ∈ S and all t ≥ 0.

− In particular, for small t > 0,

πj =∑i∈S

πip(t)ij ≈

∑i∈S

πi[δij + tgij] = πj + t∑i∈S

πigij .

− So, need∑i∈S πigij = 0 for all j ∈ S.

− So, can check if {πi} is stationary by checking if∑i∈S πigij = 0 for all j ∈ S.

• What about reversibility?

− Well, if πi gij = πj gji for all i, j ∈ S, then∑i πi gij =

∑i πj gji = πj

∑i gji = πj ·0 = 0,

so π is stationary.

− So, again, reversibility (in the above sense) implies stationary!

− In above example, π1g12 = (2/3)(3) = 2, while π2g21 = (1/3)(6) = 2, so it’s reversible.

• Is convergence to {πi} guaranteed?

• CONTINUOUS-TIME MARKOV CONVERGENCE THEOREM: If a continuous-time

M.C. is irreducible, and has a stationary distribution π, then limt→∞ p(t)ij = πj for all

i, j ∈ S.

− (Like discrete case, but don’t need aperiodicity, since in continuous time we always

have p(t)ii > 0, so it is automatically aperiodic.)

− Proof: For any h > 0, P (h) ≡ {p(h)ij } is irreducible and aperiodic discrete-time chain

with stationary distribution π, so limn→∞ p(hn)ij = πj. Result follows (formally, using

monotonicity of distance to π).

− See e.g. Durrett, 2nd ed., Theorem 4.4, p. 128.

• CONNECTION TO DISCRETE-TIME MARKOV CHAINS:

− Let {pij}i,j∈S be the transition probabilities for a discrete-time Markov chain {Xn}∞n=0.

− Let {N(t)}t≥0 be a Poisson process with intensity λ > 0.

− Then let Xt = XN(t).

− Then {Xt} is just like {Xn} except that it jumps at Poisson process event times, not

integer times. (“Exponential holding times”)

− In particular, {Xt} is a continuous-time Markov process!

− That is, we can “create” a continuous-time Markov process from a discrete-time

Markov chain.

• What is the generator of this Markov process {Xt}?− Well, here p

(t)ij = Pi[XN(t) = j] =

∑∞n=0 Pi[N(t) = n, Xn = j]

=∑∞n=0 P[N(t) = n] p

(n)ij =

∑∞n=0[e−λt (λt)n

n!] p

(n)ij .

43

− So, as t↘ 0,

p(t)ij =

∞∑n=0

P[N(t) = n] p(n)ij

= P[N(t) = 0] p(0)ij + P[N(t) = 1] p

(1)ij + P[N(t) = 2] p

(2)ij + . . .

= P[N(t) = 0] (δij) + P[N(t) = 1] pij + P[N(t) = 2] p(2)ij + . . .

= [e−λt(λt)0/0!]δij + [e−λt(λt)1/1!] pij +O(t2)

= (1− λt)δij + (1− λt)(λt)pij +O(t2) = δij + t[λ(pij − δij)] +O(t2) .

− But as t↘ 0, p(t)ij = δij + tgij +O(t2).

− Hence, tgij = t[λ(pij − δij)], so gij = λ (pij − δij).− That is, gii = λ (pii − 1), and for i 6= j, so gij = λ pij.

− Check: for i 6= j, gij ≥ 0, and gii ≤ 0. Good.

− Also,∑j∈S gij = gii +

∑j 6=i gij = λ(pii − 1) +

∑j 6=i(λpij) = −λ +

∑j∈S(λpij) =

−λ+ λ∑j∈S pij = −λ+ λ(1) = 0, as it must.

• SPECIAL CASE: if X0 = 0, and pi,i+1 = 1 for all i, then

Xn = n for all n, so Xt = XN(t) = N(t) = Poisson process.

− And, the Poisson process {N(t)}t≥0 itself has generator (check):

G =

−λ λ 0 0 . . .0 −λ λ 0 . . .0 0 −λ λ . . .

.

Application – Queueing Theory:

• Consider a queue (i.e., a line of customers) with just one server.

− Let Tn = time of arrival of nth customer. (And set T0 = 0.)

− Let Yn = Tn − Tn−1 = inter-arrival time between (n− 1)st and nth customers.

− Let Sn = time it takes to serve the nth customer.

− Let Q(t) = number of customers in the system (i.e., waiting in the queue or being

served) at time t ≥ 0. (Assume Q(0) = 0.)

• What happens as t→∞?

• M/M/1 QUEUE: Tn − Tn−1 ∼ Exp(λ), and Sn ∼ Exp(µ), all indep., λ, µ > 0. (So {Tn}are arrival times of a Poisson process with intensity λ.)

− Then by memoryless property, {Q(t)} is a Markov process!

• GENERATOR?

− Well, let ra,b,t = P[a arrivals and b served by time t].

− Then ra,b,t =[e−λt(λt)a/a!

] [e−µt(µt)b/b!

]= e−(λ+µ)tλaµbta+b/a!b!.

44

− Hence, if a+ b ≥ 2, then ra,b,t = O(t2) as t↘ 0.

− Also, r1,0,t = e−(λ+µ)tλt and r0,1,t = e−(λ+µ)tµt.

− Hence, for n ≥ 0,

gn,n+1 = limt↘0

P[Q(t) = n+ 1 |Q(0) = n]

t

= limt↘0

r1,0,t + r2,1,t + r3,2,t + . . .

t= lim

t↘0

r1,0,t +O(t2)

t= lim

t↘0

e−(λ+µ)tλt+O(t2)

t= λ .

− Similarly, for n ≥ 1,

gn,n−1 = limt↘0

P[Q(t) = n− 1 |Q(0) = n]

t

= limt↘0

r0,1,t + r1,2,t + r2,3,t + . . .

t= lim

t↘0

r0,1,t +O(t2)

t= lim

t↘0

e−(µ+µ)tλt+O(t2)

t= µ .

− Also if |n−m| ≥ 2 then P[Q(t) = m |Q(0) = n] = O(t2), so gn,m = 0.

− But∑∞m=0 gn,m = 0, so the generator must be given by:

G =

−λ λ 0 0 0 . . .µ −λ− µ λ 0 0 . . .0 µ −λ− µ λ 0 . . .

0 0. . . . . . . . .

i.e. g00 = −λ and gnn = −λ− µ for n ≥ 1.

− So we have solved for the queue generator matrix G.

• STATIONARY DISTRIBUTION {πi}?− Need

∑i∈S πigij = 0 for all j ∈ S. (Or, can use reversibility: check.)

− j = 0: π0(−λ) + π1(µ) = 0, so π1 = (λµ)π0.

− j = 1: π0(λ) + π1(−λ− µ) + π2(µ) = 0,

so π2 = ( λ−µ)π0 + (−λ−µ−µ )π1 = (−λ

µ)π0 + (1 + λ

µ)(λµ)π0 = (λ

µ)2π0.

− Then by induction: πi = (λµ)iπ0, for i = 0, 1, 2, . . .

− So if λ < µ, i.e. 1µ< 1

λ, i.e. E(Sn) < E(Tn − Tn−1), then since

∑i πi = 1,

π0 =1∑∞

i=0(λµ)i

= 1− (λ

µ) ,

and the stationary distribution is

πi = (λ

µ)i (1− λ

µ) , i = 0, 1, 2, 3, . . .

(geometric distribution).

• Furthermore, since the process is clearly irreducible,

limn→∞

P[Q(t) = i] = πi = (λ

µ)i (1− λ

µ) .

45

− By contrast, if λ > µ, then Q(t)→∞ w.p. 1. (see below)

− If λ = µ, then Q(t)→∞ in probability, but not w.p. 1. (see below)

General (G/G/1) Queue:

• What if we don’t assume Exponential distributions, just that {Tn−Tn−1} i.i.d., and {Sn}i.i.d. (all indep.)?

• Then Q(t) is not Markovian! Have to use “cruder” methods.

• Let Dn = time of departure of the nth customer.

• Roughly speaking, Dn = Dn−1 + Sn.

− But no one served while queue is empty.

− So, actually, Dn = max(Tn, Dn−1) + Sn.

• Let Wn = max(0, Dn−1 − Tn) = the amount of time that the nth customer has to wait.

(With W0 = 0.)

• LINDLEY’S EQUATION: For n ≥ 1, Wn = max(0, Wn−1 + Sn−1 − Yn).

• PROOF:

− The (n− 1)st customer is in the system for a total time Wn−1 + Sn−1.

− But the nth customer arrives a time Yn after the (n− 1)st customer.

− If Wn−1 + Sn−1 ≤ Yn, then the nth customer doesn’t have to wait at all, so Wn = 0.

− IfWn−1+Sn−1 ≥ Yn, thenWn = [time the (n−1)st customer is in the system]−[amount

of that time that the nth customer was not present for] = (Wn−1 +Sn−1)−Yn, Q.E.D.

• LINDLEY’S COROLLARY: Wn = max0≤k≤n∑ni=k+1(Si−1 − Yi).

(Here, if k = n, then the sum equals zero.)

− PROOF #1. Write it out: W0 = 0, W1 = max(0, S0 − Y1),

W2 = max(0,W1 + S1 − Y2) = max(0,max(0, S0 − Y1) + S1 − Y2) = max(0, S1 −Y2, S0 − Y1 + S1 − Y2),

W3 = max(0,W2 +S2−Y3) = max(0,max(0, S1−Y2, S0−Y1 +S1−Y2) +S2−Y3) =

max(0, S2 − Y3, S1 − Y2 + S2 − Y3, S0 − Y1 + S1 − Y2 + S2 − Y3),

etc., each corresponding to the claimed formula.

− PROOF #2: Induction on n. When n = 0, both sides are zero. If n increases to n+1,

then by Lindley’s Equation, each possible value of the “max” gets Sn − Yn+1 added

to it. And the “max with zero” is covered by allowing for the possibility k = n + 1,

Q.E.D.

• THEOREM: For a general (G/G/1) single-server queue:

− (a) if E(Yn) < E(Sn), then limn→∞Wn =∞ w.p. 1.

(Hence, also limn→∞Wn = ∞ in probability, so for any M < ∞, limn→∞P(Wn >

M) = 1.)

− (b) if E(Yn) > E(Sn), then {Wn} is “bounded in probability”, i.e. for any ε > 0 there

is M <∞ such that P(Wn > M) < ε for all n ∈ N.

46

− (c) if E(Yn) = E(Sn), and Sn−1 and Yn are not both constant (i.e., Var(Sn−1− Yn) >

0), then Wn → ∞ in probability (but not w.p. 1). (“Borderline” case; similar to

branching processes with m = 1, and to the absolute value of ssrw.)

• PROOF OF (a):

− By Lindley’s Equation, Wn+1 ≥ Wn + Sn − Yn+1.

− Here the sequence {Sn − Yn+1} is i.i.d., with mean > 0.

− So, by the SLLN, lim infn→∞Wn

n≥ E(Sn − Yn+1) > 0, w.p. 1.

− It follows that lim infn→∞Wn ≥ ∞, w.p. 1, Q.E.D.

• PROOF OF (b):

− By Lindley’s Corollary,

P(Wn > M) = P(

max0≤k≤n∑ni=k+1(Si−1 − Yi) > M

).

− But {Si−1 − Yi} are i.i.d., so this is equivalent to

P(Wn > M) = P(

max0≤k≤n∑n−ki=1 (Si−1 − Yi) > M

).

− This is the probability that i.i.d. partial sums with negative mean (since E(Si−1−Yi) <0) will ever be larger than M .

− i.e., it is the probability that the maximum of a sequence of i.i.d. partial sums with

negative mean will be larger than M .

− But by the SLLN, i.i.d. partial sums with negative mean will eventually become

negative, w.p. 1.

− So, w.p. 1, only a finite number of the partial sums will have positive values.

− So, w.p. 1, the maximum value of the partial sums will be finite.

− So, as M →∞, the probability that the maximum value will be > M must converge

to zero.

− So, for any ε > 0, there is M <∞ such that the probability that its maximum value

is be > M is < ε, Q.E.D.

• PROOF OF (c):

− Trickier! Omitted! For details see e.g. Grimmett & Stirzaker, 2nd ed., Theorem

11.5(4), pp. 432–435.

• REMINDER: FINAL EXAM Tues April 18, 9:00 am - 12:00 noon, in the Exam Centre

(EX) room 300. BRING YOUR STUDENT CARD, and DO NOT SIT NEXT TO

ANYONE THAT YOU KNOW.

• Good luck and best wishes! – J.R.

————————— END OF WEEK #12 ——————————

47

sta447/2006 (stochastic processes) lecture notes, winter...

Documents