random processes (mth6141), 2020{2021...random processes (mth6141), 2020{2021 sasha sodin november...

Random Processes (MTH6141),

2020–2021

Sasha Sodin ∗

January 3, 2021

Please drop me a note at [email protected] if you spot anymistakes or if you have any other suggestions for improvement. Theparts marked with # . . . " as well as the footnotes are not exam-inable.

∗These notes are adapted from lecture notes of Dudley Stark, Robert Johnson, Mark Jerrum, Ilya Goldsheid,and David Ellis.

1

Contents

1 Introduction 31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Markov chains – definitions, long-time behaviour 112.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . . . . 112.2 Computation of multi-step transition probabilities . . . . . . . . . . . . . 182.3 Computation of multi-step transition probabilities – cont. . . . . . . . . . . 242.4 Limiting distribution of Markov chains . . . . . . . . . . . . . . . . . . . 302.5 Equilibrium distributions . . . . . . . . . . . . . . . . . . . . . . . . . 332.6 Irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.7 Existence of the limiting distribution; aperiodicity . . . . . . . . . . . . . 412.8 Computation of equilibrium distribution . . . . . . . . . . . . . . . . . . 48

3 Markov chains – first step analysis 513.1 Absorption time and first-visit time . . . . . . . . . . . . . . . . . . . . 513.2 Absorption and first-visit probabilities . . . . . . . . . . . . . . . . . . . 56

3.2.1 Several applications . . . . . . . . . . . . . . . . . . . . . . . . 583.3 Recurrence and transience: finite chains . . . . . . . . . . . . . . . . . . 653.4 Recurrence and transience: infinite chains . . . . . . . . . . . . . . . . . 68

4 Continuous-time Markov chains 754.1 Motivating example and construction of a Poisson process . . . . . . . . . . 764.2 Waiting times and sojourn times of Poisson processes . . . . . . . . . . . . 824.3 Further properties of Poisson processes . . . . . . . . . . . . . . . . . . 864.4 Definition of a general Markov chain . . . . . . . . . . . . . . . . . . . . 894.5 Construction using exponential random variables . . . . . . . . . . . . . . 924.6 Chapman–Kolmogorov equations . . . . . . . . . . . . . . . . . . . . . 954.7 Long-term behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.8 Birth-death processes and applications . . . . . . . . . . . . . . . . . . . 101

2

1 Introduction

1.1 Overview

This course is devoted to the study of stochastic processes. A stochastic processis a collection of random variables (Xt)t∈T , defined on the same probabilityspace, indexed by a set T (the parameter space), and taking values in a set S(the state space). We shall discuss a formal definition later. In the meanwhile,we start with

Example 1.1 (Two-dimensional random walk). A walker starts off at a pointX0 = (x0, y0) ∈ Z2. Each second, he moves from his location (xn−1, yn−1) ∈ Z2

to one of the adjacent points (xn−1− 1, yn−1), (xn−1 + 1, yn−1), (xn−1, yn−1− 1),(xn−1, yn−1 + 1), with equal probability:

probability to jump to (x, y) =14 , (x, y) ∈ (xn−1 ± 1, yn−1), (xn−1, yn−1 ± 1)0 , otherwise

(1.1)

Here the parameter space T = Z+ (as in most of our course), and the statespace is S = Z2. See Figure 1 for an illustration.

-10 -5 5 10

-2

2

4

6

8

10

Figure 1: The first 250 steps of a two-dimensional random walk, starting at theblack dot and ending at the red one.

One important feature of this example is that the decision where to go on eachstep is based only on the location of the walker at that step (“the present”) andnot on the location on the previous steps (“the past”). Stochastic processesboasting this property are called Markov processes (a formal definition willfollow). Although Markov processes can describe many different phenomenawhich have little to do with random walks, all of them bear some mathematicalsimilarity to random walks, therefore Example 1.1 (and Example 1.2 below) arein a sense representative.

3

When we see a stochastic process, the first thing we should ask ourselves iswhether it is Markov. Here is an example of a process which is not Markov:replace with

probability to jump to (x, y) =13 , (x, y) ∈ (xn−1 ± 1, yn−1), (xn−1, yn−1 ± 1) \ (xn−2, yn−2),0 , otherwise

where we formally set (x−1, y−1) = (1, 0). Here we do not allow the particle tobacktrack, i.e. to return to the last vertex it came from. Therefore the walker’sdecision depends on the past and not only on the present, and therefore theprocess is not Markov. In such cases one may wonder how to make the processMarkov.

Here is another example.

Example 1.2. Imagine a knight jumping on an 8 × 8 chess board, startingfrom a1. On each step, he makes one of the moves allowed by the chess ruleswith equal probability. For example, if he is currently at b3, on the next stephe will jump to a1 or a5 or c3 or c5 or d2 or d4; the probability of each thesejumps is equal to 1/6. Two realisations are depicted on Figure 1.2. Here the

Figure 2: A realisation of the first 15 jumps (left) and the first 150 jumps (right)of a knight starting from a1.

parameter space T = Z+, and the state space is the 8× 8 chess board. UnlikeExample 1.1, the state space is finite.

The first question that one can ask about a Markov process is what is itslong-time behaviour, i.e. in which states does the process like to spend time. Inthe two examples above, we start the random walk at some fixed site; then we

4

ask what is the distribution of the walker after 1000 steps. Thus, in Example 1.2it turns out that

P(X1000 = a1) ≈ 1

84, P(X1000 = b4) ≈ 1

28. (1.2)

In the first part of the course, we shall develop methods allowing to answersuch questions for Markov processes on finite state spaces (in the meanwhile,try to come up with an heuristic explanation of the inequality P(X1000 = a1) <P(X1000 = b4), which is visible even on Figure 1.2 (right)). Tools from linearalgebra will help us a lot: we shall associate to each process a matrix, and thelong-time properties of the walk will be encoded in the properties of this matrix,particularly, its largest eigenvalues and the corresponding eigenvectors.

In the second section of the first half of the course, we shall be concernedwith absorbtion. Imagine that the squares a7 and d5 are covered with glue.Then the knight will eventually get glued to one of these two squares. Whatis the average time until it gets glued? What is the probability that it will getglued to a7? Such questions make sense both for the finite state space knightprocess and for the infinite state space random walk.

Another aspect of the absorbtion problem is the dichotomy between recur-rence and transience. In the case of the knight walk (without gluing), it isintuitively clear that the knight will almost surely return to its point of origin(in fact, infinitely many times). That is, the knight process is recurrent. Itturns out that random walk on the infinite state space Z2 is also recurrent.To emphasise that this result is not a triviality, we mention that random walkon the three-dimensional lattice Z3 is transient (not recurrent): with positiveprobability, it never returns to the origin. As put by G. Polya, “A drunk manwill find his way home, but a drunk bird may get lost forever.”

In the second part of the course, we shall consider continuous-time stochas-tic processes: for example, T can be R+, i.e. the walker can move at arbi-trary (rather than just integer) times. Such processes are harder to define; weshall discuss several ways to construct them. For now, we list a few real-worldprocesses that can be successfully modelled by continuous-time stochastic pro-cesses:

1. α-decay: X(t) = the number of α-particles emitted by a piece of uranium-238 during the time interval [0, t];

2. X(t) = the population of the UK t years after New Year Eve 2000;

3. Brownian motion: X(t) ∈ R3 the location of a microscopic particle in afluid as a function of time.

We shall mostly focus on examples similar to the first two of these, in that X(t)can only take on a finite or countable number of values.

5

1.2 Preliminaries

Review of main notions from basic probability: probability, conditional probability, conditional

expectations. Please solve all the exercises in this section. If you find any of the topics difficult,

please review it more thoroughly using an introductory textbook in probability theory.1

Probability Throughout this module, Ω will be a probability space, i.e. thespace of all possible outcomes. An event is a subset2 A ⊂ Ω. The probabilityof an event A is denoted by P(A) ∈ [0, 1]. The most important property of Pis its σ-additivity (countable additivity): if A1, A2, A3, · · · is a finite or infinitesequence of pairwise disjoint events (i.e. Aj ∩ Ak = ∅ for j 6= k),

P(⊎j≥1

Aj) =∑j≥1

P(Aj) . (1.3)

The plus above the union sign is there to remind us that the events are disjoint.Another important property is P(Ω) = 1. From these two properties we deducethat if Aj form a partition, i.e. a sequence of pairwise disjoint events the unionof which is the full space Ω, then

∑j≥1 P(Aj) = 1.

Exercise 1.2.1. If A1, A2, A3, · · · are arbitrary (not necessarily disjoint) events,then

P(⋃j≥1

Aj) ≤∑j≥1

P(Aj) . (1.4)

Exercise 1.2.2. Let A1, A2 ⊂ Ω be arbitrary events. Prove that

P(A1 ∪ A2) ≥ P(A1) + P(A2)− 1 .

Example 1.3 (Fair die). Ω = 1, 2, 3, 4, 5, 6; P(A) = |A|6 (where |A| is the

size of A). The events odd = 1, 3, 5 and even = 2, 4, 6 form a partition,and indeed their probabilities add up to 1. The events odd = 1, 3, 5 and(≤ 3) = 1, 2, 3 are not disjoint, and

P(odd ∪ (≤ 3)) = P(1, 2, 3, 5) =2

3<

1

2+

1

2= P(odd) + P(≤ 3) .

Independence, conditional probability One of the most important propertiesof events is independence. In Example 1.3, the events odd and ≤ 3 are notindependent, since knowing that the outcome is odd increases the chances that

1e.g. Chapter 2 of the book “Introduction to Probability” by Dimitri P. Bertsekas and John N. Tsitsiklis; orthe online book of Grinstead and Snell https://math.dartmouth.edu/~prob/prob/prob.pdf, pages 133–162 and 139–141; or the website Random: http://www.randomservices.org/random/prob/Conditional.html; or Scott Sheffield’s slides http://math.mit.edu/~sheffield/600/Lecture26.pdf

2Formally, not every subset of Ω is an event. You are most welcome to ignore this nuance during this term.

6

https://math.dartmouth.edu/~prob/prob/prob.pdf

http://www.randomservices.org/random/prob/Conditional.html

http://www.randomservices.org/random/prob/Conditional.html

http://math.mit.edu/~sheffield/600/Lecture26.pdf

it is≤ 3. Two events A and B are called independent if P(A∩B) = P(A)×P(B),which means (more or less) that the fraction of B occupied by A ∩ B is equalto the fraction of Ω occupied by A.

If A and B are two events and P(B) > 0, we can define the conditionalprobability

P(A|B) =P(A ∩B)

P(B).

In our example, P(odd| ≤ 3) = 23 (whereas if the two events were independent,

the conditional probability would be equal to P(odd) = 12).

From (1.3) we obtain

Theorem 1.4 (Theorem of total probability). If B1, B2, · · · form a partitionof Ω and P(Bj) > 0 for each j, then for any event A

P(A) =∑j≥1

P(A|Bj)× P(Bj) .

In our example,

P(odd) =1

2=

2

3× 1

2+

1

3× 1

2= P(odd| ≤ 3)× P(≤ 3) + P(odd| > 3)× P(> 3) .

Random variables, expectation A random variable is a function X : Ω→ R.One can think of an event A as a special case of a random variable taking justtwo values: 1 on A and 0 on Ω \ A. To avoid abuse of notation, we denote

1A(ω) =

1, ω ∈ Ω

0, ω /∈ Ω

and call it the indicator of A.The expectation of a random variable X taking on a finite or countable

number of values x1, x2, · · · is defined as3

EX =∑j≥1

P(X = xj)xj .

Clearly, E1A = P(A). It follows from (1.3) that for any sequence of randomvariables Xj,

E∑j≥1

Xj =∑j≥1

EXj . (1.5)

3in the case of countably many values, the sum may diverge, e.g. if P(X = n) = 1/n. We tacitly assume thatthis never occurs, unless explicitly stated otherwise as in (3.15).

7

In particular,

E∑j≥1

1Aj =∑j≥1

P(Aj) . (1.6)

Exercise 1.2.3. In the setting of Example 1.3, check directly that E(1odd+1≤3) =P(odd) + P(≤ 3).

Independence Two random variables X1 and X2 taking on finitely or count-ably many values are called independent if for any x1 and x2 the event X1 = x1and X2 = x2 are independent. This is equivalent to each of the following twoproperties:

1. for any sets of values E1 and E2, the events X1 ∈ E1 and X2 ∈ E2are independent;

2. for any functions f1, f2, E f1(X1)f2(X2) = E f1(X1)× E f2(X2).

Similarly, one defines independence of more than two random variables: X1, X2, · · ·are jointly independent if for any k and any functions f1, · · · , fk,

Ek∏j=1

fj(Xj) =k∏j=1

E fj(Xj) .

Exercise 1.2.4. Construct three random variables such that each two are inde-pendent but the three are not jointly independent. (Hint: it suffices to considerrandom variables taking just two values, say, 0 and 1.)

Conditional expectation The expectation of a random variable is a general-isation of the probability of an event. Similarly, the conditional expectation isa generalisation of conditional probability. First, let X be a random variabletaking values x1, x2, · · · , and let B be an event, P(B) > 0. The conditionalprobabilities

P(X = xj|B) =P(X = xj ∩B)

P(B)(1.7)

add up to one (why?), therefore they define a probability distribution calledthe conditional distribution of X given B.

Warning. The conditional distribution of X is not the distribution of anynaturally defined random variable. In probability theory, there is no such thingas a conditional random variable!

The conditional expectation of X given B is the mean of this distribution:

E(X|B) =∑j≥1

P(X = xj|B)xj .

8

Example 1.5. In the notation of Example 1.3, let X be the outcome of thedice. Then E(X|odd) = 3 and E(X|even) = 4.

Exercise 1.2.5 (Theorem of total probability – bis.). Suppose X is a randomvariable (taking on finitely or countably many values), and let B1, B2, · · · be asequents of events forming a partition of Ω. Show that

EX =∑j≥1

E(X|Bj)P(Bj) .

Instead of an event B, we can condition X on a random variable Y : for eachvalue y that Y attains with positive probability, we can define the conditionalexpectation E(X|Y = y). Then we construct a random variable E(X|Y ) whichis equal to E(X|Y = y) on the event Y = y. For example, if Y = 1B,

E(X|1B) =

E(X|B) on B

E(X|Ω \B) on Ω \B .

Note that E(X|1B) is constant on B and on Ω \ B. Similarly, in the generalsituation, if y1, y2, · · · are the values which Y takes with positive probability,the events Y = yj form a partition of Ω, and E(X|Y ) is constant on eachpart.

Exercise 1.2.6. Let X, Y be two random variables (taking on finitely or count-ably many values). Prove the following.

1. There exists a function g : Ω→ R such that E(X|Y ) = g(Y ).

2. (the theorem of total probability – ter.) E(X|Y ) has the same expectationas X (i.e. EE(X|Y ) = EX)

3. E(h(Y )|Y ) = h(Y ) for any function h : Ω→ R.

Exercise 1.2.7. If X and Y are independent, then E(X|Y ) is the constant ran-dom variable identically equal to E(X).

Example 1.6. LetX = Y+Z where Y ∼ Bernoulli(1/2) and Z ∼ Bernoulli(1/3)be independent random variables. We know that P(Y = 0) = P(Y = 1) = 1/2,P(Z = 1) = 1/3, P(Z = 0) = 2/3. We see that

P(X = 1|Y = 0) = P(Z = 1|Y = 0) = P(Z = 1) = 1/3 ,

P(X = 0|Y = 0) = P(Z = 0|Y = 0) = P(Z = 0) = 2/3 ,

P(X = 2|Y = 1) = P(Z = 1|Y = 1) = P(Z = 1) = 1/3 ,

P(X = 1|Y = 1) = P(Z = 0|Y = 1) = P(Z = 0) = 2/3 .

9

Thus,E(X|Y = 0) = (1)(1/3) + (0)(2/3) = 1/3

andE(X|Y = 1) = (2)(1/3) + (1)(2/3) = 4/3

The random variable E(X|Y ) equals 1/3 when Y = 0 and 4/3 when Y = 1.We may express this as E(X|Y ) = Y + 1/3. It has expectation

E(E(X|Y )) = (1/3)P(Y = 0) + (4/3)P(Y = 1)

= (1/3)(1/2) + (4/3)(1/2) = 5/6.

It follows that E(X) = 5/6. We may show that fact directly:

E(X) = E(Y ) + E(Z) = 1/2 + 1/3 = 5/6.

Remark 1.7. While the notion of conditional expectation E(X|Y ) requires theconditioned random variable X to take values in R (so that we can average thesevalues), the random variable Y may take values in an arbitrary set. Accordingly,if X, Y, Z are three random variables of which X is real-valued, E(X|Y, Z) is theconditional expectation of X given the random variable (Y, Z). More generally,we shall consider conditional expectations such as E(X|Y1, Y2, Y3, Y4, · · · ).

Exercise 1.2.8. Compute E(X|Y, Z) in Example 1.6.

Exercise 1.2.9. Let X be a random variable taking values in R, and let Y, Z bearbitrary random variables. Prove that

E (E(X|Y, Z)|Y ) = E(X|Y ) .

Exercise 1.2.10. Let N have a Poisson distribtion with parameter 1. Condi-tioned on N = n, let X have a uniform distribution

P(X = i) =1

N, i = 1, 2, . . . , N.

Find E(X|N) and E(X).

Exercise 1.2.11. Suppose that N has Binom(20, 1/4) distribution and, condi-tionally on N , X has a Binom(N, 1/2) distribution. What is the expectationof X?

10

2 Markov chains – definitions, long-time behaviour

2.1 Definition and basic properties

Following the definition of discrete-time stochastic processes, we present a construction of an

important subclass – Markov chains. A Markov chain can be visualised using a transition graph

and a transition matrix. Our goal is to get an idea which kind of processes can be described

by Markov chains, and to compute the joint distribution of random variables forming such a

Markov chain.

Definition 2.1. Let S be a finite or countable set. A discrete-time stochasticprocess with state space S is a sequence of random variables

(Xn)n≥0 = (X0, X1, X2, · · · )

defined on the same probability space and taking values on S.

Example 2.2. The following are discrete-time stochastic processes.

1. Xn – independent, identically distributed random variables taking thevalues ±1 with equal probability. Here S = −1,+1.

2. Sn = X1 + · · · + Xn, where Xn are as above, form a stochastic processwith state space Z.

3. Yn = Xn +Xn+1 form a stochastic process with state space −2, 0, 2.

4. Zn = Yn mod 2 =

1 , Yn is odd

0 , Yn is even; here S = 0, 1.4

The random variables in the first example are independent, which makes themeasy to analyse. On the other hand, many real-world procersses can not be wellapproximated by sequences of independent random variables. For example, theweather on day n strongly depends on the weather on day n − 1. This is amotivation to consider a class stochastic processes which allow for dependencebut are still amenable to analysis.

Now we present the construction of stochastic processes of this class, called(homogeneous) Markov chains. The construction is inductive: if we have con-structedX0, · · · , Xn, then the conditional distribution ofXn+1 givenX0, · · · , Xn

will depend only on Xn (similarly to the random walk which we have discussedin the introduction (Example 1.1): the distribution of Xn+1 is uniform on the4 vertices adjacent to Xn, independently of Xn−1 et cet.)

4we can of course replace S with any larger set S ⊃ S

11

For a formal definition, fix a state space S (which we always assume to befinite or countable). Also fix an array P = (pss′)s,s′∈S of real numbers, calledtransition probabilities, such that

for each s, s′ ∈ S, pss′ ≥ 0 ; (2.1)

for each s ∈ S,∑s′∈S

pss′ = 1 . (2.2)

The number pss′ will represent the probability to jump from s to s′; the condi-tions above guarantee that the probabilities are between zero and one and addup to one. Finally, fix a probability distribution π0 on S, i.e. a list of numbersπ0(s) ≥ 0 that add up to one (the initial distribution). Once we are given Pand π0, we construct a stochastic process (X0, X1, X2, X3, · · · ) as follows. First,X0 is chosen at random according to the initial distribution π0. Then for eachn ≥ 0 the conditional distribution of Xn+1 given X0, X1, · · · , Xn is given by

P(Xn+1 = sn+1|Xn = sn, · · · , X1 = s1, X0 = s0) = psnsn+1,

for any s0, s1, · · · , sn+1 ∈ S . (2.3)

The relations (2.3) determine the joint distribution of X0, X1, · · · : for any n,

P(X0 = s0, X1 = s1, · · · , Xn = sn)

= P(X0 = s0, X1 = s1, · · · , Xn−1 = sn−1)

× P(X0 = s0, X1 = s1, · · · , Xn = sn|X0 = s0, · · · , Xn−1 = sn−1)

= P(X0 = s0, X1 = s1, · · · , Xn−1 = sn−1)psn−1sn = · · ·= π0(s0)ps0s1ps1s2 · · · psn−1sn .

(2.4)

Thus we have proved that our construction indeed defines a stochastic process.5

We recapitulate this as (a somewhat pedantic)

Proposition 2.3. For any initial distribution π0 on S and any array P satis-fying (2.1) and (2.2) there exists a stochastic process (Xn)n≥0 on S such thatX0 is distributed according to π0 and the conditional distribution of Xn givenXn−1 = s is given by the numbers (pss′)s′∈S. The joint distribution of the randomvariables Xn is given by (2.4).

Remark 2.4. The important feature of the formula (2.3) is that the conditionalprobabilities depend only on the value of Xn and not on Xn−1, Xn−2 et cet. Thisfeature is called the Markov property of the process (Xn)n≥0.5Formally, we need to make sure that∑

s0,s1,··· ,sn∈SP(X0 = s0, X1 = s1, · · · , Xn = sn) = 1 (2.5)

Exercise 2.1.1. Prove the relation (2.5) by induction on n.

12

Example 2.5 (A ‘rooms and doors’ Markov chain). Here is a simple exampleof a Markov chain. There are 5 rooms, labelled 1, 2, 3, 4, 5. There are doorsconnecting certain pairs of rooms, indicated by the diagram in Figure 3 (wedraw a line between two rooms if and only if there is a door connecting them).

1

2

3

4

5

Figure 3: The arrangement of the rooms in Example 2.5

We start in room 1. At the end of each minute, if we are in a certain room,we choose at random one of the doors leading from that room (we choose eachdoor with equal probability), and we go through that door to the room beyond.We let Xn denote the label of the room we are in after n minutes (so X0 = 1,and X1 = 2 or 5, with P(X1 = 2) = 1

2 and P(X1 = 5) = 12).

The discrete-time stochastic process (X0, X1, X2, . . .) is a Markov chain withstate space S = 1, 2, 3, 4, 5, initial distribution

π0(s) =

1 , s = 1

0

and transition probabilities

pss′ =

1d(s) if there is a door between room s and room s′;

0 otherwise,

where d(s) denotes the number of doors leading from room s (for each s ∈1, 2, 3, 4, 5).

The relation (2.4) allows to compute joint distributions, e.g.

P X0 = 1, X1 = 2, X2 = 3, X3 = 2, X4 = 5 = π0(1)p12p23p32p25

= 1× 1

2× 1

3× 1

3× 1

3=

1

54.

13

Some further questions to ask are:

• After 1000 steps, what is the probability to be at room 3?6

• Which proportion of the steps do we spend at room 5?

• What is the probability that we reach room 4 before reaching room 5?

• What is the expectation of the length of time we spend in room 3, beforereaching room 4?

• What is the expectation of the time at which we first return to room 1?

By the end of the first half of the course, you should be able to answer all ofthese questions!

Exercise 2.1.2. Find which of the processes in Example 2.2 are Markov chains,and for those write down the transition probabilities and the initial distribution.

Exercise 2.1.3. A bomb lying in a storehouse explodes at the beginning of eachminute with probability 1

3 , independently of the history.

1. Devise a Markov chain describing the evolution of the bomb. Write downthe transition probabilities and the initial distribution.

2. What is the distribution of the time (in minutes) from the moment thebomb is placed in the storehouse until the explosion?

Exercise 2.1.4. Open the webpage http://homepages.math.uic.edu/~leon/

mcs425-s08/handouts/char_freq2.pdf and read the paragraph before the ta-ble.

1. Devise a simple Markov chain model for English text (hint: there shouldbe 26 states).

2. Explain how to compute the transition probabilities form the data onthat webpage (you do not need to actually compute all of them, butplease compute a few, e.g. pDO and pOG. A calculator may help here.)Please make sure that (2.2) is not violated!

3. Suppose a text is generated at random according to the model that youhave constructed, starting with the letter ‘D’. What is the probabilitythat the first three letters will form the word ‘DOG’?

4. What is the most likely four-letter word starting with the letter ‘M’?

6After 2 steps, the probability to be at room 1 is 1213 + 1

213 = 1

3 (why?) However, this method is not feasiblefor a large number of steps.

14

http://homepages.math.uic.edu/~leon/mcs425-s08/handouts/char_freq2.pdf

http://homepages.math.uic.edu/~leon/mcs425-s08/handouts/char_freq2.pdf

Transition matrices and transition graphs There are two convenient waysto visualise the transition probabilities of a Markov chain. The first one is atransition graph: we draw a graph the vertices of which are the states in S, andwe draw a directed edge s→ s′ labelled with the transition probability pss′ foreach s, s′ such that pss′ > 0.

For example, the transition probabilities in the ‘rooms and doors’ Markovchain of Example 2.5 are visualised by the transition graph in Figure 2.1.

1

2

3

4

5

12

13

13

13

13 1

12

13

13

13

13 1

3

Figure 4: Transition graph of the “rooms and doors” Markov chain

If the state space happens to be the set S = 1, 2, · · · , s, it is also convenientto arrange the transition probabilities in an s× s matrix (pss′)

ss,s′=1, called the

transition matrix, which we denote by the same letter P .For example, the transition matrix in the “rooms and doors” Markov chain

of Example 2.5 is given by

P =

0 1/2 0 0 1/2

1/3 0 1/3 0 1/30 1/3 0 1/3 1/30 0 1 0 0

1/3 1/3 1/3 0 0

.

Sanity check: the sum of the numbers in each row should be equal to one(why?) Such matrices are called stochastic.

15

Remark 2.6. Transition probabilities are arranged in a matrix not just forgraphic convenience. It turns out that the linear-algebraic properties of thesematrices carry important information regarding the Markov chain!

Example 2.7. An alien travels in the solar system. When he arrives at a site(the sun, the earth or the moon), he forgets all the past, and decides on thenext step using the transition probabilities in Figure 2.7. The motion of this

À

ÊÁ

12

14

34

23

12

13

Figure 5: The transition graph for Example 2.7 (an alien travelling around theSolar System)

alien can be modelled by a Markov chain. If the initial distribution is uniform:

π0(À) = π0(Ê) = π0(Á) =1

3,

then

P(Ê À Ê Á Ê) =1

3

1

4

1

2

3

4

2

3=

1

48.

If we label À = 1, Ê = 2, Á = 3, we get the transition matrix

P =

0 12

12

14 0 3

413

23 0

Note: if we choose a different labelling of the sites, we get a different transitionmatrix!

Exercise 2.1.5. Draw the transition graph for the Markov chain of Exercise 2.1.3(bomb in a storehouse), and write down the transition matrix.

Exercise 2.1.6. Alice and Bob vote in each parliamentary election. If, in acertain election, Alice and Bob vote for the same party, they vote for it againin the next election. If they vote for different parties, next time each of themswitches their opinion independently with probability 1

4 .

16

1. Devise a Markov model to describe their votes in the n-th election, underthe simplifying assumptions that they are immortal, the parliamentarysystem in the UK is stable and there are only two parties (C and L).

2. Draw the transition graph and write down the transition matrix (notethat you need to number the states!)

3. Assume that in the n-th election Alice voted C and Bob voted L. What isthe (conditional) probability that in the n+ 2-nd election Alice will voteL and Bob will vote C?

4. Assume that in the n-th election Alice and Bob both voted L. What isthe (conditional) probability that in the n− 1-st election Alice voted C?

Exercise 2.1.7. A walker starts at a point S0 = s ∈ N. At the end of eachsecond, he stays at his previous position with probability 2

3 and jumps to s+ 1

with probability 13 .

1. Draw the transition graph of the corresponding Markov chain. (It has tobe infinite, so of course you can only draw a finite piece). If you are notafraid of infinite matrices, write down the transition matrix.

2. Show that for any 0 ≤ n < n′ the difference Sn′ − Sn is binomially dis-tributed: Sn′ − Sn ∼ Binom(n′ − n, 1

3).

17

2.2 Computation of multi-step transition probabilities

First, the distribution of Xn+1 is computed from the distribution of Xn. Then we turn to

multi-step transition probabilities. Transition probabilities describe the conditional distribution

of Xn+1 given Xn; similarly, r-step transition probabilities will describe conditional distribution

of Xn+r given Xn. At the end of the section, we preview a linear-algebraic method to compute

multi-step transition probabilities; this method will be developed in an orderly fashion in the

next section.

The first lemma tells us how to compute the probability distribution πn ofX0 inductively (recall that the initial distribution π0 is given).

Lemma 2.8. Let (Xn)n≥0 be a Markov chain with finite or countable statespace S and transition probabilities pss′. If πn : S → [0, 1] is the probabilitydistribution of Xn, i.e. πn(s) = P(Xn = s), then for any s′ ∈ S

πn+1(s′) =

∑s∈S

πn(s)pss′ . (2.6)

Proof. By the theorem of total probability (Theorem 1.4),

πn+1(s′) =

∑s∈S

P(Xn = s)P(Xn+1 = s′|Xn = s)

=∑s∈S

πn(s)P(Xn+1 = s′|Xn = s) .

The following observation is central to the sequel. Suppose S = 1, · · · , s;then we can write a probability distribution π on S as a row vector π =(π(1), · · · , π(s)). With this notation, (2.6) can be written as

πn+1 = πnP , (2.7)

i.e. on every step the distribution is multiplied by the transition matrix fromthe right.

Example 2.9. In Example 2.7 (alien in the Solar System),

π1 = π0P =(

13

13

13

) 0 12

12

14 0 3

413

23 0

=(

736

718

512

)and similarly

π2 = π1P =(

736

718

512

) 0 12

12

14 0 3

413

23 0

=(

1772

2772

2872

).

That is, P(X0 = Á) = 13 , P(X1 = Á) = 5

12 , P(X2 = Á) = 2872 .

18

Corollary 2.10. Let (Xn)n≥0 be a Markov chain with finite state space S =1, · · · , s and transition matrix P . Then

πn+r = πnPr .

Exercise 2.2.1. A grasshopper jumps between three flowers, labelled 1, 2 and3. He starts at one of the flowers, and he jumps at the end of each minute.Whenever he is about to jump, he chooses at random one of the two flowers heis not currently sitting on, choosing each with probability 1

2 , and he jumps tothat flower. Let Xn be the label of the flower he is sitting on after n minutes (sothat, for example, if Xn = 2 then Xn+1 is either 1 or 3). Then (X0, X1, X2, . . .)is a Markov chain with state space S = 1, 2, 3.

1. Draw the transition graph and write down the corresponding transitionmatrix.

2. Suppose P(X0 = 1) = P(X0 = 2) = 12 and P(X0 = 3) = 0. Find the

probability distribution of X1 and X2.

Exercise 2.2.2. A Markov chain has state space 1, 2, 3, 4 and transition matrix0 1

323 0

14 0 1

214

13

13

13 0

0 0 0 1

.

1. Draw the transition graph of the Markov chain.

2. Find the following probabilities.

a) P(X1 = 3 | X0 = 2),

b) P(X2 = 3 | X1 = 2),

c) P(X2 = 3 | X1 = 2, X0 = 1),

d) P(X2 = 3, X1 = 2 | X0 = 1),

e) P(X2 = 3 | X0 = 2).

Exercise 2.2.3. In the Land of Oz a snowy day is always followed by a rainyday, a rainy day is always followed by a nice day, and a nice day is equally likelyto be followed by a nice day, a rainy day or a snowy day.

1. Draw the transition graph for this Markov chain with state space N,R, Sfor nice, rain, snow and set up the transition probability matrix.

2. Suppose that it is rainy on 8 October. What is the probability that it willbe nice on 11 October?

19

Exercise 2.2.4. Suppose Xn is a Markov chain with state space S = 0, 1, 2and transition matrix 0.3 0.2 0.5

0.4 0.0 0.60.5 0.2 0.3

Assuming that P(X0 = 0) = 0.7,P(X0 = 1) = 0.1,P(X0 = 2) = 0.2,

1. Compute P(X0 = 2, X1 = 2, X2 = 1).

2. Compute P(X4 = 1, X0 = 2).

3. Compute P(X55 = 0 |X53 = 1).

Multi-step transition probabilities The next question we ask is what is theconditional distribution of Xn+r given Xn. For example, what is the probabilitydistribution of the grasshopper from Exercise 2.2.1 after r steps? Intuitively,when r is large, the probabilities P(Xr = j) (j = 1, 2, 3) should be approxi-mately equal (to each other and hence to 1

3).

Definition 2.11. Let (Xn)n≥0 be a Markov chain with state space S and tran-sition probabilites pss′. The numbers

p(r)ss′ =

∑s1,··· ,sr−1∈S

pss1ps1s2 · · · psr−2sr−1psr−1s′ (2.8)

are called the r-step transition probabilities.

As expected, p(r)ss′ is equal to the conditional probability that Xn+r = s′ con-

ditioned on Xn = s:

P(Xn+r = s′|Xn = s) =P(Xn = s,Xn+r = s′)

P(Xn = s)

=∑

s1,··· ,sr−1∈S

P(Xn = s,Xn+1 = s1, · · · , Xn+r−1 = sr−1, Xn+r = s′)

P(Xn = s)

=∑

s1,··· ,sr−1∈S

pss1ps1s2 · · · psr−2sr−1psr−1s′ = p(r)ss′ .

(2.9)

Exercise 2.2.5. Compute all the two-step transition probabilities in Exercise 2.2.1(Grasshopper).

If S = 1, · · · , s, (2.8) can be concisely rewritten as

(p(r))sss′=1 = P r .

This is clearly consistent with Corollary 2.10.

20

Exercise 2.2.6. Let (Xn) be a Markov chain on a finite or countable state spaceS, with transition probabilities P and initial distribution π0, and let m ≥ 1.Show that the random variables (Xmn)n≥1 form a Markov chain, and find thetransition probabilities.

Diagonalization: preview The point is that there are much better ways toraise a matrix to the r-th power than using the definition. Recall that a squarematrix P is called diagonalizable (over C) if there exists an invertible matrixM (with complex entries) and a diagonal matrix Λ (with complex entries) suchthat MΛM−1 = P . Calculating the r-th power of a diagonal matrix is easy: if

Λ =

λ1 0 · · · 00 λ2 · · · 0

0 0 . . . ...0 0 · · · λs

,

then

Λr =

λr1 0 · · · 00 λr2 · · · 0

0 0 . . . ...0 0 · · · λrs

.

It follows that if P is diagonalisable with MΛM−1 = P , then

P r = (MΛM−1)r = (MΛM−1)(MΛM−1) · · · (MΛM−1) = MΛrM−1 ,

so we can calculate P r (fairly) easily.

Example 2.12 (A simple weather model). The following Markov chain is asimple model for the weather. There are two states, 1=‘sunny’ and 2=‘rainy’;Xn represents the state of the weather on the (n + 1)th day of the year, soXn = 1 or 2 for all n ≥ 0. The transition probabilities are

p1,1 = 0.7, p1,2 = 0.3, p2,1 = 0.6, p2,2 = 0.4,

the transition graph is given in Figure 6 and the transition matrix

P =

(0.7 0.30.6 0.4

). (2.10)

The matrix (2.10) is easy to diagonalise. Indeed, the sum of the entries ineach row is one (this is true for any transition matrix), hence the vector

(11

)is

an eigenvector with eigenvalue λ = 1. The trace of P is equal to 1.1, hence

21

1 (sunny) 2 (rainy)

0.3

0.6

0.7 0.4

Figure 6: The transition graph of the simple weather model of Example 2.12.

the second eigenvalue is equal to 0.1; the corresponding eigenvector(xy

)can be

found from the equation P(xy

)= 0.1

(xy

):

0.7x+ 0.3y = 0.1x

0.6x+ 0.4y = 0.1y,

the solution of which is(xy

)=(

1−2

)(or any vector proportional to it). Hence

P

(1 11 −2

)=

(1 11 −2

)(1 00 0.1

),

i.e.

P =

(1 11 −2

)(1 00 0.1

)(1 11 −2

)−1

. (2.11)

Still focusing on our example, we observe that as r →∞(1 00 0.1

)r→(

1 00 0

),

hence

P r →(

1 11 −2

)(1 00 0

)(1 11 −2

)−1

=

(23

13

23

13

).

That is, the probability that the weather is sunny r days from now tends to 23

as r → ∞, regardless of the weather today. This matches our common sense:the weather in five years hardly depends on whether it rains today or not. Insubsequent lectures, we shall discuss the counterpart of this phenomenon forother Markov chains.

Exercise 2.2.7. Compute the 1000-step transition probabilities for the weatherMarkov chain of Example 2.12. You are welcome to rely on (2.11).

Exercise 2.2.8. For each r ≥ 1, compute the r-step transition probabilities forthe bomb Markov chain of Exercise 2.1.3, and find their limit as r →∞.

22

Remark 2.13. Unfortunately, the transition matrix of a Markov chain can failto be diagonalizable over C: for example,

P =

12

12 0

0 12

12

0 0 1

is a stochastic matrix (so it is the transition matrix of some Markov chain), butit is not diagonalizable over C.

Exercise 2.2.9 (for enthusiasts). Check that the above matrix is not diagonal-isable over C.

23

2.3 Computation of multi-step transition probabilities – cont.

Review of basic notions from linear algebra: eigenvalues, eigenvectors, diagonalisation, alge-

braic and geometric multiplicity. Please make sure that you can use them confidently (including

computations with complex numbers!) The all-ones vector is an eigenvector of any transition

matrix with eigenvalue λ = 1. Diagonalisation helps us compute multi-step transition probabil-

ities.

Linear-algebraic preliminaries In this paragraph we review the necessary pre-liminaries. The exposition is somewhat brief, more details can be found in anyreasonable textbook in linear algebra.

Recall that if P is an s× s complex matrix, a complex number λ ∈ C is saidto be an eigenvalue of P if there exists a column-vector v ∈ Cs with v 6= 0, suchthat Pv = λv; such a column-vector v is said to be an eigenvector of P witheigenvalue λ. An equivalent definition is as follows: λ is an eigenvalue of P iffit is a root of the characteristic polynomial of P , defined by

χP (t) = det(P − t1).

A matrix P is diagonalizable if there exist an invertible M and a diagonalΛ = diag(λ1, · · · , λs) such that P = MΛM−1. This relation is equivalent toPM = MΛ, which means that the columns vj of M satisfy Pvj = λjvj, in otherwords, vj is an eigenvector of P with eigenvalue λj. Thus P is diagonalizable ifand only if it has an eigenbasis, i.e. a basis consisting of eigenvectors.

The eigenvalues of P are exactly the roots of χP , therefore P has s eigenvalues,if multiplicity is accounted properly. Each eigenvalue has at least one eigenvec-tor corresponding to it, and eigenvectors corresponding to different eigenvaluesare linearly independent. Therefore we have a sufficient condition for diagonal-izability:

Fact 2.14. Every s× s complex matrix that has s distinct eigenvalues is diag-onalizable over C.

This condition is not necessary (not even for stochastic matrices). A necessaryand sufficient condition is given in terms of the multiplicities of the eigenvalues.

Definition 2.15. If P is an s×s complex matrix and λ is an eigenvalue of P , thegeometric multiplicity γP (λ) of λ is the dimension of the space of eigenvectorsv : Pv = λv.

Clearly, P is diagonalizable if and only if the sum of the geometric multiplic-ities of all the eigenvalues is equal to s. This can be further restated using

Definition 2.16. If P is an s × s complex matrix and λ is an eigenvalue ofP , the algebraic multiplicity µP (λ) of λ is the maximal integer m such that

24

(t− λ)m is a factor of χP (t). Informally, it is the ‘multiplicity of λ’ as a root ofthe characteristic polynomial χP (t).

By the fundamental theorem of algebra, the algebraic multiplicities of allthe eigenvalues add up to s. On the other hand, it is not hard to see thatµP (λ) ≥ γP (λ) for each eigenvalue λ. Therefore the geometric multipliciiesadd up to n if and only if each of them is equal to the corresponding algebraicmultiplicity. We state this as

Fact 2.17. If P is an s× s complex matrix, then P is diagonalisable over C ifand only if γP (λ) = µP (λ) for every eigenvalue λ of P .

This leads to the following algorithm for checking diagonalizability and diag-onalizing P :

1. Compute the characteristic polynomial χP (t), find its roots λj and theiralgebraic multiplicity µP (λj).

2. For each j, solve the linear equation Pv = λjv; compute the dimensionγP (λj) of the space of solutions. If, for some j, γP (λj) < µP (λj), thematrix is not diagonalisable. Otherwise, find a basis in the space of solu-tions.

3. Take as M the matrix the columns of which are the basis vectors from(2), and as Λ – the diagonal matrix in which each λj apears µP (λj) timeson the diagonal.

Example 2.18. Consider the matrix

P =

7 7 77 7 77 7 7

.

Its characteristic polynomial is

χP (t) = det

7− t 7 77 7− t 77 7 7− t

= det

7− t 7 7t −t 0t 0 −t

= (7− t)t2 + 7t2 + 7t2 = (21− t)t2 .

Therefore the eigenvalues are λ = 0 and λ = 21 algebraic multiplicities µP (0) =2 and µP (21) = 1. For each of the two, we set up the system of equationsPv = λv: for λ = 0,

7x = 7y + 7z = 0

25

has a two-dimensional space of solutions, whereas for λ = 21,−14x+ 7y + 7z = 0

7x− 14y + 7z = 0

7x+ 7y − 14z = 0

has a one-dimensional space of solutions. Hence γP (0) = 2 = µP (0) andγP (21) = 1 = µP (21), i.e. the matrix is diagonalisable. We find a basis foreach these two spaces (of course, the choice of the basis is not unique!):

1−10

,

01−1

,

1

11

.

Finally, P = MΛM−1, where

Λ =

0 0 00 0 00 0 21

, M =

1 0 1−1 1 10 −1 1

.

Exercise 2.3.1. Which of the following matrices are diagonalizable?

A =

(4 2−2 0

), B =

(3 −11 0

), C =

1 2 32 4 63 6 9

Finally, we recall

Fact 2.19. For any matrix P , there exist an invertible matrix M such thatT = M−1PM is the form:

T =

λ1 ∗ ∗ 0 0 · · ·0 λ1 ∗ 0 0 · · ·0 0 λ1 0 0 · · ·0 0 0 λ2 ∗ · · ·0 0 0 0 λ2 · · ·

· · ·0 0 0 0 0 · · · λk ∗ ∗0 0 0 0 0 · · · 0 λk ∗0 0 0 0 0 · · · 0 0 λk

(2.12)

where λ1, · · · , λk are the distinct eigenvalues, and for each j the matrix Tcontains a µP (λj)×µP (λj) upper triangular block with λj on the main diagonal.

26

Eigenvalues of transition matrices In the case of stochastic matrices, thecomputation of the eigenvalues is facilitated by

Lemma 2.20. If P is a stochastic matrix, then the all-1’s column-vector 1 =(1, 1, . . . , 1)> is an eigenvector of P with eigenvalue 1.

Proof. Let u = (1, 1, . . . , 1)>. Then for all i ∈ S, we have (Pu)i =∑

j∈S Pi,juj =∑j∈S Pi,j = 1 = ui, so Pu = u, as required.

Example 2.21. Consider the Markov chain with transition graph in Figure 2.21,and transition matrix equal to

1

23

12

12

12

12

12

12

Figure 7: The transition graph for Example 2.22

P =

12

12 0

0 12

12

12 0 1

2

The characteristic polynomial of P is given by

χP (t) = det(P − t1) =

∣∣∣∣∣∣12 − t

12 0

0 12 − t

12

12 0 1

2 − t

∣∣∣∣∣∣ = (1

2− t)3 +

1

8

which has three roots: 12(1+1) = 1, 1

2(1+ω) and 12(1+ω), where ω = −1+i

√3

2 and

ω = −1−i√

32 are the cubic roots of unity. The eigenvalues are simple, hence P is

diagonalisable. The eigenvector corresponding to 1 is (1, 1, 1)>; the eigenvectorscorresponding to ω/4 and ω/4 are

(1, ω, ω)>, (1, ω, ω)>,

respectively. Hence finally

P =

1 1 11 ω ω1 ω ω

1 0 00 1

2(1 + ω) 00 0 1

2(1 + ω)

1 1 11 ω ω1 ω ω

−1

.

27

As r →∞,

P r →

1 1 11 ω ω1 ω ω

1 0 00 0 00 0 0

1 1 11 ω ω1 ω ω

−1

=

13

13

13

13

13

13

13

13

13

Example 2.22. Consider the Markov chain with transition graph in Figure 2.22,and transition matrix

1

23

12

14

12

14

14

12

14

14

14

Figure 8: The transition graph for Example 2.22

P =

14

12

14

14

14

12

12

14

14

The characteristic polynomial of P is given by

χP (t) = det(P − t1) =

∣∣∣∣∣∣14 − t

12

14

14

14 − t

12

12

14

14 − t

∣∣∣∣∣∣ = −t3 +3t2

4+

3t

16+

1

16

The eigenvalue t = 1 has to be a root, hence we factor out t− 1 and then findthe remaining two roots:

χP (t) = −t3 +3t2

4+

3t

16+

1

16= (t− 1)(−t2 − t

4− 1

16)

= −(t− 1)(t− −1 + i√

3

8)(t− −1− i

√3

8) = −(t− 1)(t− ω

4)(t− ω

4) ,

where as before ω = −1+i√

32 and ω = −1−i

√3

2 are the cubic roots of unity. The

eigenvector corresponding to 1 is (1, 1, 1)>; the eigenvectors corresponding toω/4 and ω/4 are

(1, ω, ω)>, (1, ω, ω)>,

28

respectively. Hence finally

P =

1 1 11 ω ω1 ω ω

1 0 00 ω

4 00 0 ω

4

1 1 11 ω ω1 ω ω

−1

.

As r →∞,

P r →

1 1 11 ω ω1 ω ω

1 0 00 0 00 0 0

1 1 11 ω ω1 ω ω

−1

=

13

13

13

13

13

13

13

13

13

# Food for thought: why do the transition matrices of Examples 2.21 and 2.22have the same eigenvectors and the same limit of P r? "

Exercise 2.3.2. Diagonalize the transition matrix of the grasshopper Markovchain from Exercise 2.2.1 and find the limit of P r as r →∞.

Exercise 2.3.3. Three distinguishable tokens, A, B and C, are placed in a line on atable. On each time step, two of the three coins are selected, with all possibilitiesequally likely. The positions of the two selected tokens are interchanged.

1. Model this process as a Markov chain. How many states does it have?Draw a transition graph of the Markov chain.

2. Determine P(X100 = ABC | X0 = ABC) and P(X100 = BAC | X0 = ABC).(Here, we are using a natural convention for labelling the states of theMarkov chain.)

Exercise 2.3.4. A taxi driver works in Newham, Tower Hamlets, and the Cityof London. A journey from Newham ends in Newham or in Tower Hamlets,with probability 1/2 for each of the options. A journey from Tower Hamletsends in Newham or in the City of London, with probability 1/2 for each of theoptions. A journey starting in the City of London ends in the City of Londonor in Tower Hamlets, with probability 1/2 for each of the options. The drivermakes 100 consecutive journeys (so that the next journey starts in the boroughin which the previous one ended), with the first one starting in Newham. Findthe probability that his last journey ends in Newham.

Exercise 2.3.5. Any 2× 2 stochastic matrix P =

(p11 p12

p21 p22

)is diagonalizable,

with eigenvalues 1 and p11 + p22 − 1.

29

2.4 Limiting distribution of Markov chains

The limiting distribution describes the distribution of Xn in the n → ∞ limit. We formally

define it, and discuss examples of Markov chains in which it exists and examples in which it

does not exist.

The analysis of Examples 2.12 and 2.22 suggests the following

Definition 2.23. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite or countable state space S. A probabilitydistribution π on S (i.e. a function π : S → [0, 1] such that

∑s∈S π(s) = 1) is

said to be the limiting distribution for P if for any s, s′ ∈ S

limr→∞

p(r)ss′ = π(s′) . (2.13)

Remark 2.24. Note that the definition depends only on the transition prob-abilities and not on the initial conditions, even if we speak of the limitingdistribution of a Markov chain. Thus we require (2.13) to hold even for thoses for which P(X0 = s) = 0.

Example 2.25. The limiting distribution for the Markov chain of Example 2.12is

π2.12(s) =

23 , s = sunny13 , s = rainy

.

In Example 2.22, π2.22(s) = 13 for s = 1, 2, 3 (i.e. π is the uniform distribution).

In the notation of row vectors:

π2.12 = (2

3,1

3) , π2.22 = (

1

3,1

3,1

3) .

Some transition matrices do not have a limiting distribution. Here are a fewexamples. It may be helpful to draw a transition graph for each of these!

Example 2.26 (Chain with non-communicating parts).

P =

12

12 0 0

12

12 0 0

0 0 13

23

0 0 13

23

Here P = P 2 = P 3 = · · · , and therefore

limr→∞

p(r)ss′ =

σ12(s

′) , s ∈ 1, 2σ34(s

′) , s ∈ 3, 4 ,

30

where

σ12 = (1

2,1

2, 0, 0) , σ34 = (0, 0,

1

3,2

3) .

That is, we have one limit for one subset of the states, and another one – foranother subset.

Example 2.27 (Parity-respecting chain).

P =

0 0 1

323

0 0 13

23

12

12 0 0

12

12 0 0

Here P = P 3 = P 5 = · · · , while

P 2 = P 4 = · · · =

12

12 0 0

12

12 0 0

0 0 13

23

0 0 13

23

and therefore the limit lim

r→∞p

(r)ss′ does not exist. We have different sequential

limits along the even and odd subsequences of integers.

Exercise 2.4.1. Construct a Markov chain for which the sequence p(r)ss′ would

have exactly seven sequential limits as r →∞.

Example 2.28 (Probability escapes to infinity). Let X1, X2, · · · be indepen-dent, identically distributed Bernoulli(1

2) random variables. Then Sn = S0 +X1+· · ·+Xn form a Markov chain with state space S = Z+. For each s, s′ ∈ Z+,

limn→∞

P(Sn > s′|S0 = s) = 1 ,

which means that limr→∞

p(r)ss′ ≡ 0. That is, the limit exists, but it is not a proba-

bility distribution.

These three examples are in a certain sense representative: appropriate con-ditions ruling them out ensure that there is a limiting distribution. We shallfocus on the case of Markov chains with finite state space; in this case, we donot need to worry about the situation of Example 2.28.

Exercise 2.4.2. Find the limiting distribution for Exercise 2.1.3 (“bomb in astorehouse”).

Exercise 2.4.3. A 2 × 2 transition matrix P has a limiting distribution unless

P =

(1 00 1

)or P =

(0 11 0

).

31

Exercise 2.4.4. A stochastic matrix P has a limiting distribution π if and onlyif for any initial distribution π0 one has

limr→∞

π0Pr = π .

32

2.5 Equilibrium distributions

An equilibrium distribution is a probability distribution on the state space which is preserved

by the Markov chain (equivalently, is an eigenvector of the transposed transition matrix with

eigenvalue 1). If the state space is finite, an equilibrium distribution always exists, but it is

not necessarily unique. If a chain has a limiting distribution, then it is the unique equilibrium

distribution.

We postpone the discussion of existence of limiting distributions and startwith a more general notion.

Definition 2.29. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite or countable state space S. A probabilitydistribution π on S is said to be an equilibrium (= stationary = invariant)distribution for P if

∀s′ ∈ S∑s∈S

π(s)pss′ = π(s′) ; (2.14)

in other words, if Xn has distribution π, then also Xn+1 has distribution π. Inthe case S = 1, · · · , s, (2.14) can be concisely written as

πP = π , (2.15)

i.e. π> is an eigenvector of P> with eigenvalue 1.

Remark 2.30. P and P> have the same eigenvalues (in fact, with the samealgebraic and geometric multiplicities), hence by Lemma 2.20 λ = 1 is alwaysan eigenvalue of P>. However, it is not a priori clear whether one of sucheigenvectors is a probability distribution. We later show that this is indeed thecase, i.e. an equilibrium distribution always exists.

Remark 2.31. Any eigenvalue of a transition matrix is at most one in absolutevalue. Indeed, if Pv = λv then for any s ∈ S

λv(s) =∑s′∈S

pss′v(s′) .

Suppose |λ| > 1. Dividing both sides by λ and taking the maximum over s, weobtain:

maxs|v(s)| = 1

|λ|maxs|∑s′

pss′v(s′)| ≤ 1

|λ|maxs

∑s

pss′|v(s′)|

≤ 1

|λ|maxs′|v(s′)|

∑s′

pss′ =1

|λ|maxs|v(s)| ,

i.e. maxs |v(s)| = 0 and v ≡ 0.

33

Lemma 2.32. Suppose P is the array of transition probabilities of a Markovchain on a finite or countable state space S. If π is the limiting distribution forP , then it is the unique equilibrium distribution.

Proof. Assume that S is finite. If π is the limiting distribution, then for anys, s′ ∈ S,

limr→∞

p(r)ss′ = π(s′) .

Choosing an arbitrary s∗ ∈ S in place of s, we have:∑s∈S

π(s)pss′ =∑s∈S

limr→∞

p(r)s∗spss′ = lim

r→∞

∑s∈S

p(r)s∗spss′

= limr→∞

p(r+1)s∗s′ = π(s′) .

This holds for any s′, hence π is an equilibrium distribution. # (For infinite S,an extra argument is required to exchange the sum and the limit. We omit thedetails.7) " If π is another equilibrium distribution, then it is also invariantunder r-fold application of P , i.e. for any s′ ∈ S and any r ≥ 1

π(s′) =∑s∈S

π(s)p(r)ss′ ,

whenceπ(s′) = lim

r→∞

∑s∈S

π(s)p(r)ss′ . (2.16)

On the other hand,

limr→∞

p(r)ss′ = π(s′) ,

hence

limr→∞

∑s∈S

π(s)p(r)ss′ =

∑s∈S

π(s) limr→∞

p(r)ss′ =

∑s∈S

π(s)π(s′) = π(s′) . (2.17)

From (2.16) and (2.17) π = π.

Example 2.33. Let P be the s×s identity matrix. It defines the boring Markovchain X0 = X1 = X2 = · · · . Clearly, it has no limiting distribution (unlesss = 1), and equally clearly, any distribution is an equilibrium distribution.

Example 2.34. Consider tha transition matrix

P =

0 1/3 2/30 1 00 0 1

.

7Hint: Let ε > 0. Then there exists a finite subset Sε ⊂ S such that

∑s∈Sε

π(s) ≥ 1− ε.

34

It does not have a limiting distribution (why not?) Let us find all the equilib-rium distributions. If π = (p, q, r) is a distribution, we should have p, q, r ≥ 0and p+ q + r = 1. Then

πP = (0, q, r) ,

therefore π is an equilibrium distributioin if and only if p = 0. That is, the set ofequilibium distributions consists of all the distributions of the form (0, q, 1−q)for some q ∈ [0, 1].

Exercise 2.5.1. In the setting of Example 2.26 (a chain with non-communicatingparts):

1. Show that σ12 and σ34 are both equilibrium distributions.

2. Find all the equilibrium distributions.

Exercise 2.5.2. In the setting of Example 2.27:

1. Show that neither (12 ,

12 , 0, 0) nor (0, 0, 1

3 ,23) is an equilibrium distribution.

2. Find an equilibrium distribution.

3. Show that the equilibrium distributuon you have found is unique.

Exercise 2.5.3. Reconsider the Markov chain from Exercise 2.1.4 (model for En-glish text). Does it have an equilibrium distribution? What is its significance?

Exercise 2.5.4. Show that in Exercise 2.28 (sums of Bernoulli random variables)there is no equilibrium distribution.

Exercise 2.5.5. Consider the Markov chain on state space Z+, with transitionprobabilities as follows.

pij =

1, if i = 0 and j = 1;13 , if i ≥ 1 and j = i+ 1;23 , if i ≥ 1 and j = i− 1;

0, otherwise.

Does this Markov chain have an equilbrium distribution? If so, what is it?

If the state space is finite, there is always an equilibrium distribution.

Proposition 2.35. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite state space S. Then P has an equilibriumdistribution.

#

35

Proof. Without loss of generality S = 1, · · · , s, so we can use linear-algebraicnotation. Let µ be an arbitrary probability distribution on S. Define a sequenceof probability distributions

πN =1

N

N−1∑n=0

µP n .

Each component of πN lives in the compact interval [0, 1], therefore we canchoose a subsequence πNk such that each component πNk(s) converges to alimit π(s) ∈ [0, 1]. Then∑

s

π(s) =∑s

limk→∞

πNk(s) = limk→∞

∑s

πNk(s) = 1 ,

i.e. the limit π is indeed a probability distribution.8 It remains to show that πis an equilibrium distribution:

πP = limk→∞

πNkP = limk→∞

1

Nk

Nk−1∑n=0

µP nP = limk→∞

1

Nk

Nk−1∑n=0

µP n+1

= limk→∞

1

Nk

Nk∑n=1

µP n = limk→∞

1

Nk

[Nk−1∑n=0

µP n − µ+ µPNk

].

Each component 1Nkµ and of 1

NkµPNk is bounded by 1/Nk and hence tends to

zero. Therefore

πP = limk→∞

1

Nk

Nk−1∑n=0

µP n = π .

"

Exercise 2.5.6. In the setting of Example 2.26 (a chain with non-communicatingparts), which equilibrium distributions can be obtained using the constructionfrom the proof above?

8Here we used the assumption that S is finite. For infinite S, this step does not work, and indeed the propositionis not valid!

36

2.6 Irreducibility

If the state space of a Markov chain is finite and irreducible, i.e. the state space can not be

split in two parts such that one can never get from the first one to the second one, then the

equilibrium distribution is unique, and moreover λ = 1 is a simple eigenvalue of the transition

matrix. The strong law of large numbers tells us that this equilbrium distribution describes the

statistics of visits to each vertex in the long-time limit.

Uniqueness of the equilibrium distribution We would like to check whetherthe equilibrium distribution is unique. Inspecting Example 2.26, we see thatthe state space consists of two parts: S = 1, 2 ∪ 3, 4, so that one can notreach one from the other one. In this case there is no natural reason to expectthe equilibrium distribution to be uinque. The following definition singles outa class of Markov chains for which one can reach any state from any other one.

Definition 2.36. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite or countable state space S. P is said to be

irreducible if for any s, s′ ∈ S there exists r = r(s, s′) ≥ 1 such that p(r)ss′ > 0.

Otherwise, it is called reducible.

For example, the transition matrix of Example 2.26 (chain with non-communicatingparts) is reducible. The transition matrix of Example 2.27 (parity-respectingchain) is irreducible. Here is a further example:

Example 2.37. The transition matrix

P =

(12

12

0 1

)is reducible.

Exercise 2.6.1. Determine which of the following transition matrices are irre-ducible:

P1 =

0 1

212 0 0

0 0 12

12 0

0 0 0 12

12

0 0 0 0 112 0 0 0 1

2

, P2 =

0 0 1

212 0

0 0 12

12 0

0 12 0 0 1

20 0 0 0 112 0 0 0 1

2

, P3 =

0 1

212 0 0

0 12 0 1

2 014 0 1

414

14

0 0 0 0 10 0 0 1

212

.

Theorem 2.38 (Perron-Frobenius). Let P be a (finite) stochastic matrix. IfP is irreducible, then P has a unique equilibrium distribution π. # Moreover,one is a simple eigenvalue of P and P>: µP (1) = µP>(1) = 1. "

37

Proof. Choose R sufficiently large so that for each s, s′ ∈ S there exists 0 ≤r ≤ R− 1 such that p

(R)ss′ > 0. Denote

P =1

R

R−1∑r=0

P r .

Then every equilibrium distribution for P is also an equilibrium distributionfor P . The advantage of P is that all of its matrix entries are strictly positive.

Let v be an eigenvalue of P (and hence also of P ) with eigenvalue 1. As inRemark 2.31,

maxs|v(s)| = max

s|∑s′

pss′v(s′)| ≤ maxs

∑s

pss′|v(s′)|

≤ maxs|v(s)|

∑s′

pss′ = maxs|v(s)| ,

whence all the inequalities (and particularly the first one) are equalities. Inparticular, all the components of v have the same argument, i.e. v is proportionalto a probability vector. If v and v′ are two different probability vectors whichare both eigenvectors of P with eigenvalue 1, then their difference is again aneigenvector. However, it is not proportional to a probability vector, since thesum of the coordinates is zero. The contradiction shows that γP (1) = 1, inparticular, there is a unique equilibrium distribution.# Assume that µP (1) > 1. Then by Fact 2.19 there exists a vector w such

that Pw = w + cv fpr some c 6= 0. Then

s∑s=1

|(P rw)(s)| → ∞ , r →∞ ,

which contradicts that for any w

s∑s=1

|(Pw)(s)| ≤s∑s=1

|w(s)| .

"

Exercise 2.6.2. Check that the reducible transition matrix from Example 2.37has a unique equilibrium distribution (and in fact a limiting distribution). Thusirreducibility is sufficient but not necessary for the uniqueness of equilibrium.

Law of large numbers For a Markov chain (Xn)n≥0, denote:

#s(N) = # 0 ≤ n ≤ N − 1 |Xn = s , N ≥ 1 , s ∈ S .

38

Theorem 2.39. Let (Xn)n≥0 be a Markov chain on a finite state space S, withirreducible transition array P and arbitrary initial distribution. Then for eachs ∈ S

P

limN→∞

#s(N)

N= π(s)

= 1 ,

where π is the unique equilibrium distribution for P .

We do not prove this theorem.

Example 2.40 (The Ehrenfest Urn model). Suppose there are two urns, urn Aand urn B, containing a total of four balls. Every minute, I choose uniformly atrandom one of the four balls (so I choose each with probability 1/4), and I moveit from the urn it is currently in, to the other urn. Estimate the proportion oftime for which urn A is empty, over a very long period of time.

We model this situation as a Markov chain (X0, X1, . . .) with 5 states; state iis where urn A contains exactly i balls. So S = 0, 1, 2, 3, 4, and the transitiongraph is

0 1 2 3 4

1 3/4 1/2 1/4

13/41/21/4

The transition matrix is therefore

P =

0 1 0 0 0

1/4 0 3/4 0 00 1/2 0 1/2 00 0 3/4 0 1/40 0 0 1 0

.

This Markov chain is clearly irreducible, so it has a unique equilibrium dis-tribution, π = (π(0), π(1), π(2), π(3), π(4)) say, and π satisfies the equationsπP = π,

∑4i=0 π(i) = 1, i.e.

14π(1) = π(0),

π(0) + 12π(2) = π(1),

34π(1) + 3

4w3 = π(2);12π(2) + π(4) = π(3);

14π(3) = π(4),

π(0) + π(1) + π(2) + π(3) + π(4) = 1.

39

The first and fifth equations give π(1) = 4π(0), π(3) = 4π(4) respectively.Substituting the first equation into the second gives π(0) + 1

2π(2) = 4π(0)so π(2) = 6π(0), and similarly substituting the fifth equation into the fourthgives π(2) = 6π(4). Hence, π(4) = π(0) and so π(1) = π(3) = 4π(0), andπ(2) = 6π(0). Substituting these into the last equation gives 16π(0) = 1, soπ(0) = 1/16 and therefore

π = (1/16, 4/16, 6/16, 4/16, 1/16) = (1/16, 1/4, 3/8, 1/4, 1/16).

In particular, π(0) = 1/16, so by the previous theorem, we have

P(

#n ∈ 0, 1, . . . , N − 1 : urn A is empty at time nN

→ 1

16

)= 1,

hence, over a very long period of time, urn A will be empty roughly (1/16)thof the time, with probability 1.

Exercise 2.6.3. On each evening, the owner of a car washes it with probability0.6. Independently of that event, every night a dirty rain pours on the car withprobability 0.2. In the long run, on which fraction of the mornings is the carclean?

Exercise 2.6.4. An individual either drives his car or walks in going from hishome to his office in the morning, and from his office to his home in the after-noon. He uses the following strategy: If it is raining in the morning, then hedrives the car, provided it is at home to be taken. Similarly, if it is raining inthe afternoon and his car is at the office, then he drives the car home. He walkson any morning or afternoon that it is not raining or the car is not where he is.Assume that, independent of the past, it rains during successive mornings andafternoons with constant probability p.

1. In the long run, on what fraction of days does our man walk in the rain?

2. What if he owns two cars?

40

2.7 Existence of the limiting distribution; aperiodicity

An transition matrix has a limiting distribution unless it has a non-trivial period, i.e. a natural

number R > 1 such that p(r)ss can be positive only for r divisible by R. A transition matrix

without such periods is called aperiodic. In linear-algebraic terms, all the eigenvalues of an

irreducible aperiodic matirx are strictly less than one in absolute value, except for the simple

eigenvalue λ = 1.

We have seen that an irreducible finite Markov chain always has a uniqueequilibrium distribution, but it does not necessarily have a limiting distribution.

Let us return to Example 2.27. The transition matrix

P =

0 0 1

323

0 0 13

23

12

12 0 0

12

12 0 0

(2.18)

is irreducible, but there is no limiting distribution. In this case, it is clear whatthe problem is: the states are divided into two classes, S = 1, 2∪3, 4; if westart at the first class, we must be in the first class on all the even steps, andin the second class on the odd steps (and vice versa). Here is another example:

P =

0 0 1

212 0 0

0 0 13

23 0 0

0 0 0 0 13

23

0 0 0 0 23

13

23

13 0 0 0 0

34

14 0 0 0 0

(2.19)

Exercise 2.7.1. Check that the transition matrix from (2.19) is irreducible.

The transition matrix (2.19) has no limiting distribution. Indeed, the statesare divided into three classes: S0 = 1, 2, S1 = 3, 4, and S2 = 5, 6.If we start from an initial state in S0, we will have X0, X3, X6, · · · ∈ S0,

X1, X4, X7, · · · ∈ S1 and X2, X5, X8, · · · ∈ S2. Hence p(3r)0,s′ is non-zero only

for s′ ∈ S0, whereas p(3r+1)0,s′ is non-zero for s′ ∈ S1.

To rule out examples such as (2.18) and (2.19), we introduce several defini-tions.

Definition 2.41. Let P be a transition array of a Markov chain on a finite orcountable state space S. The period of a state s ∈ S is defined as

R(s) = gcdr ≥ 1 | p(r)

ss > 0.

If p(r)ss = 0 for all r ≥ 1, we set R(s) = 0. If R(s) = 1 for all s ∈ S, we say that

P is aperiodic.

41

Exercise 2.7.2. If P is irreducible, then R(s) is the same for all s ∈ S. Inparticular, if P is irreducible and there exists s ∈ S such that pss > 0, then Pis aperiodic.

Exercise 2.7.3. For P from (2.18), verify that R(s) = 2 for all s ∈ S.

Exercise 2.7.4. For P from (2.19), verify that R(s) = 3 for all s ∈ S.

Theorem 2.42. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite state space S. If P is irreducible andaperiodic, then

1. all the eigenvalues of P except for the simple eigenvalue 1 are stricltly lessthan 1 in absolute value (|λ| < 1);

2. P has a limiting distribution.

We shall prove Theorem 2.42 after discussing a few examples.

Example 2.43. Let

P =

0 1 0 0 0 0 0 00 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 1/6 0 0 0 5/6 0 00 0 1/7 0 0 0 6/7 00 0 0 1/8 0 0 0 7/8

1/9 0 0 0 8/9 0 0 0

.

Then p(r)11 > 0 for r = 8, 12, 16, · · · . Hence R(1) = 4. Similarly (or invoking

Exercise 2.7.2), we have that R(s) = 4 for all s.

Example 2.44. Let

P =

0 1 0 0 0 0 0 00 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 1/6 0 0 0 5/6 0 00 0 1/7 0 0 0 6/7 00 0 0 1/8 0 0 0 7/8

1/9 0 0 0 7/9 0 0 1/9

.

Now p(r)11 > 0 for r = 8 and also for r = 9. Hence R(1) = 1. Similarly (or

invoking Exercise 2.7.2), we have that R(s) = 1 for all s. In particular, P isirreducible (consistently with Exercise 2.7.2).

42

Exercise 2.7.5. For each of the following transition matrices, say whether it isirreducible, aperiodic, or neither. Which of them have a unique equilibrium dis-tribution, and what is it? Which of them have a limiting distribution? Justifyyour answers.

P1 =

12

12 0 0

0 13

23 0

0 0 12

12

13 0 0 2

3

, P2 =

25

15

25

0 1 00 0 1

(2.20)

P3 =

12

12 0

0 12

12

0 0 1

, P4 =

12

14

14

0 0 10 1 0

, P5 =

0 1

2 0 12

0 0 1 00 1

2 0 12

1 0 0 0

(2.21)

Exercise 2.7.6. Consider the transition matrix

Pp =

0 1

2 p 12 − p

0 0 0 10 0 0 11 0 0 0

where 0 ≤ p ≤ 1/2 is a parameter.

1. For which values of p is this Markov chain irreducible? Justify your an-swer.

2. For which values of p is this Markov chain aperiodic? Justify your answer.

3. For the range of values of p covered by both (a) and (b), calculate thelimiting distribution of the Markov chain (as a function of p).

4. Let p be such that the Markov chain is irreducible but not aperiodic.What is P(X1000 = 1 | X0 = 1) and why?

Exercise 2.7.7. There are five light bulbs in a room, and no other sources oflight. Each light bulb can either be on or off. Every minute, one of the fivelight bulbs is chosen at random (each is chosen with probability 1/5), and itsswitch is flipped (if it was on, it is turned off, and if it was off, it is turned on).

1. Show how to model the level of light in the room (after t minutes) as aMarkov chain with six states.

2. Show that this Markov chain does not have a limiting distribution.

3. Does this Markov chain have an equilibrium distribution? If so, find anequilibrium distribution, and say whether or not it is the only equilibriumdistribution.

43

Exercise 2.7.8. Let n ∈ N with n ≥ 2. A grasshopper jumps about betweenn flowers, labelled 1, 2, . . . , n. Every minute, if he is on a particular flower,he chooses at random one of the other n − 1 flowers (he chooses each of themwith the same probability, 1

n−1), and he jumps to that other flower. Suppose hestarts on flower 1. Write down the proportion of time he spends on flower 2, inthe long run. In other words, write down

limN→∞

number of visits to flower 2 between time 0 and time N − 1

N.

For which n does the corresponding Markov chain have a limiting distribution?(Justify your answer.) Find this limiting distribution (your answer should de-pend on n).

Exercise 2.7.9. An airline reservation system has a single computer, whichbreaks down on any given day with probability p > 0. It takes two days torestore a failed computer to normal service. Consider a Markov chain definedby taking as states the pairs (x, y), where x is the number of machines in operat-ing condition at the end of a day and y is 1 if a day’s labour has been expendedon a machine, and 0 otherwise. The state space is S = (1, 0), (0, 0), (0, 1) andthe transition probability matrix is

P =

q p 00 0 11 0 0

where q = 1− p.

1. Show that P is irreducible and aperiodic.

2. Compute the system availability, which is defined to be probability in thelong run that a machine is working at the end of the day.

3. Suppose that the repair facility is improved and that it now takes onlyone day to restore a failed computer to normal service. Form a Markovchain appropriate to the new situation and determine the new systemavailability. Is the new system availability larger or smaller than it wasin part b) ?

Exercise 2.7.10. Recall Example 1.2 (random walk of a knight).

1. Describe the process using a Markov chain (no need to draw the transitiongraph or to write down all the transition probabilities!)

2. Is the transition matrix irreducible? Is it aperiodic?

44

3. For which other chess pieces is the analogous random walk irreducible?For which pieces is it aperiodic? (Please ignore capturing, castling, empassant, promotion, et cet.)

#

Proof of Theorem 2.42

Step 1 First, we show that the first part of the theorem implies the secondpart:

Proposition 2.45. Let P be a stochastic matrix. If all the eigenvalues of Pexcept for the simple eigenvalue 1 are stricltly less than 1 in absolute value(|λ| < 1), then P has a limiting distribution.

Proof. Consider the special case when P is diagonalisable. Then P = MΛM−1,where

Λ = diag(1, λ2, · · · , λs) , max2≤j≤s

|λj| < 1 .

The first column of M is the eigenvector of P corresponding to the eigenvalue 1,i.e. a column of ones 1. Similarly, P> = (M−1)>ΛM>, whence the first columnof (M−1)> is the corresponding eigenvector of P>, i.e. the unique equilibriumdistribution π, transposed and perhaps multiplied by some number c ∈ C. Thus

P r = MΛrM−1 →ME11M−1 ,

where E11 is the matrix the only non-zero element of which is a 1 in the top-leftcorner. The right-hand side is equal to

1 π =

cπ(1) cπ(2) c · · · cπ(s)cπ(1) cπ(2) · · · cπ(s)· · ·cπ(1) cπ(2) · · · cπ(s)

.

This has to be astochastic matrix, hence in fact c = 1, and we see that π is thelimiting distribution.

If P is not diagonalisable, we need to use Fact 2.19 to write P = MTM−1,where

T =

1 0 0 0 · · ·0 λ2 ∗ ∗ · · ·0 0 λ2 ∗ · · ·0 0 0 λ2 · · ·

· · ·0 0 0 0 · · · λk ∗ ∗0 0 0 0 · · · 0 λk ∗0 0 0 0 · · · 0 0 λk

(2.22)

45

Then we need to prove

Lemma 2.46. Let T be an upper triangular matrix (i.e. Tjk = 0 for j > k)such that the diagonal entries satisfy |Tjj| < 1. Then limr→∞ T

r = 0.

Example 2.47. If T is 2× 2, then

T =

(a b0 c

), T r =

(ar b a

r−cra−c

0 cr

), a 6= c(

ar b rar−1

0 cr

), a = c

→ 0 .

Exercise 2.7.11. Prove the general case of Lemma 2.46.

Step 2 Now we prove a special case of Theorem 2.42.

Lemma 2.48. Let P = (pss′)s,s′∈S be the array of transition probabilities of aMarkov chain (Xn)n≥0 on a finite state space S. The conclusion of Theorem 2.42holds if pss′ > 0 for any s, s′ ∈ S.

Clearly, the condition pss′ > 0 implies that P is irreducible and aperiodic(why?)

Proof. Let v be an eigenvector of P with eigenvalue λ 6= 1, |λ| = 1. Similarlyto the proof of Theorem 2.38,

maxs|v(s)| = max

s|∑s′

pss′v(s′)| ≤ maxs

∑s′

pss′|v(s′)|

≤ maxs|v(s)|

∑s′

pss′ = maxs|v(s)| ,

whence all the inequalities are equalities. This means that all v(s) have thesame argument, i.e. without loss of generality v(s) ≥ 0. Choose s′ such thatv(s′) > 0; then we obtain the impossible equality

λv(s′) =∑s∈S

v(s)pss′ .

This proves the first conclusion; the second one follows from Proposition 2.45.

46

Step 3 The general case of Theorem 2.42 is reduced to the special case thatwe have just proved using

Lemma 2.49. Let P be a stochastic matrix. If P is irreducible and aperiodic,

there exists r0 ≥ 1 such that p(r)ss′ > 0 for all s, s′ ∈ S and r ≥ r0.

Having the lemma in hand, we just apply the conclusion of Step 2 to P r.

Proof. First consider s′ = s. Denote

R(s) = r ≥ 1 | p(r)ss > 0 .

Clearly, if r, r′ ∈ R(s), then also r+ r′ ∈ R(s). Let us show that R(s) containsall sufficiently large natural numbers, r ≥ r∗(s). Since R(s) = 1, there existr1, · · · , rm ∈ R(s) such that

gcd(r1, · · · , rm) = 1 .

Let r ∈ N . By Euclid’s algorithm, one can find a1, · · · , am ∈ Z such thata1r1 + · · · + amrm = r. Unfortunately, aj are usually not all non-negative.Divide aj by r1 · · · rj−1rj+1 · · · rm with remainder:

aj = bjr1 · · · rj−1rj+1 · · · rm + cj , 0 ≤ cj < r1 · · · rj−1rj+1 · · · rm .

Now,r = a1r1 + · · ·+ amrm = a′1r1 + · · ·+ a′mrm ,

where

a′1 = a1 + (b2 + · · · bm)r2r3 · · · rm , a′2 = c2 , a′3 = c3 , · · · a′m = cm .

Then a′2, · · · , a′m ≥ 0. Now, if r ≥ r∗(s) = mr1 · · · rm,

m∑j=2

a′jrj =m∑j=2

cjrj < mr1 · · · rm = r∗(s) ,

hence also a′1 > 0.Now consider s 6= s′. By irreducibility, there exists r+(s, s′) > 0 such that

p(r+(s,s′))ss′ > 0. Then for r ≥ r+(s, s′) + r∗(s) we have p

(r)ss′ > 0. Thus the

conclusion of the lemma holds for

r0 = maxsr∗(s) + max

s6=s′r+(s, s′) .

"

47

2.8 Computation of equilibrium distribution

We discuss two classes of transition matrices for which equilibrium distributions can be easily

found. One class consists of transitive matrices, which are invariant under conjugation by a

sufficiently large permutation group. The second class consists of transition matrices describing

the random walk on a graph.

Transitive Markov chains The first important case is when the states areindistinguishable.

Definition 2.50. A transition array P on a fintie or countable state space S iscalled transitive (not to be confused with transient!) if for any s, s′ ∈ S thereexists a permutation σ : S ↔ S such that σ(s) = s′ and for any s1, s2 ∈ S

pσ(s1)σ(s2) = ps1s2 .

Example 2.51. Let

P =

0 2/3 1/3 00 0 2/3 1/3

1/3 0 0 2/32/3 1/3 0 0

;

all the states look the same (the transition graph is invariant under rotation), sowe expect the uniform distribution to be an (or, rather, the unique) equilibriumdistribution.

This is justified by

Proposition 2.52. If P is transitive and |S| <∞, then the uniform distribu-tion on S is an equilibrium distribution for P .

Proof. It suffices to check that∑

s∈S pss′ does not depend on s′. To this end,observe that if s′, s′′ ∈ S, there is a permutation σ of S which takes s′ to s′′,and then ∑

s∈S

pss′′ =∑s∈S

pσ(s)s′′ =∑s∈S

pσ(s)σ(s′) =∑s∈S

pss′ .

Exercise 2.8.1. Find all the equilibrium distributions of

P1 =

12

14

18

18

14

18

18

12

18

18

12

14

18

12

14

18

, P2 =

12

14 0 0 1

4 014

12 0 0 1

4 00 0 1

414 0 1

20 0 1

412 0 1

414

14 0 0 1

2 00 0 1

214 0 1

4

If you rely on Proposition 2.52, please verify the assumptions!

48

Combinatorial graphs Consider the following generalisation of the ‘doors androoms’ Markov chain of Example 2.5: we are given a collection of rooms 1, · · · , s,some of which are connected by doors. Then set

pss′ =

1/d(s) , s has a door to s′

0 ,(2.23)

where d(s) is the number of doors in room s. Graphically, we have a graph,G i.e. a set of vertices some pairs of which are connected by undirected edges(loops are not allowed), and d(s) is the degree, i.e. the number of edges incidentto the vertex i (we assume that d(s) > 0, i.e. there are no isolated vertices).To emphasise the distinction between these graphs and the transition graphs ofthe Markov chains, we tautologically call the former combinatorial graphs, andrefer to the resulting Markov chain as random walk on G.

Proposition 2.53. Let G be a finite graph. Then π(s) = d(s)/∑

s d(s) is anequilibrium distribution.

Proof. ∑s∈S

π(s)pss′ =∑s∈S

π(s)pss′ =∑

s connected to s′

d(s)∑s d(s)

1

d(s)

=d(s′)∑s d(s)

= π(s′) .

Example 2.54. Consider the graph in Figure 9. The sum of the degrees isequal to 16 (twice the number of edges); therefore the distribution

π = (1/16, 3/16, 1/8, 1/4, 1/8, 3/16, 1/16)

(where π(s) = d(s)/16) is an equilibrium distribution for the transition matrixdefined as in (2.23).

Proposition 2.55. Let G be a combinatorial graph. The corresponding transi-tion matrix is irreducible if and only if G is connected.

We leave the proof as an exercise.

Exercise 2.8.2. In Example 2.5, which proportion of the steps do we spend atroom 5 (in the long run)?

Now we ask when is the transition matrix aperiodic.A graph is called bipartite if the set of vertices can be split in two parts such

that no edge has both ends in the same part.

49

2

1

3

4

5

6

7

Figure 9: The arrangement of the rooms in Example 2.54

Example 2.56. If s is even, the cycle graph 1, · · · , s with edges

(1, 2), (2, 3), (3, 4), · · · , (s, 1)

is bipartite. Indeed, if s is even, then every edge connects an even vertex toan odd one. For odd s, the cycle graph is not bipartite. Indeed, if the cycle isbipartite, 1 belongs to one of the parts; let us call it A. Then also 3 ∈ A and5 ∈ A and so forth, and then s ∈ A. But then also 2 ∈ A, 4 ∈ A and so forth,so A contains all the vertices. Contradiction.

Proposition 2.57. A graph is bipartite if and only if it does not contain a cycleof odd length.

Proof. Clearly, if a graph contains an odd cycle, it is not bipartite. Suppose thegraph does not contain an odd cycle; let us show that it is bipartite. Withoutloss of generality, the graph is connected. Pick a vertex v; let A consists ofvertices w which are connected to v by some path of even length, and let Bbe the complement of A. If two vertices in A were connected to one another,there would a cycle of odd length (why?). Similarly, two vertices in B can notbe connected to one another. Hence the graph is bipartite.

Corollary 2.58. If a graph is not bipartite, it has a vertex s with R(s) = 1.

Proof. Pick a vertex on an odd cycle (of length ≥ 3, else the statement istrivial). Its period has to divide both 2 and an odd number, hence it is equalto 1.

Corollary 2.59. The random walk on a combinatorial graph has a limitingdistribution if and only if the graph is connected and not bipartite.

Exercise 2.8.3. Rigorously formulate and prove the relations (1.2) (random walkof a knight on a chessboard).

50

3 Markov chains – first step analysis

3.1 Absorption time and first-visit time

Absorbing state (example: an enclosure with a hungry tiger). We compute the average time

until the walker is absorbed by the tiger (the mean absorption time), first in the example and

then – for a general Markov chain. The solution is found using a well-posed system of linear

equations.

Motivating example Consider the following Markov chain. The state space isa set of enclosures in London Zoo: S = elephant(1), goat(2), pinguin(3), tiger(4).The transition probabilities are as follows:

pss′ =

13 , s 6= tiger and s′ 6= s

1 , s = s′ = tiger

0 , otherwise

In this situation it is natural to call the state (4) an absorbing state: ifXn = tigerfor some value of n, then also Xn+1 = Xn+2 = · · · = tiger. In the currentsituation, it is clear that with probability one the chain will eventually reachthe absorbing state and stay there. One of the questions we could ask is whatis the distribution of the absorption time

T = minn ≥ 0 |Xn = tiger? (3.1)

It turns out that the expectation of (3.1) can be easily computed, as follows.Clearly, T = 0 on the event X0 = 4, hence E(T |X0 = 4) = 0. By the Markovproperty and the formula of total probability,

E(T |X0 = 1) =4∑s=1

p1,sE(T |X0 = 1, X1 = s)

=4∑s=1

p1,sE(T |X1 = s)

=1

3E(T |X1 = 2) +

1

3E(T |X1 = 3) +

1

3E(T |X1 = 4)

.

Now, the conditional distribution of (X1, X2, · · · ) given X1 = s coincides withthe distribution of (X0, X1, · · · ) with the initial condition X0 = s. Therefore

E(T |X1 = s) = E(T |X0 = s) + 1 ,

51

whence

E(T |X0 = 1) =1

3E(T |X0 = 2) +

1

3E(T |X0 = 3) +

1

3E(T |X0 = 4) + 1

=1

3E(T |X0 = 2) +

1

3E(T |X0 = 3) + 1 .

Similarly,

E(T |X0 = 2) =1

3E(T |X0 = 1) +

1

3E(T |X0 = 3) + 1 ,

E(T |X0 = 3) =1

3E(T |X0 = 1) +

1

3E(T |X0 = 2) + 1 .

ThusE(T |X0 = 1) = E(T |X0 = 2) = E(T |X0 = 3) = 3 ,

and for an arbitrary initial distribution π0

ET = 3(π0(1) + π0(2) + π0(3)) = 3− 3π0(4) .

We shall now develop and generalise this argument.

Mean absorption time

Definition 3.1. Let P = (pss′)s,s′∈S be the array of transition probabilities ofa Markov chain (Xn)n≥0 on a finite or countable state space S. A state s ∈ Sis called absorbing if pss = 1. More generally, if A ⊂ S is such that, for anys ∈ A,

∑s′∈A pss′ = 1, A is called an absorbing set.

Exercise 3.1.1. Let P be a transition array. If |S| > 1 and P has an absorbingstate, it is not irreducible.

Let P be a transition array, and let A ⊂ S be a non-empty absorbing set.Denote

TA = minn ≥ 0 |Xn ∈ A .Define νs = E(TA|X0 = s). Clearly,

∀s ∈ A νs = 0 . (3.2)

For s ∈ S \ A,

νs = E(TA|X0 = s) =∑s′∈S

pss′ E(TA|X1 = s′)

=∑s′∈S

pss′ E(TA|X1 = s′) =∑s′∈S

pss′(E(TA|X0 = s′) + 1)

=∑s′∈S

pss′ E(TA|X0 = s′) + 1 =∑s′∈S

pss′νs′ + 1

(3.3)

52

and, plugging in (3.2), we finally obtain

∀s ∈ S \ A νs =∑s′∈S\A

pss′νs′ + 1 . (3.4)

Example 3.2. In the Zoo Markov chain described above,

P =

0 1

313

13

13 0 1

313

13

13 0 1

30 0 0 1

,

thus for A = 4 we get the same system of equationsν1 =

1

3ν2 +

1

3ν3 + 1

ν2 =1

3ν1 +

1

3ν3 + 1

ν3 =1

3ν1 +

2

3ν2 + 1

as above.

In this case, the system of equations (3.4) has a unique solution. The followinglemma ensures that this is the case whenever the absorption time is well defined.

Lemma 3.3. Let P be a stochastic matrix, and let A ⊂ S be a subset whichcan be reached from any point in S:

∀s ∈ S \ A ∃r > 0, s′ ∈ A : p(r)ss′ > 0 . (3.5)

Then the linear system (3.4) has a unique solution.

Remark 3.4. If (3.5) fails, i.e. A can not be reached from some state s, thenclearly νs =∞.

Proof. Without loss of generality, A = |S| − |A| + 1, · · · , |S|. Let PA be thetop-left (|S| − |A|)× (|S| − |A|) submatrix of P ; the system (3.4) can be thenwritten as (1 − PA)ν = 1. We need to show that the homogeneous system(1 − PA)v = 0 has only a trivial solution, i.e. 1 is not an eigenvalue of PA.Recall that 1 is an eigenvalue of P since the rows add up to one; now we haveerased part of the columns, therefore it is reasonable to expect that the largesteigenvalue will decrease. To make this formal, we employ an argument similarto the one in the proof of Theorem 2.38. Assume that PAv = v. Denote

PA = (pA,ss′)s,s′∈S\A =1

R

R−1∑r=0

P rA ,

53

where R is large enough to ensure that

∀s ∈ S \ A ∃1 ≤ r < R, s′ ∈ A : p(r)ss′ > 0 .

Then∑

s′∈S\A pA,ss′ < 1 for any s ∈ S \ A, whence∑s∈S\A

|v(s)| ≤∑

s,s′∈S\A

ps,s′|v(s′)| <∑s′∈S\A

|v(s′)| ,

i.e. v ≡ 0.

This completes the proof of

Theorem 3.5. Let P be a transition array on a finite state space S, and letA ⊂ S be an absorbing set satisfying (3.5). Then the mean absorption times νsare uniquely determined by (3.2) and (3.4).

Remark 3.6. The theorem applies equally well to chains without absorbingstates. Let S = 1, 2, 3, 4, and let

pss′ =

13 , s

′ 6= s

0 , otherwise(3.6)

(i.e. the tiger is replaced with a friendly crocodile). Let (Yn) be the stochasticprocess thus obtained. What is

E(T4 | Y0 = s) , where T Y4 = minn ≥ 1 |Yn = 4?

We observe that the answer is the same as for the London Zoo Markov chainabove. Indeed, let

Xn =

Yn , n < T Y44

Clearly, Xn has the same distribution as the London Zoo process, and T Y4 = TX4 .Thus

E(T Y4 |Y0 = s) = E(TX4 |X0 = s) =

0 , s = 4

3 , otherwise

This is an instance of a coupling argument.

Example 3.7. Consider the random walk on the combinatorial graph in Fig-ure 10. What is the expected time until we reach the first or the fifth vertex,when the walk starts from s ∈ 1, 2, 3, 4, 5?Solution: we are looking for νs = E(TA|X0 = s) for A = 1, 5. A is not an

54

1 2 3 4 5

Figure 10: The combinatorial graph from Example 3.7

absorbing set, however, the previous analysis still applies, as follows. Clearly,ν1 = ν5 = 0. For the other values of s, we set up the linear system

ν2 =ν1 + ν3

2+ 1 , ν3 =

ν2 + ν4

2+ 1 , ν4 =

ν3 + ν5

2+ 1

and find that ν2 = ν4 = 3 and ν3 = 4.

Exercise 3.1.2. In Example 2.5, what is the expected time of the first arrival inRoom 3? What is the expected time of the first return to Room 1?

55

3.2 Absorption and first-visit probabilities

Suppose a Markov chain has several absorbing states, so that we are eventually absorbed in oneof them. What is the probability to be absorbed in a specific absorbing state? The solution isfound using a well-posed system of linear equations.

Then several applications are discussed. First, we study the one-dimensional random walk

(restricted to a finite segment of the integers), and apply the result to the problem of Gambler’s

Ruin. Then we consider a sequence of independent random variables and compute the expected

waiting time until a given sequence of outcomes is observed.

Suppose again P is a finite transition array and A ⊂ S is a set (by thecoupling argument above, we can assume without loss of generality that Ais absorbing). Assume that (3.5) holds, i.e. for any initial condition the chaineventually reaches A (with full probability). Then we can study the distributionof the A-valued random variable XTA —(“X with subscript TA”) which tells uswhat is the first vertex in A that we visit. Denote

usa = P(XTA = a|X0 = s) , s ∈ S , a ∈ A .

If A is absorbing, these are absorption probabilities; in the general case, theycould be called first-visit probabilites.

For s ∈ A, TA = 0, hence

usa =

1, a = s

0, s ∈ A .

For s ∈ S \ A,

usa =∑s′

pss′ P(XTA = a|X1 = s′) =∑s′

pss′us′a

=∑s′∈S\A

pss′us′a +∑s′∈A

pss′usa =∑s′∈S\A

pss′us′a + psa .

According to Lemma 3.3, this linear system has a unique solution. Thus wehave proved:

Theorem 3.8. Let P be a transition array on a finite state space S, and let A ⊂S be a set satisfying (3.5). Then the probabilities usa are uniquely determinedby the relations

usa =

∑

s′∈S\A pss′us′a + psa , s ∈ S \ A0 , s ∈ A \ a1 , s = a .

(3.7)

56

Remark 3.9. In matrix form, the linear equations can be concisely written as

(1− PA)−→u a = −→p a ,

where as above PA is obtained by crossing out from P the rows and columnscorresponding to the states in A, −→u a is the vector with components us,a, and−→p a is the vector with components psa (s runs in S \ A).

Example 3.10. For the random walk on the 5-vertex combinatorial graph ofExample 3.7, let A = 1, 5. Solving the equations, we obtain:

u1,1 = 1 , u2,1 = 3/4 , u3,1 = 1/2 , u4,1 = 1/4 , u5,1 = 0

and clearly us,5 = 1− us,1.

Exercise 3.2.1. A caterpillar crawls along the edges of a cube, starting fromthe vertex marked 0 on Figure 3.2. Upon reaching a vertex, the caterpillarcontinues to one of the three edges incident to this vertex, with probability 1/3for each. The vertices marked α and β are covered with glue. What is theprobability that the caterpillar will get glued to the vertex α? What about thevertex β?

α

3

0

1

β

5

4

2

Figure 11: The caterpillar’s journey (Exercise 3.2.1)

Exercise 3.2.2. Let us modify Example 2.40 (Ehrenfest urns) as follows: thegame ends if one of the two urns is empty. What is the expected length of thegame? What is the probability that the left urn will be empty at the end ofthe game?

Exercise 3.2.3. Find a simple example of a Markov chain with a finite statespace and with two absorbing states, for which the probability that the processeventually reaches an absorbing state is strictly between 0 and 1. Do the ‘first-step analysis’ equations for finding the probability of absorption in a particularabsorbing state in your chain have a unique solution? If not, how would youidentify the correct solution in your example?

57

Exercise 3.2.4. A Markov chain with state space 1, 2, 3, 4 has transition matrix1 0 0 0

1/4 1/4 1/4 1/41/6 1/6 1/6 1/20 0 0 1

.

1. Which states are absorbing?

2. Find the probability that the chain ends up in state 1, given that it startsin state 2.

Exercise 3.2.5. A Markov chain with state-space 1, 2, 3, 4, 5 has transitionmatrix

0 2/3 1/3 0 00 1/3 1/2 0 1/60 1/2 0 1/4 1/40 0 0 1 00 0 0 0 1

.

The process starts in state 1.

1. State 5 is an absorbing state. Which other state is absorbing?

2. Calculate the probability that the process is absorbed in the state 5.

3. Calculate the expectation of the time of absorption.

4. Calculate the expectation of the number of visits to state 2 before absorp-tion.

5. Suppose that you gain £10 for each visit to state 2 and lose £5 for eachvisit to state 3. Calculate the expectation of the amount of money yougain.

3.2.1 Several applications

One-dimensional random walk and Gambler’s ruin Consider the followingMarkov chain: S = L,L+ 1, · · · ,M,

pss′ =

p , s 6= L,M , s′ = s+ 1

1− p , s 6= L,M , s′ = s− 1

1 , s = s′ ∈ L,M0 .

(3.8)

58

for some p ∈ (0, 1]. Verbally, a walker moves right with probability p andleft with probability 1− p, and gets absorbed at the boundary. Let us find theprobability νs = νs;L,M that the walker starting at s is absorbed at L. Accordingto Theorem 3.8,

uL = 1 , uM = 0 ,

us = pus+1 + (1− p)us−1 , L < s < M .(3.9)

For p = 12 it is clear that the solution is a linear function interpolating between

uL = 1 and uM = 0, i.e.

us =M − sM − L

.

(Cf. Example 3.10!) For other p, we observe that (3.9) is a (second order)linear difference equation with constant coefficients, so it is natural to look fora solution of the form ξs. The exponent ξ has to satisfy

ξ = pξ2 + (1− p) .

This equation has two solutions, ξ = 1 and ξ = ξp = 1−pp , thus we try us =

c1 + c2ξsp. Plugging in s = L,M we determine c1 and c2:

us =

ξMp −ξspξMp −ξLp

, p 6= 12

M−sM−L , p = 1

2 .(3.10)

Let us summarise our findings. Consider the random walk (Xn)n≥0 on Z withtransition probabilities

pss′ =

p , s′ = s+ 1

1− p , s′ = s− 1

0 ,

(3.11)

where p ∈ (0, 1]. For L ∈ Z, denote

TL = minn ≥ 0 |Xn = L .

If Xn never equals L, set TL =∞.

Theorem 3.11. The following holds in the setting described above.

1. For any L ≤ s ≤M , L 6= M , one has

PTL < TM |X0 = s =

ξMp −ξspξMp −ξLp

, p 6= 12

M−sM−L , p = 1

2 .

59

2. For any L ≤ s,

PTL <∞|X0 = s =

1 , p ≤ 12(

1−pp

)s−L, p > 1

2 .

Proof. We have proved the first part above. The second part follows by takingM →∞:

PTL <∞|X0 = s = limM→∞

PTL < TM |X0 = s .

As an application, assume that a gambler starts with a fortune s ∈ N andplays successive games in which he wins £1 with probability p and loses £1with probability 1− p. He quits the game if he either reaches a fortune M > sor loses all of his money. The probability of gambler’s ruin, i.e. reaching thestate 0 before M , is given by

us =

ξMp −ξspξMp −1 , p 6= 1

2

1− sM , p = 1

2 .(3.12)

Remark 3.12. Note that for p = 12 (fair game), the expectation of the fortune

of the gambler at the end of the game is equal to

M(1− us) = s ,

i.e. to the initial fortune of the gambler. The same is true if the gambler stopsafter a fixed number of steps. It turns out that this is the case for any strategywhich is honest, i.e. does not require the gambler to know the results of futuregames. A formal statement of this result lies outside the scope of our course.

Exercise 3.2.6. A gambler starts with a fortune s ∈ N and plays successivegames in which he wins £1 with probability p and loses £1 with probability1− p. He quits the game if he either reaches a fortune M > s or loses all of hismoney.

1. What is the expected number of games that the gambler plays until hequits?

2. Check that the number νs from (a) and the number us from (3.12) arerelated by

s+ (2p− 1)νs = (1− us)M .

Try to find an informal explanation of this equation.

60

Outcomes of independent trials We start with the following question. Afair coin is tossed until the sequence HTH is observed. What is the expectednumber of tosses?

It is natural to construct a Markov chain. On first glance, it seems that weneed 8 states, as described by the graph in Figure 3.2.1. On second thought,

HHT

HHH

HTT

HTH

THT

THH

TTT

TTH

Figure 12: Coin tosses. All the arrows are with probability 12 , except for the red

arrow from HTH to itself, which comes with probability 1.

we need to add seven more states for the initial steps: ∅, H, T , HH, HT , TH,and TT , so that we start from ∅, go to either H or T , and so forth. ThusTheorem 3.5 yields a system of 15 equations in 15 variables.

Simplification # 1: less states One way to simplify the problem is toreplace our Markov chain with a simpler one, which carries less information.

State 3: The last three tosses were HTH;

State 2: The last two tosses were HT ;

State 1: The last toss was an H, and the last three tosses were not HTH;

State 0: None of the above (i.e., the last two tosses were TT , or there has onlybeen one toss so far and it was a T , or there have been no tosses so far).

Notice that state i corresponds to the situation where ‘we have just seen thefirst i tosses we need’. This Markov chain has transition graph as follows.

61

0 1 2 3

1/2 1/2 1/2

1/2

1/2 1

1/2

There is just one absorbing state, namely the state 3. Let T denote the timeof first absorption. For each i ∈ 0, 1, 2, 3, let νi = E(T |X0 = i). (We wantto calculate ν0.) Then by the above corollary, ν0, ν1, ν2, ν3 satisfy the followingsystem of simultaneous linear equations:

ν0 = 1 + 12ν0 + 1

2ν1,

ν1 = 1 + 12ν1 + 1

2ν2,

ν2 = 1 + 12ν0,

ν3 = 0.

Simplifying the first two of these equations gives:

12ν0 = 1 + 1

2ν1,12ν1 = 1 + 1

2ν2,

ν2 = 1 + 12ν0.

Substituting the third of these equations into the second gives ν1 = 3 + 12ν0,

and substituting this into the first gives ν0 = 5 + 12ν0, so ν0 = 10.

Exercise 3.2.7. A standard die is rolled repeatedly until the sum of two consec-utive rolls is exactly 4.

1. Show how to model this process using a Markov chain with about 40states.

2. Show how to model this process using a Markov chain with a substantiallysmaller number of states.

3. Calculate the expectation of the number of rolls made in total.

Exercise 3.2.8. You have four fair coins. You toss them all so that they randomlyfall heads or tails. Those that fall tails in the first toss you pick up and tossagain. You toss again those that show tails after the second toss, and so on,until all show heads.

1. Find the expected number of tosses.

2. Find the probability that the final toss only involves one coin.

62

Simplification # 2: using the fairness of the game # Now we brieflydescribe an approach that simplifies the computations even further. Imaginethe following game. Before the n-th step of the game, a gambler joins the gamewith a fortune of £1. He bets his money on a head. If he loses, he is out of thegame. If he wins, he bets his £2 on a tail on the next step. If he wins again, hebets his £4 on a head. The game stops when the sequence HTH is observed.

If HTH is observed on the steps T −2, T −1, T , then the T -th gambler leavesthe game with £2 (a gain of £1), the T − 2-nd – with £8 (a gain of £7, andall the other ones are ruined. Thus the total gain of the gamblers is equal to

1 + 7− (T − 2) = 10− T .

If we believe in Remark 3.12, this random variable should have a zero mean,i.e. ET = 10. Let us justify this without appealing to Remark 3.12.

For each state s, define the value of s to be total gain of the gamblers on thelast three steps; see Figure 3.2.1. We can also define val(HT ) = 4 et cet. Now

HHT/£4

HHH/£2

HTT/£0

HTH/£10

THT/£4

THH/£2

TTT/£0

TTH/£2

Figure 13: Coin tosses (bis). The values of the states

we note thatval(s) + 1 =

∑s′

pss′ val(s′)

since with probability 12 the gamblers lose, with probability one they double

their fortune, and in exactly one of the two possibilities the new gambler gains£1. These equations look very similar to those of Theorem 3.5, and indeed, ifwe set νs = 10− val(s), we get a solution of the latter equations! Thus we haveshown that

ν0 = 10 , ν1 = 8 , ν2 = 6

63

without any computation.

Exercise 3.2.9. A monkey is typing on a typewriter whose only keys are thecapital letters A through Z of the Roman alphabet. It types a letter per second;each letter appears with probability 1

26 , independently of the previous text.What is the expected time it will take for the word ABRACADABRA to appear?

Exercise 3.2.10. Solve Exercise 3.2.7 without computations.

"

64

3.3 Recurrence and transience: finite chains

A state is recurrent if a chain starting from this state almost surely eventually returns to it.

Otherwise the state is called transient. If the state space is finite, a state is transient if and

only if a walk starting from this state is eventually absorbed in an absorbing set not containing

our state (colloquially, we return to the origin unless we are eaten by a tiger).

Consider a Markov chain (Xn) with finite or countable state space S andtransition probabilities pss′. For s ∈ S, let

τs = min n ≥ 1 |Xn = s , fs = P(τs <∞|X0 = s) .

Definition 3.13. The state s is called recurrent if fs = 1, and transient other-wise.

For example, any absorbing state is recurrent. If A ⊂ S is an absorbing set,and s ∈ S \ A is such that, for some r ≥ 1,

P(Xr ∈ A|X0 = s) > 0 , (3.13)

then s is transient. In the case of a finite state space, these simple observationsallow to determine whether a state is transient or not.

Theorem 3.14. Consider a Markov chain (Xn) with finite state space S andtransition probabilities pss′, and let s ∈ S. The following properties are equiva-lent:

1. s is transient;

2. there exists an absorbing set A ⊂ S \ s such that (3.13) holds for somer ≥ 1.

Proof. (2) =⇒ (1) is clear. To prove the converse, we assume that (3.13) failsfor all r ≥ 1, and show that s is recurrent. Let

A =a ∈ S | ∀r ≥ 1 p(r)

as = 0

be the set of all states from which one can never reach s. It is clear that A isan absorbing set (why?) Next, by assumption (the negation of (3.13)),

∀r ≥ 1 P(Xr ∈ A|X0 = s) = 0 ,

i.e. a chain starting in s will never visit A:

P(∃r ≥ 0 : Xr ∈ A|X0 = s) = 0 .

65

According to the definition of A, there exists R such that

p = mins′∈S\A

max1≤r≤R

p(r)s′s > 0 ,

i.e.mins′∈S\A

P(X1 = s ∪ X2 = s · · · ∪ XR = s |X0 = s′) ≥ p .

Denote

Ek = XkR+1 = s ∪ XkR+2 = s · · · ∪X(k+1)R = s

,

then any k ≥ 1

mins′∈S\A

P(Ek|XkR = s′) ≥ p , maxs′∈S\A

P(Ω \ Ek|XkR = s′) ≤ 1− p .

This implies that the numbers qk = P(Ω \ (E1 ∪ · · · ∪ Ek)|X0 = s) satisfy

qk = P(Ω \ (E1 ∪ · · · ∪ Ek)|X0 = s)

= P(Ω \ Ek | X0 = s ∩ (Ω \ (E1 ∪ · · · ∪ Ek−1)))qk−1

≤ mins′∈S\A

P(Ω \ Ek | XkR = s′)qk−1

≤ (1− p)qk−1 ≤ · · · ≤ (1− p)k .

In particular, qk → 0 as k →∞, i.e. s is recurrent:

P(⋃k≥1

Ek|X0 = s) = 1 .

Exercise 3.3.1. Find the recurrent and transient states of the chains the transi-tion graphs of which are depicted in Figure 14. [Answer: The recurrent states

Figure 14: The transition graphs from Exercise 3.3.1. The probabilities corre-sponding to the arrows can be arbitrary positive numbers.

are marked in green.]

66

Exercise 3.3.2. Find the recurrent and the transient states for the followingtransition matrices:

1/4 1/4 1/4 1/40 1/3 2/3 00 2/3 1/3 0

1/3 1/3 1/4 1/12

,

1/3 0 0 2/3 0 0 0 0 01/3 1/3 1/3 0 0 0 0 0 00 0 3/4 0 0 1/4 0 0 00 0 0 1/3 0 0 2/3 0 00 0 0 1/6 1/6 1/6 1/6 1/6 1/60 0 0 0 0 3/4 0 0 1/4

2/3 0 0 0 0 0 1/3 0 01/6 1/6 1/6 1/6 1/6 1/6 0 0 00 0 1/4 0 0 0 0 0 3/4

.

67

3.4 Recurrence and transience: infinite chains

A state in acountable Markov chain is transient if and only the mean number of returns to the

origin is finite. Random walk is recurrent on Z and Z2 but not on Z3.

If the state space is infinite, the criterion of Theorem 3.14 is no longer valid.

Example 3.15. Consider the random walk on Z:

S = Z , pss′ =

p , s′ = s+ 1

1− p , s′ = s− 1

0 .

The condition of Theorem 3.14 holds whenever p ∈ (0, 1). On the other hand,we

Claim. The state s = 0 is recurrent if and only if p = 12 .

Proof. Indeed, observe that

P(τ0 <∞|X0 = 0) = pP(τ0 <∞|X1 = 1) + (1− p)P(τ0 <∞|X1 = −1) .

Let us compute the probability P(τ0 <∞|X1 = 1). It is equal to the probabilitythat the process (Yn) on Z+ with the transition probabilities (3.11) is absorbedat 0. According to Theorem 3.11, part (2), this is equal to

P(τ0 <∞|X1 = 1) = min(1− pp

, 1) .

Similarly,

P(τ0 <∞|X1 = −1) = min(p

1− p, 1) .

If p = 12 , both expressions are equal to 1, hence P(τ0 <∞|X0 = 0) = 1. If p 6= 1

2 ,one of the expressions is strictly less than one, hence P(τ0 <∞|X0 = 0) < 1.

Unfortunately, the analysis above can not be directly extended to otherMarkov chains, such as the two-dimensional random walk. Therefore we de-velop another approach to recurrence.

Denote by #s the number of visits to s, i.e.

#s = # n ≥ 1 : Xn = s =∞∑n=1

1Xn=s .

Proposition 3.16. Consider a Markov chain on a finite or countable statespace S, and let s ∈ S.

68

1. If fs < 1, then #s ∼ Geom(1− fs), i.e.

P(#s = k) = (1− fs)fks , k ≥ 0 .

2. If fs = 1 (i.e. s is recurrent), P(#s =∞) = 1.

Proof. Let us show that the conditional probability

P#s ≥ k + 1 |X0 = s , #s ≥ k

is equal to fs. Denote by τ(k)s the k-th return to s:

τ (1)s = τs , τ (k+1)

s = minn > τ (k)s |Xn = s .

If #s = k, then τ(k+1)s = τ

(k+2)s = · · · = ∞. Then by the Theorem of total

probability

P(#s ≥ k + 1 |X0 = s)

=∞∑n=k

P(#s ≥ k + 1 |X0 = s , τ (k)s = n)P(τ (k)

s = n |X0 = s) .

Now, by the Markov property

P(#s ≥ k + 1 |X0 = s , τ (k)s = n) = P(#s ≥ k + 1 |Xn = s) = fs ,

henceP(#s ≥ k + 1 |X0 = s , #s ≥ k) = fs ,

as claimed. This implies that

P(#s ≥ k |X0 = s) = fks P(#s ≥ k |X0 = s , #s ≥ 0) = fks .

Exercise 3.4.1. Find the distribution of #0 in Example 3.15.

Theorem 3.17. Consider a Markov chain on a finite or countable state spaceS. A state s ∈ S is recurrent if and only if

∞∑r=1

p(r)ss =∞ . (3.14)

Proof. If (3.14) fails, then

E(#s |X0 = s) =∞∑n=1

E(1Xn=s |X0 = s)

=∞∑n=1

P(Xn = s |X0 = s) =∞∑n=1

p(n)ss <∞ ,

69

whence #s is almost surely finite, and then s is transient according to Propo-sition 3.16. If (3.14) holds,

E#s =∞ , (3.15)

whence s is recurrent (again, according to Proposition 3.16).

One-dimensional random walk Let us recover the conclusion of Example 3.15

using Theorem 3.17. Observe that p(2n+1)00 = 0, while

p(2n)00 =

(2n

n

)pn(1− p)n .

If p 6= 12 , 4p(1− p) < 1, whence(

2n

n

)pn(1− p)n ≤ 22npn(1− p)n = (4p(1− p))n

decays exponentially with n, and∑n≥1

(2n

n

)pn(1− p)n ≤

∑n≥1

(4p(1− p))n =4p(1− p)

1− 4p(1− p)<∞

and the random walk is transient. If p = 12 , we need more precise bounds.

Fact 3.18 (Stirling’s approximation).√n (n/e)n ≤ n! ≤ 100

√n (n/e)n.

This implies:(2n

n

)4−n =

(2n)!

n!24−n ≥

√2n(2n/e)2n

(100√n(n/e)n)2

4−n ≥ 1

104√n,

whence ∑n≥1

(2n

n

)4−n ≥

∑n≥1

1

104√n

=∞

and the random walk is recurrent, i.e. all of its states are recurrent.

Exercise 3.4.2. Prove that (with full probability) the one-dimensional randomwalk visits all the integer points infinitely many times.

Exercise 3.4.3. Consider the chain with state space Z and transition probabili-ties

ps s′ =

p , s′ = s+ 1 , s < 0

1− p , s′ = s− 1 , s < 0

q , s′ = s+ 1 , s ≥ 0

1− q , s′ = s− 1 , s ≥ 0 .

For which p, q ∈ [0, 1] is 0 a recurrent state?

70

Exercise 3.4.4. Consider the following array of transition probabilities on thestate space Z+ = 0, 1, 2, · · · :

pss′ =

p , s′ = s+ 2 , s > 1

1− p , s′ = s− 2 , s > 1

q , s′ = s = 0

1− q , s′ = 1− s = 1

1 , s′ = 3 , s = 1

0 , otherwise.

1. Draw the transition graph of this chain.

2. For each p, q ∈ [0, 1], determine which states are recurrent and which onesare transient.

Exercise 3.4.5. In this exercise sign(s) is +1 if s is positive and −1 if s isnegative. Consider the following array of transition probabilities on the statespace Z:

pss′ =

p , s′ = s+ sign(s) , s 6= 0

q , s′ = s− sign(s) , s 6= 0

1− p− q , s′ = −s , s 6= 0

1 , s′ = s = 0

0 , otherwise.

For each p ∈ [0, 1] and q ∈ [0, 1− p], determine which states are recurrent andwhich ones are transient.

d-dimensional random walk

Theorem 3.19 (Polya). The simple d-dimensional random walk is recurrentfor d = 1, 2 and transient for d ≥ 3.

Proof. The basic idea is as follows. Denote by N1 the number of moves in thedirection of the x-axis (out of the first n moves), by N2 – the number of movesin the direction of the y-axis, and so forth. On each step, the probability tomake a move in the direction of the x-axis is 1/d, therefore EN1 = n/d and wecan expect that usually N1 ≈ n/d. Suppose we would have an exact equalityNj = n/d for each 1 ≤ j ≤ d, and suppose moreover that all these numbersare even. The random walk returns to the origin if and only if for each j thenumber of steps in the direction of the j-th vector of the standard basis is equal

71

to the number of steps in the opposite direction. By the computation that wehave done for the one-dimensional walk, this probability is of order

d∏j=1

1√N j

≈ const

nd/2. (3.16)

Now we see that the sum of these numbers diverges if d = 1, 2 and converges ifd ≥ 3.# Let us justify this argument. We have already settled the case d = 1.

Of course, we can not say that Nj = n/d. However, we can show that usuallyNj ≥ n/(2d). Indeed, by the Chebyshev inequality,

P(Nj ≤n

2d) = P(e−tNj ≥ e−tn/(2d)) ≤ etn/(2d) E e−tNj . (3.17)

Each Nj is a sum of n independent Bernoulli(1/d) random variables νi, hence

E e−tNj = E e−t∑ni=1 νi = E

n∏i=1

e−tνi =n∏i=1

E e−tνi = (1− 1− e−t

d)n

≤ (1− (t− t2/2)/d)n ≤ e−(t−t2/2)n/d ,

where we have used that e−t ≤ 1− t+ t2/2. Therefore

(3.17) ≤ e−t(1−t)n/(2d) .

Choosing t = 12 , we obtain

P(minjNj ≤

n

2d) = P(∃1 ≤ j ≤ d : Nj ≤

n

2d) ≤ de−n/(8d) . (3.18)

This allows to complete the proof for d ≥ 3. Indeed,

P(Xn = 0) = P(Xn = 0|minjNj ≥

n

2d)P(min

jNj ≥

n

2d)

+ P(Xn = 0|minjNj ≤

n

2d)P(min

jNj ≤

n

2d)

≤ P(Xn = 0|minjNj ≥

n

2d) + P(min

jNj ≤

n

2d)

≤ P(Xn = 0|minjNj ≥

n

2d) + de−n/(8d) .

Now, by Stirling’s approximation

P(Xn = 0|N1, · · · , Nd) =d∏j=1

2−Nj

( NjNj/2

), Nj is even

0 , Nj is odd

≤d∏j=1

104√Nj

≤d∏j=1

104√Nj

,

72

hence

P(Xn = 0|minjNj ≥

n

2d) ≤

(√2d× 104

√n

)d

=C

nd/2,

and finally∞∑n=1

P(Xn = 0) ≤∞∑n=1

[C

nd/2+ de−n/(8d)

]<∞ .

Thus random walk in d ≥ 3 is transient.

Now let us show that random walk in two dimensions is recurrent. We arein trouble if one of the Nj is odd: then Xn can not be zero. Luckily, we have

Exercise 3.4.6. Prove that if n is even, P(all Nj are even) ≥ 3−d.

Now we argue as before: for even n,

P(Xn = 0) ≥ P(Xn = 0|N1 and N2 are even)P(N1 and N2 are even)

≥ 3−2 P(Xn = 0|N1 and N2 are even)

≥ 3−22∏j=1

1

100√n

=c

n,

whence ∞∑n=1

P(Xn = 0) =∞

and we have recurrence. "

Remark 3.20. The same conclusion holds for the diagonal random walk, de-fined as follows:

pss′ =

2−d , ∀1 ≤ j ≤ d |s′j − sj| = 1

0 , otherwise.

In this case the main idea of the argument is the same, but the implementationis simpler: the coordinates of the walk are independent, hence the n-step return

probability p(n)00 is exactly equal to the d-th power of the same probability in

dimension one. Thus the estimate (3.16) concludes the proof.

If you are not convinced, compare Figure 15 with Figure 1.

Exercise 3.4.7. k one-dimensional random walkers leave the origin and indepen-dently start perfoming simple random walk.

1. Describe the stochastic process as a Markov chain.

2. For which k will the walkers almost surely meet again at the origin?

3. For which k will the walkers almost surely meet again?

73

Figure 15: The first 2000 steps from the path of a three-dimensional randomwalk

74

4 Continuous-time Markov chains

In the second part of the course, we consider continuous-time stochastic pro-cesses. For us, a continuous-time stochastic process is a collection (X(t))t∈R+

ofrandom variables defined on the same probability space Ω, indexed by t ∈ [0,∞)and taking values in a state space S. We do not assume that X(t) dependscontinuously on t (this will almost never hold in our examples); however, someassumptions are implicitly made.

75

4.1 Motivating example and construction of a Poissonprocess

A lion has an infinite supply of rabbits in his den; each rabbit weighs 1 kg. Atthe end of each hour, the lion eats a rabbit with probability p > 0. Denoteby X(1)(t) the mass of the lion after t hours9. Clearly, the values of X(1)(n) atnatural n form a Markov chain with state space Z+ and transition probabilities

pss′ =

p , s′ = s+ 1

1− p , s′ = s

0 .

Equivalently,

X(1)(t) = X(1)(0) +∑

1≤n≤tξn . (4.1)

where ξj ∼ Bernoulli(p) are independent Bernoulli variables. Observe that for0 ≤ t ≤ t′

X(1)(t′)−X(1)(t) =∑t<n≤t′

ξn ,

whence

for any 0 ≤ t1 ≤ t′1 ≤ t2 ≤ t′2 ≤ · · · ≤ tJ ≤ t′J ,

the random variables (X(1)(t′j)−X(1)(tj))Jj=1 are independent and

X(1)(t′j)−X(1)(tj) ∼ Binom(bt′jc − dtje, p)(4.2)

One drawback of this model is that it assumes that the lion has a meal exactlyat the end of an hour. Perhaps a more realistic model would go as follows:every minute the lion eats a rabbit with probability p/60. Let us denote thenew process by X(60)(t). For this model, (4.2) is replaced with

for any (real) 0 ≤ t1 ≤ t′1 ≤ t2 ≤ t′2 ≤ · · · ≤ tJ ≤ t′J ,

the random variables (X(60)(t′j)−X(60)(tj))Jj=1 are independent and

X(60)(t′j)−X(60)(tj) ∼ Binom(b60t′jc − d60tje, p/60)

(4.3)

Going further, let us allow the lion to have a meal every k-th part of an hour,with probability p/k. We obtain a process X(k)(t) which satisfies

for any 0 ≤ t1 ≤ t′1 ≤ t2 ≤ t′2 ≤ · · · ≤ tJ ≤ t′J ,

the random variables (X(k)(t′j)−X(k)(tj))Jj=1 are independent and

X(k)(t′j)−X(k)(tj) ∼ Binom(bkt′jc − dktje, p/k)

(4.4)

9digestion is ignored in this model

76

10 20 30 40

5

10

15

20

25

30

Figure 16: Three models for the lion eating rabbits: X(1)(t) in blue, X(60)(t) inorange and X(t) in red. Here p = 0.7, and the lion is assumed tohave zero mass at time zero.

At this stage, it is natural to take the limit as k → ∞. Recall that a Poissonrandom variable M ∼ Poisson(r) with parameter r is a random variable takingvalues in Z+ so that

P(M = m) =rm

m!e−r .

Exercise 4.1.1. If M ∼ Poisson(r), then EM = VarM = r.

Observation. Let r > 0. Then Binom(N, r/N) → Poisson(r) as r → ∞, i.e.for any m ≥ 0 (

N

m

)(r/N)m(1− (r/N))N−m → rm

m!e−r .

Proof.

limN→∞

(N

m

)(r/N)m(1− (r/N))N−m

= limN→∞

N(N − 1) · · · (N −m+ 1)

m!(r/N)m(1− (r/N))N−m

=rm

m!× lim

N→∞

N(N − 1) · · · (N −m+ 1)

Nm× (1− (r/N))N × (1− (r/N))−m

=rm

m!× 1× e−r × 1 =

rm

m!e−r .

77

This suggests the following: if we take k → ∞, we should obtain a processX(t) which satisfies

for any 0 ≤ t1 ≤ t′1 ≤ t2 ≤ t′2 ≤ · · · ≤ tJ ≤ t′J ,

the random variables (X(t′j)−X(tj))Jj=1 are independent and

X(t′j)−X(tj) ∼ Poisson(p(t′j − tj))(4.5)

This limit is called a Poisson process.

Definition 4.1. Let λ ∈ [0,∞) (not necessarily between 0 and 1!). The Poissonpoint process with intensity λ is a process X(t) taking values in Z+ such that

for any 0 ≤ t1 ≤ t′1 ≤ t2 ≤ t′2 ≤ · · · ≤ tJ ≤ t′J ,

the random variables X(0), (X(t′j)−X(tj))Jj=1 are independent

and X(t′j)−X(tj) ∼ Poisson(λ(t′j − tj))(4.6)

A limiting procedure described similar to the one above, i.e.

X(t) = X(0) + limk→∞

∑0<n≤kt

ξ(k)n , ξ(k)

n ∼ Bernoulli(λ/k) ,

explains why such a process exists (# however, a fully rigorous argument wouldreguire discussing the notion of convergence of stochastic processes ").

We also need to explain why is the joint distribution of X(t) is uniquelydetermined by the distribution of X(0) and the property 4.6. This follows fromthe fact that for any t1 < t2 < · · · < tJ

X(t1) = X(0) + (X(t1)−X(0)) ,

X(t2) = X(0) + (X(t1)−X(0)) + (X(t2)−X(t1)) ,

X(t3) = X(0) + (X(t1)−X(0)) + (X(t2)−X(t1)) + (X(t3)−X(t2)) , · · ·

where the terms in the brackets are independent with explicitly given distribu-tion.

Convention. If not specified otherwise, we set X(0) = 0.

Example 4.2. An uranium-238 nucleus can disintegrate into thorium-234,emitting an α-particle (a helium nucleus). The probability that a given nu-cleus will disintegrate in t seconds is approximately 1.5 × 10−18 × t. However,a cubic centimiter of uranium contains approximately 2× 1023 nuclei, thereforeon average we expect it to emit 3 × 105 particles per second. Let X(t) be thenumber of particles emitted during the first t seconds. We would like to devisea model that would describe the joint distribution of (X(t))t≥0, i.e. the distri-bution of X(t) as a stochastic process. Let k 1 be a huge number, and divide

78

the time into short intervals of length 1/k seconds. The expected number ofnuclei to decay in one tiny time interval is equal to

(2× 1023)× (1.5× 10−18)/k = 3× 105/k .

If k 105, the probability that more than one nucleus will decay in a singletiny interval is

≤(

2× 1023

2

)(1.5× 10−18)2/k2 = 3.5× (105/k)2 3× 105/k ,

i.e. it is negligible with respect to the probability that a single nucleus willdecay. Therefore the random variables

ξn = # of particles decaying in the n-th interval

are approximately Bernoulli(3× 105/k). Since k is very large, we can approxi-mate X(t) by a Poisson process with parameter λ = 3× 105 sec−1. Thus

P(no particles are emitted in 0.01 msec) ≈ e−3×105×10−5 = e−3 ≈ 1/20 ,

while

P(five particles are emitted in 0.02 msec) ≈ 65

5!e−6 ≈ 1/6 .

We see that the Poisson approximation is particularly useful on time intervalsof order 0.01 msec. On much shorter intervals typically no particles are emitted,while on longer intervals the number of emitted particles is large.

Here are a few more computations that we can make.

1. What is the probability that 5 particles are emitted during an interval oflength 0.03 msec of which none are emitted during the last 0.02 msec?Answer: 35

5! e−3 × e−6 = 35

5! e−9 = 81

40e−9.

2. What is the conditional probability that 5 particles are emitted during aninterval of length 0.03 msec given that none are emitted during the last0.02 msec? Answer: 35

5! e−3 = 81

40e−3.

Exercise 4.1.2. Defects occur along the length of a filament at a rate of λ perfoot.

1. Model the situation using a Poisson process.

2. Calculate the probability that there are no defects in the first foot of thefilament.

3. Calculate the conditional probability that there are no defects in the sec-ond foot of the filament, given that the first foot contained a single defect.

79

Exercise 4.1.3. Students arrive at the lecture hall according to a Poisson processof intensity 3 students/min, starting from 8:55 a.m.

1. What is the probability that at 9:05 Sasha will start lecturing to an emptyclassroom?

2. What is the conditional probability that there will be more that 30 stu-dents present at 9:05, given that 28 students arrived by 9:00?

The Poisson process as a Markov process The Poisson process is con-structed as a limit of Markov processes. Therefore it is natural to view it asa Markov process on its own right. Here we introduce some definitions whichanticipate the general theory that we shall develop in later sections.

For each t ≤ t′ and arbitrary t1 < · · · < tk < t, n1 ≤ · · · ≤ nk ≤ n ≤ m,consider the conditional probabilities

P(X(t′) = m |X(t) = n,X(t1) = n1, · · · , X(tk) = nk)

These can be rewritten as

P(X(t′)−X(t) = m−n |X(t)−X(tk) = n−nk, · · · , X(t2)−X(t1) = n2−n1) ,

and by the property (4.6) this is equal to

P(X(t′)−X(t) = m− n) =(λ(t′ − t))m−n

(m− n)!e−λ(t′−t) .

Thus we have:

P(X(t′) = m |X(t) = n,X(t1) = n1, · · · , X(tk) = nk)

= P(X(t′) = m |X(t) = n) =(λ(t′ − t))m−n

(m− n)!e−λ(t′−t) .

The first equality is a Markov-type property. The second one shows that theMarkov chain is homogeneous: the transition probabilities depend only on thedifference between t′ and t. For each τ ≥ 0, set up an array of transitionprobabilities

Pmn(τ) = P(X(t+ τ) = m |X(t) = n)

=(λτ)m−n

(m− n)!e−λτ .

(4.7)

We clearly have:

Pmn(τ + τ ′) =∞∑`=0

Pm`(τ)P`n(τ′) . (4.8)

80

This reminds us of the relation

p(r+r′)mn =

∞∑`=0

p(r)m`p

(r′)`n

which we would have for a discrete-time Markov chain (with state space Z+).We can express P (τ) in terms of P (τ/2) and further in terms of P (τ/4) et

cet., but, unlike the case of discrete-time chains, there is no minimal unit oftime. What we could do instead is take τ → 0. We clearly have

limh→+0

Pmn(h) = Pmn(0) =

1 , n = m

0

but this is not very helpful. It is more useful to compute the derivative at zero:

limh→+0

Pmn(h)− Pmn(0)

h= Lmn =

−λ , n = m

λ , n = m+ 1

0

(4.9)

The array L = (Lmn)m,n∈Z+is called the infinitesimal generator of the Poisson

process. The reason for the name is that L together with the Markov property(4.8) characterises the Poisson process (we do not make a formal statement atthe moment).

It is convenient to rewrite (4.9) using the following

Notation. Write f(h) = o(h) if limh→+0

f(h)/h = 0. For example, h2 = o(h) and

h−3e−1/h = o(h) but√h 6= o(h) and h/10000 6= o(h).

Then

Pmn(h) = δmn − hλmn + o(h) =

1− λh+ o(h) , n = m

λh+ o(h) , n = m+ 1

o(h)

(4.10)

81

4.2 Waiting times and sojourn times of Poisson processes

Definition 4.3. Let X(t) be a Poisson process with X(0) = 0. For each n ∈ N ,the n-th arrival time (or waiting time) is defined as

Wn = mint > 0 : X(t) ≥ n .

i.e. the time of the n-th jump of X(t). The differences

Sn =

W1 , n = 1

Wn −Wn−1 , n ≥ 2

are called the sojourn or interrarival times of X(t).

Our goal is to find the distribution of these random variables.Recall that an exponential random variable with parameter λ > 0 is a random

variable ξ taking values in R+ such that

P(ξ > t) = e−λt .

Equivalently, the probability density function of ξ is equal to λe−λt. We use thenotation ξ ∼ Exp(λ). We have:

E ξ =

∫ ∞0

P(ξ > t)dt =

∫ ∞0

e−λtdt =1

λ.

Fact 4.4. If ξ ∼ Exp(λ), then the conditional distribution of ξ − t conditionedon ξ ≥ t is also Exp(λ), for any t > 0.

Theorem 4.5. Let X(t) be a Poisson process of intensity λ with X(0) = 0.The sojourn times Sn are independent, identically distributed Exp(λ) randomvariables.

Proof. For n = 1, S1 ∼ Exp(λ):

P(S1 > s) = P(X(s) = 0) = P(X(t)−X(0) = 0) = e−λ .

# For n ≥ 2, we shall check that the conditional distribution of Sn givenS1, · · · , Sn−1 is Exp(λ). To this end, observe that

P(Sn > s |S1, · · · , Sn−1)

= P(X(S1 + · · ·+ Sn−1 + s) = n− 1 |S1, · · · , Sn−1)

= P(X(Wn−1 + s)−X(Wn−1) = 0 |S1, · · · , Sn−1)

= P(X(Wn−1 + s)−X(Wn−1) = 0 |S1, · · · , Sn−1) .

82

Given S1, · · · , Sn−1, we know Wn−1 = S1 + · · · + Sn−1. Now, conditionallyon Wn−1 = τ , S1, · · · , Sn−1 depend only on (X(t))t≤τ , hence by the Markovproperty

P(X(Wn−1 + s)−X(Wn−1) = 0 |Wn−1 = τ, S1, · · · , Sn−1)

= P(X(Wn−1 + s)−X(Wn−1) = 0 |Xτ = n− 1) = e−λ ,

as claimed. "

Warning. Please exercise caution when conditioning on continual random vari-ables, careless use leads to paradoxes10.

Remark 4.6. The property which we have just proved also characterises thePoisson process: if S1, S2, · · · are independent Exp(λ) random variables, then

X(t) = max n : S1 + · · ·+ Sn ≤ t

defines a Poisson process.

Theorem 4.7. For each n, the arrival time Wn has the Erlang(n, λ) distribu-tion, i.e. its probability density function is given by

fn(t) =

λntn−1

(n−1)!e−λt , t ≥ 0

0

Proof. For each n,

P(Wn > t) = P(X(t) < n) =n−1∑m=0

λmtm

m!e−λt .

Hence the probability density function of Wn is equal to

fn(t) = − ddt

P(Wn > t) = −n−1∑m=0

[λmtm−1

(m− 1)!e−λt − λm+1tm

m!e−λt

]

= −n−1∑m=0

[λmtm−1

(m− 1)!e−λt

]+

n∑m=1

[λmtm−1

(m− 1)!e−λt

]=

λntn−1

(n− 1)!.

Exercise 4.2.1. Customers enter a shop according to a Poisson process with rate15 per minute. Let C(t) be the number of customers who have entered the shopafter it has been open for t minutes.

10See https://en.wikipedia.org/wiki/Borel\OT1\textendashKolmogorov_paradox

83

https://en.wikipedia.org/wiki/Borel\OT1\textendash Kolmogorov_paradox

1. Calculate the following:

a) P(C(20) = 3),

b) P(C(20) = 3 | C(10) = 1),

c) P(C(10) = 1, C(20) = 3),

d) P(C(20) = 1 | C(10) = 3),

e) P(C(10) = 1 | C(20) = 3).

(Your answers should be expressed in terms of powers of e, where neces-sary, but simplified as much as possible in all other ways. Briefly explainyour answers with reference to the properties of the Poisson process.)

2. What is the probability that the second customer arrives within the first15 minutes? (Again, simplify, but leave your answers in terms of powersof e where necessary.)

3. Suppose that each customer spends exactly 10 minutes in the shop. Findthe distribution of the number of customers in the shop after it has beenopen for 1 hour. Explain your reasoning.

4. What is the expected time it takes until the third customer enters thestore.

5. Divide an hour into 30 “even” minutes 0, 2, 4, . . . , 58 and 30 “odd” minutes1, 3, 5, . . . , 59. Find the distribution of the number of customers who arriveduring the even minutes on a given hour.

Exercise 4.2.2. A store is open from 10:00 in the morning to 4:00 in the after-noon. The store has a bargain hour from 1:00 to 2:00. Suppose that customersenter the store according to a Poisson process and that on average 30 customersenter per hour.

1. What is the probability distribution of the number of customers thatarrive during the bargain hour in a single day? What is the expectednumber of customers that arrive during the bargain hour in a single day?

2. Suppose that it is known that 500 customers entered the store on a partic-ular day. What is the probability distribution of the number of customersthat arrived during the bargain hour? What is the expected number ofcustomers that arrived during the bargain hour?

Exercise 4.2.3. Buses arrive at a bus stop according to a Poisson process of rate8 per hour (i.e., 2/15 per minute). I arrive at the bus stop and I take the firstbus that arrives.

84

1. What is the probability that my waiting time is less than 5 minutes?

2. What is the probability that my waiting time is between 5 and 10 minutes?

3. What is the conditional probability that my waiting time is less than 20minutes, given that no buses arrive in the first 10 minutes?

Exercise 4.2.4. I arrive at a bus stop at a random time (for concreteness, uni-formly between 9 a.m. and 10 a.m.) What is my expected waiting time underthe following assumptions:

1. I am in Zurich, and the buses are equally spaced in time, 10 minutesapart?

2. I am in London, and the buses arrive according to a Poisson process withrate 6 per hour?

How would you explain the difference in your answers to the two parts to anon-mathematician? (Notice that the expected number of buses per hour is thesame, namely 6, in both cases!)

85

4.3 Further properties of Poisson processes

Lemma 4.8 (The Superposition Lemma). Let λ, µ > 0. Let (X(t) : t ≥ 0) be aPoisson process of rate λ, and let (Y (t) : t ≥ 0) be a Poisson process of rate µ.Suppose that (X(t) : t ≥ 0) and (Y (t) : t ≥ 0) are independent of one another.Then the stochastic process (X(t) + Y (t) : t ≥ 0) is a Poisson process of rateλ+ µ.

#

Proof. This follows from the infinitesimal description of the Poisson process(see (4.10) and further below).

"

Example 4.9. Suppose alpha particles arrive at a Geiger counter according toa Poisson process of rate 4 per hour, and beta particles arrive at the Geigercounter according to a Poisson process of rate 6 per hour; suppose furtherthat these two processes are independent. What is the probability that exactly3 particles in total (alpha particles plus beta particles) arrive at the Geigercounter in a 15-minute experiment?

Solution: let X(t) denote the number of alpha particles arriving in the firstt hours of the experiment, and let Y (t) denote the number of beta particlesarriving in the first t hours of the experiment. Then (X(t) : t ≥ 0) is a Poissonprocess of rate 4, and (Y (t) : t ≥ 0) is a Poisson process of rate 6. Hence, bythe Superposition Lemma, (X(t) + Y (t) : t ≥ 0) is a Poisson process of rate4 + 6 = 10. The probability we want is therefore

P(X(0.25) + Y (0.25) = 3) = P(Poisson(10× 0.25) = 3)

= P(Poisson(2.5) = 3)

= e−2.5 2.53

3!= e−5/2 53

23 · 6=

125

48e2√e.

Theorem 4.10. Let λ > 0, and let (X(t) : t ≥ 0) be a Poisson process of rateλ. Let s > 0 and let n ∈ N. Suppose we condition on X(s) = n. Then the(conditional) joint distribution of the waiting times (W1,W2, . . . ,Wn) can befound as follows. Let U1, U2, . . . , Un be mutually independent random variableswhich are each uniformly distributed on [0, s]. Choose a permutation π ∈ Snsuch that

Uπ(1) ≤ Uπ(2) ≤ . . . ≤ Uπ(n),

i.e. the permutation π puts the random variables U1, U2, . . . , Un in non-decreasingorder of the values they take. Then the (conditional) joint distribution of thearrival times (W1,W2, . . . ,Wn) is identical to the joint distribution of

(Uπ(1), Uπ(2), . . . , Uπ(n)) .

86

This theorem says that, if, for example, particles arrive at a detector accord-ing to a Poisson process of rate λ per minute, and we are told that after 60minutes (say), exactly 100 particles have arrived, then we can ‘generate’ thefirst 100 arrival times T1, T2, . . . , T100 (with the correct conditional distribution)by taking 100 independent random variables U1, U2, . . . , U100 each uniformlydistributed on the time-interval [0, 60], and then reordering them so that theyare in increasing order. This is useful for calculation, as we will see below.

Proof. We give a proof for the special case n = 1. Conditionally on X(s) = 1,the probability of X(s′) = 1 (where 0 ≤ s′ ≤ s) is equal to

P(X(s′) = X(s) = 1)

P(X(s) = 1)=λs′e−λs

′ × e−λ(s−s′)

λse−λs=s′

s= P(U1 ≤ s′) .

Example 4.11. Suppose telephone calls arrive at a call centre according to aPoisson process of rate 100 per hour, and each caller stays on the phone forexactly 10 minutes. The call centre opens at 9 am one morning. By 10 am,just five people have called, and there were enough customer service advisors toanswer each call as soon as it came in. However, there is a power-cut at 10 amand the call centre does not reopen for the rest of the morning. Conditional onthis information, find the expected total duration of all the calls that morning.

Solution: let us measure time in minutes from 9am. Let X(t) be the numberof calls that arrive between time 0 and time t. Then (X(t) : t ≥ 0) is aPoisson process of rate 12

3 . We are told that X(60) = 5. By Theorem 4.10,conditional on X(60) = 5, we can generate the arrival-times of the five calls bygenerating five independent random variables each uniformly distributed on thetime-interval [0, 60]. Let these random variables be U1, U2, U3, U4, U5; then thearrival-times of the five calls are U1, U2, U3, U4, U5. For each i ∈ 1, 2, 3, 4, 5,let Di be the duration of the call that comes in at time Ui. We are being askedto find

E[D1 +D2 +D3 +D4 +D5].

Since each Di has the same distribution, by the linearity of expectation we have

E[D1 +D2 +D3 +D4 +D5] = 5E[D1].

So we just have to find E[D1]. If U1 ≤ 50, then D1 = 10, but if U1 > 50, thenD1 = 60− U1. Recall that the random variable U1 is uniformly distributed onthe time-interval [0, 60]. Hence,

E[D1] = E[D1 | U1 ≤ 50]× P(U1 ≤ 50) + E[D1 | U1 > 50]× P(U1 > 50)

= 10× 5060 + E[D1 | U1 > 50]× 10

60

= 253 + 1

6E[D1 | U1 > 50] (4.11)

87

Now, conditional on U1 > 50, the conditional distribution of U1 is uniformon the time-interval (50, 60]. (Check this!) So E[U1 | U1 > 50] = 55. SinceD1 = 60− U1 whenever U1 > 50, we have

E[D1 | U1 > 50] = E[60− U1 | U1 > 50] = 60− E[U1 | U1 > 50] = 60− 55 = 5.

Substituting this into (4.11) gives

E[D1] = 253 + 1

6 × 5 = 556 .

Hence,E[D1 +D2 +D3 +D4 +D5] = 5E[D1] = 5× 55

6 = 2756 .

So the expected total duration of all the calls that morning is 45 minutes and50 seconds.

Exercise 4.3.1. A fisherman catches eel according to a Poisson process of rate2 fish/hour, and mackerel – according to a Poisson process of rate 3 fish/hour.Find the probability distribution function of the time from his arrival at thefishing pier until he catches the second fish.

Exercise 4.3.2. District Line trains arrive at the Mile End underground stationaccording to a Poisson process of rate 3 trains/hour. Central Line trains arriveaccording to a Poisson process of rate 4 trains / hour.

1. What is the expected waiting time until the first train arrives?

2. Conditionally on the event that 8 trains arrive from 9am to 9:40am, whatis the probability that no trains arived between 9:10 and 9:20?

88

4.4 Definition of a general Markov chain

Recall that a (discrete-time) Markov chain was defined using the following data:a (finite or countable) state space S; an initial distribution π0 on S; an array oftransition probabilities (pss′)s,s′∈S which are non-negative, (2.1), and add up toone for each s, (2.2).

To construct a continuous-time Markov chain, we still need a state space andan initial distribution. Instead of the transition probabilities, we have an arrayof jump rates or transition intensities (λss′)s6=s′ which should satisfy:

∀s 6= s′ λss′ ≥ 0 ; (4.12)

∀s∑s′

λss′ <∞ . (4.13)

We can think of these as follows: the sum λs =∑

s′ 6=s λss′ controls the rate ofjumps (jumping ahead, we can say that the time from the arrival at a vertex suntil the departure will have an Exp(λs) distribution), whereas the probabilitiespss′ = λss′/λs determine the distribution of the state s′ to which we jump.

Example 4.12. The Poisson process of intensity λ is obtained by taking S =Z+ and

λss′ =

λ , s′ = s+ 1

0

Definition 4.13. Let S be a finite or countable state space, π0 – a proba-bility distribution on S, and (λss′)s6=s′ – an array satisfying (4.12) and (4.13).A stochastic process (X(t))t≥0 taking values in S is called a continuous-timeMarkov chain with initial distribution π0 and jump rates λss′ if

1. X(0) is distributed according to π0;

2. there exist Pss′ : Z+ → [0, 1] such that for any 0 ≤ t1 ≤ t2 ≤ · · · ≤ tJ ≤t ≤ t′ and s1, · · · , sJ , s, s′,

P(X(t′) = s′|X(t1) = s1 · · ·X(tJ) = sJ , X(t) = s) = Pss′(t′ − t) ; (4.14)

3. as h→ +0,Pss′(h) = δss′ + Lss′h+ o(h) , (4.15)

where we employed the following notation:

λs =∑s′ 6=s

λss′ , Lss′ =

λss′ , s′ 6= s

−λs , s′ = sδss′ =

0 , s′ 6= s

1 , s′ = s

89

The array L = (Lss′) is called the infinitesimal generator of X(t).

There are two important foundational questions which we shall discuss verybriefly: first, given L and π0, does such a process exist? (yes); is it unique? (no– but uniqueness holds for a subclass which we shall single out).

Example 4.14. Let S = 1, 2, and let

λss′ =

a , (s, s′) = (1, 2)

b , (s, s′) = (2, 1)

0

Pictorially, we have the graph in Figure 17. In spite of the similarity to, say,Figure 6, note that the numbers represent intensities rather than probabilities.

1 2

a

b

0 2 4 6 8 10

1

2

Figure 17: The jump rates of the two-state continuous-time Markov chain ofExample 4.14 (left) and a realisation, starting from X0 = 1 (right).

Birth processes

Example 4.15. The following example is called a birth process. Let S = Z+,and let λ0, λ1, · · · ≥ 0. Let

λss′ =

λs , s′ = s+ 1

0

The special case λs ≡ λ corresponds to the Poisson process of intensity λ.Another special case is λn = nλ for some λ > 0. This process is called thelinear or Yule birth process.

As we now show, the linear birth process describes the growth of a population(e.g. of yeast cells) where reproduction is asexual, and where population growthis not limited by a lack of food, space or other resources, and death does not

90

occur. (This is the case, for example, when one yeast cell is placed in a large vatwith plenty of space and food, in the early stages of population growth beforecells start to die.)

Lemma 4.16. Let λ > 0 Suppose a population starts with one individual (mem-ber) at time 0. Suppose each individual in the population produces offspring(who are new members of the population) via asexual reproduction, accordingto a Poisson process of rate λ. Suppose these Poisson processes are mutuallyindependent. Finally, suppose that no member of the population dies. Let X(t)be the number of individuals in the population at time t. Then X(t) is a linearbirth process with birth rate λ, and therefore birth parameters λn = nλ for alln ∈ N.

Proof. It is clear that the process we have constructed is a continuous-timeMarkov chain. Now,

P(X(t+ h) = m+ 1 | X(t) = m)

= P ∃i s.t. Yi(t+ h)− Yi(t) = 1, Yj(t+ h)− Yi(t) = 0 for all j 6= i= mP Y1(t+ h)− Y1(t) = 1, Yj(t+ h)− Yi(t) = 0 for all j 6= 1= m(λh+ o(h))(1− λh+ o(h))m−1 (using independence)

= m(λh+ o(h))(1− (m− 1)λh+ o(h)) (using a binomial expansion)

= mλh+ o(h).

Moreover, we have

P(X(t+ h) = m | X(t) = m) = P Yj(t+ h) = Yi(t) = 0 for all j= (1− λh+ o(h))m

= 1−mλh+ o(h).

Hence for m 6= m,m+ 1

P(X(t+ h) = m′ | X(t) = m) = o(h) .

Note that we have not proved that the linear birth process is uniquely deter-mined by its generator. This is indeed the case; see Proposition 4.21.

91

4.5 Construction using exponential random variables

One can construct a continuous-time Markov chain mimicking the property ofthe Poisson process stated in Remark 4.6. Take an infinite sequence of indepen-dent Exp(1) random variables En, and let pss′ = λss′/λs as above. Construct anauxiliary discrete-time Markov chain Yn with initial distribution π0 and transi-tion probabilities pss′. Then let W0 = 0 and then for each n ≥ 1

1. let Wn = Wn−1 + En/λYn−1;

2. define X(t) = Yn−1 for Wn−1 ≤ t < Wn.

If (almost surely) W∞ = limn→∞Wn =∞, and then our definition is complete.We shall assume that we are in this situation; later we shall see that this is notalways the case, i.e. our construction does not always give a Markov chain inthe sense of Definition 4.13.

Let us check that if W∞ =∞, the process that we have constructed satisfiesthe required properties. First, note that we could generate the random variablesdynamically: after definiting W0 = 0 and chosing Y0 ∼ π0,

1. generate an Exp(1) random variable En, and set Wn = Wn−1 +En/λYn−1;

2. define X(t) = Yn−1 for Wn−1 ≤ t < Wn, and choose Yn so that Yn = swith probability pYn−1s.

This shows that X(t) is indeed a Markov process, i.e. some Pss′ exist. To check(4.15), observe that for s′ 6= s

Pss′(h) = P(X(h) = s′ |X(0) = s) = P(Y1 = s′, W1 < h |X(0) = s) + o(h)

= λshpss′ + o(h) = λss′h+ o(h) .

Example 4.17. Consider Example 4.14. What is the expected time until wereach state 2? The time to reach state 2 is equal to W1 ∼ exp(a), hence theexpectation is equal to EW1 = 1/a.

Computation of transition probabilities

Example 4.18. Consider the birth process in Example 4.15. We have:

Pss(t) = P(W1 > t|X0 = s) = e−λst . (4.16)

92

Further, conditioning on W1 = t1 we have

Ps s+1(t) = P(W1 ≤ t < W2|X0 = s)

=

∫ t

0

dt1λse−λst1 × e−λs+1(t−t1)

= λse−λs+1t

∫ t

0

dt1e(λs+1−λs)t1

= λse−λs+1t

1

λs+1 − λs(e(λs+1−λs)t − 1)

=λs

λs+1 − λs(e−λst − eλs+1t) ,

(4.17)

and so forth.

Exercise 4.5.1. Compute Ps s+2(t).

Explosion Consider the birth process in Example 4.15. If λn = en, a realisa-tion of the process looks like the graph in Figure 18. What happened here? We

0.0 0.5 1.0 1.5 2.0

2

4

6

8

10

2.07 2.08 2.09 2.10 2.11 2.120

5

10

15

Figure 18: A birth process with λn = en (left); zoom in around t = 2.1 (right).

have:

EW∞ =∑n≥1

E(En/λn) =∑n≥1

1

λn=∑n≥1

e−n <∞ ,

hence W∞ <∞ almost surely.Thus the birth process that we have constructed is not a continuous-time

Markov chain in the sense of Definition 4.13. We can still turn it into a Markovchain, for example, as follows: take infinitely many independent, identicallydistributed copies of the process we have constructed; denote them X(j)(t) and

denote the corresponding explosion times W(j)∞ . Then set

X(t) = X(j)(t−J∑j=1

W (j)∞ ) ,

J∑j=1

W (j)∞ ≤ t <

J+1∑j=1

W (j)∞ .

93

The process we have defined satisfies the conditions of Definition 4.13. It is

intuitively clear that it is not unique: the definition of X(t) for t ≥ W(1)∞ can

be modified without changing the infinitesimal generator.

Definition 4.19. A Markov chain which has (with positive probability) in-finitely many transitions in finite time is called explosive and the first accumu-lation point of transition times is called the explosion time. Otherwise, it iscalled non-explosive.

As hinted by the above discussion, a Markov chain is non-explosive if andonly if it is uniquely determined by its infinitesimal generator. We do not provethis here; however, we state a sufficient condition for being non-explosive.

Proposition 4.20. A continuous-time Markov chain is non-explosive if (4.13)holds in the following stronger form:

sups

∑s′

λss′ <∞ . (4.18)

In particular, any Markov chain on a finite state space is non-explosive.

For birth processes, there is a necessary and sufficient condition.

Proposition 4.21. A birth process with parameters λn is explosive if and onlyif∑

n1λn<∞.

Proof. Similarly to the example considered above (λn = en), if∑

n1λn<∞,

EW∞ <∞ ,

whence the process is explosive. We do not prove the opposite direction.

Exercise 4.5.2. Consider the birth process with parameters λs = (s+1)α, α ∈ R.For which α is this process explosive?

Exercise 4.5.3. Consider a continuous time Markov chain with state space S =−2,−1, 0, 1, 2, .... We suppose that

λ−2−1 = λ−1−2 = 1 , λ0−1 = 2 , λ0 1 = 1 , λnn+1 = n2 (n ≥ 1)

and all the other transition rates are zero. Suppose the process is started atX(0) = 0. What is the probability of explosion?

94

4.6 Chapman–Kolmogorov equations

Recall that the multi-step transition probabilities of a discrete-time Markovchain satisfy

p(r+ρ)ss′ =

∑s′′

p(r)ss′′p

(ρ)s′′s′ .

Theorem 4.22 (Kolmogorov–Chapman relations). Suppose X(t) is a continuous-time Markov chain as in Definition 4.13.

1. For any t, τ ≥ 0 and s, s′ ∈ S,∑s′′∈S

Pss′′(t)Ps′′s′(τ) = Pss′(t+ τ) . (4.19)

2. (backward equation) For any t ≥ 0 and s, s′ ∈ S,

d

dtPss′(t) =

∑s′′∈S

Lss′′Ps′′s′(t) . (4.20)

3. (forward equation) if X(t) is non-explosive, one has for any t ≥ 0 ands, s′ ∈ S,

d

dtPss′(t) =

∑s′′∈S

Pss′′(t)Ls′′s′ . (4.21)

In matrix form, the first relation can be written as P (t)P (τ) = P (t + τ),while the other two assert that P ′(t) = LP (t) = P (t)L.

Idea of proof. The relation (4.19) follows from the total probability formula.The other two relations should follow from the first one by differentiating int or in τ . This is a complete proof for the case of finite state spaces; for aninfinite state space, the exchange of sums and derivatives requires justification,which we shall not discuss here.

The relations (4.20) and (4.21) are differential equations on P (t). In someexamples, they can be solved explicitly.

Example 4.23. Consider the birth process with parameters λn. The backwardequation says that

P ′ss′(t) =∑s′′

Lss′′Ps′′s′(t) = −λsPss′(t) + λsPs+1 s′(t) ,

while the forward equation is

P ′ss′(t) =∑s′′

Pss′′(t)Ls′′s′ = −λs′Pss′(t) + λs′−1Ps s′−1(t) .

95

If we postulate that Pss′ ≡ 0 for s′ < s, either the forward or the backwardequation yields

P ′ss(t) = −λsPss(t) =⇒ Pss(t) = e−λst .

Next,

P ′s s+1(t) = −λsPs s+1(t) + λsPs+1 s+1(t) = −λs+1Ps s+1(t) + λsPss(t) .

If we use the first equation, we get

P ′s s+1(t) = −λsPs s+1(t) + λse−λs+1t .

The trick is to look for a solutuon of the form

Ps s+1(t) = Ae−λst +Be−λs+1t .

If we postulate that Ps s+1 is of this form, we get

P ′s s+1(t) = −Aλse−λst −Bλs+1e−λs+1t ;

equating this with

−λsPs s+1(t) + λse−λs+1t = −λs(Ae−λst +Be−λs+1t) + λse

−λs+1t ,

we obtain the equations−λs+1B = −λsB + λsA+B = 0 (since Ps s+1(0) = 0)

whence

−A = B =λs

λs − λs+1.

Finally,

Ps s+1(t) =λs

λs+1 − λse−λst − λs

λs+1 − λse−λs+1t .

If we would use the forward equations, we would get the same result. Theseresults are consistent with (4.16) and (4.17).

Exercise 4.6.1. Compute Ps s+2(t) using the Chapman–Kolmogorov equations.

Exercise 4.6.2. Individuals in a population reproduce according to the followingrules. At time 0, there is one individual. While there are at most two individ-uals, each individual gives birth to offspring asexually according to a Poissonprocess of rate 4 per hour. (These Poisson processes are independent of oneanother.) The moment there are three (or more) individuals, however, all repro-duction stops and the population remains at size 3 forever. Let X(t) denote thenumber of individuals in the population at time t hours. Then (X(t) : t ≥ 0)is a birth process with X(0) = 1.

96

1. Write down the birth parameters of this birth process. Justify your an-swer, briefly.

2. Define pn(t) = P(X(t) = n) for each n ∈ N. Write down two differentialequations satisfied by p1(t) and p2(t) (and involving only p1(t), p2(t) andtheir derivatives).

3. Solve them, to derive explicit formulæ for p1(t) and p2(t).

4. Deduce an explicit formula for p3(t) in terms of t.

5. Calculate the probability that there are three members of the populationafter exactly 2 hours.

6. Find the expected time of the second birth.

Exercise 4.6.3. Individuals in a population reproduce according to the followingrules. At time 0, there are two individuals. Each time a new individual is born(and also at time 0), all the individuals in the population divide themselvesinto as many separate pairs as they can make (so bm/2c pairs if there are mindividuals), and then each pair gives birth to offspring according to a Poissonprocess of rate 2 per year, until the next birth in the population takes place. (Allof the Poisson processes are independent of one another.) Let X(t) denote thenumber of individuals in the population at time t years. Then (X(t) : t ≥ 0)is a birth process with X(0) = 2.

1. Write down the birth parameters of this birth process. Justify your an-swer, briefly.

2. Write down three differential equations satisfied by p2(t), p3(t) and p4(t)(and involving only p2(t), p3(t), p4(t) and their derivatives).

3. Solve them

4. Calculate the probability that there are at least five members of the pop-ulation after three years.

5. Find the expected time of the third birth.

Exercise 4.6.4. Let λ > 0. Let (X(t) : t ≥ 0) be a Yule (or linear) birth processwith X(0) = 1 and with birth parameters given by λn = nλ for all n ∈ N. Letpn(t) = P(X(t) = n) for all n ∈ N.

1. Write down the system of differential equations satisfied by the functionspn(t).

97

2. Check that they have solution pn(t) = e−λt(1− e−λt

)n−1, by differentiat-

ing this function and substituting in.

3. Use this to write down the distribution of X(t), and find formulas for itsexpectation and its variance, in terms of t alone.

4. Find the expected time of the third birth.

Finite state spaces For chains on finite state spaces, the equations can alsobe solved explicitly. Assume for simplicity that the infinitesimal generator isdiagonalisable: L = −MΛM−1, where Λ = diag(Λ1, · · · ,Λs) (we denoted theeigenvalues −Λ1,−Λ2 et cet to emphasise that they are non-positive). Thenthe backward equation P ′ = LP implies that

(M−1PM)′ = M−1P ′M = (M−1LM) (M−1PM) = −Λ(M−1PM) ,

whenceP = M diag(exp(−Λ1t), · · · , exp(−Λst))M

−1 .

Example 4.24. Consider the two-state chain of Example 4.14, with

L =

(−a ab −b

).

Clearly, L1 = 0 (as true for any infinitesimal generator). Hence the eigenvaluesof L are 0 and −a− b. The corresponding eigenvectors are 1 and

(a−b). Thus

L =

(1 a1 −b

)(0 00 −a− b

)(1 a1 −b

)−1

and

P (t) =

(1 a1 −b

)(1 00 e−(a+b)t

)(1 a1 −b

)−1

=

(1 a1 −b

)(1 00 e−(a+b)t

)(b

a+baa+b

1a+b −

1a+b

)=

(b+ae−(a+b)t

a+ba−ae−(a+b)t

a+bb−be−(a+b)t

a+ba+be−(a+b)t

a+b

).

98

4.7 Long-term behaviour

We briefly state the continuous-time counterparts of the definitions and resultswe saw for discrete-time Markov chains.

Definition 4.25. Let L be the infinitesimal generator of a continuous-timeMarkov chain on a finite or countable state space S, and let P (t) be the corre-sponding transition probabilities.

1. A probability distribution π on S is called an equilibrium distribution if,for any t, πP (t) = π. This is equivalent to πL = 0.

2. A probability distribution π on S is called a limiting distribution if, forany initial distribution π0,

limt→∞

π0P (t) = π .

Similarly to discrete-time chains, a limiting distribution is the unique equi-librium distribution. Unlike the discrete-time case, a unique equilibrium distri-bution is always the limiting distribution. Similarly to discrete-time chains, ifthere is a limiting distribution, it is also the unique equilibrium distribution.#

Definition 4.26. An infinitisemal generator L of a continuous-time Markovchain with finite or countable state space S is called irreducible if for any s, s′ ∈S there exists t > 0 such that Pss′(t) > 0.

Exercise 4.7.1. Let s, s ∈ S. Show that if, for some t > 0, we have Pss′(t) > 0,then this holds for all t > 0.

Finite chains

Theorem 4.27. If an infinitisemal generator L of a continuous-time Markovchain with finite state space S is irreducible then it has a limiting distribution.

Exercise 4.7.2. A certain type component has two states: 0 = OFF and 1 =OPERATING. In state 0, the process remains there a random length of time,which is exponentially distributed with parameter α, and then moves to state1. The time in state 1 is exponentially distributed with parameter β, afterwhich the process returns to state 0. The system has two of these components,A and B, with distinct parameters (see Table 1). In order for the system tooperate, at least one of components A and B must be operating (a parallelsystem). Assume that the component stochastic processes are independent ofone another. Determine the long run probability that the system is operatingby

99

Component operating failure rate repair rate

A αA βAB αB βB

Table 1: The parameters of the machines in Exercise 4.7.2

1. Considering each component separately as a two-state Markov chain andusing their statistical independence;

2. Considering the system as a four-state Markov chain.

Exercise 4.7.3. A bear of little brain named Pooh is fond of honey. Bees pro-ducing honey are located in three trees: tree A, tree B and tree C. Tending tobe somewhat forgetful, Pooh goes back and forth among these three honey treesrandomly (in a Markovian manner) as follows: From A, Pooh goes next to B orC with probability 1/2 each; from B, Pooh goes next to A with probability 3/4,and to C with probability 1/4; from C, Pooh always goes next to A. Pooh staysa random time at each tree. (Assume that the travel times can be ignored.)Pooh stays at each tree an exponential length of time, with the mean being 5hours at tree A or B, but with mean 4 hours at tree C.

1. Construct a continuous-time Markov chain modelling this situation.

2. Find the limiting proportion of time that Pooh spends at each honey tree.

"

100

4.8 Birth-death processes and applications

A birth-death process with birth parameters λ0, λ1, λ2, . . . and death parametersµ1, µ2, . . . is a continuous-time process X(t) on state space Z with jump rates

λs s′ =

λs , s′ = s+ 1

µs , s′ = s− 1

0

We state without proof a sufficient condition for non-explosion:

∞∑n=0

n∑k=0

µk+1 · · ·µnλk · · ·λn

=∞ . (4.22)

Noite that in the case of pure birth processes, this condition boils down to∑λ−1n =∞. We shall only consider processes satisfying (4.22).

Example 4.28. The following rough model describes the spread of infectionin a large and interacting population. The time that it takes for a personto recover from Covid-19 is an exponential random variable with parameterµ. Once a person is infected, he starts infecting other people according to aPoisson process with parameter λ. Then the size X(t) of the currently infectedpopulation is described by a birth-death process with parameters λn = λnand µn = µn. The most basic question is whether the infection dies out, i.e.whether X(t) is almost surely absorbed at n = 0, or it grows to infinity, or thedistribution of X(t) has a non-trivial limit.

The backward equations for birth-death processes areP ′0n(t) = −λ0P0n(t) + λ0P1n(t) ,

P ′mn(t) = µmPm−1n(t)− (λm + µm)Pmn(t) + λmPmn(t) (n ≥ 1) .

The forward equations areP ′m0(t) = −λ0Pm0(t) + µ1Pm1(t) ,

P ′mn(t) = λn−1Pmn−1(t)− (λn + µn)Pmn(t) + µn+1Pmn+1(t) (m ≥ 1) .

Unlike the case of pure birth processes, neither the forward nor the backwardequations have a triangular form that would allow for a general explicit solution.However, some computations are possible.

Example 4.29. Consider Example 4.28. We have

EX ′(t) = (λ− µ)EX(t) ,

101

whenceEX(t) = e(λ−µ)t EX(0) .

This immediately impleis that the process almost surely dies out if R = λµ < 1.

One can show that the process also dies out if R = 1. On the other hand, ifR > 1, the process alsmost surely grows exponentially.

For general birth-death processes, we shall determine whether a limiting dis-tribution exists, and find it if it does.

If π is the limiting distribution, it is in particular a unique equilibrium dis-tribution. Therefore πL = 0, i.e.

−λ0π(0) + µ1π(1) = 0

λs−1π(s− 1)− (λs + µs)π(s) + µs+1π(s+ 1) = 0 , s ≥ 1(4.23)

Claim. Fix π(0), and assume that all λs and µs are strictly positive. Theunique solution of the equations (4.23) is fiven by

π(1) =λ0

µ1π(0) , π(2) =

λ0λ1

µ1µ2π(0) , · · · , π(s) =

λ0λ1 · · ·λs−1

µ1µ2 · · ·µsπ(0) , · · ·

Proof. We argue by induction on s. The statement is clear for s = 1. If it holdsfor s− 1 and s, the second equation of (4.23) yields

π(s+ 1) =1

µs+1

((λs + µs)

λ0 · · ·λs−1

µ1 · · ·µs− λs−1

λ0 · · ·λs−2

µ1 · · ·µs−1

)π(0)

=λ0 · · ·λsµ1 · · ·µs+1

.

Thus if the series

1 +λ0

µ1+λ0λ1

µ1µ2+ · · ·+ λ0λ1 · · ·λs−1

µ1µ2 · · ·µs+ · · · (4.24)

converges, there is a unique equilibrium distribution (which is also a limitingdistribution), while if the series (4.24) there is no equilibrium distribution. Inthe latter case, X(t)→∞ almost surely as t→∞.

Example 4.30 (Queue with one server). Suppose λn = λ for all n ≥ 0 andµn = µ for all n ≥ 1. This can be used to model the following situation:customers wait in a queue to be served by a server; the cutomers arrive accordingto a Poisson process of rate λ (i.e. the times between arrivals are independentExp(λ)), the time it takes to serve a customer is Exp(µ)-distributed, and theserving times of the customers are independent.

102

In this case, π(s) = (λ/µ)sπ(0), for s ≥ 0. For a limiting distribution weneed

∑∞s=0 π(s) = 1, i.e.,

π(0)∞∑s=0

(λµ

)s= 1.

If λ ≥ µ then the sum does not converge and we do not have a limiting distribu-tion. (The expected length of the queue will tend to infinity with time.) If λ < µthen the geometric series converges to µ/(µ−λ), and hence π(0) = 1−λ/µ. Inthis case there is a limiting distribution given by

P(Q(t) = s)→ π(s) =(

1− λ

µ

)(λµ

)s.

The limiting distribution of Q(t) is essentially geometric; specifically

Q(t) ∼ Geom(1− λ/µ) .

At equilibrium, letting % = λ/µ,

E(Q(t)) =∞∑s=1

π(s)s

=∞∑s=1

%s(1− %)s = %

∞∑s=1

%s−1(1− %)s =%

1− %.

Exercise 4.8.1 (Queue with two servers). Suppose λn = λ, µ1 = µ and µ2 =µ3 = · · · = 2µ. For which λ and µ does there exist a limiting distribution? Findit! What is its mean?

Exercise 4.8.2 (Queue with infinitely many servers). Suppose λn = λ and µn =nµ. Prove that for any λ, µ > 0 there exists an equilibrium distribution, andfind it. What is its mean?

103

random processes (mth6141), 2020{2021...random processes (mth6141), 2020{2021 sasha sodin november...

Documents