topics in discrete-time stochastic processes€¦ · provide solid foundations in random processes...

Nicolas Privault

Topics in Discrete-Time StochasticProcesses

With random interactions and algorithms

MAS728 Part I

This version: August 27, 2020 https://www.ntu.edu.sg/home/nprivault/indext.html

https://www.ntu.edu.sg/home/nprivault/indext.htmlhttps://www.ntu.edu.sg/home/nprivault/indext.html

Preface

Data science, machine learning and artificial intelligence are now ubiqui-tous in engineering applications as well as in everyday life. They rely on pow-erful algorithms which are sometimes regarded as opaque when fed withinput data and producing output for analysis. This aim of this book is toprovide solid foundations in random processes for a thorough understand-ing of such algorithms. This includes mastering basic concepts in stochas-tic processes, such as the Markov property, and to provide exposure to vari-ous applications in machine learning and data science, such as unsupervisedlearning and reinforcement learning. Concrete application examples via ex-periments and simulations based on computer codes.

Nicolas Privault2020

" v


https://www.ntu.edu.sg/home/nprivault/indext.html

vi "



Contents

1 A Summary of Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Markov (1856-1922) property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Hitting probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Mean hitting and absorption times . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Mean number of returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Phase-Type Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1 Negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Hitting time distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 Mean hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Random Walks and Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Distribution and hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Return times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Hitting probabilities and hitting times . . . . . . . . . . . . . . . . . . . . . . . 423.4 Recurrence of symmetric random walks . . . . . . . . . . . . . . . . . . . . . 453.5 Reflected random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.6 Conditioned random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Random Walk with Cookies on the Half-Line . . . . . . . . . . . . . . . . . . . . 634.1 Hitting times and probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2 Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.3 Mean hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4 Number of eaten cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.5 Conditional results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Convergence to Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.1 Limiting and stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . 815.2 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . . . . . . . . 875.3 Transition bounds and contractivity . . . . . . . . . . . . . . . . . . . . . . . . . 90

vii

5.4 Distance from stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.5 Mixing times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.6 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6 Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.1 Voter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.2 Irreducibility, aperiodicity and recurrence . . . . . . . . . . . . . . . . . . . 1046.3 Limiting and stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . 1066.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7 Meta Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.1 Markovian modeling of ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.2 Limiting and stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . 1157.3 Matrix perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.4 State ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8 Probabilistic Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.1 Pattern recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.2 Winning streaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.3 Synchronizing automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328.4 Average synchronization times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.1 Emission matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1399.2 Hidden state estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419.3 Forward-backward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.4 Baum-Welch algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1469.5 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15310.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15310.2 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15610.3 Example - deterministic MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15910.4 Example - stochastic MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

11 Poisson Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16911.1 Spatial Poisson (1781-1840) processes . . . . . . . . . . . . . . . . . . . . . . . . 16911.2 Poisson functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17111.3 Poisson stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17311.4 Transformations of Poisson measures . . . . . . . . . . . . . . . . . . . . . . . . 177

12 Boolean Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18112.1 Boolean-Poisson model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18112.2 Coverage probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18512.3 Percolation in the Boolean model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

viii "



Appendix: Probability Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . 191

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Author index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

" ix



x "



List of Figures

1.1 NGram Viewer output for the term "Markov chains" . . . . . . . . . . . . . . . . . . 2

3.1 Graph of 120 = (107 ) paths with n = 5 and k = 2* . . . . . . . . . . . . . . . . . . . . . 353.2 Sample path of the random walk (Sn)n∈N . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3 Sample paths of the random walk (Sn)n∈N . . . . . . . . . . . . . . . . . . . . . . . . . 443.4 Sample path of the random walk (Sn)n∈N . . . . . . . . . . . . . . . . . . . . . . . . . . 453.5 Last return to state 0 at time k = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.6 Sample path of the random walk (Sn)n∈N . . . . . . . . . . . . . . . . . . . . . . . . . . 553.7 Sample paths of the random walk (Sn)n∈N . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.1 Random walk with cookies* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.1 Global balance condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Detailed balance condition (discrete time) . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 Graphs of distance to stationarity d(n) and its upper bound (1− θ)n . . . . . 98

6.1 Simulation of the voter model with N = 199* . . . . . . . . . . . . . . . . . . . . . . . 1016.2 Simulation of the voter model with N = 199* . . . . . . . . . . . . . . . . . . . . . . . 1026.3 Simulation of the voter model with N = 3* . . . . . . . . . . . . . . . . . . . . . . . . . 1096.4 Probability of a majority of “+” as a function of p ∈ [0, 1] . . . . . . . . . . . . . . 111

7.1 Stationary distribution as a function of ε ∈ [0, 1] . . . . . . . . . . . . . . . . . . . . . . 1207.2 Mean return times as functions of ε ∈ [0, 1] . . . . . . . . . . . . . . . . . . . . . . . . . . 121

9.1 Plot of estimate$hmm$emissionProbs[1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . 1489.2 Plot of estimate$hmm$emissionProbs[2,] . . . . . . . . . . . . . . . . . . . . . . . . . . . 1489.3 Plot of η 7→ (M0,η/M0,”_”)((M1,”_” −M1,η)/M1,”_”)2 . . . . . . . . . . . . . . . . . . 1499.4 Frequency analysis of alphabet letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1499.5 Plot of estimate$hmm$emissionProbs[3,] . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.6 Plot of estimate$hmm$emissionProbs[1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10.1 Optimal value function with p = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

xi

10.2 Optimal value function with 0 < p < 1/2 . . . . . . . . . . . . . . . . . . . . . . . . . . 16610.3 Optimal value function with p = 1/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16710.4 Optimal value function with 1/2 < p 6 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 167

11.1 Poisson point process samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17011.2 Poisson point process sample on the plane . . . . . . . . . . . . . . . . . . . . . . . . . . 17111.3 Gamma Lévy density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

12.1 Boolean model in dimension three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18212.2 Boolean model with uniform radii in dimension two . . . . . . . . . . . . . . . . . . 18312.3 Boolean model with exponential radii in dimension two . . . . . . . . . . . . . . . 18312.4 Two-dimensional Boolean model built on a Poisson point process on

R2 × [0, ∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18412.5 Two-dimensional Boolean model built on a Poisson point process on

R2 × [0, ∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18412.6 One-dimensional Boolean model built on a Poisson point process on

R× [0, ∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18512.7 Set Cz in the one-dimensional Boolean model . . . . . . . . . . . . . . . . . . . . . . . . 18612.8 Set Ccz in the one-dimensional Boolean model . . . . . . . . . . . . . . . . . . . . . . . . 18712.9 Three-dimensional Boolean model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19012.10Three-dimensional Boolean model with clipped spheres . . . . . . . . . . . . . . . 190

* Animated figures (work in Acrobat Reader).

xii "



Chapter 1A Summary of Markov Chains

This chapter begins with a review of discrete-time Markov processes andtheir matrix-based transition probabilities, followed by the computation ofhitting probabilities, mean hitting and absorption times, and mean numberof returns.

1.1 Markov (1856-1922) property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Hitting probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Mean hitting and absorption times . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Mean number of returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.1 Markov (1856-1922) property

Consider a discrete-time stochastic process (Zn)n∈N taking val-ues in a discrete state space S, typically S = Z. The S-valued pro-cess (Zn)n∈N is said to be Markov, or to have the Markov propertyif, for all n > 1, the probability distribution of Zn+1 is determinedby the state Zn of the process at time n, and does not depend onthe past values of Zk for k = 0, 1, . . . , n− 1.

" 1


https://en.wikipedia.org/wiki/Andrey_Markovhttps://en.wikipedia.org/wiki/Markov_propertyhttps://www.ntu.edu.sg/home/nprivault/indext.html

Fig. 1.1: NGram Viewer output for the term "Markov chains".

In other words, for all n > 1 and all i0, i1, . . . , in, j ∈ S we have

P(Zn+1 = j | Zn = in, Zn−1 = in−1, . . . , Z0 = i0) = P(Zn+1 = j | Zn = in).

In particular, we have

P(Zn+1 = j | Zn = in, Zn−1 = in−1) = P(Zn+1 = j | Zn = in),

andP(Z2 = j | Z1 = i1, Z0 = i0) = P(Z2 = j | Z1 = i1).

In addition, we have the following facts.

1. Chain rule. The first order transition probabilities can be used for the com-plete computation of the probability distribution of the process as

P(Zn = in, Zn−1 = in−1, . . . , Z0 = i0) (1.1)= P(Zn = in | Zn−1 = in−1) · · ·P(Z1 = i1 | Z0 = i0)P(Z0 = i0),

or

P(Zn = in, Zn−1 = in−1, . . . , Z1 = i1 | Z0 = i0) (1.2)= P(Zn = in | Zn−1 = in−1) · · ·P(Z1 = i1 | Z0 = i0),

i0, i1, . . . , in ∈ S.

2. By the law of total probability applied under P to the events Ai0 := {Z1 =i1 and Z0 = i0}, i0 ∈ S, we also have

P(Z1 = i1) = ∑i0∈S

P(Z1 = i1, Z0 = i0)

= ∑i0∈S

P(Z1 = i1 | Z0 = i0)P(Z0 = i0), i1 ∈ S, (1.3)

2 "



and similarly, under the probability measure P(· | Z0 = i0),

P(Z2 = i2 | Z0 = i0) = ∑i1∈S

P(Z2 = i2 and Z1 = i1 | Z0 = i0)

= ∑i1∈S

P(Z2 = i2 | Z1 = i1)P(Z1 = i1 | Z0 = i0),

i0, i2 ∈ S.

Transition matrices

In the sequel we will assume that the Markov chain (Zn)n∈N is time homoge-neous, i.e. the probability P(Zn+1 = j | Zn = i) is independent of n ∈ N. Inthis case, the random evolution of a Markov chain (Zn)n∈N is determined bythe data of

Pi,j := P(Z1 = j | Z0 = i), i, j ∈ S, (1.4)which coincides with the probability P(Zn+1 = j | Zn = i) for all n ∈N. Thisdata can be encoded into a matrix indexed by S2 = S× S, called the transitionmatrix of the Markov chain:[

Pi,j]

i,j∈S = [ P(Z1 = j | Z0 = i) ]i,j∈S ,

also written on S := Z as

P =[

Pi,j]

i,j∈S =

. . ....

......

...... . .

.

· · · P−2,−2 P−2,−1 P−2,0 P−2,1 P−2,2 · · ·

· · · P−1,−2 P−1,−1 P−1,0 P−1,1 P−1,2 · · ·

· · · P0,−2 P0,−1 P0,0 P0,1 P0,2 · · ·

· · · P1,−2 P1,−1 P1,0 P1,1 P1,2 · · ·

· · · P2,−2 P2,−1 P2,0 P2,1 P2,2 · · ·. . .

......

......

.... . .

.

The notion of transition matrix is related to that of (weighted) adjacency ma-trix in graph theory.

By the law of total probability applied to the probability measure P(· | Z0 = i),we also have the equality

" 3



∑j∈S

P(Z1 = j | Z0 = i) = P⋃

j∈S{Z1 = j}

∣∣∣∣ Z0 = i = P(Ω) = 1, i ∈N,

(1.5)i.e. the rows of the transition matrix satisfy the condition

∑j∈S

Pi,j = 1,

for every row index i ∈ S.

Using the matrix notation P = (Pi,j)i,j∈S and Relation (1.1), we find

P(Zn = in, Zn−1 = in−1, . . . , Z0 = i0) = Pin−1,in · · · Pi0,i1P(Z0 = i0),

i0, i1, . . . , in ∈ S, and we rewrite (1.3) as

P(Z1 = i) = ∑j∈S

P(Z1 = i | Z0 = j)P(Z0 = j) = ∑j∈S

Pj,iP(Z0 = j), i ∈ S.

(1.6)A state k ∈ S is said to be absorbing if Pk,k = 1.

In case the Markov chain (Zk)k∈N takes values in the finite state spaceS = {0, 1, . . . , N}, its (N + 1) × (N + 1) transition matrix will simply havethe form

[Pi,j]

06i,j6N =

P0,0 P0,1 P0,2 · · · P0,N

P1,0 P1,1 P1,2 · · · P1,N

P2,0 P2,1 P2,2 · · · P2,N

......

.... . .

...

PN,0 PN,1 PN,2 · · · PN,N

.

Still on the finite state space S = {0, 1, . . . , N}, Relation (1.6) can be restatedin the language of matrix and vector products using the shorthand notation

η = πP, (1.7)

where

η := [P(Z1 = 0), . . . , P(Z1 = N)] = [η0, η1, . . . , ηN ] ∈ RN+1

is the row vector “distribution of Z1”,

π := [P(Z0 = 0), . . . , P(Z0 = N)] = [π0, . . . , πN ] ∈ RN+1

4 "



is the row vector representing the probability distribution of Z0, and

[η0, η1, . . . , ηN ] = [π0, . . . , πN ]×

P0,0 P0,1 P0,2 · · · P0,N

P1,0 P1,1 P1,2 · · · P1,N

P2,0 P2,1 P2,2 · · · P2,N

......

.... . .

...

PN,0 PN,1 PN,2 · · · PN,N

. (1.8)

Higher-order transition probabilities

As noted above, the transition matrix P is a convenient way to record P(Zn+1 =j | Zn = i), i, j ∈ S, into an array of data.

However, it is much more than that, as already hinted at in Relation (1.7).Suppose for example that we are interested in the two-step transition proba-bility

P(Zn+2 = j | Zn = i).This probability does not appear in the transition matrix P, but it can be com-puted by first step analysis, applying the law of total probability to the proba-bility measure P( · | Zn = i), as follows.

i) 2-step transitions. Denoting by S the state space of the process, we have

P(Zn+2 = j | Zn = i) = ∑l∈S

P(Zn+2 = j and Zn+1 = l | Zn = i)

= ∑l∈S

P(Zn+2 = j | Zn+1 = l)P(Zn+1 = l | Zn = i)

= ∑l∈S

Pi,l Pl,j

= [P2]i,j, i, j ∈ S,

where we used (1.4).

ii) k-step transitions. More generally, we have the following result.

Proposition 1.1. For all k ∈N we have the relation

[ P(Zn+k = j | Zn = i) ]i,j∈S =[[Pk]i,j

]i,j∈S

= Pk. (1.9)

" 5



Proof. We prove (1.9) by induction. Clearly, the statement holds for k = 0 andk = 1. Next, for all k ∈N, we have

P(Zn+k+1 = j | Zn = i) = ∑l∈S

P(Zn+k+1 = j and Zn+k = l | Zn = i)

= ∑l∈S

P(Zn+k+1 = j, Zn+k = l, Zn = i)P(Zn = i)

= ∑l∈S

P(Zn+k+1 = j, Zn+k = l, Zn = i)P(Zn+k = l and Zn = i)

P(Zn+k = l and Zn = i)P(Zn = i)

= ∑l∈S

P(Zn+k+1 = j | Zn+k = l and Zn = i)P(Zn+k = l | Zn = i)

= ∑l∈S

P(Zn+k+1 = j | Zn+k = l)P(Zn+k = l | Zn = i)

= ∑l∈S

P(Zn+k = l | Zn = i)Pl,j.

We have just checked that the family of matrices

[ P(Zn+k = j | Zn = i) ]i,j∈S , k > 1,

satisfies the same induction relation as the matrix power Pk, i.e.

[Pk+1]i,j = ∑l∈S

[Pk]i,l Pl,j,

and the same initial condition, hence by induction on k > 0 the equality

[ P(Zn+k = j | Zn = i) ]i,j∈S =[[Pk]i,j

]i,j∈S

= Pk

holds not only for k = 0 and k = 1, but also for all k ∈N. �

The matrix product relation

Pm+n = PmPn = PnPm,

which reads

[Pm+n]i,j = ∑l∈S

[Pm]i,l [Pn]l,j = ∑l∈S

[Pn]i,l [Pm]l,j, i, j ∈ S,

can now be interpreted as

P(Zn+m = j | Z0 = i) = ∑l∈S

P(Zm = j | Z0 = l)P(Zn = l | Z0 = i)

6 "



= ∑l∈S

P(Zn = j | Z0 = l)P(Zm = l | Z0 = i),

i, j ∈ S, which is called the Chapman-Kolmogorov equation.

1.2 Hitting probabilities

Starting with this section, we introduce the systematic use of the first stepanalysis technique. The main applications of first step analysis are the com-putation of hitting probabilities, mean hitting and absorption times, meanfirst return times, and average number of returns to a given state.

Hitting probabilities

Let us consider a Markov chain (Zn)n∈N with state space S, and let A ⊂ Sdenote a subset of S as in the following example with S = {0, 1, 2, 3, 4, 5} andA := {0, 2, 4}.

0

1

2

34

5

A

We are interested in the first time TA the chain hits the subset A, with

TA := inf{n > 0 : Zn ∈ A}, (1.10)

with TA = 0 if Z0 ∈ A and

TA = ∞ if {n > 0 : Zn ∈ A} = ∅,

i.e. if Zn /∈ A for all n ∈N. In case the transition matrix P satisfies

" 7


https://en.wikipedia.org/wiki/Sydney_Chapman_(mathematician)https://en.wikipedia.org/wiki/Andrey_Kolmogorovhttps://www.ntu.edu.sg/home/nprivault/indext.html

Pk,l = 1{k=l} for all k, l ∈ A, (1.11)

the set A ⊂ S is said to be absorbing.

We aim at computing the hitting probabilities

gl(k) = P(ZTA = l and TA < ∞ | Z0 = k)

of hitting the set A ⊂ S through state l ∈ A starting from k ∈ S, whereZTA represents the location of the chain (Zn)n∈N at the hitting time TA. Thiscomputation can be achieved by first step analysis, using the law of total prob-ability for the probability measure P(· | Z0 = k) and the Markov property, asfollows.

Proposition 1.2. Assume that (1.11) holds. The hitting probabilities

gl(k) := P(ZTA = l and TA < ∞ | Z0 = k), k ∈ S, l ∈ A,

satisfy the equation

gl(k) = ∑m∈S

Pk,mgl(m) = Pk,l + ∑m∈S\A

Pk,mgl(m), (1.12)

k ∈ S \ A, l ∈ A, under the boundary conditions

gl(k) = P(ZTA = l and TA < ∞ | Z0 = k) = 1{k=l} =

1 if k = l,0 if k 6= l, k, l ∈ A,which hold since TA = 0 whenever one starts from Z0 ∈ A.Proof. For all k ∈ S \ A we have TA > 1 given that Z0 = k, hence we can write

gl(k) = P(ZTA = l and TA < ∞ | Z0 = k)= ∑

m∈SP(ZTA = l and TA < ∞ | Z1 = m and Z0 = k)P(Z1 = m | Z0 = k)

= ∑m∈S

P(ZTA = l and TA < ∞ | Z1 = m)P(Z1 = m | Z0 = k)

= ∑m∈S

Pk,mP(ZTA = l and TA < ∞ | Z1 = m)

= ∑m∈S

Pk,mP(ZTA = l and TA < ∞ | Z0 = m)

= ∑m∈S

Pk,mgl(m), k ∈ S \ A, l ∈ A,

where the relation

P(ZTA = l and TA < ∞ | Z1 = m) = P(ZTA = l and TA < ∞ | Z0 = m)

8 "



follows from the fact that the probability of ruin does not depend on the initaltime the counter is started. �

Equation (1.12) can be rewritten in matrix form as

gl = Pgl , l ∈ A,

where gl is a column vector, i.e.

gl(0)...gl(N)

=

P0,0 P0,1 P0,2 · · · P0,N

P1,0 P1,1 P1,2 · · · P1,N

P2,0 P2,1 P2,2 · · · P2,N

......

.... . .

...

PN,0 PN,1 PN,2 · · · PN,N

×

gl(0)...gl(N)

, l ∈ A,

under the boundary condition

gl(k) = P(ZTA = l and TA < ∞ | Z0 = k) = 1{l}(k) =

1, k = l,0, k 6= l,for all k, l ∈ A. See e.g. Theorem 3.4 page 40 of Karlin and Taylor (1981) for auniqueness result for the solution of such equations.

In addition, the hitting probabilities gl(k) = P(ZTA = l and TA < ∞ | Z0 =k) satisfy the condition

1 = P(TA = ∞ | Z0 = k) + ∑l∈A

P(ZTA = l and TA < ∞ | Z0 = k

)= P(TA = ∞ | Z0 = k) + ∑

l∈Agl(k), (1.13)

for all k ∈ S.

Note that we may have P(TA = ∞ | Z0 = k) > 0, for example in thefollowing chain with A = {0} and k = 1 we have

P(T0 = ∞ | Z0 = 1) = 0.2.

" 9



0 1 210.8

0.21

More generally, if f : A −→ R is a function on the domain A, letting

gA(k) := IE[

f (ZTA)∣∣Z0 = k] = ∑

l∈Af (l)P(ZTA = l and TA < ∞ | Z0 = k), k ∈ S,

by linearity we find the Dirichlet problem

gA = PgA,

under the boundary condition

gA(k) = f (k), k ∈ A,

see e.g. Theorem 5.3 in Privault (2008) for a continuous-time analog.

1.3 Mean hitting and absorption times

We are now interested in the mean hitting time

hA(k) := IE[TA | Z0 = k]

it takes for the chain to hit the set A ⊂ S starting from a state k ∈ S. In case theset A is absorbing we refer to hA(k) as the mean absorption time into A startingfrom the state k . Clearly, since TA = 0 whenever X0 = k ∈ A, we have

hA(k) = 0, for all k ∈ A.

Proposition 1.3. The mean hitting times

hA(k) := IE[TA | Z0 = k], k ∈ S,

satisfy the equations

hA(k) = 1 + ∑l∈S

Pk,lhA(l) = 1 + ∑l∈S\A

Pk,lhA(l), k ∈ S \ A, (1.14)

under the boundary conditions

hA(k) = IE[TA | Z0 = k] = 0, k ∈ A.Proof. For all k ∈ S \ A, by first step analysis using the law of total expectationapplied to the probability measure P(· | Z0 = l), and the Markov property

10 "



we have

hA(k) = IE[TA | Z0 = k]= ∑

l∈SIE[TA1{Z1=l} | Z0 = k

]=

1P(Z0 = k)

∑l∈S

IE[TA1{Z1=l}1{Z0=k}

]=

1P(Z0 = k)

∑l∈S

IE[TA1{Z1=l and Z0=k}

]= ∑

l∈SIE[TA | Z1 = l and Z0 = k

]P(Z1 = l and Z0 = k)P(Z0 = k)

= ∑l∈S

IE[TA | Z1 = l and Z0 = k]P(Z1 = l | Z0 = k)

= ∑l∈S

IE[1 + TA | Z0 = l]P(Z1 = l | Z0 = k)

= ∑l∈S

(1 + IE[TA | Z0 = l])P(Z1 = l | Z0 = k)

= ∑l∈S

P(Z1 = l | Z0 = k) + ∑l∈S

P(Z1 = l | Z0 = k) IE[TA | Z0 = l]

= 1 + ∑l∈S

P(Z1 = l | Z0 = k) IE[TA | Z0 = l]

= 1 + ∑l∈S

Pk,lhA(l), k ∈ S \ A,

with the relation

IE[TA | Z1 = l, Z0 = k] = 1 + IE[TA | Z0 = l].

Hence we have

hA(k) = 1 + ∑l∈S

Pk,lhA(l), k ∈ S \ A, (1.15)

under the boundary conditions

hA(k) = IE[TA | Z0 = k] = 0, k ∈ A, (1.16)

the Condition (1.16) implies that (1.15) becomes

hA(k) = 1 + ∑l∈S\A

Pk,lhA(l), k ∈ S \ A.

�

" 11



The equations (1.14) can be rewritten in matrix form as

hA =

1...1

+ PhA,by considering only the rows with index k ∈ Ac = S \ A, under the boundaryconditions

hA(k) = 0, k ∈ A.

First return times

Consider now the first return time Trj to state j ∈ S, defined by

Trj := inf{n > 1 : Xn = j},

withTrj = ∞ if Xn 6= j for all n > 1.

Note that in contrast with the definition (1.10) of the hitting time Tj, the infi-mum is taken here for n > 1 as it takes at least one step out of the initial statein order to return to state j . Nevertheless we have Tj = Trj if the chain is

started from a state i different from j .

We denote byµj(i) := IE

[Trj∣∣X0 = i] > 1

the mean return time to state j ∈ S after starting from state i ∈ S.

Mean return times can also be computed by first step analysis. We have

µj(i) = IE[Trj | X0 = i

]= 1×P(X1 = j | X0 = i)

+ ∑l∈Sl 6=j

P(X1 = l | X0 = i)(1 + IE

[Trj | X0 = l

])= Pi,j + ∑

l∈Sl 6=j

Pi,l(1 + µj(l))

= Pi,j + ∑l∈Sl 6=j

Pi,l + ∑l∈Sl 6=j

Pi,lµj(l)

= ∑l∈S

Pi,l + ∑l∈Sl 6=j

Pi,lµj(l)

12 "



= 1 + ∑l∈Sl 6=j

Pi,lµj(l),

henceµj(i) = 1 + ∑

l∈Sl 6=j

Pi,lµj(l), i, j ∈ S. (1.17)

See e.g. Theorem 5.9 page 49 of Karlin and Taylor (1981) for a uniquenessresult for the solution of such equations.

Hitting times vs return times

Note that the time Tri to return to state i is always at least one by construc-tion, hence µi(i) > 1 and cannot vanish, while we always have hi(i) = 0 asboundary condition, i ∈ S. On the other hand, for i 6= j we have by definition

hi(j) = IE[Tri | X0 = j

]= IE[Ti | X0 = j] = µi(j),

and for i = j the mean return time µj(j) can be computed from the hittingtimes hj(l), l 6= j, by first step analysis as

µj(j) = ∑l∈S

Pj,l(1 + hj(l))

= Pj,j + ∑l 6=j

Pj,l(1 + hj(l))

= ∑l∈S

Pj,l + ∑l 6=j

Pj,lhj(l)

= 1 + ∑l 6=j

Pj,lhj(l), j ∈ S, (1.18)

which in agreement with (1.17) when i = j.

Markov chains with rewards

Let (Xn)n∈N be a Markov chain with state space S and transition matrix P =(Pi,j)i,j∈S. Derive the first step analysis equation for the value function

V(k) := IE

[∑n>0

R(Xn)∣∣∣ X0 = k

], k ∈ S, (1.19)

" 13



defined as the total accumulated reward obtained after starting from state k ,where R : S→ R is a reward function.* We have

V(k) = IE

[∑n>0

R(Xn)∣∣∣ X0 = k

]

= ∑m∈S

Pk,m

(R(k) + IE

[∑n>1

R(Xn)∣∣∣ X1 = m

])

= ∑m∈S

Pk,mR(k) + ∑m∈S

Pk,m IE

[∑n>1

R(Xn)∣∣∣ X1 = m

]

= R(k) ∑m∈S

Pk,m + ∑m∈S

Pk,m IE

[∑n>0

R(Xn)∣∣∣ X0 = m

]= R(k) + ∑

m∈SPk,mV(m), k ∈ S.

1.4 Mean number of returns

LetRj := ∑

n>11{Xn=j} (1.20)

denote the number of returns to state j by the chain (Xn)n∈N.

In the sequel, we let

pij = P(Trj < ∞ | X0 = i

)= P(Xn = j for some n > 1 | X0 = i), i, j ∈ S,

denote the probability of return to state j in finite time† starting from statei .

Proposition 1.4. The probability distribution of the number of returns Rj to state jgiven that {X0 = i} is given by

P(Rj = m | X0 = i) =

1− pij, m = 0,

pij × (pjj)m−1 × (1− pjj), m > 1,* We always assume that R(·) and (Xn)n∈N are such that the series in (1.19) are convergent.† When i 6= j , pij is the probability of visiting state j in finite time after starting fromstate i .

14 "



In case i = j, Ri is simply the number of returns to state i starting from statei , and it has the geometric distribution

P(Ri = m | X0 = i) = (1− pii)(pii)m, m > 0. (1.21)

Proposition 1.5. We have

P(Rj < ∞ | X0 = i) =

1− pij, if pjj = 1,

1, if pjj < 1.

We also have

P(Rj = ∞ | X0 = i) =

pij, if pjj = 1,

0, if pjj < 1.

In particular, if pjj = 1, i.e. state j is recurrent, we have

P(Rj = m | X0 = i) = 0, m > 1,

and in this case,P(Rj < ∞ | X0 = i) = P(Rj = 0 | X0 = i) = 1− pij,

P(Rj = ∞ | X0 = i) = 1−P(Rj < ∞ | X0 = i) = pij.

On the other hand, when i = j we find

P(Ri < ∞ | X0 = i) = ∑m>0

P(Ri = m | X0 = i)

= (1− pii) ∑m>0

(pii)m

=

0, if pii = 1,1, if pii < 1, (1.22)hence

P(Ri = ∞ | X0 = i) =

1, if pii = 1,0, if pii < 1, (1.23)i.e. the number of returns to a recurrent state is infinite with probability one.

The notion of mean number of returns will be needed for the classification ofstates of Markov chains in Chapter 1.5. When pjj < 1 we have P(Rj < ∞ |X0 = i) = 1 and

" 15



IE[Rj | X0 = i] = ∑m>0

mP(Rj = m | X0 = i) (1.24)

= (1− pjj)pij ∑m>1

m(pjj)m−1

=pij

1− pjj, (1.25)

henceIE[Rj | X0 = i] < ∞ if pjj < 1.

If pj,j = 1 then IE[Rj | X0 = i] = ∞ unless pi,j = 0, in which case P(Rj = 0 |X0 = i) = 1 and IE[Rj | X0 = i] = 0. In particular, when i = j we find the nextproposition.

Proposition 1.6. The mean number of returns to state i is given by

IE[Ri | X0 = i] =pii

1− pii,

and it is finite, i.e. IE[Ri | X0 = i] < ∞, if and only if pii < 1.

1.5 Classification of states

In this chapter we present the notions of communicating, transient and re-current states, as well as the concept of irreducibility of a Markov chain. Wealso examine the notions of positive and null recurrence, periodicity, and ape-riodicity of such chains. Those topics will be important when analysing thelong-run behavior of Markov chains in the next chapter.

Communicating states

Definition 1.7. A state j ∈ S is to be accessible from another state i ∈ S, andwe write i 7−→ j , if there exists a finite integer n > 0 such that

[Pn]i,j = P(Xn = j | X0 = i) > 0.

In other words, it is possible to travel from i to j with non-zero probabilityin a certain (random) number of steps. We also say that state i leads to state

j , and when i 6= j we have

P(Trj < ∞ | X0 = i

)> P

(Trj 6 n | X0 = i

)> P(Xn = j | X0 = i) > 0.

16 "



In case i 7−→ j and j 7−→ i we say that i and j communicate andwe write i ←→ j .

The binary relation “←→” is a called an equivalence relation as it satisfies thefollowing properties:

a) Reflexivity:

For all i ∈ S we have i ←→ i .

b) Symmetry:

For all i, j ∈ S we have that i ←→ j is equivalent to j ←→ i .

c) Transitivity:

For all i, j, k ∈ S such that i ←→ j and j ←→ k , we havei ←→ k .

The equivalence relation ‘←→” induces a partition of S into disjoint classesA1, A2, . . . , Am such that S = A1 ∪ · · · ∪ Am, anda) we have i ←→ j for all i, j ∈ Aq, and

b) we have i 6←→ j whenever i ∈ Ap and j ∈ Aq with p 6= q.

The sets A1, A2, . . . , Am are called the communicating classes of the chain.

Definition 1.8. A Markov chain whose state space is made of a unique communi-cating class is said to be irreducible, otherwise the chain is said to be reducible.

Recurrent states

Definition 1.9. A state i ∈ S is said to be recurrent if, starting from state i , thechain will return to state i within a finite (random) time, with probability 1, i.e.,

pi,i := P(Tri < ∞ | X0 = i

)= P(Xn = i for some n > 1 | X0 = i) = 1. (1.26)

The next Proposition 1.10 uses the mean number of returns Ri to state idefined in (1.20), and relies on the geometric distribution (1.21) of Ri giventhat X0 = i.

Proposition 1.10. For any state i ∈ S, the following statements are equivalent:

" 17


https://en.wikipedia.org/wiki/Equivalence_relation#Definitionhttps://en.wikipedia.org/wiki/Equivalence_relation#Definitionhttps://www.ntu.edu.sg/home/nprivault/indext.html

i) the state i ∈ S is recurrent, i.e. pi,i = 1,ii) the number of returns to i ∈ S is a.s.* infinite, i.e.

P(Ri = ∞ | X0 = i) = 1, i.e. P(Ri < ∞ | X0 = i) = 0, (1.27)

iii) the mean number of returns to i ∈ S is infinite, i.e.

IE[Ri | X0 = i] = ∞, (1.28)

iv) we have∑n>1

f (n)i,i = 1, (1.29)

where f (n)i,i := P(Tri = n | X0 = i

), n > 1, is the distribution of Tri .

As a consequence of (1.28), we have the following result.

Corollary 1.11. A state i ∈ S is recurrent if and only if

∑n>1

[Pn]i,i = ∞,

i.e. the above series diverges.

Corollary 1.11 admits the following consequence, which shows that any statecommunicating with a recurrent state is itself recurrent. In other words, re-currence is a class property, as all states in a given communicating class arerecurrent as soon as one of them is recurrent.

Corollary 1.12. Class property. Let j ∈ S be a recurrent state. Then any statei ∈ S that communicates with state j is also recurrent.

A communicating class A ⊂ S is therefore recurrent if any of its states isrecurrent.

Transient states

A state i ∈ S is said to be transient when it is not recurrent, i.e., by (1.26),

pi,i = P(Tri < ∞ | X0 = i

)= P(Xn = i for some n > 1 | X0 = i) < 1, (1.30)

orP(Tri = ∞ | X0 = i

)> 0.

Proposition 1.13. For any state i ∈ S, the following statements are equivalent:i) the state i ∈ S is transient, i.e. pi,i < 1,

* “Almost surely”.

18 "



ii) the number of returns to i ∈ S is a.s.* finite, i.e.

P(Ri = ∞ | X0 = i) = 0, i.e. P(Ri < ∞ | X0 = i) = 1, (1.31)

iii) the mean number of returns to i ∈ S is finite, i.e.

IE[Ri | X0 = i] < ∞, (1.32)

In other words, a state i ∈ S is transient if and only if

P(Ri < ∞ | X0 = i) > 0,

which by (1.22) is equivalent to

P(Ri < ∞ | X0 = i) = 1,

i.e. the number of returns to state i ∈ S is finite with a non-zero probabilitywhich is necessarily equal to one. As a consequence of Corollary 1.11 we alsohave the following result.

Corollary 1.14. A state i ∈ S is transient if and only if

∑n>1

[Pn]i,i < ∞,

i.e. the above series converges.

Similarly to Corollary 1.12, Corollary 1.14 admits the following consequence,which shows that any state communicating with a transient state is itself tran-sient. Therefore, transience is also a class property, as all states in a given com-municating class are transient as soon as one of them is transient.

Corollary 1.15. Class property. Let j ∈ S be a transient state. Then any statei ∈ S that communicates with state j is also transient.

Proof. If a state i ∈ S communicates with a transient state j then i is alsotransient (otherwise the state j would be recurrent by Corollary 1.12). �

A communicating class A ⊂ S is therefore transient if any of its states istransient. By Corollary 1.14 and the relation

∑n>0

[Pn]i,j =[(Id − P)−1

]i,j, i, j ∈ S, (1.33)

we find that a finite state space is transient if the matrix Id − P is invertible.However, 0 is clearly an eigenvalue of Id − P with eigenvector [1, 1, . . . , 1],therefore Id − P is not invertible and finite chains admit at least one recurrent* “Almost surely”.

" 19



state, as noted in Theorem 1.16.

See also Althoen et al. (1993) for an application of (1.33) to the Snakes andLadders game.

Clearly, any absorbing state is recurrent, and any state that leads to an ab-sorbing state is transient.

We close this section with the following result for Markov chains with finitestate space.

Theorem 1.16. Let (Xn)n∈N be a Markov chain with finite state space S. Then(Xn)n∈N has at least one recurrent state.

Positive vs null recurrence

The expected time of return (or mean recurrence time) to a state i ∈ S isgiven by

µi(i) : = IE[Tri∣∣X0 = i]

= ∑n>1

nP(Tri = n

∣∣ | X0 = i)= ∑

n>1n f (n)i,i .

Recall a state i is recurrent when P(Tri < ∞

∣∣X0 = i) = 1, i.e. when the ran-dom return time Tri is almost surely finite starting from state i . However, therecurrence property yields no information on the finiteness of its expectationµi(i) = IE

[Tri∣∣X0 = i].

Definition 1.17. A recurrent state i ∈ S is said to be:a) positive recurrent if the mean return time to i is finite, i.e.

µi(i) = IE[Tri∣∣X0 = i] < ∞,

b) null recurrent if the mean return time to i is infinite, i.e.

µi(i) = IE[Tri∣∣X0 = i] = ∞.

The following Theorem 1.18 shows in particular that a Markov chain withfinite state space cannot have any null recurrent state, cf. e.g. Corollary 2.3 inKijima (1997), and also Corollary 3.7 in Asmussen (2003).

Theorem 1.18. Assume that the state space S of a Markov chain (Xn)n∈N is finite.Then all recurrent states in S are also positive recurrent.

20 "



As a consequence of Definition 1.8, Corollary 1.12, and Theorems 1.16 and1.18 we have the following corollary.

Corollary 1.19. Let (Xn)n∈N be an irreducible Markov chain with finite state spaceS. Then all states of (Xn)n∈N are positive recurrent.

Periodicity and aperiodicity

Given a state i ∈ S, consider the sequence

{n > 1 : [Pn]i,i > 0}

of integers which represent the possible travel times from state i to itself.

Definition 1.20. The period of the state i ∈ S is the greatest common divisor of thesequence

{n > 1 : [Pn]i,i > 0}.A state having period 1 is said to be aperiodic, which is the case in particularif Pi,i > 0, i.e. when a state admits a returning loop with nonzero probability.

In particular, any absorbing state is both aperiodic and recurrent. A recurrentstate i ∈ S is said to be ergodic if it is both positive recurrent and aperiodic.

If [Pn]i,i = 0 for all n > 1 then the set {n > 1 : [Pn]i,i > 0} is empty andby convention the period of state i is defined to be 0. In this case, state i isalso transient.

Note also that if{n > 1 : [Pn]i,i > 0}

contains two distinct numbers that are relatively prime to each other (i.e. theirgreatest common divisor is 1) then state i aperiodic.

Proposition 1.21 shows that periodicity is a class property, as all states in agiven communicating class have same periodicity.

Proposition 1.21. Class property. All states that belong to a same communicatingclass have the same period.

A Markov chain is said to be aperiodic when all of its states are aperiodic.Note that any state that communicates with an aperiodic state becomes itselfaperiodic. In particular, if a communicating class contains an aperiodic statethen the whole class becomes aperiodic.

" 21



Notes

See e.g. Chen and Hong (2012) for statistical testing of the Markov propertyin time series.

22 "



Chapter 2Phase-Type Distributions

Phase-type distributions (Neuts (1981)) are used in insurance, risk manage-ment and actuarial science to model heavy-tailed random claim sizes appear-ing for example in reserve and surplus processes. They provide a class ofprobability distributions depending on a wide range of parameters which canbe used for fitting to actual data and are suitable for Monte Carlo simulation.See e.g. Latouche and Ramaswami (1999) for further reading.

2.1 Negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Hitting time distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 Mean hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.1 Negative binomial distribution

Given p ∈ [0, 1], consider a two-state Markov chain (Xn)n∈N on the statespace {0, 1}, with transition matrix

P =[

1 0q p

],

with q := 1− p. We note thati) State 0 is absorbing, i.e. P(Xn+1 = 0 | Xn = 0) = 1, and

ii) The first hitting time

T0 := inf{n > 0 : Xn = 0}

of state 0 starting from state 1 has the geometric distribution p givenby

P(T0 = k | X0 = 1

)= (1− p)pk−1, k > 1.

" 23


https://en.wikipedia.org/wiki/Discrete_phase-type_distributionhttps://www.ntu.edu.sg/home/nprivault/indext.html

More generally, given d > 1, consider a d + 1-state Markov chain (Xn)n∈N onthe state space {0, 1, . . . , d}, with transition matrix

P =

1 0 0 · · · 0 0q p 0 · · · 0 00 q p · · · 0 0...

.... . . . . .

......

0 0 · · · q p 00 0 · · · 0 q p

,

with q := 1− p. In this case,i) State 0 is absorbing, i.e. P(Xk+1 = 0 | Xk = 0) = 1, and

ii) The first hitting time T0 of state 0 starting from state d has the (shifted)negative binomial distribution

P(T0 = k | X0 = d

)=

(k− 1k− d

)(1− p)d pk−d, k > d.

The idea of phase-type distributions is to generalize the above modeling byconsidering a discrete-time Markov chain (Xn)n∈N on {0, 1, . . . , d} having dtransient* states {1, 2, . . . , d}, and 0 as absorbing state. The geometric andnegative binomial distributions have power tails, hence they are examples ofheavy-tailed probability distributions.

Clearly, the first row of P has to be [1, 0, . . . , 0] because state 0 is absorb-ing, and the remaining of the matrix can take the form [α, Q]. Hence the tran-sition matrix P of the chain (Xn)n∈N takes the form

P =[

Pi,j]

06i,j6d =

1 0 · · · 0α1 Q1,1 · · · Q1,dα2 Q2,1 · · · Q2,d...

.... . .

...αd Qd,1 · · · Qd,d

=[

1 0α Q

],

where α is the column vector α = [α1, α2, . . . , αd]> and Q is the d× d matrix

Q =

Q1,1 · · · Q1,d... . . . ...Qd,1 · · · Qd,d

.* Here the transience condition implies that P(T0 < ∞ | X0 = i) = 1 for all i = 1, 2, . . . , d, itwill be ensured by assuming that Id −Q is invertible, see § 1.5 for details.

24 "



In addition, every row of the d× (d + 1) matrix [α, Q] has to add up to one,i.e. we have the relation

αk +d

∑l=1

Qk,l = 1, k = 1, . . . , d, (2.1)

which is used to show the following lemma.

Lemma 2.1. We have the relation α = (Id −Q)e, where

Id :=

1 0 · · · 0 00 1

. . . 0 0...

.... . . . . .

...0 0 · · · 1 00 0 · · · 0 1

denotes the d× d identity matrix and

e :=

11...1

.Proof. Relation (2.1) can be rewritten as

(Id −Q)e =

1−Q1,1 −Q1,2 · · · −Q1,d−Q1,1 1−Q1,2 · · · −Q1,d

......

. . ....

−Qd,1 · · · Qd,d−1 1−Qd,d

×

11...1

=

1−Q1,1 − · · · −Q1,d1−Q2,1 − · · · −Q2,d

...1−Qd,1 − · · · −Qd,d

=

α1α2...

αd

,which shows that α = (Id −Q)e. �

The next proposition can be intuitively interpreted by noting that since state0 is absorbing, the n-step behavior of the chain on the states {1, 2, . . . , d} is

entirely determined by the matrix Qn since when 1 6 i, j 6 d, as one cannot

" 25



travel through state 0 when moving from i to j in any n > 1 number oftime steps.

Proposition 2.2. We have

Pn =[

1 0(Id −Qn)e Qn

], n > 0. (2.2)

Proof. We proceed by induction on n > 0. Clearly, the conclusion holds forn = 0, and also at the rank n = 1 since α = (Id − Q)e. Next, we assume thatthe relation (2.2) holds at the rank n > 0. In this case, we have

Pn+1 = P× Pn

=

[1 0α Q

]×[

1 0(Id −Qn)e Qn

]=

[1 0

α + Q(Id −Qn)e Qn+1]

=

[1 0

(Id −Qn+1)e Qn+1]

,

since

α + Q(Id −Qn)e = (Id −Q)e + (Q−Qn+1)e = (Id −Qn+1)e.

�

2.2 Hitting time distribution

In this section we show that the probability distribution of the first hittingtime T0 of state 0 after starting from state i > 1 can be computed using thevector α and the matrix Q.

Proposition 2.3. For all i = 1, 2, . . . , d we have

P(T0 = n | X0 = i

)= [Qn−1α]i, n > 1. (2.3)

Proof. We partition the event {T0 = n} as

{T0 = n

}=

d⋃k=1

{Xn = 0 and Xn−1 = k

},

and note that, since [Pn−1]i,k = [Qn−1]i,k and αk = Pk,0, k = 1, 2, . . . , d, wehave

26 "



P(T0 = n | X0 = i

)= P

(d⋃

k=1

{T0 = n, Xn−1 = k

} ∣∣∣ X0 = i)

=d

∑k=1

P(T0 = n, Xn−1 = k | X0 = i)

=d

∑k=1

P(Xn = 0, Xn−1 = k | X0 = i)

=d

∑k=1

P(Xn = 0 | Xn−1 = k)P(Xn−1 = k | X0 = i)

=d

∑k=1

[Pn−1]i,kPk,0

=d

∑k=1

αk[Qn−1]i,k

= [Qn−1α]i, n > 1.

�

From now on we assume that the initial distribution of X0 is given by thed-dimensional vector

β =

β1β2...

βd

,i.e.

βi = P(X0 = i), i = 1, 2, . . . , d,

with P(X0 = 0) = 0.

Proposition 2.4. The probability distribution of T0 is given by

P(T0 = n) = β>Qn−1α, n > 1.

Proof. By (2.3), we have

P(T0 = n) =d

∑i=1

P(T0 = n | X0 = i)P(X0 = i)

=d

∑i=1

βi[Qn−1α

]i

=d

∑i=1

βi

d

∑k=1

αkQn−1i,k

" 27



= β>Qn−1α, n > 1.

�

Since the states {1, 2, . . . , d} are transient, Corollary 1.14 shows that the matrixinverse (Id − sQ)−1 exists and is given by the series

(Id − sQ)−1 = ∑k>0

skQk, s ∈ (−1, 1]. (2.4)

We note that T0 is finite with probability one, since

P(T0 < ∞) =∞

∑n=0

P(T0 = n)

=∞

∑n=1

β>Qn−1α

= β>∞

∑n=0

Qnα

= β>(Id −Q)−1α= β>(Id −Q)−1(Id −Q)e

=d

∑i=1

βi

=d

∑i=1

P(X0 = i)

= 1.

Corollary 2.5. The cumulative distribution function P(T0 6 n) of T0 is given interms of the vectors β, e, and the matrix Qn as

P(T0 6 n) = 1− β>Qne, n > 0. (2.5)Proof. We have

P(T0 6 n) =n

∑k=1

P(T0 = k)

=n

∑k=1

β>Qk−1α

= β>(Id −Qn)(Id −Q)−1α= β>(Id −Qn)e= 1− β>Qne, n > 1.

28 "



�

Alternatively, using the relation α = (Id − Q)e and a telescopic sum, Rela-tion (2.5) can be recovered as

P(T0 6 n) =n

∑k=1

P(T0 = k)

=n

∑k=1

β>Qk−1α

=n

∑k=1

β>Qk−1(Id −Q)e

=n−1∑k=0

β>Qke−n

∑k=1

β>Qke

= β>e− β>Qne= 1− β>Qne, n > 1.

We can also rewrite P(T0 6 n) as the probability of not being in any statei = 1, 2, . . . , d at time n, as

P(T0 6 n) = 1−d

∑k=1

P(Xn = k)

= 1−d

∑k=1

d

∑i=1

βiP(Xn = k | X0 = i)

= 1−d

∑k=1

d

∑i=1

βiQni,k

= 1− β>Qne, n > 0.

Alternatively, we could also write

P(T0 6 n) = P(Xn = 0)

=d

∑i=1

βiP(Xn = 0 | X0 = i)

=d

∑i=1

βi[Pn]i,0

=d

∑i=1

βi[(Id −Qn)e]i

" 29



=d

∑i=1

βi −d

∑i=1

βi[Qne]i

= 1− β>Qne, n > 0.

We refer to the Appendix for the definition of the Probability GeneratingFunction (PGF) of a discrete random variable.

Proposition 2.6. The probability generating function

GT0(s) := IE[sT0]= ∑

k>0skP(T0 = k)

of T0 is given byGT0(s) = sβ

>(Id − sQ)−1(Id −Q)e. (2.6)Proof. By (2.7) we have P(T0 < ∞) = 1, hence

GT0(s) = ∑k>0

skP(T0 = k)

= P(X0 = 0) + ∑k>1

skβ>Qk−1α

= s ∑k>0

skβ>Qkα

= sβ> ∑k>0

skQkα

= sβ>(Id − sQ)−1α= sβ>(Id − sQ)−1(Id −Q)e,

where we applied Lemma 2.1 and (2.4). �

We note that

P(T0 < ∞) = GT0(1) = β>(Id −Q)−1(Id −Q)e = β>e = 1, (2.7)

which shows that state 0 is reached in finite time with probability one.

2.3 Mean hitting times

Using the probability generating function s 7→ GT0(s), we compute the firstand second moments E[T0] and E[T20 ] of T0. By differentiating (2.6) with re-spect to s we have

G′T0(s) = β>(Id − sQ)−1α + sβ>Q(Id − sQ)−2α,

30 "



hence*

IE[T0] = G′T0(1−)

= β>(Id −Q)−1α + β>Q(Id −Q)−2α= β>(Id −Q)(Id −Q)−2α + β>Q(Id −Q)−2α= β>(Id −Q)−2α= β>(Id −Q)−1e.

By differentiating (2.6) further, we also have

G′′T0(s) = β>Q(Id − sQ)−2α + β>Q(Id − sQ)−2α + 2sβ>Q2(Id − sQ)−3α,

hence

IE[T0(T0 − 1)] = G′′T0(1−)

= 2β>Q(Id −Q)−2α + 2β>Q2(Id −Q)−3α= 2β>Q(Id −Q)−3α,= 2β>Q(Id −Q)−2e,

and

IE[T20 ] = IE[T0(T0 − 1)] + IE[T0]= 2β>Q(Id −Q)−2e + β>(Id −Q)−1e= 2β>Q(Id −Q)−2e + β>(Id −Q)(Id −Q)−2e= β>(Id + Q)(Id −Q)−2e.

More generally, by (13.11) we could also compute the factorial moment

IE[T0(T0 − 1) · · · (T0 − k + 1)] = G(k)T0 (1−) = k!β>Qk−1(Id −Q)−ke,

for all k > 1.

* Here, G′X(1−) denotes the derivative on the left at the point s = 1.

" 31



32 "



Chapter 3Random Walks and Recurrence

In this chapter we recall some basic facts on Bernoulli and absorbing ran-dom walks. Detailed proofs of the results on Bernoulli and absorbing randomwalks can be found in Chapter 3 of Privault (2018). The results on reflectedrandom walks are given with detailed proofs, and will be used in Chapter 4on random walks in a cookie environment, or excited random walks. See § 1.2and Proposition 1.3 of Hairer (2016) for the general theory of recurrence ofMarkov chains and their application to random walks.

3.1 Distribution and hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Return times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Hitting probabilities and hitting times . . . . . . . . . . . . . . . . . . . . . 423.4 Recurrence of symmetric random walks . . . . . . . . . . . . . . . . . . . . 453.5 Reflected random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.6 Conditioned random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1 Distribution and hitting times

Let {e1, e2, . . . , ed} denote the canonical basis of Rd, i.e.

ek = (0, . . . , 1,↑k

0, . . . , 0), k = 1, 2, . . . , d.

The unrestricted Zd-valued random walk (Sn)n>0, also called the Bernoullirandom walk, is defined by S0 = 0 and

Sn =n

∑k=1

Xk = X1 + · · ·+ Xn, n > 0,

started at

" 33



S0 =~0 = (0, 0, . . . , 0︸︷︷︸d times

),

where the random walk increments

Xn ∈ {e1, e2, . . . , ed,−e1,−e2, . . . ,−ed}, n > 1,

form an independent and identically distributed (i.i.d.) family (Xn)n>1 of {−1, 1}d-valued random variables with distribution

P(Xn = ek) = pk, P(Xn = −ek) = qk, k = 1, 2, . . . , d,

such thatd

∑k=1

pk +d

∑k=1

qk = 1.

One-dimensional random walk

When d = 1 the random walk increments (Xk)k>1 form an independent andidentically distributed (i.i.d.) family of Bernoulli random variables, with distri-bution P(Xk = +1) = p,

P(Xk = −1) = q, k > 1,with p + q = 1. In this case the random walk can only evolve by going up ofdown by one unit within the finite state space {0, 1, . . . , S}. We have

P(Sn+1 = k + 1 | Sn = k) = p and P(Sn+1 = k− 1 | Sn = k) = q,

k ∈ Z. We also have

IE[Sn | S0 = 0] = IE[

n

∑k=1

Xk

]=

n

∑k=1

IE [Xk] = n(2p− 1) = n(p− q),

and the variance can be computed as

Var[Sn | S0 = 0] = Var[

n

∑k=1

Xk

]=

n

∑k=1

Var[Xk] = 4npq.

The distribution of S2n is given by

P(S2n = 2k | S0 = 0) =(

2nn + k

)pn+kqn−k, −n 6 k 6 n, (3.1)

34 "



and we note that in an even number of time steps, (Sn)n∈N can only reach aneven state in Z starting from 0 . Similarly, in an odd number of time steps,(Sn)n∈N can only reach an odd state in Z starting from 0 . In Figure 3.1 we

enumerate the 120 =(

107

)=

(103

)possible paths corresponding to n = 5

and k = 2.

Fig. 3.1: Graph of 120 =(

107

)=

(103

)paths with n = 5 and k = 2.*

Two-dimensional random walk

When d = 2 the random walk can return to state~0 in 2n time steps by

• k forward steps in the direction e1,• k backward steps in the direction −e1,• n− k forward steps in the direction e2,• n− k backward steps in the direction −e2,where k ranges from 0 to 2n. For each k = 0, 1, . . . , n the number of ways toarrange those four types of moves among 2n time steps is the multinomialcoefficient (

2nk, k, n− k, n− k

)=

(2n!)k!k!(n− k)!(n− k)! ,

hence, since every sequence of 2n moves occur with the same probability(1/4)2n, by summation over k = 0, 1, . . . , n we find

P(S2n =~0

)=

n

∑k=0

(2n!)(k!)2((n− k)!)2 (p1q1)

k(p2q2)n−k

* Animated figure (works in Acrobat Reader).

" 35



=(2n)!(n!)2

n

∑k=0

(nk

)2(p1q1)k(p2q2)n−k. (3.2)

Multidimensional random walk

Given i1, i2, . . . , id ∈ N, we count all paths starting from~0 and returning to~0via ik “forward” steps in the direction ek and ik “backward” steps in the di-rection −ek, k = 1, 2, . . . , d.

In order to come back to~0 we need to take i1 forward steps in the directione1 and i1 backward steps in the direction e1, and similarly for i2, . . . , id. Thenumber of ways to arrange such paths is given by the multinomial coefficients(

2ni1, i1, i2, i2, . . . , id, id

)=

(2n)!(i1!)2 · · · (id!)2

,

and by summation over all possible indices i1, i2, . . . , id > 0 satisfying i1 +· · ·+ id = n and multiplying by the probability (1/(2d))2n of each path, wefind

P(S2n =~0

)= ∑

i1+···+id=ni1,i2,...,id>0

(2n

i1, i1, i2, i2, . . . , id, id

) d∏k=1

(pkqk)ik

= ∑i1+···+id=ni1,i2,...,id>0

(2n)!(i1!)2 · · · (id!)2

d

∏k=1

(pkqk)ik . (3.3)

3.2 Return times

We letTr0 := inf{n > 1 : S0 = 0)

denote the first return time to 0 of the one-dimensional random walk(Sn)n>0.

Proposition 3.1. The probability distribution P(Tr0 = n | S0 = 0) of the firstreturn time Tr0 to 0 is given by

P(Tr0 = 2n | S0 = 0

)=

12n− 1

(2nn

)(pq)n, n > 1,

with P(Tr0 = 2n + 1 | S0 = 0) = 0, n > 0.

36 "


https://en.wikipedia.org/wiki/Multinomial_theorem#Multinomial_coefficientshttps://www.ntu.edu.sg/home/nprivault/indext.html

Proof. We first note that the number of paths joining S0 = 0 to S2n = 0without hitting 0 can be split into the sets of paths joining S1 = 1 toS2n−1 = 1 without hitting 0 and the sets of paths joining S1 = −11 toS2n−1 = −11 without hitting 0 . According to the following graph, to eachof the

(2n− 2)!(n− 1)!n! blue paths joining S1 = 1 to S2n−2 = 1 without crossing 0

between time 1 and time 2n− 1 we can associate a red path joining S1 = −1to S2n−2 = −1 without crossing 0 .

original path

reflected path

1 2n− 1

On the graph below, every blue path joining S1 = 1 to S2n−1 = 1 by crossing0 can be associated to a unique red path joining S1 = 1 to S2n−1 = −1. The

number of such paths is(

2n− 2n− 2

)=

(2n− 2

n

).

original path

reflected path

1 2n− 1

Therefore, the number of paths joining S1 = 1 to S2n−2 = 1 without crossing0 between time 1 and time 2n− 1 is

" 37



(2n− 2n− 1

)−(

2n− 2n− 2

)=

(2n− 2)!(n− 1)!(n− 1)! −

(2n− 2)!(n− 2)!n!

=(n2 − n(n− 1))(2n− 2)!

n!n!

=(2n− 2)!(n− 1)!n! .

Adding the number of paths joining S1 = 1 to S2n−2 = 1 without crossing 0between time 1 and time 2n − 1 to the number of paths joining S1 = −1 toS2n−2 = −1 without crossing 0 between time 1 and time 2n− 1, we get thetotal to the number of paths joining S0 = 0 to S2n = 0 without crossing 0 ,between time 0 and time 2n, as follows:

2(2n− 2)!(n− 1)!n! =

2n(2n− 2)!n!n!

=1

2n− 1

(2nn

).

�

Let

GTr0 : [−1, 1] −→ Rs 7−→ GTr0 (s)

denote the Probability Generating Function (PGF) of the random variable Tr0 ,defined by

GTr0 (s) := IE[sT

r01{Tr00

snP(Tr0 = n

∣∣S0 = 0),−1 6 s 6 1, cf. (13.8). Recall that the knowledge of GTr0 (s) provides certaininformation on the distribution of Tr0 , such as the probability

P(Tr0 < ∞ | S0 = 0

)= IE

[1{Tr0

Proof. By Proposition 3.1, the probability distribution P(Tr0 = n | S0 = 0

)of

the first return time Tr0 to 0 is given by

P(Tr0 = 2k | S0 = 0) =1

2k− 1

(2kk

)(pq)k, k > 1,

with P(Tr0 = 2k + 1 | S0 = 0

)= 0, k ∈ N. By applying a Taylor expansion to

s 7−→ 1− (1− 4pqs2)1/2 in (3.4), we get

GTr0 (s) = ∑n>0

snP(Tr0 = n | S0 = 0)

= ∑k>1

s2kP(Tr0 = 2k | S0 = 0)

= ∑k>1

s2k

2k− 1

(2kk

)(pq)k

= ∑k>1

s2k

k!1

2k− 11× 2× · · · × (2k− 1)× (2k)

1× 2× · · · × (k− 1)× k (pq)k

= ∑k>1

s2k

k!1

2k− 11× 3× 5× · · · × (2k− 3)× (2k− 1)(2pq)k

=12 ∑k>1

s2k(4pq)k

k!

(1− 1

2

)× · · · ×

(k− 1− 1

2

)= 1− ∑

k>0

1k!(−4pqs2)k

(12− 0)(

12− 1)× · · · ×

(12− (k− 1)

)= 1− (1− 4pqs2)1/2,

where we used the Taylor expansion

(1 + x)α = ∑k>0

xk

k!α(α− 1)× · · · × (α− (k− 1))

for α = 1/2. �

The distribution

P(Tr0 = 2k | S0 = 0) =(4pq)k

k!12

(1− 1

2

)× · · · ×

(k− 1− 1

2

)=

(4pq)k

2k!

k−1∏m=1

(m− 1

2

)" 39



=1

2k− 1

(2kk

)(pq)k, k > 1,

can be recovered from the relation

P(Tr0 = n | S0 = 0) =1n!

∂n

∂snGTr0 (s)|s=0, n > 0.

Proposition 3.3. The probability that the first return to 0 occurs within a finitetime is

P(Tr0 < ∞ | S0 = 0

)= 2 min(p, q), (3.5)

and we haveP(Tr0 = ∞ | S0 = 0

)= |2p− 1| = |p− q|. (3.6)

Proof. We have

P(Tr0 < ∞ | S0 = 0

)= IE

[1{Tr0 0,

whereas in the symmetric case (or fair game) p = q = 1/2 we find that

P(Tr0 < ∞ | S0 = 0

)= 1 and P

(Tr0 = ∞ | S0 = 0

)= 0,

i.e. the symmetric random walk is recurrent, as it returns to 0 with probabil-ity one and has a single communicating class, see Ccorollary 1.12.

i) In the non-symmetric case p 6= q, by (3.6), the time Tr0 needed to returnto state 0 is infinite with probability

P(Tr0 = ∞ | S0 = 0

)= |p− q| > 0,

hence

40 "



IE[Tr0 | S0 = 0] = ∞. (3.7)Starting from S0 = k > 1, the mean hitting time of state 0 equals

IE[Tr0 | S0 = k] =

∞ if q 6 p,

kq− p if q > p,

(3.8)

see Exercise 3.2 in Privault (2018).ii) In the symmetric case p = q = 1/2 we have P

(Tr0 < ∞ | S0 = 0

)= 1

and

IE[Tr0 | S0 = 0] = IE[Tr01{Tr01kP(Tr0 = 2k

∣∣S0 = 0)= 2 ∑

k>1

k2k− 1

(2kk

)(pq)k.

When p = q = 1/2, we find

IE[Tr01{Tr01

2k2k− 1

(2kk

)1

22k. (3.10)

By Stirling’s approximation k! ' (k/e)k√

2πk as k tends to ∞, we have

2k2k− 1

122k

(2kk

)=

2k2k− 1

(2k)!22k(k!)2

'k→∞1√πk

,

from which (3.10) recovers (3.9) by the limit comparison test.

The probability of hitting state 0 in finite time starting from any state kwith k > 1 is given by

P(Tr0 < ∞ | S0 = k

)= min

(1,(

qp

)k), k > 1, (3.11)

i.e.

" 41


https://en.wikipedia.org/wiki/Stirling%27s_approximationhttps://en.wikipedia.org/wiki/Limit_comparison_testhttps://www.ntu.edu.sg/home/nprivault/indext.html

P(Tr0 = ∞ | S0 = k

)= max

(0, 1−

(qp

)k), k > 1.

Using the independence of increments of the random walk (Sn)n∈N, one canalso show that the probability generating function of the first passage time

Tk = inf{n > 0 : Sn = k}

to any level k > 1 is given by

GTk (s) =

(1−

√1− 4pqs22qs

)k, 4pqs2 < 1, q 6 p, (3.12)

from which the distribution of Tk can be computed given the series expansionof GTk (s).

3.3 Hitting probabilities and hitting times

LetTL := inf{n > 0 : Sn = L}

denote the first hitting time of L by the one-dimensional random walk (Sn)n>0,and let

T0 := inf{n > 0 : Sn = 0}denote the first hitting time of 0 by the process (Sn)n>0.

n

Sn

L =

S0 =

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 3.2: Sample path of the random walk (Sn)n∈N.

Proposition 3.4. (Relation (2.2.27) in Privault (2018)). In the non-symmetric casep 6= q, the event

{T0 < TL} =⋃

n∈N{Sn = 0}, (3.13)

42 "



has the conditional probability

P(T0 < TL | S0 = k) =(q/p)k − (q/p)L

1− (q/p)L =(p/q)L−k − 1(p/q)L − 1 , (3.14)

or

P(TL < T0 | S0 = k) =(p/q)L−k − (p/q)L

1− (p/q)L =1− (q/p)k1− (q/p)L , (3.15)

k = 0, 1, . . . , L.

In the symmetric case p = q = 1/2, we find

P(T0 < TL | S0 = k) = 1−kL

, or P(TL < T0 | S0 = k) =kL

, (3.16)

k = 0, 1, . . . , L, see Relation (2.2.28) in Privault (2018). When the number L ofstates becomes large we obtain the probability of hitting the origin startingfrom state k , as

f∞(k) := limL→∞

P(T0 < TL | S0 = k)

= min

(1,(

qp

)k)

=

1 if q > p,(

qp

)kif p > q, k > 0.

(3.17)

Similarly, for all k > 0 we have

limS→∞

P(TL < T0 | S0 = k) =

0 if p 6 q,

1−(

qp

)kif p > q,

which represents the probability that the one-dimensional random walk (Sn)n∈N“escapes to infinity”.

Mean hitting times

Let nowT0,L = inf{n > 0 : Sn = 0 or Sn = L}

" 43



denote the time* until any of the states 0 or L is reached by (Sn)n∈N, withT0,L = +∞ in case neither states are ever reached, see Figure 3.3.

n

Sn

L =

S0 =

T0,6T0,6

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 3.3: Sample paths of the random walk (Sn)n∈N.

From Proposition 3.4, we note that

P(T0,L | S0 = k) = P(T0 < TL | S0 = k) + P(TL < T0 | S0 = k)

=(p/q)L−k − 1(p/q)L − 1 +

(q/p)k − 1(q/p)L − 1

=(q/p)L((p/q)L−k − 1)− ((p/q)L−k − 1) + (p/q)L((q/p)k − 1)− ((q/p)k − 1)

((p/q)L − 1)((q/p)L − 1)

=(q/p)k − (q/p)L − (p/q)L−k + 1 + (p/q)L−k − (p/q)L − (q/p)k + 1

((p/q)L − 1)((q/p)L − 1)= 1, k = 0, 1, . . . , L.

Proposition 3.5. (Relation (2.3.11) in Privault (2018)). When p 6= q, the meanhitting time

hL(k) := IE[T0,L | S0 = k]starting from S0 = k ∈ {0, 1, . . . , L} can be computed as

hL(k) = IE[T0,L | S0 = k] =1

q− p

(k− L 1− (q/p)

k

1− (q/p)L

), k = 0, 1, 2, . . . , L,

(3.18)

In the symmetric case p = q = 1/2 we get

hL(k) = IE[T0,L | S0 = k] = k(L− k), k = 0, 1, 2, . . . , L, (3.19)

see Relation (2.3.17) in Privault (2018). In particular, we note that

* The notation “inf” stands for “infimum”, meaning the smallest n > 0 such that Sn = 0 orSn = L, if such an n exists.

44 "



IE[T0,L | S0 = k] < +∞, k = 0, 1, 2, . . . , L.

3.4 Recurrence of symmetric random walks

The question of recurrence of the d-dimensional random walk has been firstsolved in Polya (1921). The treatment proposed in this section is based onChampion et al. (2007). We consider the symmetric Zd-valued random walk

Sn = X1 + · · ·+ Xn, n > 0,

started at S0 =~0 = (0, 0, . . . , 0), where (Xn)n>1 is a sequence of independentuniformly distributed random variables

Xn ∈ {e1, e2, . . . , ed,−e1,−e2, . . . ,−ed}, n > 1,

with distribution

P(Xn = ek) = P(Xn = −ek) =1

2d, k = 1, 2, . . . , d.

LetTr~0 := inf{n > 1 : Sn =~0}

denote the time of first return* to~0 = (0, 0, . . . , 0) of the random walk (Sn)n∈Nstarted at ~0, with the convention inf ∅ = +∞, see Figure 3.4. The randomwalk is said to be recurrent if P

(Tr~0 < ∞

)= 1.

n

Sn

S0 =

Tr0

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 3.4: Sample path of the random walk (Sn)n∈N.

* Recall that the notation “inf” stands for “infimum”, meaning the smallest n > 0 such thatSn = 0, with Tr0 = +∞ if no such n > 0 exists.

" 45



Recurrence of the one-dimensional random walk

When d = 1 we can now compute P(S2n = 0), n > 1, and deduce that theone-dimensional random walk is recurrent, i.e. we have P

(Tr0 < ∞

)= 1. For

this we will use Stirling’s approximation n! ' (n/e)n√

2πn as n tends to ∞.When d = 1 we have

P(S2n = 0) =1

22n

(2nn

)=

(2n)!22n(n!)2

'n→∞1√πn

,

by Stirling’s approximation, hence

∑n>0

P(Sn = 0) = ∞.

and by Corollary 1.11 or Corollary 3.12 we conclude that P(Tr0 < ∞

)= 1,

i.e. we recover the fact that the one-dimensional symmetric random walk isrecurrent.

Recurrence of the two-dimensional random walk

Proposition 3.6. When d = 2 and p1 = q1 = p2 = q2 = 1/4 the two-dimensionalrandom walk is recurrent, i.e. we have P

(Tr~0 < ∞

)= 1.

Proof. Recall that when d = 2, by (3.2) we have

P(S2n =~0

)=

(14

)2n n∑k=0

(2n!)(k!)2((n− k)!)2

=(2n)!

42n(n!)2n

∑k=0

(nk

)2=

(2n)!42n(n!)2

(2nn

)=

((2n)!)2

42n(n!)4'n→∞

12πn

,

where we used the combinatorial identity*(2nn

)=

n

∑k=0

(nk

)2* This identity can be proved by noting that the number (2nn ) of ways to draw n balls among2n balls can be obtained by summing the number of ways to draw exactly k white balls andn− k black balls for k = 0, 1, . . . , n.

46 "


https://en.wikipedia.org/wiki/Stirling%27s_approximationhttps://www.ntu.edu.sg/home/nprivault/indext.html

and Stirling’s approximation. This yields

∑n>0

P(Sn =~0

)= ∞.

We conclude by Corollary 1.11 or Corollary 3.12 which show that P(Tr~0 <

∞)= 1. �

Recurrence of d-dimensional random walks, d > 3

We will use the following lemma, see Lemma 4 in Champion et al. (2007).

Lemma 3.7. Let n = and + bn where an is a nonnegative integer and bn ∈{0, 1, . . . , d− 1}. We have

i1!i2! · · · id! > (an!)d(an + 1)bn

for all i1, i2, . . . , id nonnegative integers such that i1 + · · ·+ id = n, d > 1.Proposition 3.8. The random walk (Sn)n∈N is not recurrent indimension d > 3.

Proof. By (3.3) we have

P(S2n =~0

)=

1(2d)2n ∑i1+···+id=n

i1,i2,...,id>0

(2n)!(i1!)2 · · · (id!)2

.

Using the boundi1!i2! · · · id! > (an!)d(an + 1)bn

for n = i1 + · · ·+ id from Lemma 3.7 and the Euclidean division n = and + bnwhere bn ∈ {0, 1, . . . , d− 1}, we have

∑n>1

P(S2n =~0

)= ∑

n>1

1(2d)2n

(2nn

)∑

i1+···+id=ni1,i2,...,id>0

(n!)2

(i1!)2 · · · (id!)2

6 ∑n>1

1(2d)2n

(2nn

)n!

(an!)d(an + 1)bn∑

i1+···+id=ni1,i2,...,id>0

n!i1! · · · id!

6 ∑n>1

1(2d)2n

(2nn

)n!dn

(an!)dabnn

= ∑n>1

(2n)!

22ndnn!(an!)dabnn,

" 47


https://en.wikipedia.org/wiki/Stirling%27s_approximationhttps://en.wikipedia.org/wiki/Euclidean_divisionhttps://www.ntu.edu.sg/home/nprivault/indext.html

from the formuladn = ∑

i1+···+id=ni1,i2,...,id>0

n!i1! · · · id!

which follows from the multinomial identity(n

∑l=1

xl

)k= k! ∑

d1+···+dn=kd1>0,...,dn>0

xd11d1!· · · x

dnn

dn!. (3.20)

Next, applying Stirling’s approximation to n!, (2n)! and an!, and using thelimit limm→∞(1 + x/m)m = ex, x ∈ R, we have

(2n)!

22ndnn!(an!)dabnn' (2n/e)

2n√

4πn

22ndn(n/e)n√

2πn((an/e)an√

2πan)dabnn

=

√2

(2π)d/2nn

ebn(and)nad/2n

=

√2

(2π)d/2(1− bn/n)−n

ebn ad/2n

6

√2dd/2

(2π)d/2(1− (d− 1)/n)−n

(and)d/2

'√

2dd/2ed−1

(2π)d/21

nd/2,

since and ' n as n goes to infinity from the relation and/n = 1− bn/n. Weconclude that there exists a constant C > 0 such that for all n sufficientlylarge, we have

(2n)!22ndnn!(an!)d

6C

nd/2, (3.21)

hence the random walk is not recurrent when d > 3. Indeed, (3.21) shows that

∑n>0

P(Sn =~0

)< ∞,

hence P(Tr~0 = ∞

)> 0 by Corollary 1.11 or Corollary 3.12. �

Conditions for recurrence

In Corollary 3.12 below we provide an alternative proof of Corollary 1.11.

48 "


https://en.wikipedia.org/wiki/Stirling%27s_approximationhttps://www.ntu.edu.sg/home/nprivault/indext.html

Proposition 3.9. The probability distribution P(Tr~0 = n

), n > 1, satisfies the

convolution equation

P(Sn =~0

)=

n

∑k=2

P(Tr~0 = k

)P(Sn−k =~0

), n > 1.

Proof. We partition the event {Sn =~0} into

{Sn =~0} =n⋃

k=2

{Sn−k =~0, Sn−k+1 6=~0, . . . , Sn−1 6=~0, Sn =~0

}, n > 1,

according to the time of last return to state~0 before time n, with P({S1 =~0}

)=

0 since we are starting from S0 =~0, see Figure 3.5.

n

Sn

S0 =n =k =

0

1

2

3

4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fig. 3.5: Last return to state 0 at time k = 10.

Then we have

P(Sn =~0

):= P

(Sn =~0 | S0 =~0

)=

n

∑k=2

P(Sn−k =~0, Sn−k+1 6=~0, . . . , Sn−1 6=~0, Sn =~0 | S0 =~0

)=

n

∑k=2

P(Sn−k+1 6=~0, . . . , Sn−1 6=~0, Sn =~0 | Sn−k =~0, S0 =~0

)×P(Sn−k =~0 | S0 =~0

)=

n

∑k=2

P(S1 6=~0, . . . , Sk−1 6=~0, Sk =~0 | S0 =~0

)P(Sn−k =~0 | S0 =~0

)=

n

∑k=2

P(Tr~0 = k | S0 =~0

)P(Sn−k =~0 | S0 =~0

)=

n

∑k=2

P(Sn−k =~0

)P(Tr~0 = k

), n > 1.

" 49



�

Lemma 3.10. For all m > 1 we have

1− 1m∑n=0

P(Sn =~0

) 6 m∑n=2 P(Tr~0 = n

)6

2m

∑n=2

P(Sn =~0

)m

∑n=0

P(Sn =~0

) . (3.22)Proof. We start by showing that

m

∑n=1

P(Sn =~0

)=

m

∑k=2

P(Tr~0 = k

) m−k∑l=0

P(Sl =~0

).

We have

m

∑n=1

P(Sn =~0

)=

m

∑n=1

n

∑k=2

P(Tr~0 = k

)P(Sn−k =~0

)=

m

∑k=2

m

∑n=k

P(Tr~0 = k

)P(Sn−k =~0

)=

m

∑k=2

P(Tr~0 = k

) m−k∑l=0

P(Sl =~0

)6

m

∑k=2

P(Tr~0 = k

) m∑l=0

P(Sl =~0

)=

(m

∑n=0

P(Sn =~0

))( m∑n=2

P(Tr~0 = n

)).

On the other hand, we have

2m

∑n=1

P(Sn =~0

)=

2m

∑n=2

P(Tr~0 = n

) 2m−n∑l=0

P(Sl =~0

)>

m

∑n=2

P(Tr~0 = n

) 2m−n∑l=0

P(Sl =~0

)>

m

∑n=2

P(Tr~0 = n

) m∑l=0

P(Sl =~0

).

�

By letting m tend to ∞ in (3.22) we get the following corollary.

50 "



Corollary 3.11. We have

P(Tr~0 < ∞

)= 1− 1

∑n>0

P(Sn =~0

) = 1− 11 + IE[R~0 | S0 =~0]

.

Proof. By Lemma 3.10, letting m tend to infinity in (3.22), we have

1− 1∑n>0

P(Sn =~0

) 6 ∑n>2

P(Tr~0 = n

)= P

(Tr~0 < ∞

)6

∑n>2

P(Sn =~0

)∑n>0

P(Sn =~0

)= 1− 1

∑n>0

P(Sn =~0

) .�

As a consequence of Corollary 3.11 we have the following corollary. Note thatthe sum of the series

∑n>0

P(Sn =~0

)= ∑

n>0IE[1{Sn=~0}

]= IE

[∑n>0

1{Sn=~0}

]

represents the average number of visits to state 0 . We also have

∑n>0

P(Sn =~0

)= ∑

n>0[Pn]0,0 = (I − P)−10,0 .

Corollary 3.12. The d-dimensional symmetric random walk is recurrent, i.e. P(Tr~0 <

∞)= 1, if and only if

∑n>0

P(Sn =~0

)= ∞.

3.5 Reflected random walk

We now consider a reflected random walk (Sn)n>0 with transition probabili-ties

" 51



P(Sn+1 = k + 1 | Sn = k) = p, k = 0, 1, . . . , L− 1,P(Sn+1 = k− 1 | Sn = k) = q, k = 1, 2, . . . , L− 1,

with

P(Sn+1 = 0 | Sn = 0) = q and P(Sn+1 = L | Sn = L) = 1,

for all n ∈N = {0, 1, 2, . . .}, where q = 1− p and p ∈ (0, 1].Proposition 3.13. State L is eventually reached with probability one after startingfrom any state k ∈ {0, 1, . . . , L}.Proof. Let

g(k) := P(TL < ∞ | S0 = k)denote the probability that state L is reached in finite time after startingfrom state k ∈ {0, 1, . . . , L}. Using first step analysis we can write down thedifference equations satisfied by g(k), k = 0, 1, . . . , L− 1, as

g(k) = pg(k + 1) + qg(k− 1), k = 1, 2, . . . , L− 1, (3.23)

withg(0) = pg(1) + qg(0) (3.24)

for k = 0, and the boundary condition g(L) = 1. In order to solve for thesolution g(k) := P(TL < ∞ | S0 = k) of (3.23)-(3.24), k = 0, 1, . . . , L, weobserve that the constant function g(k) = C is solution of both (3.23) and(3.24) and the boundary condition g(L) = 1 yields C = 1, hence

g(k) = P(TL < ∞ | S0 = k) = 1

for all k = 0, 1, . . . , L. �

Leth(k) := IE[TL | S0 = k]

denote the expected time until state L is reached after starting from statek ∈ {0, 1, . . . , L}.Proposition 3.14. We have

h(k) = IE[TL | S0 = k] =L− kp− q +

q(p− q)2

((qp

)L−(

qp

)k),

k = 0, 1, . . . , L, when p 6= q, and

h(k) = IE[TL | S0 = k] = (L + k + 1)(L− k), k = 0, 1, . . . , L,

52 "



when p = q = 1/2.

Proof. Using first step analysis we can write down the difference equationssatisfied by h(k) for k = 0, 1, . . . , L− 1, as

h(k) = 1 + ph(k + 1) + qh(k− 1), k = 1, 2, . . . , L− 1, (3.25)

withh(0) = 1 + ph(1) + qh(0) (3.26)

for k = 0, and the boundary condition h(L) = 0. We compute h(k) =IE[TL | S0 = k] for all k = 0, 1, . . . , L by solving the equations (3.25)-(3.26)for k = 1, 2, . . . , L− 1.

(i) Case p 6= q. The solution of the associated homogeneous equation

h(k) = ph(k + 1) + qh(k− 1), k = 1, 2, . . . , L− 1,

has the form

h(k) = C1 + C2(q/p)k, k = 1, 2, . . . , L− 1,

and we can check that k 7→ k/(p− q) is a particular solution. Hence the gen-eral solution of (3.25) has the form

h(k) =k

q− p + C1 + C2(q/p)k, k = 0, 1, . . . , L,

with0 = h(L) =

Lq− p + C1 + C2(q/p)

L,

ph(0) = p(C1 + C

topics in discrete-time stochastic processes€¦ · provide solid foundations in random processes...

Documents