introduction to stochastic processesthibaut.mastrolia/... · 2019-11-25 · introduction to...

Introduction to Stochastic Processes

Thibaut Mastrolia

MAA 305 – Bachelor program, third yearÉcole Polytechnique – Fall 2019

MAA 305 – Bachelor 3rd year – École Polytechnique. Thibaut Mastrolia

General Introduction

This lecture notes is associated with the course "MAA 305 – Probability: Stochastic Pro-cesses", intended for third year students of the Bachelor program of the Ecole Polytechnique.It follows the course "MAA 203 – Probability" and "MAA 301 – Integration".

The guideline of these lecture note is the random walk introduced in Chapter 1 by follow-ing exactly the book An Introduction to Markov Processes, by D.W. Stroock. This reference isadapted as an exercise in this first chapter. The second chapter focuses on the definition of astochastic process together with the notion of stopping time. It also introduces Gaussian pro-cesses. It follows the lecture notes Martingale en temps discret et chaines de Markov, by NizarTouzi (Lecture notes of the engineer program) and is also contained in the other referencesmentioned. The third chapter introduces the building of the conditional expectation and fol-lows Chapter 9 of Probability with martingale, by D. Williams. The fourth chapter investigatesdiscrete martingale theory and is based on Chapters 10-11-12 in Probability with martingale,by D. Williams. The fifth chapter on discrete Markov chains follows the same lines thanChapter 2 in Adventure in stochastic processes, by S. Resnick.The sixth chapter is adapted byChapter 6 in Adventure in stochastic processes, by S. Resnick and study the Brownian motion.

These lecture notes are illustrated with examples and exercises with corrections at the endof the document.

This version: November 2019. I thank the students of the Bachelor program for theirremarks, for their questions helping to improve this course.

2

Contents

0.1 measurable space and integrability . . . . . . . . . . . . . . . . . . . . . . . . . 50.2 Some convergence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1 Introduction with the Random Walk 71.1 Nearest Neighbor Random Walk on Z: an introductive exercise . . . . . . . . . 7

1.1.1 Preliminary definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Law of passage time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.3 Passage time at the point a 6= 0 . . . . . . . . . . . . . . . . . . . . . . . 81.1.4 Time of first return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Recurrence of random walk on Zd . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Recurrence of the symmetric random walk in Z2 . . . . . . . . . . . . . . . . . . 121.4 Transience in Zd for d ≥ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Stochastic Processes: Definitions and Examples 152.1 Stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Stopping time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Information available at a stopping time . . . . . . . . . . . . . . . . . . . . . . 172.4 The random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5 The branching process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Gaussian vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6.1 Refreshers on pair of random variables . . . . . . . . . . . . . . . . . . . 192.6.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Conditional Expectation 233.1 Intuitive example for discrete random variables and conditional expectation

with respect to an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.1 Intuitive example and discrete random variable . . . . . . . . . . . . . . 233.1.2 Conditional expectation with respect to an event . . . . . . . . . . . . . 24

3.2 Conditional expectation: definition, existence, uniqueness . . . . . . . . . . . . 253.2.1 Projection in L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2 Fundamental existence theorem . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Application: the case of continuous random variable admitting a density. . . . . 303.5 Conditional expectation and independence . . . . . . . . . . . . . . . . . . . . . 313.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3


4 Introduction to Discrete Martingale Theory 354.1 Discrete martingale, submartingale and supermartingale . . . . . . . . . . . . . 354.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Predictable processes and martingale . . . . . . . . . . . . . . . . . . . . . . . . 374.4 Stopped martingale and Doob’s stopping Theorem . . . . . . . . . . . . . . . . 384.5 (Sub)martingale inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.6 Decomposition of martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.7 Convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Discrete Markov Chains: a First Approach 475.1 Markov chain: first definitions and property . . . . . . . . . . . . . . . . . . . . 475.2 Decomposition of the state space . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3 Transience, recurrence and periodicity . . . . . . . . . . . . . . . . . . . . . . . 53

5.3.1 Transience and recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3.2 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3.3 Solidarity properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.3.4 Special case: Markov chain with finite number of states . . . . . . . . . 57

5.4 Invariant measures and stationary distribution . . . . . . . . . . . . . . . . . . . 575.5 Strong law of large numbers for Markov chains . . . . . . . . . . . . . . . . . . 595.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 The Brownian Motion: a Good Place to End 636.1 The Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 The Brownian motion as a rescaled random walk . . . . . . . . . . . . . . . . . 646.3 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7 Solutions to exercises 677.1 Random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.2 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.3 Conditional expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.5 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4


Reminders on Probability Theory

We refer to the Lecture notes of Giovani Conforti in MAA 203. In addition to it, we introducesome fundamental definitions. We denote by B(R) the Borel σ−algebra (see Definition 3.5 inConforti, MAA203 ) which is the σ−algebra generated by closed interval of R. In other words,

B(R) = σ(π(R)),

where π(R) = [a, b], −∞ < a ≤ b < +∞.

0.1 measurable space and integrability

Definition 0.1.1. Let (E, E) be a measurable space. A function f from (E, E) into (R,B(R))is measurable if the preimage of any Borelian by f is in E, i.e.

f−1([a, b]) ∈ E , ∀a ≤ b.

The function f is said E−measurable.

Definition 0.1.2. Let (Ω,A,P) be a probability space and Z be a real random variable on thisspace. The σ−algebra σ(Z) generated by Z consists of sets Z ∈ B where B ∈ B(R).

Definition 0.1.3 (measurable random variables). Let (Ω,A,P) be a probability space and(E, E) be a measurable space. Then, a random variable X from Ω into E is a measurablefunction X from Ω into E. We also say that X is an A−measurable random variable.

When E = R and E = B(R), we named the random variable X a real random variable.

In other words, X is a real random variable if for any a < b

ω ∈ Ω, X(ω) ∈ [a, b] ∈ A.

For any random variable X, the σ−algebra generated by X, denoted by σ(X) is the smallestσ−algebra on the probability space considered such that X is σ(X)−measurable. It is givenby

σ(X) = σ(X−1(A), for any A ∈ E).

Proposition 0.1.1. Let G and A be two σ-algebra such that G ⊂ A. Then, any G−measurablerandom variable is A−measurable.

5


We denote by Lp(A,P) with p ≥ 1 the space of p−integrable and A−measurable realrandom variable under P, i.e. X ∈ Lp(A,P) if E[|X|p] < +∞, where the expectation is takenwith respect to P. We set ‖X‖p := E[|X|p]

1p defining a norm on the space Lp(A,P). When

there is no ambiguity on the σ-algebra and the probability, we simply write L2 for L2(A,P).We recall that L2 is endowed with a scalar product defined by 〈X,Y 〉2 := E[XY ] for anyX,Y ∈ L2.

0.2 Some convergence results

We recall in this section some very useful results to obtain convergences of random variablessequences. This section is a reminder on the course "Measure and Integration" MAA 301. Werecall that for a sequence of events (En) we define

lim supnEn :=

⋃m

⋂n≥m

En

= (En, infinitely often)

= ω ∈ Ω, for every m, ∃n(ω) ≥ m such that ω ∈ En(ω)= ω ∈ Ω, ω ∈ En for infinitely many n

Proposition 0.2.1 (Borel-Cantelli Lemma, see for instance 2.7 in (Williams 1991)). Let(En)n∈N be a sequence of events on a probability space (Ω,A,P) such that

∑n P(En) < ∞.

Then,P(lim supEn) = 0.

Theorem 0.2.1 (Monotone convergence theorem, see for instance 5.3 in (Williams 1991)).Let (Xn)n be a sequence of random variables with values in [0,∞) such that Xn ≤ Xn+1 almostsurely. Assume that Xn converges to X almost surely, then

limn→+∞

E[Xn] = E[ limn→+∞

Xn] = E[X].

Theorem 0.2.2 (Dominated convergence theorem, see for instance 5.9 in (Williams 1991)).Let (Xn) be a sequence of random variables on a probability space (Ω,A,P). Assume that

• Xn converges almost surely to a random variable X when n goes to +∞,

• there exists an integrable random variable Y ∈ L1 with values in R+ (independent of n)such that |Xn| ≤ Y ,

then,limnXn = X in L1, i.e. E[|Xn −X|] −→ 0, n→ +∞.

6


Chapter 1

Introduction with the Random Walk

Main reference: (Stroock 2000) An Introduction to Markov Processes, D.W. Stroock, Springer(2000). This reference is adapted as an exercise.

1.1 Nearest Neighbor Random Walk on Z: an introductive ex-ercise

This section refers to 1.1 of (Stroock 2000). The answers can be find in the reference mentioned.

1.1.1 Preliminary definitions

In this section, we identify N∗ with the set of positive integers, that is 1, 2, . . . . Let p ∈ (0, 1)and consider a family Bn, n ∈ N∗ of −1, 1-valued identically distributed and (mutually)independent random variables which take the value 1 with probability p. We say that Bnfollows a Rademacher law with parameter p. We set q = 1 − p. Let now n ∈ N∗ and(εi)1≤i≤n ∈ −1, 1n. We have

P(B1 = ε1, . . . , Bn = εn) = pN(E)qn−N(E),

where N(E) is the number of indexes m such that Bm = 1.

Question 1. Let Sn(E) =∑n

m=1 εm prove that N(E) = n+Sn(E)2 .

We now set

X0 = 0, Xn =

n∑m=1

Bm, n ∈ N∗.

The sequence of random variables (Xn)n is the first example of this course of stochastic pro-cess and is called a nearest neighbor random walk on Z.

Question 2a. Prove that P(X0 = 0) = 1 and compute P(Xn − Xn−1 = ε|X0, . . . , Xn−1) forε = 1 and ε = −1.

7


Question 2b. Assume now that X0 = x0 ∈ Z (just for this question!) What can you say aboutP(Xn = k|X0 = x0, . . . , Xn−1 = `) and P(Xn = k|Xn−1 = `)?

We recall that P(Xn − Xn−1 = ε|X0, . . . , Xn−1) is an abuse of notations of the conditionalprobability P(Xn −Xn−1 = ε|σ(X0, . . . , Xn−1)) where σ(X0, . . . , Xn−1) is the σ−algebra gen-erated by X0, . . . , Xn−1.

The interpretation to the result obtained in the previous question is very intuitive: startingfrom zero at time 0, the random walk jumps between time n and n− 1 independent of whereit has been before time n.

1.1.2 Law of passage time

We now turn to the computation of P(Xn = m).

Question 3. Let n ∈ N∗. What are the possible values of Xn? If n is odd (resp. even) whatdo you deduce concerning Xn?

Question 4. By using Question 1 and Question 3, compute P(Xn = m).

Question 5. Show that P(Xn = m) = pP(Xn−1 = m− 1) + qP(Xn−1 = m+ 1).

1.1.3 Passage time at the point a 6= 0

More challenging question is to study the first passage time at a point a ∈ Z for the randomwalk on Z. Let a ∈ Z∗ where Z∗ ≡ Z \ 0. We set

ζa := infn ≥ 1, Xn = a, or +∞ if Xn 6= a for any n ≥ 1.

ζa is the first passage to point a. We want to compute P(ζa = n). For the sake of simplicity,we restrict to the case a ∈ N∗. The case a is negative can be obtained similarly by exchangingp and q.

Question 6. Assume that n < a, what can you say about P(ζa = n)? And if n and a have notthe same parity?

From now we assume that a ≤ n and a and n have the same parity.

Question 7. Prove that P(ζa = n) = pP(ζa > n− 1, Xn−1 = a− 1).

Question 8. Deduce that

P(ζa = n) = Na(n)pn+a

2 qn−a

2 ,

where Na(n) denotes the number of (n− 1)-uplet E ∈ −1, 1n−1 such that S`(E) ≤ a− 1 forany 0 ≤ ` ≤ n− 1 and Sn−1(E) = a− 1.

8


Question 9. Let Na(n) be the number of (n − 1)-uplet E ∈ −1, 1n−1 such that S`(E) ≥ afor some 0 ≤ ` ≤ n− 1 and Sn−1(E) = a− 1. Deduce that(

n− 1n+a

2 − 1

)= Na(n) +Na(n).

It thus remains to compute Na(n). We use the so-called reflexion principle. We introducethe following set:

• Pa(n) is the set of n-uplet (s0, . . . , sn−1) ∈ Zn such that s0 = 0, s` − s`−1 ∈ −1, 1, forany 1 ≤ ` ≤ n− 1 and there exists an index m ∈ 1, . . . , n− 1 such that sm ≥ a.

• La(n) is the set of n-uplet (s0, . . . , sn−1) in Pa(n) such that sn−1 = a− 1.

• Ua(n) is the set of n-uplet (s0, . . . , sn−1) in Pa(n) such that sn−1 = a+ 1.

Question 10. What is the relation between Na(n) and La(n)?

Question 11. Prove that(n− 1n+a

2

)= #Ua(n) = #La(n).

Hint: First, note that (s0, . . . , sn−1) ∈ Ua(n) if and only if s0 = 0, s`−s`−1 ∈ −1, 1, forany 1 ≤ ` ≤ n−1 and sn−1 = a+1, to deduce the first equality. Then, for s := (s0, . . . , sn−1) ∈Pa(n), define the map f such that f(s) := inf0 ≤ k ≤ n−1, Sk ≥ a and define the reflectionmap R by

R(s) = (s0, . . . , sn−1), s ∈ Pa(n),

with sm = sm, if 0 ≤ m ≤ f(s) and sk = 2a− sk otherwise. Note then that R maps La(n)into Ua(n) and conversely maps Ua(n) into La(n) and that R R(s) = s.

Question 12. Prove that

Na(n) =

(n− 1n+a

2 − 1

)−(n− 1n+a

2

)and deduce that

P(ζa = n) =|a|nP(Xn = a),

when n ≥ a with n and a having the same parity and 0 otherwise.

Recall that our aim is to determine whether P(ζa < +∞) = 1.

Question 13. Check that for a > 0, we can write ζa = fa(B1, . . . , Bn, . . . ) where fa is a mapfrom −1, 1N∗ into N∗ such that

fa(ε1, . . . , εn, . . . ) > n ⇐⇒m∑i=1

εi < a, for any 1 ≤ m ≤ n.

Question 14. Prove that P(ζa+1 < ∞) = P(ζ1 < ∞)P(ζa < ∞), and deduce that for anya ∈ N∗

P(ζ1 <∞) = 1 =⇒ P(ζa <∞).

It thus remains to compute P(ζ1 <∞).

9


Question 15. Show that P(ζ1 <∞) = lims→1, s<1 E[sζ1 ].

Question 16. We set ua(s) = E[sζa ] for a ∈ N∗ and s ∈ (−1, 1). Prove that

ua+1(s) = ua(s)u1(s)

and deduce that ua(s) = u1(s)a.

Question 17. Prove that u1(s) = ps+ qsu1(s)2.

Question 18. Show that E[sζa ] =(1−√

1−4pqs2

2qs

)a for |s| < 1 and deduce that

P(ζa <∞) = 1p≥q + (p

q)a1p<q.

Interpretations?

Similarly, we can prove that

E[sζa ] =(1−

√1− 4pqs2

2ps

)|a|, |s| ≤ 1, a < 0.

and for any a ∈ Z∗

P(ζa <∞)

1 if a ∈ N∗ and p ≥ q or − a ∈ N∗ and p ≤ q(pq

)a if a ∈ N∗ and p < q or − a ∈ N∗ and p > q.(1.1)

1.1.4 Time of first return

We now define ρ0 = infn ≥ 1, Xn = 0. We want to study whether P(ρ0 <∞) = 1.

Question 19. Prove that P(X1 = 1, ρ0 < ∞) = pP(ζ−1 < ∞) and P(X1 = −1, ρ0 < ∞) =qP(ζ1 <∞) and deduce that1 P(ρ0 <∞) = 2(p ∧ q).

We have proven the following theorem

Theorem 1.1.1. The random walk Xn, n ≥ 0 on Z will return to 0 with probability 1 ifand only if it is symmetric, i.e. p = q = 1

2 .

1.2 Recurrence of random walk on Zd

This section refers to 1.2.1 and 1.2.2. in (Stroock 2000). We have seen in the previous sectionthat strarting from 0, a symmetric random walk comes always back to 0. This is one stepforward in the study of the asymptotic behavior of stochastic process. The time of first returnis often call the recurrence time, that is to say the random walk is recurrent if P(ρ0 <∞) = 1.Not recurrent walks are said to be transient.

1Here p ∧ q = min(p, q).

10


We have already seen that a nearest neighbor random on Z is recurrent if and only if itis symmetric. We are interesting on the same kind of results in higher dimensions. Here, wediscuss some recurrence properties for general random walks in Zd.

We first define the set Nd by the set of nearest neighbors of the origin 0d in Zd. It consistsof the 2d points in Zd for which (d− 1) coordinates are 0 and the remaining coordinate is in−1, 1.

Definition 1.2.1 (Nearest neighbor random walk on Zd). We consider a family of independentand identically distributed Nd−valued random variables B1, . . . ,Bn, . . . A nearest neighborrandom walk on Zd is a family Xn, n ≥ 0 such that

X0 = 0d and Xn =n∑

m=1

Bm, n ≥ 1.

When B1 is uniformly distributed on Nd, the random walk is said to be symmetric.

Equivalently, such Xn, n ≥ 0 satisfies

P(X0 = 0d) = 1 and for any n ≥ 1, ε ∈ Nd,

P(Xn −Xn−1 = ε|X0, . . . ,Xn−1) = pε, with pε = P(B1 = ε).

When B1 is uniformly distributed on Nd the random walk is said to be symmetric. Wedenote by ρ0 the first return to the origin equal to n if n ≥ 1, Xn = 0 and Xm 6= 0 for any1 ≤ m ≤ n. We take ρ0 = 0d if no such n ≥ 1 exists.

Definition 1.2.2. The random walk is recurrent (resp. transient) if P(ρ0 < ∞) = 1 (resp.P(ρ0 <∞) < 1).

We now denote by ρ(n) the nth return to 0, that is

ρ(1) = ρ,

and for n ≥ 2

ρ(n−1)0 <∞ =⇒ ρ

(n)0 = infm > ρ

(n−1)0 , Xm = 0d, and ρ

(n−1)0 =∞ =⇒ ρ

(n)0 =∞.


P(ρ(n+1)0 <∞) = P(ρ

(n)0 <∞)P(ρ0 <∞),

and deduce thatP(ρ

(n)0 <∞) = P(ρ0 <∞)n, n ≥ 1.

Interpretation?

We now define T0 :=∑∞

n=0 10d(Xn), that is the number of times that X reaches the origin.Since X0 = 0d, we have T0 ≥ 1.

11



E[T0] =1

1− P(ρ0 <∞)=

1

P(ρ0 =∞).

Hint: Note that T0 ≥ n⇐⇒ ρ(n)0 <∞ for n ≥ 1.

Question 22. Deduce the following proposition

Proposition 1.2.1. T0 is a random variable such that

P(T0 <∞) > 0 =⇒ E[T0] <∞,

E[T0] =∞ =⇒ P(T0 =∞) = 1.

Finally, by noting that (easy exercise!)

E[T0] =∞∑n=0

P(Xn = 0d), (1.2)

we deduce from Question 21 that

Xn, n ≥ 0 is recurrent if and only if∞∑n=0

P(Xn = 0d) =∞. (1.3)

1.3 Recurrence of the symmetric random walk in Z2

This section refers to 1.2.3 in (Stroock 2000). We aim at finding a lower bound for P(Xn = 0d)for a special set of indexes n such that the sum on this set of indexes goes to +∞. Hence from(1.3) we will derive a sufficient condition (and only sufficient because it is a lower bound!) onthe dimension d ensuring that the random walk in Zd is recurrent.

The intuition for a symmetric random walk is the following: when the random walk is sym-metric then we expect that 0d is the more likely place for even time.

Question 23. Prove thatP(X2n = k) ≤

∑j∈Zd|P(Xn = j)|2,

and deduce that if Xn, n ≥ 0 is symmetric then

maxj∈Zd

P(X2n = j) = P(X2n = 0d).

Question 24: Application d = 1. Deduce from Question 23. that for d = 1, the symmetricrandom walk in Z is recurrent.

In order to deal with the general case, we need the following Lemma

Lemma 1.3.1. If the random walk Xn, n ≥ 0 is symmetric, then E[|Xn|2] = n.

12


Proof. Note that each Bn is a random variable centred with variance 1d . The mutual in-

dependence leads to X(i)n is centred with variance n

d , where X(i)n denotes the ith coordinate

Xn.


1

2≤ P(|X2n| < 2

√n) ≤ (4

√n+ 1)dP(X2n = 0d) ≤ 2d−1(4n

d2 + 1)P(X2n = 0d),

to deduce the following theorem

Theorem 1.3.2. The symmetric nearest neighbor random walk on Z2 is recurrent.

1.4 Transience in Zd for d ≥ 3

This section refers to 1.2.4 in (Stroock 2000) and is more tricky than the previous sections, itcan be omitted for the first reading. The lower bound obtained in Question 25 is not enoughto conclude for d ≥ 3 on the recurrence property of the associated random walk. We thus arelooking for an upper bound in this section.

Upper bound for d = 1. Question 26. Prove that for the symmetric random walk on Zwe have

P(X2n = 0) ≤ e32 (2√n− 1)−1. (1.4)

Hint: Find a lower bound for P(X2n=2j)P(X2n=0) recalling that log(1− x) ≥ −3x

2 wen 0 ≤ x ≤ 12 .

General symmetric random walk in Zd. From now, for any ε ∈ Nd we assume thatpε = 1

2d .

Question 27. We define a sequence of d mutually independent symmetric nearest neighborrandom walks on Z by (Xi,n, n ≥ 0)0≤i≤d. We set a sequence of 1, . . . , d−valued, mutuallyindependent and uniformly distributed random variables In, n ≥ 0 which are independentof (Xi,n, n ≥ 0)0≤i≤d. We set

Ni,0 = 0 and Ni;n =

n∑m=1

1(i)(Im), n ≥ 1, 1 ≤ i ≤ d.

We finally define for each n ≥ 0

Yn = (X1,N1,n , . . . , Xd,Nd,n).

Check that (Yn)n≥0 defines a random walk and therefore has the same distribution than(Xn)n≥0.

Question 28. Deduce that

P(X2n = 0d) ≤ e3d2(2

√n

d− 1)−d

+ P(Ni,2n ≤n

dfor some 1 ≤ i ≤ d).

13



P(Ni,2n ≤n

dfor some 1 ≤ i ≤ d) ≤ dP(N1,2n ≤

n

d) ≤ de−2n

2

d2 .

Question 30. Deduce from Question 28, Question 29 together with Proposition 1.2.1 that thefollowing theorem holds:

Theorem 1.4.1. The symmetric nearest neighbor random walk on Zd with d ≥ 3 is transient.

14


Chapter 2

Stochastic Processes: Definitions andExamples

In this chapter (Ω,A,P) denotes a probability space.

2.1 Stochastic process

Definition 2.1.1. Let X := (Xn)n∈N be a sequence of random variables on (Ω,A) with valuesin a measurable space (E, E). We say that X is a stochastic process.

An example of stochastic process is the random walk introduced in the first chapter where(E, E) is (Zd,B(Rd)).

We now introduce the notion of information available at any time n.

Definition 2.1.2 (Filtration). A filtration on A is an increasing1 sequence F := (Fn)n≥0 ofsub-σ-algebra of A. We say that (Ω,A,F,P) is a filtered probability space.

The monotonicity in the definition of a filtration means that the information accumulatesalong the time period without forgetting the past.

Example 1. Let X be a stochastic process in the sense of Definition 2.1.1. Then

FXn := σ(Xi, 1 ≤ i ≤ n), n ≥ 0,

is a filtration on A called the natural filtration of X. In particular, if E = R and X0 isconstant then FX0 = ∅,Ω.

This definition of filtration allows us to details more the information structure of a stochas-tic process.

Definition 2.1.3. Let X := (Xn)n∈N be a stochastic process on (Ω,A) and F := (Fn)n∈N bea filtration on A. Then

• X is adapted to F if Xn is Fn−measurable for any n ∈ N,

• X is F−predictable if Xn is Fn−1−measurable for n ≥ 0 where F−1 = ∅,Ω.1Increasing has to be understood in the sense of the inclusion of sets.

15


2.2 Stopping time

We now introduce the notion of stopping time associated with a filtration F fixed in thissection.

Definition 2.2.1. A stopping time τ associated with F is a random variable with values inN ∪ ∞ such that

τ = n ∈ Fn, for all n ≥ 0.

We also have the following characterization

Proposition 2.2.1. τ is a stopping time associated with F if and only if

τ ≤ n ∈ Fn, n ≥ 0.

Proof. Assume that τ is a stopping time. Then

τ ≤ n =n⋃i=0

τ = i ∈ Fn,

by using the stability of a σ-algebra with respect to a countable union of set inside itself.Conversely, assume that τ satisfies for each non negative integer i, τ ≤ i ∈ Fi. Note

that2 τ = n = τ ≤ n∩τ ≤ n−1c. Since τ ≤ n−1 ∈ Fn−1 ⊂ Fn, by the monotonicityproperty of a filtration, we deduce (from the stability of Fn by taking the complementary ofan event inside it and the union of two of its events) that

τ = n = τ ≤ n ∩ τ ≤ n− 1c ∈ Fn

We now give some stability properties of stopping times

Proposition 2.2.2 (Stability of stopping times). We consider a filtration F associated to eachconsidered stopping time in this proposition. Let τ, θ be two stopping times. Then, τ ∧ θ, τ ∨ θand τ + θ are stopping times.

Proof. We use Proposition 2.2.1 for the proofs. We begin to prove that τ ∨ θ is a stoppingtime. From the stability of countable intersection in a σ-algebra we have

τ ∨ θ ≤ n = τ ≤ n ∩ θ ≤ n ∈ Fn.

We turn to τ ∧ θ. Note that τ ∧ θ > nc = τ ∧ θ ≤ n. Hence, it is enough to show thatτ ∧ θ > n ∈ Fn since a σ-algebra is stable by the complementary. We note that

τ ∧ θ > n = τ > n ∩ θ > n ∈ Fn

since τ > n ∩ θ > n = (τ ≤ n ∪ θ ≤ n)c ∈ Fn, by the stability of a σ-algebra bytaking the complementary and a countable intersection of events. Thus τ ∧ θ > n ∈ Fn andso τ ∧ θ ≤ n ∈ Fn.

2For an event A, the notation Ac denotes its complementary, that is A ∪Ac = Ω and A ∩Ac = ∅.

16


We now turn to τ + θ. Note that

τ + θ ≤ n =n⋃k=0

(τ ≤ n− k ∩ θ = k),

since for any k ≤ n we have (from the monotonicity of F) τ ≤ n − k ∈ Fn−k ∈ Fn andθ = k ∈ Fk ∈ Fn we have τ + θ ≤ n ∈ Fn.

Example 2 (First reaching time). Let X be a process F−adapted (Rd,B(Rd))−valued. Forany Borelian A ∈,B(Rd) we define the first reaching time by

τA := inf n ≥ 0, Xn ∈ A,

with the convention inf ∅ = +∞. Then, τA is a stopping time.

Proof. Note thatτA ≤ n =

⋃k≤nXk ∈ A =

⋃k≤n

X−1k (A)

Since X is F−adapted, we deduce that for each k ≤ n, X−1k (A) ∈ Fk ∈ Fn. Hence, by the

stability of countable unions, we deduce that τA ≤ n ∈ Fn.

2.3 Information available at a stopping time

Here (Ω,A,P,F) is a filtered probability space. We define the information available at astopping time τ , be setting

Fτ := A ∈ A, A ∩ τ = n ∈ Fn, n ≥ 0.

Proposition 2.3.1. Let τ, θ be two stopping times. Then

(i) Fτ is a sub-σ-algebra of A,

(ii) τ ≤ θ, θ ≤ τ and τ = θ are in Fτ ∩ Fθ,

(iii) if τ ≤ θ then Fτ ⊂ Fθ

(iv) for any F-adapted stochastic process X, the random variable Xτ is Fτ− measurable.

Proof. Proof of (i). It is clear that Ω ∈ Fτ since Fn is a σ-algebra. Assume now that A ∈ Fτthen Ac ∩ τ = n = τ = n \ (A ∩ τ = n) = τ = n ∩ (A ∩ τ = n)c ∈ Fn. Hence,Ac ∈ Fτ . Let now a sequence of events Ai such that Ai ∈ Fτ for any i ∈ N. Then,⋃

i∈NAi ∩ τ = n =

⋃i∈N

(Ai ∩ τ = n),

since (Ai ∩ τ = n) ∈ Fn, we deduce that⋃i∈N(Ai ∩ τ = n) ∈ Fn and so

⋃i∈NAi ∈ Fτ .

Proof of (ii) Proving the property for τ ≤ θ or θ ≤ τ is equivalent. We prove it forτ ≤ θ. Note that τ ≤ θ ∩ τ = n = θ ≥ n ∩ τ = n = θ ≤ n− 1)c ∩ τ = n ∈ Fnsince θ < n = θ ≤ n− 1 ∈ Fn−1 ∈ Fn. Thus τ ≤ θ ∈ Fτ .

17


Similarly, τ ≤ θ ∩ θ = n = θ = n ∩ τ ≤ n =∈ Fn hence τ ≤ θ ∈ Fθ. Cos-nequently, τ ≤ θ ∈ Fτ ∩ Fθ.

τ = θ ∈ Fτ ∩ Fθ is clear.

Proof of (iii) Let A be in Fτ . We are on the subset of events τ ≤ θ. Then, θ = n ⊂τ ≤ n, thus

A ∩ θ = n = A ∩ (τ ≤ n ∩ θ = n) = (A ∩ τ ≤ n) ∩ θ = n.

Since (A ∩ τ ≤ n) ∈ Fn we deduce that A ∩ θ = n ∈ Fn thus A ∈ Fθ.

Proof of (iv) Let A be in E . We have Xτ ∈ A ∩ τ = n = Xn ∈ A ∩ τ = n ∈ Fnsince X is F−adapted. Hence, Xτ ∈ A ∈ Fτ for any A ∈ E .

2.4 The random walk

It is our guideline example. It is clearly a stochastic process. No more about it for the moment.

2.5 The branching process

Section 1.4 in (Resnick 2005).

We consider a population starting with a progenitor forming the generation 0. This progen-itor can split into k offspring with probability pk. These offspring constitue the first generation.We assume each of the first generation offspring independently split into a random number ofoffspring. The probability that the offspring of this second generation formed by an individualof the first offspring is of size k is also pk. And so until the possible extinction occurringwhen all the member of the same offspring fail to have offspring. This kind of model wasfirst introduce for nuclear fission experiments for the cascading neutrons. Mathematically, wedefined a sequence of iid non negative integer valued random variables Zn;j , n ≥ 1, j ≥ 1taking the value k ∈ N with probability pk. We define a branching process Zn, n ≥ 0 byZ0 = 1 (one progenitor), Z1 := Z1;1 (the first offspring) then by induction

Zn := Zn;1 + · · ·+ . . . Zn;Zn−1 ,

where Zn;j is the number of members of the nth generation which are offspring of the jthmember of the n − 1st generation. Note that Zn = 0 implies of course Zn+1 = 0. We alsoremark that Zn−1 is independent of the Zn;j .

Extinction probability: we denote the event that the extinction holds by extinction .We have

extinction =⋃n≥1

Zn = 0.

We also define the probability of extinction by

π := P( extinction ).

We have in particular the following result

18


Theorem 2.5.1. If m := E[Z1] ≤ 1 then π = 1. If m > 1 then π < 1 and is the uniquesolution of

s = P (s),

where P (s) = E[sZ1 ] =∑

k≥0 pksk.

The proof of this theorem is extract from Section 1.4 in (Resnick 2005) and adapted hereas an exercise.

Question 1. By setting Pn(s) := E[sZn ] for any n ≥ 1 and s ∈ [0, 1] prove that

Pn(s) = Pn−1(P (s)) = P (Pn−1(s)).

Question 2. Let πn := P(Zn = 0). Prove that (πn)n≥1 is non-decreasing and converges to π.

Question 3. Deduce that πn+1 = P (πn) and so that π is solution to P (s) = s.

Question 4. Assume now that q is another solution to P (s) = s with q ∈ [0, 1], prove thatπ ≤ q. It means that π is the smallest solution in [0, 1] of P (s) = s.

Question 5. Prove that P is convex and by noting that P (0) = 0, prove that the graphsof P and the identity on [0, 1] have at most 2 crossing points.

Question 6. Deduce that if m ≤ 1 then π = 1 and otherwise if m < 1 then π < 1.

2.6 Gaussian vectors

2.6.1 Refreshers on pair of random variables

We begin with a refresher on the existence of a density for a pair of random variables in thecontinuous case. This follows the case of joint law of discrete random variables seen in thelecture of MAA 202. We refer to Sections 8.3 and 8.4 in (Williams, 2001).

Definition 2.6.1. We say that a pair of real random variables (X,Y ) has a density withrespect to the Lebesgue measure on R2 if there exists a function f(X,Y ) from R2 into R+ ×R+

such that for any a1 < b1 and a2 < b2 we have

P((X,Y ) ∈ [a1, b1]× [a2, b2]) =

∫[a1,b1]

∫[a2,b2]

f(X,Y )(x, y)dxdy,

and for any function h from R2 into R we have

E[h(X,Y )] =

∫R2

h(x, y)f(X,Y )(x, y)dxdy,

under the condition∫R2 |h(x, y)|f(X,Y )(x, y)dxdy < +∞.

Proposition 2.6.1. Assume that the pair of random variables (X,Y ) admits a density f(X,Y )

then X and Y admit respective density functions fX and fY such that

fX(x) =

∫Rf(X,Y )(x, y)dy, fY (y) =

∫Rf(X,Y )(x, y)dx.

fX (resp. fY ) is called the marginal with respect to X (resp. Y ).

19


Remark 2.6.2. Be careful, the proposition says that if (X,Y ) has a density, then X and Yhave a density. In general if X and Y have a density, there is no reasons that the pair (X,Y )has a density! See Exercise ST2. When X and Y are independent (see 8.4 in (Williams 1991)or Exercise ST3.), the product of the marginal is the density of the pair and conversely.

2.6.2 Definition

Definition 2.6.3. A normal law with mean m and variance 0 is said to be degenerated andis constant equals to m.

Definition 2.6.4. A random vector X with values in Rd is called a Gaussian vector if anylinear combination of its components is a Gaussian random variable (possibly degenerated ifone combination is a constant).

For any Gaussian vector X = (X1, . . . , Xd)>, its mean vector is the vector E[X] of Rd

defined byE[X] = (E[Xi])1≤i≤d,

and its variance/covariance matrix Σ is a matrix with size d× d such that3

Σi,j = Cov(Xi, Xj).

BWe note that a vector of INDEPENDENT normal random variables is a Gaussian vector,since any linear combinations of independent normal random variable has a normal law.

However, a vector of normal random variables is NOT in general a Gaussian vector (it iswell-known that a sum of normal laws is not necessarily normal itself see Exercise ST4).

2.6.3 Properties

In all this section > denotes the transposition operator so that for any row (resp. column)vector u, the corresponding column (resp. row) vector is u> and for any matrix M , we denoteby M> its classical transpose.

Proposition 2.6.2. Let X = (X1, . . . , Xd)> be a Gaussian vector in Rd and denotem := E[X]

and Σ its covariance matrix. The characteristic function of X exists and is given for anyu ∈ Rd by

E[ei〈u,X〉] = ei〈u,m〉−12〈u,Σu〉.

Proof. Note that 〈u,X〉 has a normal distribution with mean 〈u,m〉 and covariance u>Σu.

In particular, we have the following (admitted4) corollary which says that a Gaussianvector has a density as soon as its variance matrix is not degenerated, coinciding with amultivariate normal distribution.

3We recall that Cov(X,Y ) = E[XY ]−E[X]E[Y ] together with Cov(X,Y ) =Cov(Y,X) (symmetry) and forany α, β ∈ R we have the linearity property Cov(αX, βY ) = αβCov(X,Y ).

4The proof that f is indeed a density is based on a change of variable on Rd, it is not at the academicprogram of this course.

20


Corollary 2.6.5. Let X = (X1, . . . , Xd)> be a Gaussian vector in Rd and denote m := E[X]

and Σ its covariance matrix. Assume that det(Σ) 6= 0 Then, X admits a multidimensionalnormal distribution with mean m and variance matrix Σ with density on Rd given by

f(x) =1

(2π)d/2√det(Σ)

e−〈x−m,Σ−1(x−m)〉

2 , x ∈ Rd.

We denote by Nd(m,Σ) the law of a random variable having the density f .

Proposition 2.6.3. Let A be a d × d matrix and B a vector in Rd. Let X be a Gaussianvector with mean m and variance matrix Σ. Then AX + B is a Gaussian vector (possiblydegenerated) with mean Am+B and variance matrix AΣA>.

Proposition 2.6.4 (Independency property). Let X = (X1, . . . , Xd)> be a Gaussian vector

in Rd. Then,

∀(i, j) ∈ 1, . . . , d, Xi and Xj are independent ⇐⇒ Cov (Xi, Xj) = 0.

Proof. The direct part is obvious. Conversely, we use Proposition 2.6.2 together with ExerciseST3.

The previous corollary is very useful in statistics since its says that as soon as you have aGaussian vector as a set of datas, the covariance is enough to check the independency of thecomponents.

2.7 Exercises

Exercise ST1a Let Xn, n ≥ 0 be a stochastic process on (Ω,A,P) taking values in themeasurable space (E, E) and τ a stopping time. Then the map Xτ defined from Ω into E by

Xτ (ω) := Xτ(ω)(ω),

is a random variable on (Ω,A,P).

Exercise ST1b [Compound process] Let Zn, n ≥ 1 be a stochastic process such that allthe variables Zi are independent and identically distributed with mean 0 and finite varianceσ2. Let

Tn :=

n∑i=1

Zi

and let M be a stopping time with respect to the filtration (FZn )n such that E[M ] < ∞.Compute E[TM ] and Var(TM ).

Exercise ST2 [Same marginals but not the same law] Let (X,Y ) and (X ′, Y ′) be two pairof random variables with respective density functions

f(X,Y )(x, y) =1

4(1 + xy)1[−1,1]2(x, y) and f(X′,Y ′)(x

′, y′) =1

41[−1,1]2(x′, y′).

1. Check that these functions indeed characterized density function.

2. Show that the law of (X,Y ) and the law of (X ′, Y ′) are different.

3. Show that (X,Y ) and (X ′, Y ′) have the same marginal laws, i.e. X and X ′ have thesame law, and Y and Y ′ have the same law.

21


Exercise ST3. Let X and Y be two real random variables with density fX and fY respec-tively. Then:

X and Y are independent⇐⇒(X,Y ) admits a density f(X,Y ) given by f(X,Y )(x, y) = fX(x)fY (y).

Exercise ST4. [Vector of Gaussian r.v. is not a Gaussian vector ]Let X1 be a normal law N (0, 1) and X2 := εX1 where ε is a random variable independent

of X1 such that P(ε = 1) = 1−P(ε = −1) = 12 . What is the law of X2? Is (X1, X2) a Gaussian

vector?

Exercise ST5. Let X be a centred Gaussian vector in R2 with covariance matrix

Q :=

(2 −1−1 2

)1. Let A be a 2× 2 matrix. Recall what is AX (mean and covariance matrix).

2. Find a 2× 2 matrix A such that the components of AX are independent.

Hint: We recall that if a matrix M is symmetric, then there exists an orthogonal matrix Osuch that O>MO is diagonal.

22


Chapter 3

Conditional Expectation

References

• Lecture notes MAA203 by Giovani Conforti.

• Probability with martingale, David Williams, Cambridge University Press, 1991.

• Martingale en temps discret et chaines de Markov, Nizar Touzi, Lecture notes Engeneringprogram second year, 2009.

This chapter refers mainly to the chapter 9 of (Williams, 1991) and follows the same linesthan (Touzi, 2009).

3.1 Intuitive example for discrete random variables and condi-tional expectation with respect to an event

3.1.1 Intuitive example and discrete random variable

Let (Ω,A,P) be a probability space and X,Z be real random variables such that

• X takes the distinct values xi, 1 ≤ i ≤ m

• Z takes the distinct values zj , 1 ≤ j ≤ n.

We recall that the conditional probability of X given the event Z = zj with P(Z = zj) > 0is given by

P[X = xi|Z = zj ] :=P[(X,Z) = (xi, zj)]

P[Z = zj ], 1 ≤ i ≤ m, 1 ≤ j ≤ n, (3.1)

i.e. among all events such that Z = zj holds, the quantity P[X = xi|Z = zj ] is the frequencyof the event X = xi. The conditional expectation is thus defined by

E[X|Z = zj ] =

m∑i=1

xiP[X = xi|Z = zj ], 1 ≤ j ≤ n.

23


We thus define a random variable ξ := E[X|Z], called the conditional expectation of X withrespect to Z as follows:

if Z(ω) = zj , then ξ(ω) := E[X|Z(ω)] =: ξj . (3.2)

Note that ξ is fully determined by the realisation of Z. In other words, and in view of (3.2),we formally have

ξ is σ(Z)−measurable. (3.3)

In addition to that, note that ξ takes the constant ξj on the event Z = zj. Hence, we obtain

E[ξ1Z=zj ] = ξjP(Z = zj) =m∑i=1

xiP(X = xi|Z = zj)P(Z = zj) = E[X1Z=zj ].

We note that

E[X|Z = zj ] = ξj =E[X1Z=zj ]

P(Z = zj).

If we now set Gj = Z = zj, the previous equality says E[ξ1Gj ] = E[X1Gj ]. Note that forany G ∈ σ(Z), the random variable 1G can be written as a sum of 1Gj with Gj = Z = zj.We thus obtain

E[ξ1G] = E[X1G], ∀ G ∈ σ(Z). (3.4)

In other words:〈X − E[X|Z],1G〉2 = 0, ∀ G ∈ σ(Z).

We thus note that X−E[X|Z] is orthogonal to the vectorial space composed by all the randomvariables σ(Z)−measurable. Geometrically speaking, (3.3) and (3.4) emphasize that ξ is theL2-orthogonal projection of X on the space of σ(Z)-measurable random variables. So ξ isthe best σ(Z)−measurable random variable approaching X in L2.

3.1.2 Conditional expectation with respect to an event

Assume that X is a random variable (not necessarily discrete). For any event B ⊂ Ω withP(B) > 0 we define

E[X|B] :=E[X1B]

P(B).

Again:

E[X1B] = E[E[X1B]

P(B)1B] = E[E[X|B]1B]

Our aim is to extend this definition to more general information structure.

Particular case: X is a discrete random variable. Assume that Y is a real randomvariable such that P(Y ∈ A) > 0 with A ∈ B(R). Hence,

E[X|Y ∈ A] =E[X1Y ∈A]

P(Y ∈ A)=

∑i xiP(X = xi, Y ∈ A)

P(Y ∈ A)=∑i

xiP(X = xi|Y ∈ A).

24


3.2 Conditional expectation: definition, existence, uniqueness

3.2.1 Projection in L2

We consider now a general probability space (Ω,A,P) and a sub-σ-algebra denoted by F ofA. Let X be a real random variable such that X ∈ L2(A,P). In view of the previous intuitivesection, we define the following projection operator in the sense of the scalar product in L2 by

PF (X) := argmin E[|X − Y |2], Y ∈ L2(F ,P).

In other words, PF (X) has to be seen as the projection of the A−measurable random variableX on the set of F−measurable random variables.

Lemma 3.2.1. The operator PF is well defined on (Ω,A,P) and such that

(i) E[X1F ] = E[PF (X)1F ], ∀F ∈ F , X ∈ L2(A,P).

(ii) X ≥ 0, =⇒ PF (X) ≥ 0, P− a.s

Proof. Proof of (i) (partially admitted). We admit (see Theorem 6.11 in Williams 1991 ) thatthe operator PF is well defined and that there exists a unique1 random variable ξ := PF (X)in L2(F ,P) such that

〈X − ξ,G〉2 = E[(X − ξ)G] = 0, ∀G ∈ L2(F ,P). (3.5)

In particular, let G = 1F ∈ L2(F ,P) with F ∈ F , we obtain Property (i).

Proof of (ii). Assume now that X ≥ 0. Set again ξ := PF (X) and consider the event

F := ξ ≤ 0 ∈ F . Then2, 0 ≤ E[X1F ](i)= E[ξ1F ] = E[−ξ−] ≤ 0. Since by definition ξ− ≥ 0

we deduce that ξ− = 0 thus (ii) holds.

As a consequence of (i) with F = Ω, we have E[PF (X)] = E[X].

3.2.2 Fundamental existence theorem

We can now state the following fundamental existence theorem.

Theorem 3.2.2 (See Theorem 9.2 in Williams 1991 ). For any A−measurable real randomvariable X such that E[|X|] < +∞, there exists a random variable ξ such that

(a) ξ is F−measurable,

(b) E[|ξ|] < +∞,

(c) for any F ∈ F , we have E[X1F ] = E[ξ1F ].

Moreover, if ξ is another random variable satisfying (a)− (b)− (c) then ξ = ξ, P− a.s..

We can now defined rigorously the conditional expectation1In particular, we can check that this operator is linear.2Here x− denotes the classical engative part of any real x, that is x− = −min(x, 0), i.e. x− = 0 if x ≥ 0

and x− = −x otherwise.

25


Definition 3.2.3. A random variable satisfying the properties (a)− (b)− (c) of the previoustheorem is called a version of the conditional expectation of X given F denoted by E[X|F ] andis unique P− a.s.

Remark 3.2.4. We insist on the fact that the conditional expectation of a random variableis (in general) a random variable. That is why it is defined P − a.s.. Any random variable ξsatisfying (a) − (b) − (c) is called a version of the conditional expectations. All versions areequal almost surely.

Remark 3.2.5. If F = σ(Y1, . . . , Yn) where (Yi)1≤i≤n is a family of random variables, wesimply write E[X|Y1, . . . , Yn] for E[X|F ].

Proof of Theorem 3.2.2, see Section 9.5 in Williams 1991. The proof is divided in three steps.

Step 1: Uniqueness under existence. We begin to prove the uniqueness. Let ξ and ξ satis-fying (a)− (b)− (c). Then, for any F ∈ F we have E[(ξ− ξ)1F ] = 0. Since (a) holds for bothξ and ξ the event ξ− ξ ≥ 0 is in F . Thus by choosing F := ξ− ξ ≥ 0 we deduce from (c)that E[(ξ − ξ)+] = 0 and so (ξ − ξ)+ = 0, a.s. Similarly, by choosing F := ξ − ξ ≤ 0 we get(ξ − ξ)− = 0, a.s.. Therefore, ξ = ξ almost surely.

We now turn to the existence of such ξ.

Step 2: Existence for X ≥ 0. Assume first that X is a non negative random variablesuch that E[|X|] < +∞. We consider the random variable Xn := X ∧ n for n ≥ 1. Xn isbounded and so is in L2(A,P). According to Lemma 3.2.1, we can define the random variableξn := PF (Xn) which satisfies (a)− (b)− (c). We now prove that

(ξn)n∈N is an increasing sequence of random variables. (3.6)

Note that E[((Xn+1 −Xn)− (ξn+1 − ξn))1G] = 0 by definition of PF for any G ∈ F . By theuniqueness of the projection operator (see (3.5)) we obtain

PF (Xn+1 −Xn) = ξn+1 − ξn.

Since now Xn+1 ≥ Xn we deduce from Property (ii) of Lemma 3.2.1 that (3.6) is true.

We now set ξ := limnξn. Note that ξ ≥ 0. Hence, ξ satisfies (a) and by the monotone

convergence theorem (c) is also satisfied. Moreover,

E[ξ] = E[limnξn] = lim

nE[ξn] = lim

nE[Xn] ≤ E[X],

where the first equality is the definition of ξ, the second equality follows from monotone con-vergence theorem, the third equality is a consequence of (iii) of Lemma 3.2.1 applied to ξn

with Xn and the last inequality is obtained by noting that Xn ≤ X. Hence, (b) is satisfied.

Step 3: Existence. Let now X be a random variable such that E[|X|] < +∞. Since X =X+ −X− we can apply the Step 2. to X+ and X− with ξ+ := PF (X+) and ξ− := PF (X−).Since the operator PF is linear, we easily check that ξ := ξ+− ξ− satisfies (a)− (b)− (c).

26


Figure 3.1: Geometrical illustration of the conditional expectation in L2. Extracted from thecourse probability refresher, 2016-2017 of Lucas Gérin.

Example 3. Let F := ∅,Ω and X a random variable such that E[|X|] < +∞. Then,Condition (a) says that ξ := E[X|F ] is constant so that ξ = E[ξ]. Condition (c) gives theexplicit value of this constant: ξ = E[X]. In other words, the conditional expectation of X isits expectation for the trivial filtration.

Example 4. Let X be a random variable such that E[|X|] < +∞ with ξ := E[X|F ]. Wetake F := σ(X). By choosing F+ := X − ξ ≥ 0 ∈ σ(X). From Condition (c) we getE[(X − ξ)1X−ξ≥0] = 0 and thus (X − ξ)+ = 0, P − a.s. Similarly, by considering F− :=X − ξ ≤ 0 ∈ σ(X), we get (X − ξ)− = 0, P− a.s. Consequently,

X = E[X|σ(X)], P− a.s.

In particular, as soon as X is square integrable, we note that the orthogonal projection onL2(F ,P) satisfies the properties required by the conditional expectation.

Corollary 3.2.6. For any A−measurable real random variable X such that E[|X|2] < +∞,then the conditional expectation E[X|F ] coincides with PF (X).

3.3 Some properties

Let X be a random variable defined on the probability set (Ω,A,P) such that E[|X|] < +∞.Let F be a sub-σ-algebra of A. Then

1. E[E[X|F ]] = E[X],

2. if X is F−measurable, then E[X|F ] = X,

3. (Linearity) for any α, β ∈ R and any random variables (X,Y ) defined on the probabilityset (Ω,A,P) such that E[|X|+ |Y |] < +∞, we have

E[αX + βY |F ] = αE[X|F ] + βE[Y |F ],

4. (Positivity) if X ≥ 0, P− a.s. then E[X|F ] ≥ 0, P− a.s.,

27


5. (cvMonotone) if (Xn)n is an increasing sequence of non-negative random variables con-verging almost surely to X then

limn

E[Xn|F ] = E[X|F ], P− a.s.,

6. (cvFatou) if (Xn)n is a sequence of random variables with Xn ≥ 0, P− a.s. then

lim infn

E[Xn|F ] ≤ E[lim infn

Xn|F ], P− a.s.,

7. (cvDominated) let (Xn)n be a sequence of random variables such that there exists apositive random variable V (independent of n) satisfying |Xn| ≤ V, P−a.s. with E[V ] <+∞ and assume that Xn converges almost surely to X, then

limn

E[Xn|F ] = E[X|F ], P− a.s.,

8. (Jensen) let φ be a convex function on R such that E[|φ(X)|] < +∞ then

φ(E[X|F ]) ≤ E[φ(X)|F ], P− a.s.,

and in particular ‖E[X|F ]‖p ≤ ‖X‖p for any p ≥ 1.

9. (Tower property) Let G be a sub-σ algebra of F , that is G ⊂ F ⊂ A, then

E[E[X|F ]|G] = E[X|G], P− a.s..

Remark 3.3.1. As a consequence of Property 4. only, if X and Y are two integrable randomvariables such that X ≤ Y , we have E[X|F ] ≤ E[Y |F ]. And in particular

|E[X|F ]| ≤ E[|X||F ].

Remark 3.3.2. The tower property 9. is very natural if we see the conditional expectation asa projector operator: it says that projecting X on the set of random variable F−measurablethen on the set of G−measurable is equivalent to project X directly on the smallest set, i.e.the set of G−measurable random variables.

Proof of the properties. We refer to Section 9.8 in (Williams 1991). Property 1. is obtainedby taking F = Ω in Theorem 3.2.2 (c). Property 2. is trivial.

Property 3. We want to compute E[αX + βY |F ]. Note that αE[X|F ] + βE[Y |F ] is bydefinition F-measurable. Now we have

E[|αE[X|F ] + βE[Y |F ]|

]= |α|E[|E[X|F ]|] + |β|E[|E[Y |F ]|] < +∞,

by definition of E[X|F ] and E[Y |F ]. For any F ∈ F we have by using the linearity of theexpectation together with 1. and the property (c) of Theorem 3.2.2 applied to E[X|F ] andE[Y |F ]

E[(αE[X|F ] + βE[Y |F ])1F ] = αE[X1F ] + βE[Y 1F ] = E[(αX + βY )1F ].

Hence, by using the uniqueness of the conditional expectation, Property 3. follows.

28


Property 4. We have seen it in the second step of the proof of Theorem 3.2.2.

Property 5. For each n we denote by Yn a version of E[Xn|F ]. By Properties 3. and 4. wededuce that Yn ≥ 0 and the sequence (Yn)n is non-decreasing. We take

Y := lim supnYn.

Then, Y is F−measurable and limnYn = Y . Thus, since by definition

E[Yn1F ] = E[Xn1F ], F ∈ F ,

we deduce from the classical monotone convergence theorem for expectation that for anyF ∈ F

E[X1F ] = limn

E[Xn1F ] = limn

E[Yn1F ] = E[Y 1F ].

Then Y = E[X|F ] so that

limn

E[Xn|F ] = limnYn = Y = E[X|F ].

Property 6. Exercise by using classical Fatou’s theorem.

Property 7. Let F ∈ F . We note that

|E[Xn|F ]1F | ≤ E[|Xn||F ] ≤ E[V |F ],

and E[E[V |F ]] = E[V ] <∞. Hence, by dominated convergence theorem (applied twice)

E[X1F ] = limn

E[Xn1F ] = limn

E[E[Xn1F |F ]] = limn

E[E[Xn|F ]1F ] = E[limn

E[Xn|F ]1F ],

and since limn

E[Xn|F ] is F-measurable and integrable, we deduce that

limn

E[Xn|F ] = E[X|F ].

Property 8. See (Williams 1991) Property (h).

Property 9. This property is a consequence of the definition of the conditional expectationtogether with Proposition 0.1.1.

Proposition 3.3.1 ("Taking out what is known", Section 9.7 (Williams 1991)). Let X bean A−measurable random variable and Y be an F−measurable random variable. We assumethat E[|XY |] < +∞ and E[|X|] < +∞. Then

E[XY |F ] = Y E[X|F ], P− a.s.

29


Proof. Step 1: when Y is an indicator function.Let Y = 1A with A ∈ F . Then, for any F ∈ F we have

E[Y E[X|F ]1F ] = E[E[X|F ]1A∩F ] = E[X1A∩F ] = E[XY 1F ],

where we have used the definition of Y , then the fact that A∩F ∈ F together with Property(c) in Theorem 3.2.2, then the definition of Y again.

Step 2: when X is non negative. Assume that X is non-negative, we can extend thisproperty to any Y of the form

∑mi=1 ai1Ai for m ∈ N∗ where (ai)1≤i≤m is a sequence of reals

and (Ai)1≤i≤m a sequence of events in F by using the linearity of the conditional expectation(property 3. above). Hence, from the monotone convergence theorem (property 5. above) wecan apply it for any Y non negative such that E[XY ] < +∞ and E[X] < +∞.

Step 3: the general case. Consider now general X and Y such that E[|XY |] < +∞ andE[|X|] < +∞, by decomposing X = X+ −X− and Y = Y + − Y − and by using the linearityof the conditional expectation, we deduce the result from Step 2.

The following proposition shows that when the random variable X and the filtration usedfor the conditioning are independent of an other filtration, this independent filtration playsno role in the conditional expectation. The proof is admitted3 and we refer to Proof of (k)Section 9.7 in Williams (2001).

Proposition 3.3.2 (Role of the independence, see (k) Section 9.7 in (Williams, 2001)). LetX ∈ L1(A,P) and F ,H be two sub-σ-algebra of A such that H is independent of σ(σ(X),F).Then:

E[X|σ(F ,H)] = E[X|F ].

As a consequence of this proposition, if H is independent of σ(X) then (by taking F :=∅,Ω)

E[X|H] = E[X].

3.4 Application: the case of continuous random variable admit-ting a density.

We now provide the important theorem in terms of representation of conditional expectationin this case.

Theorem 3.4.1 (Section 9.6 in (Williams 1991)). Let X and Y be two continuous real randomvariables such that (X,Y ) admits a density f(X,Y ). We moreover assume that

fY (y) =

∫Rf(X,Y )(x, y)dx > 0, y ∈ R.

3Here is the sketch of the proof: we consider the case where X is non negative (as usual...) and we provethat E[X1A] = E[E[X|F ]1A] for any A ∈ σ(F ,H). The uniqueness of the conditional expectation will thusprovide that E[X|σ(F ,H)] = E[X|F ]. To do it, a useful result of measure theory using the definition of π-system shows that it is enough to prove it for A = F ∩H with F ∈ F and H ∈ H. We thus obtain the resultby noting that

E[E[X|F ]1F∩H ] = E[E[X|F ]1F ]E[1H ] = E[X1F ]E[1H ] = E[X1F∩H ].

30


Let h be a function from R into R such that E[|h(X)|] < +∞. Then

E[h(X)|Y ] = g(Y ),

with

g(y) =

∫Rh(x)

f(X,Y )(x, y)

fY (y)dx.

The function x 7−→ fX|Y (x) = Φ(x, Y ) where

Φ(x, y) =f(X,Y )(x, y)

fY (y), x, y ∈ R

is called the conditional density of X given Y .

Proof. Note that the elements of σ(Y ) are generated by ω, Y (ω) ∈ B for any borelianB ∈ B(R). We compute for any B ∈ B(R)

E[h(X)1Y ∈B] =

∫R

∫Rh(x)1y∈Bf(X,Y )(x, y)dxdy

=

∫RfY (y)1y∈B

∫Rh(x)

f(X,Y )(x, y)

fY (y)dxdy

=

∫RfY (y)1y∈Bg(y)dy

= E[g(Y )1Y ∈B].

Hence, g(Y ) satisfies property (c) of Theorem 3.2.2. Moreover, g(Y ) is obviously σ(Y )−measurableand by noting that

∫R f(X,Y )(x, y)dy = fX(x) we get

E[|g(Y )|] ≤∫RfY (y)

∫R|h(x)|

f(X,Y )(x, y)

fY (y)dxdy =

∫R|h(x)|fX(x)dx = E[|h(X)|] <∞.

Hence, according to Theorem 3.2.2, we have

E[h(X)|Y ] = g(Y ).

3.5 Conditional expectation and independence

We also have the very useful result.

Proposition 3.5.1. Let X and Y be two real random variables. Let h be a map from R2 intoR and E[|h(X,Y )|] <∞. If X and Y are independent then

E[h(X,Y )|Y ] = H(Y ), with H(y) := E[h(X, y)].

Proof. We focus on the particular case where the pair (X,Y ) admits a density given byf(X,Y ) on R2. We can easily generalize it by considering integrals with respect to the law

31


of X denoted by PX and Y denoted by PY for which the independency4 is equivalent toP(X,Y )(dxdy) = PX(dx)⊗ PY (dy) where P(X,Y )(dxdy) is the law of (X,Y ).

Let E ∈ σ(Y ) then there exists a Borelian A such that Y (E) = A. We aim at provingthat E[h(X,Y )1E ] = E[H(Y )1E ]. Since X and Y are independent, f(X,Y )(x, y) = fX(x)fY (y)where fX (resp. fY ) denotes the density of X (resp. Y ). From Fubini Theorem, we have

E[h(X,Y )1E ] =

∫R2

h(x, y)1A(y)fX(x)fY (y)dxdy

=

∫R

(∫Rh(x, y)fX(x)dx

)1A(y)fY (y)dy

=

∫RE[h(X, y)]1A(y)fY (y)dy

=

∫RH(y)1A(y)fY (y)dy = E[H(Y )1E ].

Remark 3.5.1. In particular, let X and Y be two real random variables. Let h be a map fromR into R and E[|h(X)|] <∞. If X and Y are independent then E[h(X)|Y ] = E[h(X)].

3.6 Exercises

Exercise CE1. Let (X,Y ) be a pair of real random variables with density on R2 such that:

(i) X has a law Γ(2, λ) (i.e. with density fX(x) = λ2xe−λx1x≥0),

(ii) the conditional law of Y given X is a uniform law on [0, X] (in other words, the condi-tional density of Y given X = x is fY |X=x(y) = 1

x10<y<x).

1. Give the density of (X,Y ) together with the law of Y .

2. Find the conditional density of X given Y .

3. Compute:

(a) E [XY ] (Hint : E[X2]

= 6λ2 ),

(b) E [Y |X], E [X|Y ], E [X +XY |Y ] and E [E [Y |X]].

Exercise CE2 Let α ∈ R. Let X and Y two random variables such that

• X − Y is independent of Y ,

• The law of X − Y is Gaussian centred with variance σ2 − ν2 where ν2 < σ2.

• The law of Y is Gaussian with variance ν2.

Compute E[eαX−α2σ2

2 |Y ].

4Similarly to Exercise ST3

32


Exercise CE3 The lifetime of a machine in days is a random variable τ with mass functionf . Given that the machine is working after t days, what is the mean subsequent lifetime ofthe machine when (a) f(x) = 1

N+1 , x ∈ 0, . . . , N, (b) f(x) = 2−x, x ∈ N∗.Hint: First prove that for any random variables X,Y such that X takes values in N (or anysubset of N) we have E[X|Y ∈ A] =

∑i≥0 P(X > i|Y ∈ A) for any subset A of N.

Exercise CE4 (Conditioning and stopping time) Prove that for any stopping times τ and θon the filtered probability space (Ω,A,P,F), and for any (real) random variable X such thatE[|X|] < +∞ we have

E[E[X|Fτ ]|Fθ] = E[E[X|Fθ]|Fτ ] = E[X|Fτ∧θ].

Exercise CE5 How should we define Var(Y |X) the variance of Y given X? Show that

Var(Y ) = E[Var(Y |X)] + Var(E[Y |X]).

X and Y are random variables with correlation5 ρ. Show that E[Var(Y |X)] ≤ (1−ρ2)Var(Y ).

Exercise CE6 A coin shows heads with probability p. Let Xn be the number of throwsrequired to get a run of n consecutive head. Prove that E[Xn] =

∑nk=1 p

−k.

Exercise CE7 [Gaussian vector: E[X|Y ] by direct computations] Let (X,Y ) having a bi-variate normal density with zero means, respective variance σ2

X and σ2Y and correlation ρ with

corresponding density

f(X,Y )(x, y) =1

2πσXσY√

1− ρ2e− 1

2

((x/σX)2+(y/σY )2−2 ρxy

σXσY

)1

1−ρ2 .

Show that E[X|Y ] = ρσXσY

Y and Var(X|Y ) = σ2X(1− ρ2).

Exercise CE7bis [Gaussian vector: E[X|Y ] by using the properties of a Gaussian vector]Let (X,Y ) be a Gaussian vector with mean 0 and variance matrix

Σ =

(σ2X rr σ2

Y

)Compute E[X|Y ]. Hint: find a ∈ R such that Cov(X−aY, Y ) = 0 and use Proposition 2.6.4.

Exercise CE8. Consider the random walk on Z. With the notations of Section 1.1.4,compute E[ρ0|ρ0 < +∞]. Hint: use Section 3.1.2.

5We recall that the correlation ρ is defined by ρ = E[XY ]−E[X]E[Y ]√Var(X)

√Var(Y )

.

33


Chapter 4

Introduction to Discrete MartingaleTheory

In this chapter we focus on the behavior of a random process in mean along the time. We followthe Chapter 10 of (Williams 1991). We consider thorough this chapter a filtered probabilityspace (Ω,F ,F := (Fn)n≥0,P) where F is a filtration such that

F0 ⊂ F1 ⊂ · · · ⊂ Fn ⊂ · · · ⊂ F .

4.1 Discrete martingale, submartingale and supermartingale

Definition 4.1.1 (martingale). A stochastic process X is called a martingale relative to thefiltration (Fn)n≥0 and the probability1 P if

(i) X is F-adapted, i.e. for any n ≥ 0 the random variable Xn is Fn-measurable,

(ii) E[|Xn|] < +∞, for any n ≥ 0,

(iii) E[Xn|Fn−1] = Xn−1, P− a.s. for any n ≥ 1.

A supermartingale relative to the filtration (Fn)n≥0 and the probability P is a stochasticprocess X satisfying (i)− (ii) and

E[Xn|Fn−1] ≤ Xn−1, P− a.s. n ≥ 1.

A submartingale relative to the filtration (Fn)n≥0 and the probability P is a stochasticprocess X satisfying (i)− (ii) and

E[Xn|Fn−1] ≥ Xn−1, P− a.s. n ≥ 1.

When there is no ambiguity on the filtration and the probability, we omit to mention theassociated filtration and probability with.

Remark 4.1.2. In other words, a martingale is constant on average; a supermartingale de-creases on average; a submartingale increases on average.

1E here is the expectation under the probability P.

35


We have the following first properties

Proposition 4.1.1. Let X be a stochastic process.

1. X is martingale if and only if it is a submartingale and a supermartingale.

2. X is a super martingale if and only if −X is a submartingale.

3. If X is a martingale (resp. supermartingale, submartingale) such that X0 is an in-tegrable random variable then M := X − X0 is a martingale (resp. supermartingale,submartingale).

4. If X is a supermartingale, then for any m < n

E[Xn|Fm] ≤ Xm.

5. If g is a convex (resp. concave) function and X = (Xn)n≥0 is a martingale such thatE[|g(Xn)|] < +∞ for any n. Then, (g(Xn))n∈N is a submartingale (resp. supermartin-gale).

Proof. 1., 2. and 3. are direct consequences of Definition 4.1.1. Property 4. is a consequenceof the tower property in Section 3.3. The proof of 4. is a consequence of Jensen inequality inSection 3.3.

Property 3. shows in particular that we can focus on processes which are null at 0.

4.2 Examples

Sum of independent centered random variables. Let X1, X2, . . . be a sequence ofindependent random variables with E[|Xk|] < +∞ and E[Xk] = 0 for any k ≥ 1. We defineS0 = 0 and

Sn := X1 + · · ·+Xn,

F0 := ∅,Ω, Fn := σ(X1, . . . , Xn).

Then, Sn satisfies properties (i)− (ii) in Definition 4.1.1 relative to (Fn)n and

E[Sn+1|Fn] = E[Sn|Fn] + E[Xn+1|Fn]

= Sn + E[Xn+1]

= Sn,

by using the fact that Xn+1 is independent of Fn. Hence, S is a martingale with respect tothe filtration (Fn)n≥0.

36


Closed martingale Let (Fn)n be a filtration and ξ be a random variable such that E[|ξ|] <+∞. We define Mn := E[ξ|Fn]. Note that M satisfies properties (i)− (ii) in Definition 4.1.1.By using the tower property of conditional expectations, we have

E[Mn|Fn−1] = E[E[ξ|Fn]|Fn−1] = E[ξ|Fn−1] = Mn−1.

Hence, M is a martingale. This kind of martingale plays a very specific role in martingaletheory, so that we define the notion of closed martingale by the following

Definition 4.2.1 (Closed martingale). A stochastic process X = (Xn)n≥0 is a closed mar-tingale if there exists an integrable real random variable ξ such that Xn = E[ξ|Fn] for anyn ∈ N.

4.3 Predictable processes and martingale

We now introduce the notion of predictability of a stochastic process. The typical example isa game for which you have to determine a strategy at time n given all the history up to timen− 1. Hence we have the following definition

Definition 4.3.1. A stochastic process (Cn)n≥0 is F−predictable if for any n ≥ 1 the randomvariable Cn is Fn−1−measurable.

In particular, any F−predictable process is F−adapted.

As an example, consider a stochastic process (Xn)n∈N such that Xn is your winning attime n and let Cn be your stake on game n. Then, your total winnings up to time n is

Yn :=∑

1≤k≤nCk(Xk −Xk−1) =: (C ·X)n, Y0 = 0.

We have the following proposition (proof in Section 10.7 in (Williams 1991), for the proofof 3., you can use Proposition 3.3.1 above together with Cauchy-Schwarz inequality)

Proposition 4.3.1. 1. Let C be a uniformly bounded (with respect to n and ω) non-negative predictable process and let X be a supermartingale (resp. submartingale). Then,Y := C ·X is a supermartingale (resp. submartingale) such that Y0 = 0.

2. Let C be a uniformly bounded predictable process and let X be a martingale. Then,Y := C ·X is a martingale such that Y0 = 0.

3. In 1. and 2. we can replaced the boundedness condition on C by E[|Cn|2] < +∞ for anyn ∈ N if E[|Xn|2] < +∞ for any n.

Proof. Property 1. We have

E[Yn − Yn−1|Fn−1] = E[Cn(Xn −Xn−1)|Fn−1] = CnE[Xn −Xn−1|Fn−1] ≤ 0(resp. ≥ 0),

since Cn is Fn−1−measurable and nonnegative.Property 2. It is similar but we do not need a sign for Cn.

37


Property 3. The boundedness of C was only set to ensure that E[|Yn|] < +∞. If insteadof boundedness condition we assume that E[|Cn|2] + E[|Xn|2] <∞ then

E[|Yn|] = E[|∑

1≤k≤nCk(Xk −Xk−1)|] ≤ E[

∑1≤k≤n

|Ck||Xk −Xk−1|]

≤∑

1≤k≤nE[|Ck|2 + |Xk −Xk−1|2]

≤∑

1≤k≤nE[|Ck|2] + 2(E[|Xk|2] + E[Xk−1|2]) < +∞

4.4 Stopped martingale and Doob’s stopping Theorem

We now see how a stochastic process inherits from martingale properties when it is stoppedat a stopping time τ . We have in particular the following first stopping theorem (with respectto a filtration F considered in all this chapter).

Theorem 4.4.1. Let X be a stochastic process and τ be a stopping time.

(i) If X is a supermartingale (resp. submartingale), then the stopped process Xτ := (Xτ∧n)n∈Nis a supermartingale (resp. submartingale). In particular

E[Xτ∧n] ≤ E[X0],

(resp. E[Xτ∧n] ≥ E[X0].)

(ii) If X is a martingale, then the stopped process Xτ := (Xτ∧n)n∈N is a martingale. Inparticular

E[Xτ∧n] = E[X0].

Proof. We consider the stake process Cτ defined for any nonnegative integer n by Cτn := 1τ≥n.In other words, Cτn(ω) = 1 if τ(ω) ≥ n and 0 otherwise for any ω ∈ Ω. We also define

(Cτ ·X)n := Xτ∧n −X0.

Note that Cτ is predictable since Cτn = 0 = (τ ≥ n)c = τ ≤ n − 1 ∈ Fn−1. It is also(clearly) bounded and nonnegative. Hence, (i) follows from Property 1. in Proposition 4.3.1and (ii) follows from Property 2. of the same proposition with Cτ for the chosen predictablestochastic process.

BThe theorem states that the stopped processXτ inherits from the (super/sub)martingaleproperty. Consider for instance our guideline example: the random walk X on Z. We knowthat X is a martingale. Let τ := infn, Xn = 1. We have seen that P(τ < +∞) = 1. Thetheorem states that E[Xτ∧n] = E[X0] for any n. However,

1 = E[Xτ ] 6= E[X0] = 0.

It is in general wrong that you can change Xτ∧n by Xτ in the previous theorem (even if wewould like to get it...) This is true if in particular τ is bounded (which is not the case for τ).

38


Theorem 4.4.2. Let τ be a bounded stopping time (that is there exists N ∈ R+ such thatτ ≤ N, P − a.s.) and let X be a supermartingale (resp. submartingale) (rresp. martingale).Then, the random variable Xτ is integrable and

E[Xτ ] ≤ E[X0], (resp. E[Xτ ] ≥ E[X0]), (rresp. E[Xτ ] = E[X0]). (4.1)

Proof. The proof is a consequence of Theorem 4.4.1 by choosing n = N .

Exercise M0

1. Show that if τ is almost surely finite and X is a uniformly bounded supermartingale(resp. martingale) (that is there exists K ∈ R+ such that |Xn| ≤ K for any n with Kindependent of n), then (4.1) still holds.

2. Assume now that τ is almost surely finite and X is a non-negative supermartingale.Then

E[Xτ ] ≤ E[X0].

4.5 (Sub)martingale inequalities

We now provide a very useful inequality allowing to control the maximum of a submartingale.

Theorem 4.5.1 (Doob’s maximal inequalities). Let X be a submartingale. We set the procesX? defined for any n ∈ N by X?

n := supk≤nXk.

1. For any c > 0 we have for any n ∈ N

cP(X?n ≥ c) ≤ E[Xn1X?

n≥c].

2. Let p > 1, and assume that the submartingale X is non-negative such that E[|Xn|p] <+∞ for any n ∈ N. Then E[|X?

n|p] < +∞ and2

‖X?n‖p ≤

p

p− 1‖Xn‖p, n ∈ N,

Proof. Proof of 1. We consider the stopping time τ c := infk > 0, Xn ≥ c. We get3

E[Xn1X?n≥c] = E[Xn1τc≤n]

= E[E[Xn1τc≤n|Fτc ]

]= E

[E[Xn|Fτc ]1τc≤n

]≥ E

[Xτc1τc≤n

]≥ cP(X?

n ≥ c).

Proof of 2. Let q such that 1p + 1

q = 1 (in other words, q = pp−1). From Fubini’s Theorem, we

have

L :=

∫ +∞

0pcp−1P(X?

n ≥ c)dc] = E[

∫ X?n

0pcp−1dc] = E[|X?

n|p].

2we recall the notation ‖Y ‖p := E[|Y |p]1/p for any random variable Y .3Here we use Theorem 4.4.2 since τ c is bounded by n on the event τ c ≤ n.

39


From 1. we have

L :=

∫ +∞

0pcp−1P(X?

n ≥ c)dc ≤ R :=

∫ +∞

0pcp−2E[Xn1X?

n≥c]dc].

We now compute R. by using Hölder inequality for the last line.

R := E[Xn

∫ X?n

0pcp−2dc]

= E[Xnp

p− 1(X?

n)p−1]

= qE[Xn|X?n|p−1]

≤ q‖Xn‖p‖(X?n)p−1‖q = q‖Xn‖pE[(X?

n)p]1q .

Hence,L = E[|X?

n|p] ≤ R = q‖Xn‖p‖(X?n)p−1‖q = q‖Xn‖pE[(X?

n)p]1q ,

which gives‖X?

n‖p = E[|X?n|p]

1p ≤ q‖Xn‖p.

Remark 4.5.2. As a consequence, these properties are satisfied for martingales.

4.6 Decomposition of martingales

In this section (following Section 12.11 in (Williams 1991)), we provide a powerful resultensuring that any adapted and integrable process can be decomposed into a martingale partand a predictable process.

Theorem 4.6.1 (Doob decomposition). Let X be an adapted process such that E[|Xn|] < +∞for any n ∈ N.

(a) There exists a martingale M null at 0 and a predictable process A null at 0 such thatfor any n ∈ N

Xn = X0 +Mn +An, P− a.s. (4.2)

Moreover, this decomposition is unique in the indistinguishability sense, that is if thereexists another martingale M and another predictable process A null at 0 such that X =X0 + M + A then P(Mn = Mn, An = An, ∀n) = 1.

(b) X is a sub(resp. super)martingale if and only if the process A appearing in (4.2) is anincreasing (resp. a decreasing) process, i.e. An ≤ An+1 for all n ∈ N, P − a.s. (resp.An ≥ An+1 for all n ∈ N, P− a.s.).

Proof. Proof of (a). Uniqueness of the representation under existence. Assume that thereexists two martingales M,M and two predictable processes A, A such that for any n ∈ N

Xn = X0 +Mn +An = X0 + Mn + An.

Taking the expectation, we obtain that A = A. Then, we deduce that M = M .

40


Proof of (a). Existence of the representation. We set

A0 = 0 and An :=n∑k=1

E[Xk −Xk−1|Fk−1], n ≥ 1. (4.3)

Hence, A is a predictable process (see the Chapter 3 for the mesurability of conditional ex-pectations). Now we define

Mn := Xn −X0 −An.

This process is adapted and integrable (since Xn is integrable and since An is a sum ofconditional expectations which are by definition integrable). We compute

E[Mn+1|Fn] = E[Xn+1 −X0 −An+1|Fn]

= E[Xn+1 − E[Xn+1 −Xn|Fn]−

n∑k=1

E[Xk −Xk−1|Fk−1]∣∣Fn]−X0

= Xn −n∑k=1

E[Xk −Xk−1|Fk−1]−X0 = Mn,

where we have used the tower property for the last line. Hence, M is a martingale. SoDecomposition (4.2) holds.Proof of (b). It is a direct consequence of (4.3).

We have in particular the following important corollary (proof in exercise as a consequenceof Jensen inequality)

Corollary 4.6.2. Assume that M is a martingale such that ‖Mn‖2 < +∞ for any n ∈ N.Then, (M2

n)n is a submartingale and there exists a (essentially unique) martingale N and anincreasing predictable process A such that for any n ∈ N

M2n = Nn +An, P− a.s.

4.7 Convergence theorems

As a consequence of Corollary 4.6.2, if M is a martingale, then the sequence u defined for anyn by un := E[|Mn|2] is nondecreasing. In other words, as soon as this sequence in bounded,we know that there exists a limit point m? to the sequence (un)n. We have in particular thefollowing lemma

Lemma 4.7.1. Let M be a martingale such that supn E[|Mn|2] < +∞. Then, (E[|Mn|2])nconverges to some positive real m?.

This is of course a priori not enough to have a convergence in L2. We can however provethat it indeed holds under this boundedness condition.

Theorem 4.7.2. Consider a martingale M which is uniformly bounded in L2, that is

supn

E[|Mn|2] < +∞.

Then there exists a random variable denoted by M∞ ∈ L2 such that M converges almost surelyand in L2 to M∞.

41


Proof. From Lemma 4.7.1, the sequence u := (E[|Mn|2])n is nondecresing and converges tosome positive real m?. Note now that for any n, p ∈ N,

E[Mn+pMn] = E[MnE[Mn+p|Fn]] = E[|Mn|2].

Thus,E[(Mn+p −Mn)2] = E[|Mn+p|2]− E[|Mn|2] ≤ m? − E[|Mn|2].

We deduce that limn→+∞

(supp E[(Mn+p −Mn)2]

)= 0. In other words, the sequence (Mn)n∈N

is a Cauchy sequence in L2. Hence, it converges in L2 to some M∞ ∈ L2. We now turn to thealmost sure convergence. We set

Vn := supi,j≥n

|Mi −Mj |.

The process V is non-increasing and supn |Vn| < +∞, P−a.s. (since M is uniformly boundedin L2 it is uniformly bounded in L1). Hence, there exists a stochastic process V such thatVn → V, P− a.s. Note that for any ρ > 0

P(V > ρ) ≤ P(Vn > ρ)

= P( supi,j≥n

|Mi −Mj | > ρ)

≤ P(supi≥n|Mi −Mn|+ sup

j≥n|Mj −Mn| > ρ)

= P( supi,j≥n

|Mi −Mj | >ρ

2)

≤ P(supi≥n|Mi −Mn|2 >

ρ2

4)

≤ 4

ρ2E[sup

i≥n|Mi −Mn|2] ≤ 16

ρ2supi≥n

E[|Mi −Mn|2] −→ 0,

by using Chebychev inequality and Remark 4.5.2 together with the Cauchy property in L2 ofM . Hence, V = 0 almost surely and so (Mn)n∈N is almost surely a Cauchy sequence. Hence,it converges to M∞ almost surely.

As an application, ne now provide the law of large number for martingales.

Theorem 4.7.3 (Law of large numbers for square integrable martingales). Let Mn, n ∈ Nbe a martingale such that E[|Mn|2] < +∞ for any n and such that∑

n≥1

1

n2E[|Mn −Mn−1|2] < +∞.

Then1

nMn −→ 0, P− a.s. and in L2.

Proof. We set the process X defined by Xn :=∑n

k=11k (Mk−Mk−1) for any n ≥ 1. Note that

X is a martingale which satisfies the assumptions in Theorem 4.7.2. Hence, there exists a

42


random variable X∞ ∈ L2 such that Xn −→ X∞ almost surely and in L2. We now compute4

1

nMn =

1

n

n∑i=1

i(Xi −Xi−1)

= Xn −1

n

n∑i=1

(i− (i− 1))Xi−1

= Xn −1

n

n∑i=1

Xi−1.

Hence, by using Cesàro’s Lemma5, we deduce that 1nMn −→ 0, P− a.s. and in L2.

We now provide an alternative proof to the law of large numbers that we recall below

Theorem 4.7.4 (Law of large numbers). Let (Xn)n≥0 be a sequence of identical and inde-pendent integrable random variables. Then

limn→+∞

1

n

n∑i=1

Xi = E[X1], a.s.

Proof. Without any loss of generality, we assume that E[X1] = 0. We set

Mn :=∑i=1n

(Xi1|Xi|≤i − E[Xi1|Xi|≤i]

), n ≥ 1.

Note that M is a martingale. We can also verify that Mn ∈ L2 for any n. In addition tothat, there exists a positive constant C such that

n∑k=1

1

k2E[|Mk −Mk−1|2] ≤

n∑k=1

2

k2E[|X1|21|X1|≤k]

= E[|X1|2n∑k=1

2

k21|X1|≤k]

≤ CE[|X1|2∫ ∞|X1|∨1

dt

t2]

≤ CE[ |X1|2

|X1| ∨ 1

]≤ CE[|X1|] < +∞.

Hence, we deduce from Theorem 4.7.3 that 1nMn converges to 0 almost surely. From dominated

convergence Theorem, we know that E[X11|X1|∨1≤k] −→ E[X1] = 0. Thus

1

n

n∑i=1

Xi1|Xi|≤i −→ 0, a.s.

4The argument here is in fact related to Kronecker’s Lemma (see 12.7 in (Williams 1991)5Recall cesàro’s Lemma: suppose that (bn) is an increasing sequence of strictly positive real numbers

converging to +∞ and let (vn) be a converging sequence of real numbers such that vn −→ v∞ ∈ R. Then

1

bn

n∑k=1

(bk − bk−1)vk −→ v∞, n→ +∞.

43


Now, by noting that∑

i≥1 P(|Xi| ≥ i) =∑

i≥1 P(|X1| ≥ i) ≤ E[|X1|] < +∞ we deduce fromBrorel-Cantelli Lemma that P − a.s. there exists N ∈ N such that for any i ≥ N we have|Xi| ≤ i. Therefore

limn→+∞

1

n

n∑i=1

Xi = limn→+∞

1

n

n∑i=1

Xi1|Xi|≤i = 0.

4.8 Exercises

Exercise M1 Let (Xk)k≥1 be a sequence of nonnegative independent random variables withE[Xk] = 1 for any k ≥ 1.

We define M0 = 1 andMn := Πn

k=1Xk,

F0 := ∅,Ω, Fn := σ(X1, . . . , Xn).

Prove that M is a (Fn)n−martingale.

Exercise M2 Consider a stochastic process S such that

Sn = N1 + · · ·+Nn,

where (Ni)i∈N is a sequence of independent and identically distributed random variables witha centred normal law with variance σ2. Prove that for any λ ∈ R the stochastic process Mλ

defined by

Mλn := eλSn−

λ2σ2

2 ,

is a martingale.

Exercise M3 Gambler’s ruinPierre is at the Casino. He decides to play on a machine so that he wins 1 euros with

probability p and he looses 1 euros with probability 1− p. Pierre arrives with K euros at thecasino and since Pierre is very reasonable he decides to stop the game if either he wins Neuros or he has lost his K initial euros.Mathematical model.

Let (Xn)n≥1 be a sequence of iid random variables such that

P(X1 = 1) = p and P(X1 = −1) = 1− p

with p ∈]0, 1/2[∪]1/2, 1[. We set Fn = σ(X1, . . . , Xn). Let K and N be integers such that0 ≤ K ≤ N . We set S0 = K and Sn = K + X1 + . . . + Xn for any n ≥ 1. We defineT = infn ≥ 0 : Sn = 0 or Sn = N. For any n ≥ 0, we set

Mn =

(1− pp

)Sn.

1. Prove that (Mn)n is an (Fn)n martingale.

2. By considering the stopped martingale (Mn∧T )n, compute P(ST = 0) and P(ST = N).

3. Interpretations?

44


Exercise M4 Recall our guideline example: the random walk on Z. Consider a familyBn, n ∈ N∗ of −1, 1-valued identically distributed and (mutually) independent randomvariables such that

P(B1 = 1) = P(B1 = −1) =1

2.

We define X0 = 0 and Xn =∑

k≤nBk and we set

τ := infn, Xn = 1.

We take the filtration F generated by the random variables B, that is

F := (Fn)n, with Fn := σ(B1, . . . , Bn) = σ(X0, . . . , Xn).

1. Prove that τ is a stopping time with respect to the filtration F.

2. Prove that E[eθB1 ] = cosh(θ) for any θ ∈ R and deduce that M θ := (M θn)n defined for

any n byM θn := (sech(θ))neθXn ,

is a martingale, with sech(θ) := 1cosh(θ) .

3. Assume that θ > 0. Show thatE[M θ

τ ] = 1,

andE[sech(θ)τ ] = e−θ.

4. Deduce that P(τ < +∞) = 1.

Exercise M5 Let (Xn)n be a sequence of random variable independent such that

P(Xn = 100n) = P(Xn = −100n) =1

10n, and P(Xn = 0) = 1− 2

10n.

Prove that Mn :=∑n

i=1Xi is a martingale with respect to F := (Fn)n with Fn :=σ(X1, . . . , Xn). Convergence?

45


Chapter 5

Discrete Markov Chains: a FirstApproach

In this section, we follow

• (Resnick 2005) Adventure in stochastic processes, S. Resnick, Birkhäuser (2005) (Chap-ter 2 "Markov Chains").

5.1 Markov chain: first definitions and property

A Markov chain is a particular process which can be summed up as follows: conditional onits present value, a Markov chain in the future is independent of the past. That is to say: aMarkov chain at a fixed time depends on the past only through the last past values taken.As we will see below, the random walk introduced in the first section is a Markov chain,since the increments are independent. We consider thorough this section a stochastic processX := (Xn)n∈N on a fixed probability space (Ω,A,P) such that each random variable Xi, i ≤ ntakes values1 in E = N or any subset E = 0, . . . , N with N > 0.

Definition 5.1.1. The process X is a Markov chain if for any n ∈ N∗ and j, i0, . . . , in−1 ∈ Ewe have

P(Xn = j|X0 = i0, . . . , Xn−1 = in−1) = P(Xn = j|Xn−1 = in−1). (5.1)

Moreover, if P(Xn = j|Xn−1 = i) = P(X1 = j|X0 = i) =: pij for any n ≥ 1 and any i, j ∈ E,we say that the Markov chain is homogeneous. The term pij is called the transition probabilityfrom state i to state j. We define the transition matrix associated with the homogeneous Markovchain X denoted by P with size |E| × |E| by P = (pij)i,j∈E.

FROM NOWWE CONSIDER ONLY HOMOGENEOUS MARKOV CHAINSMoreover, any transition matrix will be of the form

P =

p00 p01 . . . p0n . . .p10 p11 . . . p1n . . .. . . . . . . . . . . . . . .

1Here is the main point of indexation of a Markov chain. In fact all the theory can be written as soon as

X takes values in a countable set of elements. We thus identify this set, since any countable set is in bijectionwith N, with N itself. We just have to re-index the elements. For instance, the guideline example, the randomwalk on Z is in fact covered by this theory, just be reindexing its values in Z as values in N. This indexationis made to manipulate more naturally matrices as we will see below.

47


Example 5 (Microcredit example). We assume that the borrower of a loan can be in twostates: she can ask for a loan or she can be the recipient of a loan. We assume that therecipient of a loan can have automatically an other loan for the next period excepting if she isbankrupt. In this case, she loses her right to obtain a new loan automatically. We assume thatthe beneficiary of a loan will be able to reimburse it with probability β and so will be bankruptwith probability 1− β. If she is bankrupt, she becomes an applicant for the next period to geta loan. As an applicant for a loan, she obtains the loan with probability α and she do not getit with probability 1− α. We denote by (Xk)k∈N a sequence of random variable with values in1, 2 such that if Xk = 0 (resp. Xk = 1), the agent applies to get the loan (resp. receives theloan) at time k. We note that X is a homogeneous Markov chain with state 0 or 1. Transitionprobabilities are given by

P(Xk+1 = 0|Xk = 0) = 1− α, P(Xk+1 = 2|Xk = 1) = α,

P(Xk+1 = 0|Xk = 1) = 1− β, P(Xk+1 = 1|Xk = 1) = β.

The transition matrix is

P =

(1− α α1− β β

)Note that in the previous example, the coefficient of the matrix satisfies very particular

properties, leading us to introduce the notion of stochastic matrix

Definition 5.1.2 (Stochastic matrix). Let P be a matrix defined of size |E| × |E|. We saythat P = (pij)i,j∈E is a stochastic matrix if

• pij ≥ 0 for any i, j ∈ E,

•∑

j∈E pij = 1, for any i ∈ E.

Proposition 5.1.1. Any transition matrix associated to a Markov chain is a stochastic matrix.

Proof. Let P = (pij)i,j∈E be a transition matrix. pij ≥ 0 is clear. Now

∑j∈E

pij =∑j∈E

P(X1 = j|X0 = i) =

∑j∈E P(X1 = j,X0 = i)

P(X0 = i)= 1.

The very useful property associated with (homogeneous) Markov chain is based on thefact that the conditonnal probability under interest can be computed through matrix manip-ulations. Let P = (pij)i,j∈E be a transition matrix. We define the matrix P 2 as the classicalsquare of P by the matrix P 2 = (p

(2)ij ) where

p(2)ij =

∑k∈E

pikpkj .

By induction, we define Pn+1 = (p(n+1)ij ) as the matrix P power n+ 1 by

p(n+1)ij :=

∑k∈E

p(n)ik pkj =

∑k∈E

pikp(n)kj .

48


Definition 5.1.3 (n-step transition probability). Let m,n ≥ 0. We define the n-step transi-tion probabilities starting at time m from state i to j by

P(Xm+n = j|Xm = i).

Of course, if X is a homogeneous Markov chain with transition matrix P then P(Xm+1 =j|Xm = i) = pij for any i, j ∈ E. The following important theorem shows that for homoge-neous Markov chain, P(Xm+n = j|Xm = i) does not depend on m.

Theorem 5.1.4 (Chapman-Kolmogorov). Let X be an homogeneous Markov chain with tran-sition matrix P . Then, for any m,n ≥ 0 we have and for any (i0, . . . , in) ∈ E

P(Xm = i0, . . . , Xm+n = in) = P(Xm = xm)pi0,i1 . . . pin−1,in , (5.2)

andp

(n)ij = P(Xn = j|X0 = i) = P(Xm+n = j|Xm = i), i, j ∈ E. (5.3)

Proof of Chapman-Kolmogorov’s Theorem. For the proof of (5.2), we refer to Proposition2.1.1. in (Resnick 2005).We turn to (5.3). For n = 1 we have P(Xm+1 = j|Xm = i) = pij from the definition ofhomogeneous Markov chain. Assume now that for some n ≥ 1 we have P(Xm+n = j|Xm =

i) = p(n)ij . Then, for any i, j ∈ E

P(Xm+n+1 = j|Xm = i) =∑k∈E

P(Xm+n+1 = j, Xm+n = k|Xm = i)

=∑k∈E

P(Xm+n+1 = xj , Xm+n = k,Xm = i)

P(Xm = i)

=∑k∈E

P(Xm+n+1 = xj , Xm+n = k,Xm = i)

P(Xm+n = k,Xm = i)

P(Xm+n = k,Xm = i)

P(Xm = xi)

=∑k∈E

P(Xm+n+1 = j|Xm+n = k,Xm = i)P(Xm+n = k|Xm = i)

=∑k∈E

p(n)ik pkj

= p(n+1)ij ,

where we have used the induction assumption together with the homogeneity of X for thelast equality. Since it is true for any m ≥ 0 it is in particular true for m = 0 we thus get(5.3).

We now turn to the relation between the law of the process at two different times. Wehave the following corollary (see Corollary 2.3.2 in Resnick (2005).

Corollary 5.1.5. Let X be a (homogeneous) Markov chain. Let n ≥ 0. By setting

πn := (P(Xn = i))i∈E , n ≥ 0,

identified as a row vector2 of size |E|, we have

πn+m = πmPn, and thus πn = π0P

n, n,m ≥ 0.2Let u be a row vector. Then u(i) denotes its ith entry.

49


Proof. Let j ∈ E. We get by using (5.3)

πn+m(j) = P(Xn+m = j)

=∑i∈E

P(Xn+m = j|Xm = i)P(Xm = i)

=∑i∈E

p(n)ij πm(i)

= (πmPn)j .

Example 6 (Proposition 2.3.3 in (Resnick 2005)). We set

P =

(1− a ab 1− b

)with a, b ∈ (0, 1). Explain why this matrix can be seen as a transition matrix corresponding toa Markov chain with state space 1, 2. Deduce from the previous corollary that

Pn =1

a+ b

((b ab a

)+ (1− a− b)n

(a −a−b b

)).

5.2 Decomposition of the state space

We want to see the connections between all the possible states. We first introduce the firsttime at which a state j ∈ E is reached as follow

τj := infn ≥ 0 Xn = j.

More generally, for any subset B of E we set

τB := infn ≥ 0 Xn ∈ B,

which is the first time at which the Markov chain enters in B. From now we consider thefollowing notation:

Pi(·) = P(·|X0 = i), i ∈ E.

Definition 5.2.1. We say that j is accessible from i (or equivalently, j is said consequent toi), that we write i −→ j, if

Pi(τj < +∞) > 0.

States j and i communicate if i −→ j and j −→ i, that we write i←→ j.

Note that we have obviously i←→ i.

Proposition 5.2.1. We have i −→ j if and only if

∃n ≥ 0, (Pn)i,j > 0.

50


Proof. Assume first that there exists some n ≥ 0 such that (Pn)i,j > 0. Hence, by noting that

Xn = j ⊂ τj ≤ n ⊂ τj < +∞,

we get0 < (Pn)ij ≤ Pi(τj < +∞).

Conversely, assume that for all n ≥ 0, we have (Pn)ij = 0. Then

Pi(τj < +∞) = limn

Pi(τj < +∞)

= limn

Pi(n⋃k=0

Xk = j)

≤ lim supn∑k=0

Pi(Xk = j)

= lim sup

n∑k=0

(P k)ij

= 0.

Therefore i −→ j =⇒ ∃n ≥ 0, (Pn)i,j > 0.

Proposition 5.2.2. the relation ←→ previously defined is an equivalence relation, meaningthat

• i←→ i (reflexive relation)

• i←→ j if and only if j ←→ i (symmetric relation)

• if i←→ j and j ←→ k then i←→ k (transitive relation).

Proof. The reflexivity and the symmetry are clear by definition of the relation ←→. Assumenow that i ←→ j and j ←→ k, it means according to Proposition 5.2.1 that there existsn,m > 0 such that (Pn)ij > 0 and (Pm)jk > 0. From the Chapman Kolmogorov result (5.3),we deduce that

(Pm+n)ik =∑r∈E

(Pn)i,r(Pm)r,k ≥ (Pn)i,j(P

m)j,k > 0.

Hence, i −→ k and we can show similarly that k −→ i.

In other words, we can decompose the state space E into disjoint exhaustive equivalenceclasses modulo the relation ←→. More precisely, take for instance the state indexed by 1, wedefine the class C1 as the set of all indexes j such that j ←→ i. In this case, the class C1

coincides with the class Cj of indexes in C1 and we can consider only the class C1. We do thesame procedure for all indexes outside this class C1 and we denote the next class by C2 andso. We can thus build a family of I-equivalence classes (Ci)i∈I of size |I| ≤ |E| such that

Ci ∩ Cj = ∅, i 6= j and⋃i∈I

Ci = E.

Roughly speaking, we make "packs" of states such that one state is in the same pack theanother if they communicate.

51


Example 7 (Deterministically monotone Markov chain). We consider the Markov chain withtransition matrix of size N given by Pi,j = 1 if j = i+ 1, PN,N = 1 and 0 otherwise. In thiscase i −→ i+ 1 for any 1 ≤ i ≤ N − 1 and by induction i −→ j for any j > i.

Hence, there is N equivalence classes defined by Ci = i for any 1 ≤ i ≤ N (a statecommunicates only with itself).

Example 8. Consider the Markov chain on the state space 0, 1, 2, 3 with transition matrix12

12 0 0

12

12 0 0

0 0 12

12

0 0 12

12

we have two equivalence classes:

C1 = 0, 1 and C2 = 2, 3.

Definition 5.2.2. A Markov chain is irreductible if the state space consists of only one equiv-alence class. In other words if all the states communicate together.

Neither Example 7 nor Example 8 are irreductible.

Definition 5.2.3. We say that a set of states C ⊂ E is closed if for any i ∈ C we have

Pi(τCc =∞) = 1. (5.4)

In other words, if the chain starts in the set C it never goes outside C. If C = j is closedwe say that j is an absorbing state.

Proposition 5.2.3. A set C is closed if and only if for any i ∈ C and j ∈ Cc we have pij = 0.Consequently, j is absorbing if and only if pjj = 1.

Proof. Assume first that C is closed. Then for any i ∈ C and any j ∈ Cc we have

pij = Pi(X1 = j) ≤ Pi(τCc = 1) = 0,

since P(τCc =∞) = 1. Hence, pij = 0.

Conversely, let i ∈ C ⊂ E and assume that for any j ∈ Cc we have pij = 0. Then,

Pi(τCc = 1) ≤∑j∈Cc

pij = 0.

Note now that

Pi(τCc ≤ 2) = Pi(τCc = 1) + Pi(τCc = 2)

= 0 + Pi(X1 ∈ C, X2 ∈ Cc)

=∑j∈Cc

∑k∈C

pikpkj

= 0.

By induction, we prove that for any n ≥ 1, Pi(τCc ≤ n) = 0. Taking the limit when n goes to+∞ we get that Pi(τCc <∞) = 0, hence C is closed.

52


Note that for a closed set, it is possible to enter in it but we cannot escape when we areentered. For instance, in Example 7 for any k ≤ N , the set C := k, . . . , N is closed (sincethe Markov chain here goes forward with probability 1), although k− 1 −→ k and k− 1 ∈ Cc.Concerning now Example 8, the equivalence classes C1 and C2 are closed and it is impossibleto enter C1 from C2 and conversely.

5.3 Transience, recurrence and periodicity

To echo the first chapter of these lecture notes, we now study the probability of going to astate to an other in a finite number of steps. This leads us to introduce the notions of recurrent(or persistent) and transient Markov chain and the frequency of reaching a state.

5.3.1 Transience and recurrence

Definition 5.3.1. Let X be a Markov chain on a state space E. A state i is called recurrent(sometime called persistent) if the chain starting from i returns to i in a finite number of steps.Otherwise, the state is called transient.

More formally, we introduce the notion of hitting time as follows: let X be a Markov chainstarting at state i. That is to say P(X0 = i) = 1. We set τi(0) = 0 and

τi(1) = infm ≥ 1, Xm = i,

as the first time at which the Markov chain returns to i. τi(1) is called the hitting time relatedto i. If τi(1) <∞, we set

τi(2) := infm > τi(1), Xm = i,

as the second time at which X returns to i and so by induction assuming that τi(1) < · · · <τi(n) < +∞ we define

τi(n+ 1) := infm > τi(n), Xm = i.

Hence,if Pi(τi(1) <∞) = 1, then i is recurrent,

if Pi(τi(1) =∞) > 0, then i is transient.

We also introduce a stronger definition related to the recurrence meaning that not onlythe first return time is finite but also the expected return time is finite

Definition 5.3.2. A state i is called positive recurrent if it satisfies3

Ei[τi(1)] < +∞.

We aim at proving tractable criterion to investigate if a state is recurrent or transient. Itis therefore quite natural to introduce the distribution of the hitting time (when the Markovchain starts at some specific state j) by the following

f(n)jk := Pj [τk(1) = n], n ≥ 0, k, j ∈ I.

3Here the notation Ei[·] means E[·|X0 = i].

53


Note that since τk(1) ≥ 1 we necessarily have f (0)jk = 0. Thus the probability of hitting k

starting from j is

fjk :=∞∑n=0

f(n)jk = Pj(τk(1) <∞).

Obviously,(R) i is recurrent if and only if fii = 1,

(R)+ i is positive recurrent if and only if mi := Ei[τi(1)] =

∞∑n=0

nf(n)ii <∞.

As a consequence of (R) we have

(T) i is transient if and only if fii < 1.

We now define the generating function associated to the hitting time by setting for any 0 <s < 1 and any i, j ∈ E

Fij(s) =

∞∑n=0

f(n)ij sn,

together with

Pij(s) :=

∞∑n=0

p(n)ij s

n.

Lemma 5.3.3. Let i ∈ E.

1. For any i ∈ E

p(n)ii =

n∑k=0

f(k)ii p

(n−k)ii , n ≥ 1, (5.5)

and for any 0 < s < 1

Pii(s) =1

1− Fii(s). (5.6)

2. For any j 6= i ∈ E we have

p(n)ij =

n∑k=0

f(k)ij p

(n−k)jj , (5.7)

and for any 0 < s < 1 we have

Pij(s) = Fij(s)Pjj(s). (5.8)

Proof. We refer to the proof of Proposition 2.6.1 in (Resnick 2005).

Theorem 5.3.4. We have the following results

i is recurrent ⇐⇒∞∑n=0

p(n)ii =∞,

i is transient ⇐⇒∞∑n=0

p(n)ii < +∞.

And the mean time to return to the state i is mi = F ′ii(1).

54


Proof. Recall from (R) that i is recurrent if and only if fii = 1 and so if and only if Fii(1) = 1.According to (5.6) we have

∞∑n=0

p(n)ii = Pii(1) = lim

s→1,s<1Pii(s) = lim

s→1,s<1

1

1− Fii(s)= +∞.

So that Fii(1) = 1 if and only if∑∞

n=0 p(n)ii = ∞. The proof for the transience is follows the

same line with (T).mi = F ′ii(1) is obviously satisfy in view of the previous definitions.

The previous result have a more explicit interpretation. We define the number of visits ofthe state j ∈ E by the Markov chain X after time 0 by

Nj :=∑n≥1

1Xn=j .

Note that for any i 6= j

Ei[Nj ] =∑n≥1

Pi(Xn = j) =∑n≥1

p(n)ij .

In other words: starting from i, the state i is recurrent if and only if the expectednumber of visits by the chain to state i is infinite. We have in particular the very niceproperty

Proposition 5.3.5 (Proposition 2.6.3 in (Resnick 2005)). For any i, j ∈ E and non-negativeinteger k

Pi(Nj = k) =

1− fij if k = 0,fijf

k−1jj (1− fjj) if k ≥ 1.

(5.9)

(a) If j is transient then for all states i ∈ E we have

Pi(Nj <∞) = 1,

Ei[Nj ] =fij

1− fjj=∑n≥1

p(n)ij < +∞, (5.10)

and Nj is geometrically distributed with

Pj(Nj = k) = (1− fjj)(fjj)k, k ≥ 0.

(b) If j is recurrent thenPi(Nj =∞) = fij .

5.3.2 Periodicity

We now turn to the frequency of returning to a state. We define the period associated to anystate i by

D(i) := gdcn ≥ 1, p(n)ii > 0,

where gdc means greatest common divisor. If D(i) > 1 we say that the state xi is periodic.Otherwise, when there is no n such that p(n)

ii > 0, we set D(i) = 1 and the state is thus calledaperiodic.

55


Remark 5.3.6. Guideline example: the random walk4 We recall that the random walk is theprocess X such that

X0 = 0, Xn =

n∑m=1

Bm, n ∈ N∗,

where Bm are iid random variables taking values 1 with probability p ∈ (0, 1) or −1 on thestate space Z. The period of 0 is thus 2 since p(n)

00 = 0 unless n is even.

5.3.3 Solidarity properties

We now show that if one state has some properties and communicate with another state, thelater inherits from these properties. it is what we call a solidarity.

Proposition 5.3.1. Assume that i←→ j then

1a. i is recurrent if and only if j is recurrent,

1b. i is transient if and only if j is transient,

2. i has period D(i) if and only if j has period D(i).

Proof. In this proof we take two states i and j such that Assume that i←→ j. Hence, thereexist n,m > 0 such that p(n)

ij > 0 and p(m)ji > 0.

We begin to show 1a.. By noting that Pm+n+k = PmP kPn, we have

pn+m+kjj =

∑α,β∈E

p(m)jα p

(k)αβp

(n)βj ≥ p

(m)ji p

(n)ij p

(k)ii .

Assume that i is recurrent, then we deduce from Theorem 5.3.4 that∑

k≥0 p(k)ii =∞. Therefore∑

q≥1

p(q)jj ≥

∑k≥1

p(m+n+k)jj ≥ p(m)

ji p(n)ij

∑k≥1

p(k)ii =∞.

Thus, j is recurrent. Symmetrically, if j is recurrent then i is also recurrent. As a consequenceof the dichotomy recurrence/transience for a state we deduce that 1b. holds.

We now turn to the proof of 2. Suppose that i has period D(i). Then, recall that p(0)ii = 1,

hencepm+njj ≥ p(m)

ji p(n)ij > 0.

In other words, there exists a positive integer κ1 such that n + m = κ1D(j). Note that forany positive integer k such that p(k)

ii > 0 we have

pm+n+kjj ≥ p(m)

ji p(n)ij p

(k)ii > 0.

Hence, there exists a positive integer κ2 such that n+m+k = κ2D(j) for such k. Consequently,for any k such that p(k)

ii > 0 we have

k = (κ2 − κ1)D(j).

Therefore,D(j) is a divisor ofD(i) andD(j) ≤ D(i). Symmetrically, we also haveD(i) ≤ D(j)thus D(i) = D(j).

4Again, even if it is valued in Z instead of N∗, we have the same results by indexing the random walk withany bijection from Z into N∗.

56


5.3.4 Special case: Markov chain with finite number of states

When E is finite, not all state can be transient.

Theorem 5.3.7. If E is finite (that is E can be identify with 0, . . . ,m for some m ≥ 1),then there exists at least one recurrent state. In particular, if the corresponding Markov chainis irreductible, then all the states are recurrent.

Proof. Let E = 0, . . . ,m with m ∈ N∗ and assume that all states are transient. For any jwith 0 ≤ j ≤ m we know from (5.10) that

∑n≥1 p

(n)ij < +∞ for any 0 ≤ i ≤ m. Therefore,

limn→+∞

p(n)ij = 0.

Moreover,m∑j=0

p(n)ij = 1.

Since this sum is finite, we deduce that∑m

j=0 p(n)ij −→ 0 when n goes to +∞. Which leads

to a contradiction. Hence, not all states can be transient. If moreover the Markov chain isirreductible, all the states communicate and we deduce from the solidarity properties that allstate are recurrent.

5.4 Invariant measures and stationary distribution

Definition 5.4.1 (Stationary process). Consider a stochastic process (Yn)n≥0. We say thatthis process is stationary if for any non-negative integer m and positive integer k the process(Yi)0≤i≤m has the same law that (Yi)k≤i≤m+k.

Definition 5.4.2 (Stationary distribution of a Markov chain). Consider a Markov chain onthe state space E with transition matrix P . Let π := (πi)i∈E be a probability distribution onE. It is called a stationary distribution for the Markov chain with matrix P if5

π = πP,

that is πi =∑

k∈E πkPki for any i ∈ I.

Remark 5.4.3. By iterations, note that if π is a stationary distribution then

π = πP = πPn.

The notion of stationary distribution is the heart of the theory. It particularly shows thatstarting with this kind of distribution, we know that at any time, the Markov chain inheritsfrom it.

We now define the distribution of a Markov chain when the initial distribution of X0 isgiven by the probability π on E. In other words, for any i ∈ E we have P(X0 = i) = πi. Weset:

Pπ(·) =∑i∈E

P(·|X0 = i)πi.

5Again we identify an element of Rd as a row vector and its ith coordinate by πi.

57


Proposition 5.4.1. Under the probability Pπ , the Markov chain (Xn)n≥0 is a stationaryprocess, that is

Pπ(Xn = i0, . . . , Xn+k = ik) = πi0pi0i1 . . . pik−1ik

= Pπ(X0 = i0, . . . , Xk = ik),

for any n, k ≥ 0 and i0, . . . , ik ∈ E.

Remark 5.4.4. As a corollary of the previous result with k = 0, we have

Pπ(Xn = i) = πi, i ∈ E.

Proof of Proposition 5.4.1. Note that P(Xn = i0) =∑

i∈E πip(n)ii0

. Hence, by Chapman-Kolmogorov Theorem 5.2 together with Remark 5.4.3 we get

Pπ(Xn = i0, . . . , Xn+k = ik) =∑i∈E

πip(n)ii0pi0i1 . . . pik−1ik

= πi0p(n)ii0pi0i1 . . . pik−1ik

= Pπ(X0 = i0, . . . , Xk = ik).

It is not sure in general that we can find a stationary distribution for a Markov chain. Inparticular, the fact that the total mass is 1 can be challenging. We introduce a weaker notionof invariance, for which stationary distribution is a particular case.

Definition 5.4.5 (Invariant measure). Let ν := (νi)i∈E be a sequence of non-negative con-stants6. We say that ν is an invariant measure if

ν = νP.

Remark 5.4.6. In particular, if ν is an invariant measure such that |ν| :=∑

i∈E νi < +∞then

π :=ν

|ν|,

is a stationary distribution for the Markov chain.

We have the following very nice result concerning the existence of stationary distribution(or invariant measure in weaker cases)

Theorem 5.4.7 (recurrence & invariance measure || +recurrence & stationary distribution).Let i ∈ E be a recurrent state and define for any j ∈ E

νj :=∑n≥0

Pi(Xn = j, τi(1) > n) = Ei[τi(1)−1∑n=0

1Xn=j ].

Then ν is an invariant measure.Assume moreover that the state i is positive recurrent, that is Ei[τi(1)] < +∞. Then by

setting for any j ∈ Eπj :=

νjEi[τi(1)]

,

we get that π is a stationary distribution.6In other words, ν is a measure on E

58


Proof. We have to verify that ν = νP . For the proof of this part we refer to the proof ofProposition 2.12.2 in (Resnick 2005). We just proved that if i is positive recurrent, themeasure π defined in the theorem is a probability measure. Note that

|ν| :=∑j∈E

νj =∑j∈E

∑n≥0

Pi(Xn = j, τi(1) > n)

=∑n≥0

Pi(τi(1) > n)

= Ei[τi(1)].

Hence, by scaling νj with |ν| we get a probability measure.

Theorem 5.4.8 (uniqueness up to a multiplicative contant). Assume that the Markov chainis recurrent and irreducible.

1. Assume that ν and ν are two invariant measures. Then, there exists a constant c > 0such that ν = cν.

2. If the Markov chain is irreducible and positive recurrent, then there exists a uniquestationary distribution π given by

πj =1

Ej [τj(1)], j ∈ E.

Proof. First assume that 1. holds. We prove that 2. is satisfied. Since the Markov chain ispositive recurrent, we know from Theorem 5.4.7 that there exists a stationary distribution.From 1. we deduce that if π and π are two stationary distribution (and so two invariantmeasure), there exists a constant c > 0 such that π = cπ. But since

∑j∈E πj =

∑j∈E πj = 1

we deduce that c = 1 and so the stationary distribution is unique. Since for any i ∈ E, wehave Pi(Xn = i, τi(1) > n) = 1 if n = 0 and 0 otherwise (by definition of τi(1)), we deducethat

πj :=νi

Ei[τi(1)]=

1

Ei[τi(1)].

Concerning the proof of 1., we refer to the proof of Proposition 2.12.3 in (Resnick 2005).

5.5 Strong law of large numbers for Markov chains

Theorem 5.5.1 (Law of large numbers for Markov chains). Assume that the Markov chainis irreducible and positive recurrent and denote by π its unique stationary distribution. Then

limN→+∞

1

N

N∑n=0

f(Xn) = π(f) :=∑j∈I

f(xj)πj ,

where f is chosen such that the last term is well defined.

Remark 5.5.2. Note that this result can be rewritten

limN→+∞

1

N

N∑n=0

f(Xn) = Eπ[f(X0)],

where X0 has the distribution π. That is why we call it the law of large number for Markovchain.

59


Corollary 5.5.3. Under the assumption of Theorem 5.5.1 and assume that f is bounded.Then

limN→+∞

1

N

N∑n=0

Ei[f(Xn)] = π(f).

Proof. Since f is bounded there exists some constant M > 0 such that

∣∣∑Nn=1 f(Xn)

N

∣∣ ≤M, P− a.s.

By using the dominated convergence theorem, we can take the expectation in Theorem 5.5.1to get the result.

Exercise MC0 Deduce from Corollary 5.5.3 that if the Markov chain is irreducible andpositive recurrent we have for any j ∈ I

limN→+∞

1

N

N∑n=1

p(n)ij = πj .

5.6 Exercises

Exercise MC1 Let p ∈ [0, 12 ]. Classify the chain (number of classes, recurrence, periodicity)

of the following transition matrix

M =

1− 2p 2p 0p 1− 2p p0 2p 1− 2p

Calculate pij(n) and the mean recurrence times of the states. Find the invariant measure.

Exercise MC2 Ehrenfest model. This model was proposed by Tatiana and Paul Ehrenfestto explain the second law of thermodynamics in 1907. It has then been extended to economicsand sociology to study the spread of technologies into a society.

We consider N particles shared out in two containers A and B. The system evolves asfollows: at time n a random particule go from one container to the other one, that is if therewas i particules in A at time n, there will be i − 1 particules at time n + 1 if the movingparticule was chosen in A at time n, that is with probability i

N or i + 1 particules at timen+ 1 if the moving particule was chosen in B at time n, that is with probability N−i

N .We set Xn the number of particules in A at time n starting with X0 = x0 ∈ 0, . . . , N.

1. Describe the possible state of Xn and show that this is a Markov chain with transitionmatrix P to be determined.

2. Let N = 2. Make a graph to represent the evolution of the system. What can yousay about this Markov chain (irreductible states? Periodicity? Recurrence? Invariantmeasure?).

60


3. Same question with N = 3.

4. General case: let N be in N∗. Invariant probability? Interpretation?

5. Compute limn→+∞

1n

∑ni=1Xi.

Exercise MC3 We consider the following matrix

P =

q0 p0 0 0 . . .q1 0 p1 0 . . .q2 0 0 p2 . . ....

......

. . .

with 0 < pi = 1− qi < 1, i ≥ 0 defining a Markov chain with states in N.

1. Prove that this chain is irreductible.

2. Compute f (n)00 for any n ≥ 0.

3. We set un := Πni=0pi with n ≥ 0. Prove that f (n)

00 = un−2 − un−1 for n ≥ 2 and∑N+1n=1 f

(n)00 = 1− uN .

4. Prove that Π∞i=0pi > 0 if and only if∑

i(1− pi) <∞.

5. Deduce that any i ≥ 0 is recurrent if and only if∑

i(1− pi) =∞.

Exercise MC4 Random walk in Z. Let (Bi)i be a sequence of iid random variables. Werecall that Xn =

∑ni=1Bi with P(Bi = 1) = p = 1 − P(Bi = −1) is a random walk on Z for

n ≥ 1 with X0 = 0.

1. Assume that p > 1−p deduce from the law of large numbers that P( limn→+∞

Xn = +∞) =

1. What can you say about the state 0? And if p < 1− p?

2. Assume now that p = 12 . From Theorem 1.1.1 explain why 0 is recurrent.

3. We propose to prove the recurrence of 0 without using the result of the first chapter.For any n ≥ 0, compute p(2n+1)

00 and prove that

p2n00 =

(2nn

)(1

2

)n(1

2

)n.

4. Prove that for n large enough p2n00 ∼ (πn)−1/2

Hint: we recall Stirling’s Formula: n! ∼√

2πe−nnn+1/2 for n large enough.

5. Deduce that 0 is recurrent.

Random walk in Zd. We are now in the framework of Definition 1.2.1.

6. Prove that P(X2n = 0) ∼ (πn)−d/2 as n→ +∞.

7. Deduce that the symmetric random walk is recurrent for d ≤ 2 and transient for d ≥ 3.

61


Exercise MC5 We now consider the Branching process as studied in Section 2.5. Weassume that p1 < 1.

1. Explain why (Zn) is an homogeneous Markov chain.

2. What can you say about the state 0?

3. Assume that p0 = 0. Compute fkk for any k ≥ 1 and deduce that any k ≥ 1 is transient.

4. Assume now that p0 = 1. What can you say about any state k ≥ 1?

5. We finally investigate the case 0 < p0 < 1. Prove that any k ≥ 1 is again transient.

6. Explain whyP(Zn → +∞) = 1− P( extinction ).

7. Explain why we say that the simple branching process exhibits an instability.

62


Chapter 6

The Brownian Motion: a Good Placeto End

In this section, we follow mainly

• (Resnick 2005) Adventure in stochastic processes, S. Resnick, Birkhäuser (2005) (Chap-ter 6 "Brownian motion").

This chapter conclude our course by introducing one of the most famous stochastic processwith continuous paths named the Brownian motion.

The Brownian motion, also called the Wiener process was described for the first time in1827 by the botanist Robert Brown by observing pollen particules’s movements of the Clarkiapulchella (North America flower). The particularity of this process is about its irregularity, wecan show that it has no monotonicity almost surely whatever is the observed period (even onvery small time period). We can however derive very interesting properties of this process aswe will see below. We first extend the definition of a stochastic process to stochastic processin continuous time.

Definition 6.0.1. Let (Ω,F ,P) be a probability space. A stochastic process X i continuoustime (with real values) is a map from R+ × Ω into R. For any t ∈ R+, Xt := X(t, ·) is arandom variable on the probability space and the map t 7−→ X(t, ω) is a trajectory of X givenω ∈ Ω.

6.1 The Brownian motion

Definition 6.1.1 (Brownian motion). A stochastic process B := (Bt)t≥0 is a standard Brow-nian motion if

1. B0 = 0,

2. With probability 1, the path of B are continuous that is

P(ω, t 7−→ Bt(ω) is continuous) = 1,

3. the time increments of B are independent, that is for any 0 ≤ ti < ti+1 ≤ tj < tj+1 wehave

Btj+1 −Btj is independent of Bti+1 −Bti ,

63


4. for any 0 ≤ s ≤ t, the random variable Bt − Bs has a normal distribution with mean 0and variance t− s.

Of course, as a consequence of 1. and 4., Bt has a normal distribution with mean 0 andvariance t.

B Property 3. says that the increments of B are independent, and only its increment. Inparticular, Bt is NOT independent of Bs.

The existence of such process is ensured by using different proof (for instance we canconstruct the Brownian motion starting from iid normal centred and reduced random variable,see refer to Section 6.3 in (Resnick 2005) for instance). In the next section, we emphasize thelink between a random walk on Z and a Brownian motion.

6.2 The Brownian motion as a rescaled random walk

We recall our guideline example, the random walk on Z:

X0 = 0, Xn =n∑

m=1

Ym, n ∈ N∗,

where (Ym)m∈N∗ is a sequence of iid random variables such that E[Yn] = 1 and Var(Yn) = 1.We introduce1 the following continuous time stochastic process

Bn(t) =Xbntc√n, t ≥ 0 and n ∈ N∗.

First note thatXbntc√n

=Xbntc√bntc

√bntc√n

.

According to central-limit theorem we know that the sequence (Xbntc√bntc

)n converges in law to a

normal centred and reduced random variable N (0, 1). Ina ddition to that, we have

nt ≤ bntc < nt+ 1,

hence, √nt√n≤√bntc√n

<

√nt+ 1√n

.

We deduce that√bntc√n

converges to√t. Thus

Bn(t) =Xbntc√n

converges in law to a random variable Bt,

withBt ∼ N (0, t), t ≥ 0.

We know compute the increments of Bn(t). We have1We recall that bxc denotes the classical floor function (or integer part) of a real x, that is the greatest

integer less or equal to x.

64


Bn(t)−Bn(s) =

∑bntcj=bnsc+1 Yj√

n

in law=

Xbntc−bnsc√n

in law−→n→+∞

(t− s)N (0, 1) = Bt − Bs.

Now, for any 0 ≤ t1 < · · · < tp, the variables

Bn(t1), Bn(t2)−Bn(t1), . . . , Bn(tp)−Bn(tp−1),

are independent since all these increments are composed by independent blocks of Yj . Wededuce that for any xj ∈ R, 1 ≤ j ≤ p

P(Bn(t1) ≤ x1, . . . , Bn(tp)−Bn(tp−1) ≤ xp) = P(Bn(t1) ≤ x1)× · · · × P(Bn(tp)−Bn(tp−1) ≤ xp)−→ P(Bt1 ≤ x1)× · · · × P(Btp − Btp−1 ≤ xp).

Therefore

(Bn(t1), Bn(t2)−Bn(t1), . . . , Bn(tp)−Bn(tp−1)) −→ (Bt1 , Bt2 −Bt1 , . . . , Btp −Btp−1),

and the limit increments Bt1 , Bt2 − Bt1 , . . . , Btp − Btp−1 are independent.

Even if B satisfies Property 1. 3. and 4. of the Brownian motion in Definition 6.1.1, theprocess (Bn(t), t ≥ 0) is not continuous. In order to ensure the continuity of paths, we definethe continuous version of Bn(t) by

B(c)n (t) := Bn(t) + (nt− bntc)

Ybntc+1√n

.

This process is continuous (see Figure 6.1 in (Resnick 2005)) and converges in law to someprocess (Bt)t≥0 having the same law than B. Hence, this process, build from the random walkas a limit in distribution of it, satisfies the property of a Brownian motion.

6.3 Some properties

We now provide some nice properties satisfied by the Brownian motion, strongly recalling theproperties of the random walk.

Proposition 6.3.1. Let B be a brownian motion on a probability space (Ω,F ,P).

1. (Differed property) For any s ≥ 0, the process B(s) define by

B(s)t := Bt+s −Bs, t ≥ 0,

is a Brownian motion.

65


2. (Time scaling property) For any c > 0 the process√cBt/c, t ≥ 0,

is a Brownian motion.

3. (Symmetry) The process −B is a Brownian motion.

4. (Gaussian process) B is a Brownian motion if and only if it has continuous paths, it isa zero mean Gaussian process with B0 = 0 and

Cov(Bt, Bs) = inf(s, t), s, t ≥ 0.

5. (Law of large number)

limt→+∞

Btt

= 0, almost surely.

6. (Time Reversal) The process M define for any t > 0 by Mt = tB1/t and M0 = 0 is aBrownian motion.

Proof. Exercise.

This is the starting point of the stochastic calculus theory... Master program.

6.3.1 Exercises

Exercise MB1

1. Show that a process (Xt)t∈R+ with a continuous path is a brownian motion if and onlyif

i’) X0 = 0,ii’) (Xt)t∈R+ is a centered Gaussian process,iii’) Cov(Xs, Xt) = inf(s, t).

2. Show that if Wt is a Brownian motion, then Xt = t W 1t, t > 0, X0 = 0 is also a

Brownian motion.

Exercise MB2 Let T > 0 et N ∈ N∗.

1. Compute E(∑N

k=1(W kTN−W (k−1)T

N

)2).

2. Deduce that E([∑N

k=1(W kTN−W (k−1)T

N

)2 − T]2)

= 2T 2

N .

Exercise MB3

1. Let t > 0 and N ∈ N∗, what can you say about the random variable IN = tN

∑Nk=1W kt

N?

And its behavior when N → +∞?

2. Deduce that

Xt =

∫ t

0Wsds

is a Gaussian centred random variable and compute its characteristic function.

66


Chapter 7

Solutions to exercises

7.1 Random walk

We refer to the first chapter in the book of Stroock.

7.2 Stochastic processes

Exercise ST1a We have to proof that such Xτ is A−measurable. Let A ∈ E . Then

X−1τ (A) =

⋃k≥0

[τ = k ∩ (Xk)−1(A)],

and for any k ≥ 0 we have τ = k ∈ Fk ⊂ A and (Xk)−1(A) ∈ A since Xk is a random

variable on (Ω,A,P). Hence, (Xτ )−1(A) ∈ A.

Exercise ST1b We compute

E[M∑i=1

Zi] = E[∞∑i=1

1i≤MZi].

Since i ≤ M = M ≤ i − 1c ∈ FZi−1, by using the independence of Zi with Zk, k ≤ i − 1we deduce

E[TM ] =∑

P(i ≤M)E[Zi] = 0.

By using the same argument, since Zj is independent of Zi and M ≥ j we have

E[T 2M ] = E

[(

M∑i=1

Zi)2]

=

∞∑i=1

E[1i≤M |Zi|2] + 2∑

1≤i<j<∞E[ZiZj1M≥j ]

Remember that i < j so that Zi1M≥j which is independent of Zj . Thus

E[T 2M ] =

∞∑i=1

P(i ≤M)E[|Zi|2] + 2∑

1≤i<j<∞E[Zj ]E[Zi1M≥j ]

= σ2∞∑i=1

P(i ≤M)

= σ2E[M ].

67


By recalling that∑i≥1

P(M ≥ i) =∑i≥1

∑k≥i

P(M = k) =∑k≥1

k∑i=1

P(M = k) =∑k≥1

kP(M = k) = E[M ].

Exercise ST2

1. It is clear that f(X,Y ) ≥ 0 and f(X′,Y ′) ≥ 0 on R2. Note now that∫ 1−1

∫ 1−1 xy dxdy = 0

and∫ 1−1

∫ 1−1

14dxdy = 1. Thus∫ 1

−1

∫ 1

−1f(X,Y )(x, y) dxdy =

∫ 1

−1

∫ 1

−1f(X′,Y ′)(x, y) dxdy = 1.

2. We have

P(X ≥ 0, Y ≥ 0) =1

4

∫ 1

0

∫ 1

01 + xy dxdy =

1

4

(∫ 1

0dx

∫ 1

0dy +

∫ 1

0x dx

∫ 1

0y dy

)=

1

4

(1 +

1

2× 1

2

)=

5

16

P(X ′ ≥ 0, Y ′ ≥ 0) =1

4

∫ 1

0

∫ 1

0dxdy =

1

4.

Then (X,Y ) and (X ′, Y ′) have not the same law.

3. Recall that

fX′(x) =

∫Rf(X′,Y ′)(x, y)dy =

1

4

∫R

(1+xy)1x∈[−1,1]1y∈[−1,1]dy = 1x∈[−1,1]1

4

∫ 1

−1(1+xy)dy.

Hence, fX′(x) = 121[−1,1](x) = 1[−1,1](x)

∫ 1−1

14du and similarly fY ′(y) = 1

21[−1,1](y).Moreover

fX(x) =1

41[−1,1](x)

∫ 1

−1(1 + xy) dy =

1

21[−1,1](x),

similarly fY (y) = 121[−1,1](y).

Exercise ST3 Assume that X and Y are independent. Then for any A,B ∈ B(R)∫A

∫Bf(x, y)dydx = P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B) =

∫AfX(x)dx

∫BfY (y)dy.

Hence, for any A,B ∈ B(R)∫A

∫Bf(x, y)dydx =

∫A

∫BfX(x)fY (y)dxdy.

We deduce that X and Y independent implies f(x, y) = fX(x)fY (y).Conversely, if f(x, y) = fX(x)fY (y) we have for any A,B ∈ B(R)

P(X ∈ A, Y ∈ B) =

∫A

∫Bf(x, y)dydx =

∫A

∫BfX(x)fY (y)dydx =

∫AfX(x)dx

∫BfY (y)dy

= P(X ∈ A)P(Y ∈ B).

Hence, X and Y are independent.

68


Exercise ST4 We compute for any x ∈ R

P(X2 ≤ x) = P(X2 ≤ x, ε = 1)+P(X2 ≤ x, ε = −1) = P(X1 ≤ x, ε = 1)+P(−X1 ≤ x, ε = −1).

By using the independence, we get

P(X2 ≤ x) = P(X1 ≤ x)P(ε = 1) + P(−X1 ≤ x)P(ε = −1) =1

2(P(X1 ≤ x) + P(X1 ≥ −x)).

By noting that P(X1 ≥ −x) = P(X1 ≤ x) (easy and classical exercise with a change ofvariable) we get

P(X2 ≤ x) = P(X1 ≤ x).

Thus, X2 ∼ N (0, 1). However, (X1, X2) is not Gaussian since X1 + X2 is equal to 0 withprobability 1

2 and is normally distributed (and equal to 2X1) with probability 12 .

Exercise ST5 Let X be a centred Gaussian vector in R2 with covariance matrix

Q :=

(2 −1−1 2

)1. AX is a Gaussian process with mean (0, 0)> and variance matrix AQA>.

2. Find a 2 × 2 matrix A such that the components of AX are independent. We need touse a diagonalization of Q with an orthogonal matrix for the change of basis.

– Eigenvalues: 1 and 3 with respective eigenvectors 1√2(1, 1) and 1√

2(−1, 1). We set

O the change of basis matrix hence:

O =1√2

(1 −11 1

), O−1 = O> =

1√2

(1 1−1 1

)Hence Q = OΛO>, with

Λ =

(1 00 3

)By taking A = O> we deduce that AX = 1√

2(X1 + X2, X1 −X2)> is a Gaussian

vector since for any α, β we have

α(AX)1 + β(AX)2 = (α+ β)X1 + (α− β)X2,

which is normal distributed for any (α, β) ∈ R∗ × R∗ and AQA> = Λ hence thecomponents of AX are independent from Proposition 2.6.4.

7.3 Conditional expectations

Exercise CE1

1. f(X,Y )(x, y) = fY |X=x(y)× fX(x) = λ2e−λx10<y<x. We get

fY (y) =

∫Rλ2e−λx10<y<xdx = λ1y>0e

−λy

Y ∼ E(λ).

69


2. Let y > 0, we have

fX|Y=y(x) =fX,Y (x, y)

fY (y)= λ2e−λx10<y<x ×

1

λeλy = λe−λ(x−y)10<y<x.

3. a.

E[XY ] =

∫R

∫Rxyλ2e−λx10<y<xdydx

=

∫ +∞

0λ2xe−λx

x2

2dx

=1

2

∫ +∞

0x3λ2e−λxdx

=1

2E[X2]

=3

λ2.

b. We compute x > 0∫RyfY |X=x(y)dy =

∫ +∞

0y

1

x10<y<xdy =

1

x

∫ x

0ydy =

x

2

then E[Y |X] = X2 .

Remark: E[XY ] = E[E[XY |X]] = E[XE[Y |X]] = E[X2

2 ] = 3λ2 .

c. Lety > 0 we compute∫Rxλe−λ(x−y)10<y<xdx = eλy

∫ +∞

yxλe−λxdx = eλy

([−xe−λx]+∞y +

∫ +∞

ye−λx

)= y+

1

λ.

Hence, E[X|Y ] = Y + 1λ .

d. E [X +XY |Y ] = E[X|Y ](1 + Y )

e. E[E[Y |X]] = E[X2 ] = 1λ . Remark: E[E[Y |X]] = E[Y ].

Exercise CE2 By using Prooposition "taking out what we known" 3.3.1 with Y ↔ eαY

which is σ(Y )−measurable we have

E[eαX−α2σ2

2 |Y ] = e−α2σ2

2 E[eα(X−Y )eαY |Y ]

= e−α2σ2

2 eαY E[eα(X−Y )|Y ].

From Proposition 3.5.1 and more specially Remark 3.5.1, since X − Y is independent of Y ,we have

E[eαX−α2σ2

2 |Y ] = e−α2σ2

2 eαYE[eα(X−Y )].

Since X − Y ∼ N (0, σ2 − ν2) we deduce that E[eα(X−Y )] = eα2(σ2−ν2)/2. Hence

E[eαX−α2σ2

2 |Y ] = eαY−α2ν2

2 .

70


Exercise CE3 We first prove the hint.

∑i≥0

P(X > i|Y ∈ A) =∑i≥0

∞∑k=i+1

P(X = k|Y ∈ A)

=∞∑k=1

k−1∑i=0

P(X = k|Y ∈ A)

=

∞∑k=1

P(X = k|Y ∈ A)

k−1∑i=0

1

=

∞∑k=1

kP(X = k|Y ∈ A)

= E[X|Y ∈ A].

Hence, we have to compute in both case E[τ − t|τ > t].Case a: for any r ∈ 0, . . . , N − 1 we have

P(τ > r) =

N∑i=r+1

P(τ = i) =1

N + 1× (N − r).

Let t ∈ 0, . . . , N − 1 and r ∈ 0, . . . , N − t. By noting that τ > t+ r ⊂ τ > t and soP(τ > t+ r|τ > t) = P(τ>t+r)

P(τ>t) , we compute

E[τ − t|τ > t] =N−t∑r=0

P(τ > t+ r|τ > t) =N−t∑r=0

P(τ > t+ r)

P(τ > t)

=N−t∑r=0

N − r − tN − t

=N−t∑r=0

r

N − t=

(N − t)(N − t+ 1)

2(N − t)=

1

2(N − t+ 1).

Case b. for any r ∈ N we have

P(τ > r) =∞∑

i=r+1

P(τ = i) =∞∑

i=r+1

2−i = 2−r.

Again by noting that τ > t+ r ⊂ τ > t and so P(τ > t+ r|τ > t) = P(τ>t+r)P(τ>t) ,

E[τ − t|τ > t] =

∞∑r=0

P(τ > t+ r|τ > t) =∞∑r=0

P(τ > t+ r)

P(τ > t)=

∞∑r=0

2−t−r

2−t=

∞∑r=0

2−r = 2.

Exercise CE4 We begin to restrict the study to the case where X is non negative. We setξ := E[X|Fτ∧θ]. We aim at showing that ξ = E

[E[X|Fτ ]|Fθ

]. Let A ∈ Fθ then by using

Property (c) in Theorem 3.2.2 together with the linearity of the conditional expectation weget

E[ξ1A] = E[E[X|Fτ∧θ]1A1τ≤θ] + E[E[X|Fτ∧θ]1A1τ>θ].

71


Exercise CE6 We compute

E[Xn] = E[E[Xn|Xn−1]] = E[p(Xn−1 + 1) + (1− p)(Xn−1 + 1 + Xn)]

where Xn has the same distribution than Xn. Hence,

E[Xn] =1 + E[Xn−1]

p, E[X1] =

1

p.

It remains to solve

un =1 + un−1

p, u1 =

1

p.

We get

un = p−n(1

p− 1

p− 1) +

1

p− 1=

1− p−n

p− 1.

Exercise CE7 We compute

fX|Y=y(x) =f(X,Y )(x, y)

fY (y)=

1

2πσXσY√

1−ρ2e− 1

2(1−ρ2)(( xσX

)2+( yσY

)2− 2ρxyσXσY

)

1√2πσY

e− 1

2( yσY

)2.

Hence

fX|Y=y(x) =1

σX√

2π√

1− ρ2e− 1

2(1−ρ2)(( xσX

)2+ρ2( yσY

)2− 2ρxyσXσY

).

From Theorem 3.4.1 we get E[X|Y ] = g(Y ) with

g(y) =

∫RxfX|Y=y(x)dx =

∫R

x

σX√

2π√

1− ρ2e− 1

2(1−ρ2)σ2X

(x−ρy σXσY

)2

=ρσXσY

y.

We also deduce that Var(X|Y ) = σ2X(1− ρ2).

Exercise CE7bis We compute a such that Cov(X − aY, Y ) = 0. Hence,

Cov(X − aY, Y ) = 0⇐⇒ Cov(X,Y )− aCov(Y, Y ) = 0⇐⇒ a =Cov(X,Y )

Var(Y )

we thus set a? = rσ2Y. Now we compute

E[X|Y ] = E[X − a?Y + a?Y |Y ] = E[X − a?Y ] + a?Y = 0 + a?Y.

Note that by definition r = ρσXσY and we recover the previous formula in CE7.

73


Exercise CE8 We have P(X1 = 1, ρ0 = m) = P(X1 = 1, ζ−1 Σ1 = m − 1) and P(X1 =−1, ρ0 = m) = P(X1 = −1, ζ1 Σ1 = m− 1) hence on the event ρ0 <∞ with |s| < 1

E[sρ0 ] =∑n≥1

P(ρ0 = 2n)s2n

=∑n≥1

(P(X1 = 1, ρ0 = 2n) + P(X1 = −1, ρ0 = 2n))s2n

=∑n≥1

(pP(ζ−1 = 2n− 1) + qP(ζ1 = 2n− 1))s2n

= psE[sζ−1 ] + qsE[sζ1 ]

= 1−√

1− 4pqs2.

We now compute

E[ρ0sρ0 ] = s

d

dsE[sρ0 ] =

4pqs2√1− 4pqs2

, |s| < 1.

From monotone convergence theorem we have

lims→1,s<1

E[ρ0sρ0 ] = E[ρ01ρ0<+∞] =

4pq√1− 4pq

=4pq

|p− q|,

since 1 = (p+ q)2 = p2 + q2 + 2pq = (p− q)2 + 4pq. We recall that

P(ρ0 <∞) = 2(p ∧ q).

Thus

E[ρ0|ρ0 <∞] =4pq

|p− q|× 1

2(p ∧ q)=

2(p ∨ q)|p− q|

= 1 +1

|p− q|.

7.4 Martingales

Exercise M0 1. Dominated convergence theorem. 2. Fatou’s Lemma.

Exercise M1 Mn is non negative and Fn−measurable by definition. We compute

E[Mn+1|Fn] = E[Xn+1Mn|Fn] = MnE[Xn+1|Fn] = MnE[Xn+1] = Mn,

by using Proposition 3.3.1 for the second equality and Remark 3.5.1 for the third equality.

Exercise M2 See CE2. Easy exercise by noting that Sn+1 = Xn+1 + Sn and by using thatE[eλN (0,σ2)] = eλ

2σ2/2.

74


Exercise M3 Mn is non negative and Mn is Fn-measurable. We compute

E[Mn+1 −Mn|Fn] = E[(1− p

p

)Sn+Xn+1 −(1− p

p

)Sn |Fn]

= E[(1− p

p

)Sn((1− pp

)Xn+1 − 1)|Fn]

=(1− p

p

)SnE[((1− p

p

)Xn+1 − 1)|Fn],

where we have used again Proposition 3.3.1 for the last equality. Since Xn+1 is independentof Fn we deduce from Remark 3.5.1 that

E[((1− p

p

)Xn+1 − 1)|Fn] = E[

(1− pp

)Xn+1 − 1]

=(1− p

p− 1)P(Xn+1 = 1) +

( p

1− p− 1)P(Xn+1 = −1)

=(1− p

p− 1)p+

( p

1− p− 1)(1− p)

= 0.

HenceE[Mn+1 −Mn|Fn] = 0

so by using Properties 2 and of the conditional expectations, we deduce that

E[Mn+1|Fn] = Mn.

We turn to question 2. We first assume that T <∞, P− a.s. Let n ≥ 0. The randomvariable n ∧ T := min(n, T ) is a bounded stopping time. Hence, according to Theorem 4.4.2we deduce from question 1 that

E[Mn∧T ] = E[M0] =(1− p

p

)K.

Note now that by the definition of T

|Mn∧T | ≤ sup(1− p

p

)K,(1− p

p

)N.

From dominated convergence theorem (since the previous upper bound is independent of n)we get

limn

E[Mn∧T ] = E[limnMn∧T ] = E[MT ] =

(1− pp

)K.

Now, note that ST takes two possible values : 0 and N so that

P(ST = 0) + P(ST = N) = 1. (7.1)

MT = 1 if ST = 0 and MT =(

1−pp

)Nif ST = N . Consequently

(1− pp

)K= E[MT (1ST=0 + 1ST = N)] = 1× P(ST = 0) +

(1− pp

)NP(ST = N). (7.2)

75


Hence (7.1) and (7.2) leads to a system of two equations with two unknown. We obtain

P(ST = N) =

(1−pp

)K − 1(1−pp

)N − 1= 1− P(ST = 0).

We now turn to T < ∞ almost surely. The case p = 12 is excluded by coincides with the

random walk and we know that in this case T is finite almost surely (recurrence of the randomwalk on Z. There exists several proofs of this result. Since Xi are iid, we deduce from the lawof large number that

Snn

p.s.−→n→+∞

E[X1] = 2p− 1.

If 0 < p < 12 then −1 < 2p − 1 < 0 hence Sn converges to −∞ so T < +∞. Otherwise, if

p ∈ (12 , 1) then 2p− 1 ∈ (0, 1) and Sn converges to +∞ so T < +∞.

Exercise M4 See 10.12 in Williams 1991

Exercise M5 E[Xn] = 0 for any n. Moreover

E[Mn+1 −Mn|Fn] = E[Xn+1|Fn] = 0

since Xn+1 is independent of Fn. It is thus a martingale.Note that

∑n

210n < +∞. Hence, by Borell Cantelli Lemma, Xn converges almost surely

to 0. So Mn converges almost surely. Now if Xn = 100n then Mn ≥ 100n

2 . In other words

E[|Mn|] ≥ E[|Mn|1Xn=100n ] ≥ 100n

2P(Xn = 100n) =

10n

2→ +∞.

Hence, M does not converge in L1 and so it does not converge in Lp for p ≥ 1.

7.5 Markov Chains

Exercise MC1 If p = 0 then we have 3 classes with absorbing states. Otherwise, all thestates communicates, and so are recurrent. For the period: assume that p = 1

2 , we note that

212→ 3

1→ 2 and 212→ 1

1→ 2 and no other possibilities. Hence, D(2) = 2. So all the states havea period of 2. Assume now that p ∈ (0, 1

2). Then, 12p→ 2

p→ 1 which give a loop of size 2, and

12p→ 2

1−2p→ 2p→ 1 which gives a loop of size 3. Hence D(1) = 1. Hence since all the states

communicate, all the state are aperiodic. To compute pnij we diagonalize the matrix, we get

M = PΛP−1,

with

B =

1 1 11 0 −11 −1 1

, B−1 =

14

12

14

12 0 −1

214 −1

214

, Λ =

1 0 00 1− 2p 00 0 1− 4p

so that Mn = BΛnB−1 and we easily get the pnij . In particular:

pn11 = pn33 =1

4+

1

2(1− 2p)n +

1

4(1− 4p)n, pn22 =

1

2+

1

2(1− 4p)n.

76


We see that all the state are recurrent, and we recover the period depending on p = 12 or not.

Now recall from Theorem 5.3.4 that mi = F ′ii(1). We have Fii(s) = 1−Pii(s)−1. Thus wecompute for s ∈ (0, 1)

P11(s) = P33(s) =∑n≥0

pn11sn =

∑n≥0

(1

4+

1

2(1− 2p)n +

1

4(1− 4p)n)sn

=1

4(1− s)+

1

2(1− s(1− 2p))+

1

4(1− s(1− 4p))

andP22(s) =

∑n≥0

pn22sn =

1

2(1− s)+

1

2(1− s(1− 4p)).

We get

m1 = m3 = lims→1

P ′11(s)

P11(s)2= 4, m2 = 2.

Invariant probability : (14 ,

12 ,

14).

Exercise MC2 Xn takes values in 0, . . . , N. We have

P(Xn+1 = i− 1|Xn = i) =i

N, i = 0, . . . , N − 1,

P(Xn+1 = i+ 1|Xn = i) =N − iN

, i = 1, . . . , N,

P(Xn+1 = j|Xn = i) = 0, |j − i| > 1, i = j.

Transition matrix

0 1 0 . . . . . . 0

1N 0 N−1

N

. . ....

0 2N 0 N−2

N

. . ....

.... . . . . . 0

. . . 0... · · · N−1

N 0 1N

0 . . . . . . 0 1 0

N = 2 was made in MC1.

N = 3 we get 0 1 0 013 0 2

3 00 2

3 0 13

0 0 1 0

Irreductible, so recurrent (because finite). Period 2. Invariant probability

π0 =1

2π1, π1 = π0 +

2

3π2, π2 =

2

3π1 + π3

π3 =1

3π2, π0 + π1 + π2 + π3 = 1.

77


We get

π = (1

8,3

8,3

8,1

8).

Invariant measure for any N .Ansatz:

πi =

(Ni

)2N

, i = 0, . . . , N.

Indeed, we have

πi = πi−1(1− i− 1

N) + πi+1

i+ 1

N, 1 ≤ i ≤ N − 1,

π0 = π11

N, πN = πN−1

N − 1

N.

By induction

πi = π0

(Ni

).

Hence, since∑πi = 1 we get

1 = π0

N∑i=0

(Ni

)= π02N .

Exercise MC3 See Example 2.9.1 in (Resnick 2005).

Exercise MC4 See Example 2.9.2. in (Resnick 2005): application to the random walk.

Exercise MC5 (Example 2.9.3. in (Resnick 2005): The Branching process.)

78

introduction to stochastic processesthibaut.mastrolia/... · 2019-11-25 · introduction to...

Documents