math 735: stochastic analysiskurtz/735/m735f07.pdf · math 735: stochastic analysis 1. introduction...

•First •Prev •Next •Go To •Go Back •Full Screen •Close •Quit 1

Math 735: Stochastic Analysis

1. Introduction and review

2. Notions of convergence

3. Continuous time stochastic pro-cesses

4. Information and conditional expec-tation

5. Martingales

6. Poisson process and Brownian mo-tion

7. Stochastic integration

8. Covariation and Ito’s formula

9. Stochastic differential equations

10. Diffusion processes

11. General Markov processes

12. Probability distributions on func-tion spaces

13. Numerical schemes

14. Change of measure

15. Filtering

16. Finance

17. Technical lemmas

18. Appendix


1. Introduction

• The basic concepts of probability: Models of experiments

• Sample space and events

• Probability measures

• Random variables

• Closure properties of collection of random variables

• The distribution of a random variable

• Definition of the expectation

• Properties of expectations

• Jensen’s inequality


Experiments

Probability models experiments in which repeated trials typically re-sult in different outcomes.

As a means of understanding the “real world,” probability identifiessurprising regularities in highly irregular phenomena.

If we roll a die 100 times we anticipate that about a sixth of the timethe roll is 5.

If that doesn’t happen, we suspect that something is wrong with thedie or the way it was rolled.


Probabilities of events

Events are statements about the outcome of the experiment: the roll is 6,the rat died, the television set is defective

The anticipated regularity is that

P (A) ≈ #times A occurs#of trials

This presumption is called the relative frequency interpretation ofprobability.


“Definition” of probability

The probability of an event A should be

P (A) = limn→∞

#times A occurs in first n trialsn

The mathematical problem: Make sense out of this.

The real world relationship: Probabilities are predictions about thefuture.


Random variables

In performing an experiment numerical measurements or observa-tions are made. Call these random variables since they vary randomly.

Give the quantity a name: X

X = a and a < X < b are statements about the outcome of theexperiment, that is, are events


The distribution of a random variable

If Xk is the value of X observed on the kth trial, then we should have

PX ∈ A = limn→∞

#k ≤ n : Xk ∈ An

This collection of probabilities determine the distribution of X .


The sample space

The possible outcomes of the experiment form a set Ω called the sam-ple space.

Each event (statement about the outcome) can be identified with thesubset of the sample space for which the statement is true.


The collection of events

If

A = ω ∈ Ω : statement I is true for ωB = ω ∈ Ω : statement II is true for ω

Then

A ∩B = ω ∈ Ω : statement I and statement II are true for ωA ∪B = ω ∈ Ω : statement I or statement II is true for ω

Ac = ω ∈ Ω : statement I is not true for ω

Let F be the collection of events. Then A,B ∈ F should imply thatA ∩B, A ∪B, and Ac are all in F . F is an algebra of subsets of Ω.

In fact, we assume that F is a σ-algebra (closed under countableunions and complements).


The probability measure

Each event A ∈ F is assigned a probability P (A) ≥ 0.

From the relative frequency interpretation, we must have

P (A ∪B) = P (A) + P (B)

for disjoint events A and B and by induction, if A1, . . . , Am are dis-joint

P (∪mk=1Ak) =

m∑k=1

P (Ak) finite additivity

In fact, we assume countable additivity: IfA1, A2, . . . are disjoint events,then

P (∪∞k=1Ak) =∞∑

k=1

P (Ak).

P (Ω) = 1.


A probability space is a measure space

A measure space (M,M, µ) consists of a set M , a σ-algebra of subsetsM, and a nonnegative function µ defined onM that satisfies µ(∅) = 0and countable additivity.

A probability space is a measure space (Ω,F , P ) satisfying P (Ω) = 1.


Random variables

If X is a random variable, then we must know the value of X if weknow that outcome ω ∈ Ω of the experiment. Consequently, X is afunction defined on Ω.

The statement X ≤ c must be an event, so

X ≤ c = ω : X(ω) ≤ c ∈ F .

In other words, X is a measurable function on (Ω,F , P ).

R(X) will denote the range of X R(X) = x ∈ R : x = X(ω), ω ∈Ω


Borel sets

Definition 1.1 The Borel subsets B(R) of R is the smallest σ-algebra ofsubsets of R containing (−∞, c] for all c ∈ R.

The Borel subsets B(Rd) is the smallest σ-algebra of subsets of Rd contain-ing the open subsets of Rd.

Note the every continuous function on Rd is Borel measurable.


Closure properties of the collection of random varibles

Lemma 1.2 Let X1, X2, . . . be R-valued random variables.

a) If f is a Borel measurable function on Rd, then Y = f(X1, . . . , Xd) isa random variable.

b) supnXn, infnXn, lim supn→∞Xn, and lim infn→∞Xn are random vari-ables.


Distributions

Definition 1.3 The distribution of a R-valued random variable X is theBorel measure defined by µX(B) = PX ∈ B, B ∈ B(R).

µX is called the measure induced by the function X .


Discrete distributions

Definition 1.4 A random variable is discrete or has a discrete distribu-tion if and only if R(X) is countable.

If X is discrete, the distribution of X is determined by the probabilitymass function

pX(x) = PX = x, x ∈ R(X).

Note that ∑x∈R(X)

PX = x = 1.


Examples

Binomial distribution

PX = k =

(n

k

)pk(1− p)n−k, k = 0, 1, . . . , n

for some postive integer n and some 0 ≤ p ≤ 1

Poisson distribution

PX = k = e−λλk

k!, k = 0, 1, . . .

for some λ > 0.


Absolutely continuous distributions

Definition 1.5 The distribution of X is absolutely continuous if andonly if there exists a nonnegative function fX such that

Pa < X ≤ b =

∫ b

a

fX(x)dx, a < b ∈ R.

Then fX is the probability density function for X .


Examples

Normal distribution

fX(x) =1√2πσ

e−(x−µ)2

2σ2

Exponential distribution

fX(x) =

λe−λx x ≥ 0

0 x < 0


Expectations

If X is discrete, then letting R(X) = a1, a2, . . .,

X =∑

i

ai1Ai

where Ai = X = ai.

If∑

i |ai|P (Ai) <∞, then

E[X] =∑

i

aiPX = ai =∑

i

aiP (Ai)


For general X , let

Yn =bnXcn

, Zn =dnXen

.

Then Yn ≤ X ≤ Zn, so we must have E[Yn] ≤ E[X] ≤ E[Zn]. Specif-ically, if

∑k |k|Pk < X ≤ k + 1 < ∞, which is true if and only

if E[|Yn|] < ∞ and E[|Zn|] < ∞ for all n (we will say that X is inte-grable), then define

E[X] ≡ limn→∞

E[Yn] = limn→∞

E[Zn]. (1.1)

NotationE[X] =

∫ΩXdP =

∫ΩX(ω)P (dω).


Properties

Lemma 1.6 (Monotonicity) If PX ≤ Y = 1 and X and Y are inte-grable, then E[X] ≤ E[Y ].

Lemma 1.7 (Positivity) If PX ≥ 0 = 1, and X is integrable, thenE[X] ≥ 0.

Lemma 1.8 (Linearity) IfX and Y are integrable and a, b ∈ R, then aX+bY is integrable and

E[aX + bY ] = aE[X] + bE[Y ].


Jensen’s inequality

Lemma 1.9 Let X be a random variable and ϕ : R → R be convex. IfE[|X|] <∞ and E[|ϕ(X)|] <∞, then ϕ(E[X]) ≤ E[ϕ(X)].

Proof. If ϕ is convex, then for each x, ϕ+(x) = limy→x+ϕ(y)−ϕ(x)

y−x existsand

ϕ(y) ≥ ϕ(x) + ϕ+(x)(y − x).

Setting µ = E[X],

E[ϕ(X)] ≥ E[ϕ(µ) + ϕ+(µ)(X − µ)] = ϕ(µ) + ϕ+(µ)E[X − µ] = ϕ(µ).


Consequences of countable additivity

P (Ac) = 1− P (A)

If A ⊂ B, then P (B) ≥ P (A).

If A1 ⊂ A2 ⊂ · · ·, then P (∪∞k=1Ak) = limn→∞ P (An).

P (∪∞k=1Ak) = P (∪∞k=1(Ak ∩ Ack−1)) =

∞∑k=1

P (Ak ∩ Ack−1)

= limn→∞

n∑k=1

P (Ak ∩ Ack−1) = lim

n→∞P (An)

If A1 ⊃ A2 ⊃ · · ·, then P (∩∞k=1Ak) = limn→∞ P (An).

An = ∩∞k=1Ak ∪ (∪∞k=nAk ∩ Ack+1)


Expectations of nonnegative functions

If PX ≥ 0 = 1 and∑∞

l=0 lPl < X ≤ l+1 = ∞ or PX = ∞ > 0,we will define E[X] = ∞.

Note, however, whenever I write E[X] I mean that E[X] is finite un-less I explicitly allow E[X] = ∞.


2. Notions of convergence

• Convergence of random variables

• Convergence in probability

• Bounded Convergence Theorem

• Monotone Convergence Theorem

• Fatou’s lemma

• Dominated Convergence Theorem

• Linear spaces and norms

• Lp spaces


Convergence of random variables

a) Xn → X a.s. iff Pω : limn→∞Xn(ω) = X(ω) = 1.

b) Xn → X in probability iff ∀ε > 0, limn→∞ P|Xn −X| > ε = 0.

c) Xn converges to X in distribution (denoted Xn ⇒ X) iff

limn→∞

PXn ≤ x = PX ≤ x ≡ FX(x)

for all x at which FX is continuous.


Relationship among notions of convergence

Theorem 2.1 a) implies b) implies c).

Proof. (a⇒ b) P|Xn −X| > ε ≤ Psupm≥n |Xm −X| > ε and

lim supn→∞

P|Xn−X| > ε ≤ P (∩nsupm≥n

|Xm−X| > ε) ≤ P limn→∞

Xn 6= X = 0

(b⇒ c) Let ε > 0. Then

PXn ≤ x − PX ≤ x+ ε ≤ PXn ≤ x,X > x+ ε≤ P|Xn −X| > ε

and hence lim supPXn ≤ x ≤ PX ≤ x+ ε. Similarly,

lim inf PXn ≤ x ≥ PX ≤ x− ε

Since ε is arbitrary, the implication follows.


Convergence in probability

Lemma 2.2 a) If Xn → X in probability and Yn → Y in probability thenaXn + bYn → aX + bY in probability.

b) If Q : R → R is continuous and Xn → X in probability, thenQ(Xn) → Q(X) in probability.

c) If Xn → X in probability and Xn − Yn → 0 in probability, then Yn →X in probability.

Remark 2.3 (b) and (c) hold with convergence in probability replaced byconvergence in distribution; however (a) is not in general true for conver-gence in distribution.


Bounded Convergence Theorem

Theorem 2.4 Suppose that Xn ⇒ X and that there exists a constant bsuch that P|Xn| ≤ b = 1. Then E[Xn] → E[X].

Proof. Let xi be a partition of R such that FX is continuous at eachxi. Then∑

i

xiPxi < Xn ≤ xi+1 ≤ E[Xn] ≤∑

i

xi+1Pxi < Xn ≤ xi+1

and taking limits we have∑i

xiPxi < X ≤ xi+1 ≤ lim infn→∞

E[Xn]

≤ lim supn→∞

E[Xn] ≤∑

i

xi+1Pxi < X ≤ xi+1

As max |xi+1 − xi| → 0, the left and right sides converge to E[X]giving the theorem.


Convergence of bounded trunctaion

Lemma 2.5 Let X ∈ [0,∞] a.s. (allowing PX = ∞ > 0). ThenlimM→∞E[X ∧M ] = E[X].

Proof.Check the result first for X having a discrete distribution andthen extend to general X by approximation.


Monotone Convergence Theorem

Theorem 2.6 Suppose 0 ≤ Xn ≤ X and Xn → X ∈ [0,∞] in probability.Then limn→∞E[Xn] = E[X] (allowing ∞ = ∞).

Proof. For M > 0,

E[X] ≥ E[Xn] ≥ E[Xn ∧M ] → E[X ∧M ]

where the convergence on the right follows from the bounded con-vergence theorem. It follows that

E[X ∧M ] ≤ lim infn→∞

E[Xn] ≤ lim supn→∞

E[Xn] ≤ E[X]

and the result follows by Lemma 2.5.


Example

Lemma 2.7 Suppose the PYk ≥ 0 = 1 and∑∞

k=1E[Yk] <∞. Then

Y ≡∞∑

k=1

Yk <∞ a.s.

and E[Y ] =∑∞

k=1E[Yk].

Proof. By the monotone convergence theorem,

E[Y ] = limn→∞

E[n∑

k=1

Yk] =∞∑

k=1

E[Yk] <∞.

Since E[Y ] <∞, PY <∞ = 1.


Fatou’s lemma

Lemma 2.8 If Xn ≥ 0 and Xn ⇒ X , then lim inf E[Xn] ≥ E[X].

Proof. Since E[Xn] ≥ E[Xn ∧M ] we have

lim inf E[Xn] ≥ lim inf E[Xn ∧M ] = E[X ∧M ].

By the Monotone Convergence Theorem E[X ∧M ] → E[X] and thelemma folllows.


Dominated Convergence Theorem

Theorem 2.9 Assume Xn ⇒ X , Yn ⇒ Y , |Xn| ≤ Yn, and E[Yn] →E[Y ] <∞. Then E[Xn] → E[X].

Proof. For simplicity, assume in addition that Xn + Yn ⇒ X + Y andYn − Xn ⇒ Y − X (otherwise consider subsequences along which(Xn, Yn) ⇒ (X, Y )). Then by Fatou’s lemma lim inf E[Xn + Yn] ≥E[X+Y ] and lim inf E[Yn−Xn] ≥ E[Y −X]. From these observationslim inf E[Xn] + limE[Yn] ≥ E[X] + E[Y ], and hence lim inf E[Xn] ≥E[X]. Similarly lim inf E[−Xn] ≥ E[−X] and lim supE[Xn] ≤ E[X]


Markov inequality

Lemma 2.10

P|X| > a ≤ E[|X|]/a, a > 0.

Proof. Note that |X| ≥ a1|X|>a. Taking expectations proves thedesired inequality.


Linear spaces

A set L is a real linear space if there is a notion of scalar multiplication(a, f) ∈ R × L → af ∈ L and addition (f, g) ∈ L → f + g ∈ L withthe following properties:

1. Associativity: f + (g + h) = (f + g) + h

2. Commutativity: f + g = g + f

3. Existence of identity: f + 0 = f

4. Existence of an inverse: f + (−f) = 0

5. Distributivity: a(f + g) = af + ag and (a+ b)f = af + bf

6. Compatible with multiplication in R: a(bf) = (ab)f

7. Scalar identity: 1f = f


Norms

Definition 2.11 ‖ · ‖ : L→ [0,∞) is a norm if

1. ‖af‖ = |a|‖f‖

2. ‖f + g‖ ≤ ‖f‖+ ‖g‖ (triangle inequality)

3. ‖f‖ = 0 implies f = 0.


Lp spaces

For 1 ≤ p < ∞, Lp is the collection of random variables X withE[|X|p] <∞ and the Lp-norm is defined by ||X||p = E[|X|p]1/p.

L∞ is the collection of random variables X such that P|X| ≤ c = 1for some c <∞, and ||X||∞ = infc : P|X| ≤ c = 1.


Properties of Lp norms

1) ||X||p = 0 implies X = 0 a.s. .

2) |E[XY ]| ≤ ||X||p||Y ||q 1p + 1

q = 1.

3) ||X + Y ||p ≤ ||X||p + ||Y ||p


Inequalities for (p = q = 2)

Schwartz inequality: Note that

0 ≤ E[(aX + bY )2] = a2E[X2] + 2abE[XY ] + b2E[Y 2].

Assume thatE[XY ] ≤ 0 (otherwise replaceX by−X) and take a, b >0. Then

−E[XY ] ≤ a

2bE[X2] +

b

2aE[Y 2] .

Take a = ‖Y ‖2 and b = ‖X‖2.

Triangle inequality: We have

‖X + Y ‖22 = E[(X + Y )2]

= E[X2] + 2E[XY ] + E[Y 2]

≤ ‖X‖22 + 2‖X‖2‖Y ‖2 + ‖Y ‖2

2

= (‖X‖2 + ‖Y ‖2)2.


Norms determine metrics

It follows that rp(X,Y ) = ||X − Y ||p defines a metric on Lp, the spaceof random variables satisfying E[|X|p] < ∞. (Note that we identifytwo random variables that differ on a set of probability zero.)

A sequence in a metric space is Cauchy if

limn,m→∞

rp(Xn, Xm) = 0

and a metric space is complete if every Cauchy sequence has a limit.


Completeness of Lp spaces

Lemma 2.12 For 1 ≤ p ≤ ∞, Lp is complete.

Proof. For example, in the case p = 1, suppose Xn is Cauchy andlet nk satisfy supm>nk

‖Xm − Xnk‖1 = supm>nk

E[|Xm − Xnk|] ≤ 4−k.

Then∑∞

k=1E[|Xnk+1−Xnk

|] <∞, so by Lemma 2.7, Y =∑∞

k=1 |Xnk+1−

Xnk| <∞ a.s. and with probability one, the series

X ≡ Xn1+

∞∑k=1

(Xnk+1−Xnk

) = limk→∞

Xnk

is absolutely convergent. It follows that |X − Xnk| ≤ Y and by the

dominated convergence theorem and the Cauchy property

limm→∞

‖Xm −X‖1 ≤ limk,m→∞

‖X −Xnk‖1 + ‖Xnk

−Xm‖1 = 0.


More on convergence in probability

Since by the Markov inequality

P|Xn −X| ≥ ε ≤ E[|Xn −X|p]εp

,

convergence in Lp implies convergence in probability.

Lemma 2.13 Convergence in probability is metrizable by taking

ρ0(X, Y ) = infε > 0 : P|X − Y | ≥ ε ≤ ε,

and the space of real-valued random variables with metric ρ0 (sometimesdenoted L0) is complete.


3. Continuous time stochastic processes.

• Random variables in a functionspace

• Properties of cadlag functions

• Filtrations

• Stopping times

• Poisson process

• Brownian motion

General assumption: (Ω,F , P ) is a complete probability space.


Random variables in a functionspaceA continuous time stochastic process is a random function defined onthe time interval [0,∞).

For each ω ∈ Ω, X(·, ω) is a real or vector-valued function (or moregenerally, E-valued for some complete, separable metric space E).

We assume that all stochastic processes are cadlag, that is, for eachωεΩ, X(·, ω) is a right continuous function with left limits at eacht > 0.

DE[0,∞) will denote the collection of cadlag E-valued functions on[0,∞). DE[0,∞) is sometimes refered to as Skorohod space.


Properties of cadlag functions

Lemma 3.1 For each ε > 0, a cadlag function has, at most, finitely manydiscontinuities of magnitude greater than ε in any compact time intervaland hence at most countably many discontinuities in [0,∞).

Proof. If there were infinitely many values of t ∈ [0, T ] with

r(x(t), x(t−)) > ε,

this set, call it Γε,T , would have a right or left limit point, destroyingthe cadlag property. The collection of all discontinuities is the unionof Γε,T over rational ε and T and, hence, is countable.


Cadlag processes are determined by countably manytime pointsIf X is a cadlag process, then it is completely determined by thecountable family of random variables, X(t) : t rational.

It is possible to define a metric on DE[0,∞) so that it becomes a com-plete, separable metric space.

The distribution of an E-valued, cadlag process is then defined byµX(B) = PX(·) ∈ B for B ∈ B(DE[0,∞)).


Process distribution determined by finite dimensionaldistributions

Theorem 3.2 LetX be anE-valued, cadlag process. Then µX onDE[0,∞)is determined by its finite dimensional distributions µt1,t2,...,tn : 0 ≤ t1 ≤t2 ≤ . . . tn ; n ≥ 0 where

µt1,t2,...,tn(Γ) = P(X(t1), X(t2), . . . , X(tn)) ∈ Γ, Γ ∈ B(En).


Filtrations

Definition 3.3 A collection of σ-algebras Ft, satisfying

Fs ⊆ Ft ⊆ F

for all s ≤ t is called a filtration.

A stochastic processX is adapted to a filtration Ft ifX(t) isFt-measurablefor all t ≥ 0.

A filtration Ft is complete if F0 contains all events of probability zeroand is right continuous if Ft = ∩s>tFs.

Ft is interpreted as corresponding to the information available attime t (the amount of information increasing as time progresses). Ifa process is adapted, then the state of the process at time t is part ofthe information available at time t.


The natural filtration corresponding to a processLet X be a stochastic process. Then FX

t = σ(X(s) : s ≤ t) denotesthe smallest σ-algebra such that X(s) is FX

t -measurable for all s ≤ t.FX

t is called the natural filtration generated by X .

Sometimes the term “natural filtration” is used for the right continu-ous completion of FX

t . We will denote the right continuous com-pletion of FX

t by FXt , that is, assuming (Ω,F , P ) is complete,

FXt = ∩s>t(σ(N ) ∨ FX

s ),

where σ(N ) denotes the σ-algebra generated by the null sets in F .


Classes of stochastic processes

AnE-valued stochastic processX adapted to Ft is a Markov processwith respect to Ft if

E[f(X(t+ s))|Ft] = E[f(X(t+ s))|X(t)]

for all t, s ≥ 0 and f ∈ B(E), the bounded, measurable functions onE.

A real-valued stochastic process X adapted to Ft is a martingalewith respect to Ft if

E[X(t+ s)|Ft] = X(t) (3.1)

for all t, s ≥ 0.


Stopping times

Definition 3.4 A random variable τ with values in [0,∞] is an Ft-stopping time if

τ ≤ t ∈ Ft ∀ t ≥ 0.

Lemma 3.5 Let X be a cadlag stochastic process that is Ft-adapted. If Kis closed, τK = inft : X(t) or X(t−) ∈ K is a stopping time.

Proof.

τK ≤ t = X(t) ∈ K ∪ ∩n ∩s<t,s∈Q X(s) ∈ K1/n,where Kε = x : infy∈K |x− y| < ε.

In general, for B ∈ B(R), τB = inft : X(t) ∈ B is not a stoppingtime; however, if (Ω,F , P ) is complete and the filtration Ft is com-plete and right continuous, then for any B ∈ B(R), τB is a stoppingtime.


Closure properties of the collection of stopping times

If τ , τ1, τ2 . . . are stopping times and c ≥ 0 is a constant, then

1) τ1 ∨ τ2 and τ1 ∧ τ2 are stopping times.

2) τ + c, τ ∧ c, and τ ∨ c are stopping times.

3) supk τk is a stopping time.

4) If Ft is right continuous, then

infkτk, lim inf

k→∞τk, lim sup

k→∞τk

are stopping times.


Discrete approximation of stopping times

Lemma 3.6 Let τ be a stopping time and for n = 1, 2, . . ., define

τn =k + 1

2n, if

k

2n≤ τ <

k + 1

2n, k = 0, 1, . . . .

Then τn is a decreasing sequence of stopping times converging to τ .

Proof. Observe that

τn ≤ t = τn ≤[2nt]

2n = τ < [2nt]

2n ∈ Ft.


Information at a stopping time

Definition 3.7 For a stopping time τ , define

Fτ = A ∈ F : A ∩ τ ≤ t ∈ Ft,∀t ≥ 0.

Then Fτ is a σ-algebra and is interpreted as representing the infor-mation available to an observer at the random time τ . Occasionally,one also uses

Fτ− = σA ∩ t < τ : A ∈ Ft, t ≥ 0 ∨ F0.


Properties of the information σ-algebras

Lemma 3.8 If τ1 and τ2 are stopping times and τ1 ≤ τ2, then Fτ1⊂ Fτ2

.

Proof. Let A ∈ Fτ1. Then

A ∩ τ2 ≤ t = A ∩ τ1 ≤ t ∩ τ2 ≤ t ∈ Ft,

and hence A ∈ Fτ2.

Lemma 3.9 τ is Fτ -measurable.


Lemma 3.10 If X is cadlag and Ft-adapted and τ is a stopping time,then X(τ) is Fτ -measurable and X(τ ∧ ·) is Ft-adapted.

Proof. Let τn be as in Lemma 3.6. Then

X(τn ∧ t) ≤ c = ∪k(X(k

2n∧ t) ≤ c ∩ τn ∧ t =

k

2n∧ t) ∈ Ft,

and X(τn ∧ t) is Ft-measurable. By the right continuity of X ,

limn→∞

X(τn ∧ t) = X(τ ∧ t)

and X(τ ∧ t) is Ft-measurable.

To see that X(τ) is Fτ -measurable, note that

X(τ) ≤ c ∩ τ ≤ t= (X(t) ≤ c ∩ τ = t) ∪ (X(τ ∧ t) ≤ c ∩ τ < t) ∈ Ft.


4. Information and conditional expectation

• Information

• Independence

• Conditional expectation

• Properties of conditional expectations

• Jensen’s inequality

• Functions of known and unknown random variables

• Convergence of conditional expectations


Information

Information obtained by observations of the outcome of a randomexperiment is represented by a sub-σ-algebra D of the collection ofevents F . If D ∈ D, then the observer “knows” whether or not theoutcome is in D.


Independence

Two σ-algebras D1,D2 are independent if

P (D1 ∩D2) = P (D1)P (D2), ∀D1 ∈ D1, D2 ∈ D2.

An S-valued random variable Y is independent of a σ-algebra D if

P (Y ∈ B ∩D) = PY ∈ BP (D),∀B ∈ B(S), D ∈ D.

Random variables X and Y are independent if σ(X) and σ(Y ) areindependent, that is, if

P (X ∈ B1 ∩ Y ∈ B2) = PX ∈ B1PY ∈ B2.


Conditional expectationInterpretation of conditional expectation in L2.

Problem: Approximate X ∈ L2 using information represented byD such that the mean square error is minimized, i.e., find the D-measurable random variable Y that minimizes E[(X − Y )2].

Solution: Suppose Y is a minimizer. For any ε 6= 0 and any D-measurable random variable Z ∈ L2

E[|X−Y |2] ≤ E[|X−Y−εZ|2] = E[|X−Y |2]−2εE[Z(X−Y )]+ε2E[Z2].

Hence 2εE[Z(X−Y )] ≤ ε2E[Z2]. Since ε is arbitrary,E[Z(X−Y )] = 0and hence

E[ZX] = E[ZY ] (4.1)

for every D-measurable Z with E[Z2] <∞.


Definition of conditional expectationLet X be an integrable random variable (that is, E[|X|] < ∞.) Theconditional expectation of X , denoted E[X|D], is the unique (up tochanges on events of probability zero) random variable Y satisfying

a) Y is D-measurable.

b)∫

D XdP =∫

D Y dP for all D ∈ D. (∫

D XdP = E[1DX])

Existence is discussed in the Appendix.

Condition (b) implies that (4.1) holds for all bounded D-measurablerandom variables.


Verifying Condition (b)

Lemma 4.1 Let C ⊂ F be a collection of events such that Ω ∈ C and C isclosed under intersections, that is, if D1, D2 ∈ C, then D1 ∩D2 ∈ C. If Xand Y are integrable and ∫

D

XdP =

∫D

Y dP (4.2)

for all D ∈ C, then (4.2) holds for all D ∈ σ(C) (the smallest σ-algebracontaining C).

Proof. The lemma follows by the Dynkin class theorem.


Discrete case

Assume thatD = σ(D1, D2, . . . , ) where⋃∞

i=1Di = Ω, and Di∩Dj = ∅whenever i 6= j. Let X be any F-measurable random variable. Then,

E[X|D] =∞∑i=1

E[X1Di]

P (Di)1Di

a) The right hand side is D-measurable.

b) Any D ε D can be written as D =⋃

iεADi, where A ⊂ 1, 2, 3, . . ..Therefore,∫

D

∞∑i=1

E[X · 1Di]

P (Di)1Di

dP =∞∑i=1

E[X · 1Di]

P (Di)

∫D∩Di

1DidP (monotone conv thm)

=∑iεA

E[X · 1Di]

P (Di)P (Di)

=

∫D

XdP


Properties of conditional expectation

Assume that X and Y are integrable random variables and that D isa sub-σ-algebra of F .

1) E[E[X|D]] = E[X]. Just take D = Ω in Condition B.

2) If X ≥ 0 then E[X|D] ≥ 0. The property holds because Y =E[X|D] is D-measurable and

∫D Y dP =

∫D XdP ≥ 0 for every

D ε D. Therefore, Y must be positive a.s.


6) If Y is D-measurable and Y X is integrable, then E[Y X|D] =Y E[X|D]. First assume that Y is a simple random variable, i.e.,let Di∞i=1 be a partition of Ω, Di ε D, ci ∈ R, for 1 ≤ i ≤ ∞, anddefine Y =

∑∞i=1 ci1Di

. Then,∫D

Y XdP =

∫D

( ∞∑i=1

ci1Di

)XdP =

∞∑i=1

ci

∫D∩Di

XdP

=∞∑i=1

ci

∫D∩Di

E[X|D]dP =

∫D

( ∞∑i=1

ci1Di

)E[X|D]dP

=

∫D

Y E[X|D]P

For general Y , approximate by a sequence Yn∞n=1 of simple ran-dom variables, for example, defined by Yn = k

n if kn ≤ Y < k+1

n ,k ∈ Z. Then Yn converges to Y , and the result follows by thedominated convergence theorem.


7) If X is independent of D, then E[X|D] = E[X]. Independenceimplies that for D ∈ D, E[X1D] = E[X]P (D),∫

D

XdP = E[X1D]

= E[X]

∫Ω1DdP

=

∫D

E[X]dP

Since E[X] is D-measurable, E[X] = E[X|D].

8) If D1 ⊂ D2 then E[E[X|D2]|D1] = E[X|D1]. Note that if D ε D1

then D ε D2. Therefore,∫D

XdP =

∫D

E[X|D2]dP

=

∫D

E[E[X|D2]|D1]dP.


Convex functionsA function φ : R → R is convex if and only if for all x and y in R, andλ in [0, 1], φ(λx+ (1− λ)y) ≤ λφ(x) + (1− λ)φ(y).

Let x1 < x2 and y ε R. Then

φ(x2)− φ(y)

x2 − y≥ φ(x1)− φ(y)

x1 − y. (4.3)

Now assume that x1 < y < x2 and let x2 converge to y from above.The left side of (4.3) is bounded below, and its value decreases as x2

decreases to y. Therefore, the right derivative φ+ exists at y and

−∞ < φ+(y) = limx2→y+

φ(x2)− φ(y)

x2 − y< +∞.

Moreover,

φ(x) ≥ φ(y) + φ+(y)(x− y), ∀x ∈ R. (4.4)


Functions of known and unknown random variables

Lemma 4.3 Let X be an S1-valued, D-measurable random variable and Ybe an S2-valued random variable independent of D. Suppose that ϕ : S1×S2 → R is a Borel measurable function and that ϕ(X,Y ) is integrable.Define ψ(x) = E[ϕ(x, Y )]. Then, E[ϕ(X, Y )|D] = ψ(X).

Proof. For C ∈ B(S1 × S2), define ψC(x) = E[1C(x, Y )]. ψ(X) isD-measurable as X is. For D ∈ D, define µ(C) = E[1D1C(X, Y )]and ν(C) = E[1DψC(X)]. (µ and ν are measures by the monotoneconvergence theorem.) If A ∈ B(S1) and B ∈ B(S2),

µ(A×B) = E[1D1A(X)1B(Y )]

= E[1D1A(X)]E[1B(Y )]

= E[1D1A(X)E[1B(Y )]] = ν(A×B),

and µ = ν by Lemma 17.3, giving the lemma for ϕ = 1C , C ∈ B(S1 ×S2). For general ϕ, approximate by simple functions.


More general version

Lemma 4.4 Let Y be an S2-valued random variable (not necessarily inde-pendent of D). Suppose that ϕ : S1 × S2 → R is a bounded measurablefunction. Then there exists a measurable ψ : Ω × S1 → R such that foreach x ∈ S1

ψ(ω, x) = E[ϕ(x, Y )|D](ω) a.s.

andE[ϕ(X, Y )|D](ω) = ψ(ω,X(ω)) a.s.

for every D-measurable random variable X .


ExampleLet Y : Ω → N be independent of the i.i.d random variables Xi∞i=1.Then

E[Y∑

i=1

Xi|σ(Y )] = Y · E[X1]. (4.5)

Identity (4.5) follows by taking ϕ(X, Y )(ω) =∑Y (ω)

i=1 Xi(ω) and notingthat ψ(y) = E[

∑yi=1Xi] = yE[X1].


Convergence of conditional expectations

Since

E[|E[X|D]− E[Y |D]|p] = E[|E[X − Y ]|D]|p] using linearity≤ E[E[|X − Y |p|D]] using Jensen’s inequality= E[|X − Y |p]

we have

Lemma 4.5 Let Xn∞n=0 be a sequence of random variables and p ≥ 1. Iflimn→∞E[|X −Xn|p] = 0, then limn→∞E[|E[X|D]− E[Xn|D]|p] = 0.


5. Martingales

• Definitions

• Optional sampling theorem

• Doob’s inequalities

• Local martingales

• Quadratic variation

• Martingale convergence theorem


Definitions

A stochastic process X adapted to a filtration Ft is a martingalewith respect to Ft if

E[X(t+ s)|Ft] = X(t) (5.1)

for all t, s ≥ 0. It is a submartingale if

E[X(t+ s)|Ft] ≥ X(t) (5.2)

and a supermartingale if

E[X(t+ s)|Ft] ≤ X(t). (5.3)


Optional sampling theorem

Theorem 5.1 Let X be a martingale and τ1, τ2 be stopping times. Then forevery t ≥ 0

E[X(t ∧ τ2)|Fτ1] = X(t ∧ τ1 ∧ τ2).

If τ2 <∞ a.s., E[|X(τ2)|] <∞ and limt→∞E[|X(t)|1τ2>t] = 0, then

E[X(τ2)|Fτ1] = X(τ1 ∧ τ2) .

The same results hold for sub and supermartingales with = replaced by ≥(submartingales) and ≤ (supermartingales).

Proof. See, for example, Ethier and Kurtz [2], Theorem 2.2.13.


Doob’s inequalities

Theorem 5.2 If X is a non-negative sub-martingale, then

Psups≤t

X(s) ≥ x ≤ E[X(t)]

x

and for α > 1

E[sups≤t

X(s)α] ≤ (α/α− 1)αE[X(t)α].

Proof. Let τx = inft : X(t) ≥ x and set τ2 = t and τ1 = τx. Thenfrom the optional sampling theorem we have that

E[X(t)|Fτx] ≥ X(t ∧ τx) ≥ 1τx≤tX(τx) ≥ x1τx≤t a.s.

so we have that

E[X(t)] ≥ xPτx ≤ t = xPsups≤t

X(s) ≥ x

See Ethier and Kurtz, Proposition 2.2.16 for the second inequality.


Convex transformations

Lemma 5.3 If M is a martingale and ϕ is convex with E[|ϕ(M(t))|] <∞,then

X(t) ≡ ϕ(M(t))

is a sub-martingale.

Proof.E[ϕ(M(t+ s))|Ft] ≥ ϕ(E[M(t+ s)|Ft])

by Jensen’s inequality.

From the above lemma, it follows that if M is a martingale, then

Psups≤t

|M(s)| ≥ x ≤ E[|M(t)|]x

(5.4)

andE[sup

s≤t|M(s)|2] ≤ 4E[M(t)2]. (5.5)


Local martingales

M is a local martingale if there exists a sequence of stopping timesτn such that limn→ τn = ∞ a.s. and for each n, M τn ≡ M(· ∧ τn) is amartingale. τn is called a localizing sequence for M .

The total variation of Y up to time t is defined as

Tt(Y ) ≡ sup∑

|Y (ti+1)− Y (ti)|

where the supremum is over all partitions of the interval [0, t]. Y isan FV-process if Tt(Y ) <∞ for each t > 0.


Fundamental Theorem of local martingales

Theorem 5.4 Let M be a local martingale, and let δ > 0. Then there existlocal martingales M and A satisfying M = M + A such that A is FV andthe discontinuities of M are bounded by δ.

Proof. See Protter [3], Theorem III.13.

One consequence of this theorem is that any local martingale can bedecomposed into an FV process and a local square integrable mar-tingale. Specifically, if γc = inft : |M(t)| ≥ c, then M(· ∧ γc) is asquare integrable martingale. (Note that |M(· ∧ γc)| ≤ c+ δ.)


Quadratic variation

The quadratic variation of a process Y is defined as

[Y ]t = limmax |ti+1−ti|→0

∑(Y (ti+1)− Y (ti))

2

where convergence is in probability.

The limit exists if for every ε > 0 there exists a δ > 0 such that forevery partition ti of the interval [0, t] satisfying max |ti+1 − ti| ≤ δ

P|[Y ]t −∑

(Y (ti+1)− Y (ti))2| ≥ ε ≤ ε.


Quadratic variation for FV processes

Lemma 5.5 If Y is FV, then [Y ]t =∑

s≤t(Y (s)−Y (s−))2 =∑

s≤t ∆Y (s)2

where the summation is over the points of discontinuity and ∆Y (s) ≡Y (s)− Y (s−) is the jump in Y at time s.

For any partition of [0, t]∑(Y (ti+1)− Y (ti))

2 −∑

|Y (ti+1)−Y (ti)|>ε

(Y (ti+1)− Y (ti))2 ≤ εTt(Y ).


Quadratic variation of martingales

Proposition 5.6 a) If M is a local martingale, then [M ]t exists and is rightcontinuous.

b) If M is a square integrable martingale, then the limit

limmax |ti+1−ti|→0

∑(M(ti+1)−M(ti))

2

exists in L1, and if M(0) = 0, E[M(t)2] = E[[M ]t].

Proof. See, for example, Ethier and Kurtz [2], Proposition 2.3.4.


Square integrable martingales

Lemma 5.7 If M is a square integrable martingale with M(0) = 0, thenE[M(t)2] = E[[M ]t].

Proof. Write M(t) =∑m−1

i=0 (M(ti+1)−M(ti)), 0 = t0 < · · · < tm = t.

E[M(t)2] = E[(m−1∑i=0

M(ti+1)−M(ti))2] (5.6)

= E[m−1∑i=0

(M(ti+1)−M(ti))2 +

∑i6=j

(M(ti+1)−M(ti))(M(tj+1)−M(tj))].

For ti < ti+1 ≤ tj < tj+1.

E[(M(ti+1)−M(ti))(M(tj+1)−M(tj))] (5.7)= E[E[(M(ti+1)−M(ti))(M(tj+1)−M(tj))|Ftj ]]

= E[(M(ti+1)−M(ti))(E[M(tj+1)|Ftj ]−M(tj))]

= 0,

The lemma follows by the L1 convergence in Proposition 5.6.


Examples

If M(t) = N(t) − λt where N(t) is a Poisson process with parameterλ, then [M ]t = N(t), and since M(t) is square integrable, the limitexists in L1.

For standard Brownian motion W , [W ]t = t. To check this identity,apply the law of large numbers to

[nt]∑k=1

(W (k

n)−W (

k − 1

n))2.


Martingale properties

Proposition 5.8 IfM is a square integrable Ft-martingale, thenM(t)2−[M ]t is an Ft-martingale. In particular, if W is standard Brownian mo-tion, then W (t)2 − t is a martingale.

Proof. For t, s ≥ 0, let ui be a partition of (0, t+s]. SinceE[(M(uj+1)−M(uj))(M(ui+1)−M(ui))|Ft] = 0, for i 6= j, by the L1 convergence

E[M(t+ s)2|Ft] = E[(M(t+ s)−M(t))2|Ft] +M(t)2

= E[(n−1∑i=1

M(ui+1)−M(ui))2|Ft] +M(t)2

= E[n−1∑i=1

(M(ui+1)−M(ui))2|Ft] +M(t)2

= E[[M ]t+s − [M ]t|Ft] +M(t)2.


Martingale convergence theorem

Theorem 5.9 Let X be a submartingale satisfying suptE[|X(t)|] < ∞.Then limt→∞X(t) exists a.s.

Proof. See, for example, Durrett [1], Theorem 4.2.10.


6. Poisson process and Brownian motion

• Poisson process

• Basic assumptions

• The Poisson process as a renewal process

• Brownian motion


Poisson processA Poisson process is a model for a series of random observationsoccurring in time. For example, the process could model the arrivalsof customers in a bank, the arrivals of telephone calls at a switch, orthe counts registered by radiation detection equipment.

Let N(t) denote the number of observations by time t. We assumethat N is a counting process, that is, the observations come one at atime, so N is constant except for jumps of +1. For t < s, N(s)−N(t)is the number of observations in the time interval (t, s].


Basic assumptions

1) The observations occur one at a time.

2) Numbers of observations in disjoint time intervals are indepen-dent random variables, that is, N has independent increments.

3) The distribution of N(t+ a)−N(t) does not depend on t.

Theorem 6.1 Under assumptions 0), 1), and 2), there is a constant λ suchthat N(s)−N(t) is Poisson distributed with parameter λ(s− t), that is,

PN(s)−N(t) = k =(λ(s− t))k

k!eλ(s−t).

If Theorem 6.1 holds, then we refer to N as a Poisson process withparameter λ. If λ = 1, we will call N the unit Poisson process.


Time inhomogeneous Poisson processes

Lemma 6.2 If (1) and (2) hold and Λ(t) = E[N(t)] is continuous andΛ(0) = 0, then

N(t) = Y (Λ(t)),

where Y is a unit Poisson process.


Jump times

Let N be a Poisson process with parameter λ, and let Sk be the timeof the kth observation. Then

PSk ≤ t = PN(t) ≥ k = 1−k−1∑i=0

(λt)i

ieλt, t ≥ 0.

Differentiating to obtain the probability density function gives

fSk(t) =

λ(λt)k−1e−λt t ≥ 0

0 t < 0.


The Poisson process as a renewal process

The Poisson process can also be viewed as the renewal process basedon a sequence of exponentially distributed random variables.

Theorem 6.3 Let T1 = S1 and for k > 1, Tk = Sk − Sk−1. Then T1, T2, . . .

are independent and exponentially distributed with parameter λ.


Gaussian distributions

(ξ1, . . . , ξd) has a Gaussian distribution on Rd if∑d

k=1 akξk is a real-valued Gaussian random variable for each (a1, . . . , ak) ∈ Rd.

Recall that the density function of a Gaussian (normal) random vari-able with expectation µ and variance σ2 is given by

fµ,σ(x) =1√

2πσ2exp−(x− µ)2

2σ2

Lemma 6.4 If (ξ1, . . . , ξd) is Gaussian distributed, then the joint distribu-tion is determined by µi = E[ξi], Cov(ξi, ξj), i, j = 1, . . . , d.


Brownian motionStandard Brownian motion W is a Gaussian process with E[W (t)] = 0and Cov(W (t),W (s)) = t ∧ s.

Equivalently, standard Brownian motion is a Gaussian process withmean zero and stationary, independent increments satisfying

V ar(W (t+ s)−W (t)) = s.


Properties of Brownian motion

Proposition 6.5 Standard Brownian motion, W , is both a martingale anda Markov process.

Proof. Let Ft = σ(W (s) : s ≤ t). Then

E[W (t+ s)|Ft] = E[W (t+ s)−W (t) +W (t)|Ft]

= E[W (t+ s)−W (t)|Ft] + E[W (t)|Ft]

= E[W (t+ s)−W (t)] + E[W (t)|Ft]

= E[W (t)|Ft = W (t)

Define T (s)f(x) = E[f(x+W (s))], and note that

E[f(W (t+ s))|Ft] = E[f(W (t+ s)−W (t) +W (t))|Ft]

= T (s)f(W (t))

= E[f(W (t+ s))|W (t)]


7. Stochastic integrals

• Definition

• Existence for finite variation processes

• Existence for square integrable martingales

• L2 isometry

• General existence result

• Semimartingales

• Approximation of stochastic integrals

• Change of integrator

• Change of time variable

• Other definitions


Stochastic integrals for cadlag processesLetX and Y be cadlag processes, and let ti denote a partition of theinterval [0, t]. If the limit as max |ti+1 − ti| → 0 exists in probability,∫ t

0X(s−)dY (s) ≡ lim

∑X(ti)(Y (ti+1)− Y (ti)). (7.1)

For W , standard Brownian motion,∫ t

0W (s)dW (s) = lim

∑W (ti)(W (ti+1)−W (ti)) (7.2)

= lim∑

(W (ti)W (ti+1)−1

2W (ti+1)

2 − 1

2W (ti)

2)

+∑

(1

2W (ti+1)

2 − 1

2W (ti)

2)

=1

2W (t)2 − lim

1

2

∑(W (ti+1)−W (ti))

2

=1

2W (t)2 − 1

2t.


Significance of evaluation at the left end point

If we replace ti by ti+1 in (7.2), we obtain

lim∑

W (ti+1)(W (ti+1)−W (ti))

= lim∑

(−W (ti)W (ti+1) +1

2W (ti+1)

2 +1

2W (ti)

2)

+∑

(1

2W (ti+1)

2 − 1

2W (ti)

2)

=1

2W (t)2 + lim

1

2

∑(W (ti+1)−W (ti))

2

=1

2W (t)2 +

1

2t.


Similarly, if N is a Poisson process

lim∑

N(ti)(N(ti+1)−N(ti)) =

N(t)∑k=1

(k − 1)

while

lim∑

N(ti+1)(N(ti+1)−N(ti)) =

N(t)∑k=1

k


Definition of the stochastic integral

For any partition ti of [0,∞), 0 = t0 < t1 < t2 < . . ., and any cadlagx and y, define

S(t, ti, x, y) =∑

x(ti)(y(t ∧ ti+1)− y(t ∧ ti)).

Definition 7.1 For stochastic processes X and Y , define Z =∫X−dY if

for each T > 0 and each ε > 0, there exists a δ > 0 such that

Psupt≤T

|Z(t)− S(t, ti, X, Y )| ≥ ε ≤ ε

for all partitions ti satisfying max |ti+1 − ti| ≤ δ.


Example

IfX is piecewise constant, that is, for some collection of random vari-ables ξi and random variables τi satisfying 0 = τ0 < τ1 < · · ·,

X =∑

ξi1[τi,τi+1) ,

then ∫ t

0X(s−)dY (s) =

∑ξi(Y (t ∧ τi+1)− Y (t ∧ τi))

=∑

X(τi)(Y (t ∧ τi+1)− Y (t ∧ τi)) .


Conditions for existence: Finite variation processes

The total variation of Y up to time t is defined as

Tt(Y ) ≡ sup∑

|Y (ti+1)− Y (ti)|

where the supremum is over all partitions of the interval [0, t].

Proposition 7.2 Tt(f) <∞ for each t > 0 if and only if there exist mono-tone increasing functions f1, f2 such that f = f1 − f2. If Tt(f) <∞, thenf1 and f2 can be selected so that Tt(f) = f1 + f2. If f is cadlag, then Tt(f)is cadlag.

Proof.Note that

Tt(f)− f(t) = sup∑

(|f(ti+1)− f(ti)| − (f(ti+1)− f(ti)))

is an increasing function of t, as is Tt(f) + f(t).


Existence

Theorem 7.3 If Y is of finite variation then∫X−dY exists for all X ,∫

X−dY is cadlag, and if Y is continuous,∫X−dY is continuous. (Re-

call that we are assuming throughout that X is cadlag.)

Proof. Let ti, si be partitions. Let ui be a refinement of both.Then there exist ki, li, k

′i, l

′i such that

Y (ti+1)− Y (ti) =

li∑j=ki

Y (uj+1)− Y (uj)

Y (si+1)− Y (si) =

l′i∑j=k′i

Y (uj+1)− Y (uj).


Define t(u) = ti, ti ≤ u < ti+1 and s(u) = si, si ≤ u < si+1, so that

|S(t, ti, X, Y )− S(t, si, X, Y )| (7.3)

= |∑

X(t(ui))(Y (ui+1 ∧ t)− Y (ui ∧ t))

−∑

X(s(ui))(Y (ui+1 ∧ t)− Y (ui ∧ t))|

≤∑

|X(t(ui))−X(s(ui))||Y (ui+1 ∧ t)− Y (ui ∧ t)|.

There is a measure µY such that Tt(Y ) = µY (0, t]. Since |Y (b)−Y (a)| ≤µY (a, b], the right side of (7.3) is less than∑

|X(t(ui))−X(s(ui))|µY (ui ∧ t, ui+1 ∧ t]

=∑∫

(ui∧t,ui+1∧t]|X(t(u−))−X(s(u−))|µY (du)

=

∫(0,t]

|X(t(u−))−X(s(u−))|µY (du).


lim |X(t(u−))−X(s(u−))| = 0,

so ∫(0,t]

|X(t(u−))−X(s(u−))|µY (du) → 0 (7.4)

by the bounded convergence theorem. Since the integral in (7.4) ismonotone in t, the convergence is uniform on bounded time inter-vals.


Representation of quadratic variation

Note that∑(Y (ti+1)− Y (ti))

2 = Y (t)2 − Y (0)2 − 2∑

Y (ti)(Y (ti+1)− Y (ti))

so that

[Y ]t = Y (t)2 − Y (0)2 − 2

∫ t

0Y (s−)dY (s)

and [Y ]t exists if and only if∫Y−dY exists.


Conditions for existence: Square integrable martingales

If M is a square integrable martingale and X is bounded (by a con-stant) and adapted, then for any partition ti,

Y (t) = S(t, ti, X,M) =∑

X(ti)(M(t ∧ ti+1)−M(t ∧ ti))

is a square-integrable martingale. (In fact, each summand is a square-integrable martingale.)

Theorem 7.4 Suppose M is a local square integrable Ft-martingale andX is cadlag and Ft-adapted. Then

∫X−dM exists.


Reduction to bounded X and square integrable martin-gale M

Lemma 7.5 Let X and M be as in Theorem 7.4. Let τn be a localizingsequence for M and define Xk = (X ∧ k) ∨ (−k). Then

Psupt≤T

|S(t, ti, X,M)− S(t, si, X,M)| ≥ ε

≤ Pτn ≤ T+ Psupt≤T

|X(t)| > k

+Psupt≤T

|S(t, ti, Xk,Mτn)− S(t, si, Xk,M

τn)| ≥ ε


Proof.[of Theorem 7.4] By Lemma 7.5, it is enough to consider M asquare integrable martingale and X satisfying |X(t)| ≤ C. Then, forany partition ti, S(t, ti, X,M) is a square integrable martingale.For two partitions ti and si, define ui, t(u), and s(u) as in theproof of Theorem 7.3. Recall that t(ui), s(ui) ≤ ui, so X(t(u)) andX(s(u)) are Fu-adapted.

By Doob’s inequality and the properties of martingales,

E[supt≤T

(S(t, ti, X,M)− S(t, si, X,M))2] (7.5)

≤ 4E[(S(T, ti, X,M)− S(T, si, X,M))2]

= 4E[(∑

(X(t(ui))−X(s(ui))(M(ui+1 ∧ T )−M(ui ∧ T )))2]

= 4E[∑

(X(t(ui))−X(s(ui))2(M(ui+1 ∧ T )−M(ui ∧ T ))2]

= 4E[∑

(X(t(ui))−X(s(ui))2([M ]ui+1∧T − [M ]ui∧T )].


[M ] is nondecreasing and so determines a measure by µ[M ](0, t] =[M ]t, and it follows that

E[∑

(X(t(ui))−X(s(ui)))2([M ]ui+1∧T − [M ]ui∧T )] (7.6)

= E[

∫(0,T ]

(X(t(u−))−X(s(u−)))2µ[M ](du)],

since X(t(u)) and X(s(u)) are constant between ui and ui+1.

|∫

(0,t](X(t(u))−X(s(u)))2µ[M ](du)| ≤ 4C2µ[M ](0, t] ,

The right side of (7.6) goes to zero as max |ti+1−ti| → 0 and max |si+1−si| → 0. Consequently,

∫ t

0 X(s−)dM(s) exists by the completeness ofL2, or more precisely, by the completeness of the space of processeswith norm

‖Z‖T =√E[sup

t≤T|Z(t)|2].


Completeness in a space of stochastic processes

Lemma 7.6 LetHT be the space of cadlag, R-valued stochastic processes on[0, T ] with norm ‖Z‖T =

√E[supt≤T |Z(t)|2]. Then HT is complete.


Proof. Let Zn be a Cauchy sequence, and let nk be an increasingsubsequence satisfying ‖Zm − Znk

‖T ≤ 4−k for m ≥ nk. Since

Psupt≤T

|Znk+1(t)− Znk

(t)| ≥ 2−k ≤ 4−k,

Z(t) = limk→∞

Znk(t) = Zn1

(t) +∞∑

k=1

(Znk+1(t)− Znk

(t))

converges almost surely, uniformly in t ∈ [0, T ]. By Fatou’s lemma

E[supt≤T

|Z(t)− Znk(t)|2] ≤ lim

m→∞E[sup

t≤T|Znm

(t)− Znk(t)|2] ≤

∞∑l=k

4−2l,

and

lim supm→∞

‖Z − Zm‖T ≤ limk→∞

lim supm→∞

(‖Z − Znk‖T + ‖Zm − Znk

‖T ) = 0.


Continuity properties of intergrals

Corollary 7.7 If M is a square integrable martingale and X is adapted,then

∫X−dM is cadlag. If, in addition, M is continuous, then

∫X−dM is

continuous. If |X| ≤ C for some constant C > 0, then∫X−dM is a square

integrable martingale.


L2 isometry

Proposition 7.8 Suppose M is a square integrable martingale and

E

[∫ t

0X(s−)2d[M ]s

]<∞.

Then∫X−dM is a square integrable martingale with

E

[(

∫ t

0X(s−)dM(s))2

]= E

[∫ t

0X(s−)2d[M ]s

]. (7.7)

Remark 7.9 If W is standard Brownian motion, the identity becomes

E

[(∫ t

0X(s−)dW (s)

)2]

= E

[∫ t

0X2(s)ds

].


Proof for simple processesProof. Suppose X(t) =

∑ξi1[ti,ti+1) is an adapted simple process.

E

[(∫ t

0X(s−)dM(s)

)2]

= E[∑

X(ti)2(M(ti+1)−M(ti))

2]

= E[∑

X(ti)2 ([M ]ti+1

− [M ]ti)]

= E

[∫ t

0X2(s−)d[M ]s

].


Proof for bounded adapted processes

Let X be bounded with |X(t)| ≤ C, and for a sequence of partitionstni with limn→∞ supi |tni+1 − tni | = 0, define

Xn(t) = X(tni ), for tni ≤ t < tni+1.∫ t

0Xn(s−)dM(s) =

∑X(tni ) (M(t ∧ tni+1)−M(t ∧ tni )) →

∫ t

0X(s−)dM(s),

where the convergence is in L2. It follows that∫X−dM is a martin-

gale, and

E

[(∫ t

0

X(s−)dM(s)

)2]

= limn→∞

E

[(∫ t

0

Xn(s−)dM(s)

)2]

= limn→∞

E

[∫ t

0

X2n(s−)d[M ]s

]= E

[∫ t

0

X2(s−)d[M ]s

].

The last equality holds by the dominated convergence theorem.


General cadlag, adapted X

DefineXk(t) = (k∧X(t))∨(−k). Then∫ t

0 Xk(s−)dM(s) →∫ t

0 X(s−)dM(s)in probability, and by Fatou’s lemma,

lim infk→∞

E

[(∫ t

0Xk(s−)dM(s)

)2]≥ E

[(∫ t

0X(s−)dM(s)

)2].

limk→∞

E

[(∫ t

0Xk(s−)dM(s)

)2]

= limk→∞

E

[∫ t

0X2

k(s−)d[M ]s

](7.8)

= limk→∞

E

[∫ t

0X2(s−) ∧ k2d[M ]s

]= E

[∫ t

0X2(s−)d[M ]s

]<∞,

so

E

[∫ t

0X2(s−)d[M ]s

]≥ E

[(∫ t

0X(s−)dM(s)

)2].


Since (7.7) holds for bounded X ,

E

[(∫ t

0

Xk(s−)dM(s)−∫ t

0

Xj(s−)dM(s)

)2]

(7.9)

= E

[(∫ t

0

(Xk(s−)−Xj(s−))dM(s)

)2]

= E

[∫ t

0

|Xk(s−)−Xj(s−)|2d[M ]s

]Since |Xk(s) − Xj(s)|2 ≤ 4X(s)2, the dominated convergence theo-rem implies the right side of (7.9) converges to zero as j, k → ∞.Consequently, ∫ t

0Xk(s−)dM(s) →

∫ t

0X(s−)dM(s)

in L2, and the left side of (7.8) converges to E[(∫ t

0 X(s−)dM(s))2] giv-ing (7.7).


General existence

If∫ t

0 X(s−)dY1(s) and∫ t

0 X(s−)dY2(s) exist, then∫ t

0 X(s−)d(Y1(s) +Y2(s)) exists and is given by the sum of the other integrals.

Corollary 7.10 If Y = M + V where M is a Ft-local martingale andV is an Ft-adapted finite variation process, then

∫X−dY exists for all

cadlag, adapted X ,∫X−dY is cadlag, and if Y is continuous,

∫X−dY is

continuous.


Proof. If M is a local square integrable martingale, then there existsa sequence of stopping times τn such that M τn defined by M τn(t) =M(t ∧ τn) is a square-integrable martingale. But for t < τn,∫ t

0X(s−)dM(s) =

∫ t

0X(s−)dM τn(s),

and hence∫X−dM exists. Linearity gives existence for any Y that is

the sum of a local square integrable martingale and an adapted FVprocess. But Theorem 5.4 states that any local martingale is the sumof a local square integrable martingale and an adapted FV process,so the corollary follows.


SemimartingalesY is an Ft-semimartingale if and only if Y = M + V , where M is alocal martingale with respect to Ft and V is an Ft-adapted finitevariation process.

Lemma 7.11 If Y is a semimartingale, then Y can be written as Y = M +V , whereM is a local square integrable martingale and V is finite variation.

Proof. The results follows by Theorem 5.4.

In particular, we can takeM to have discontinuities uniformly boundedby a constant.


Semimartingales with bounded jumps

Lemma 7.12 Let Y be a semimartingale satisfying δ ≥ sup |∆Y (s)| forsome δ > 0. Then there exist a local square integrable martingale M and afinite variation process V such that Y = M + V ,

sup |M(s)−M(s−)| ≤ δ

sup |V (s)− V (s−)| ≤ 2δ.

Proof. Let Y = M+ V be a decompostion of Y into a local martingaleand an FV process. By Theorem 5.4, there exists a local martingale Mwith discontinuities bounded by δ and an FV process A such thatM = M + A. Defining V = A + V = Y −M , we see that the discon-tinuities of V are bounded by 2δ.


Closure properties of the collection of semimartingalesLet S be the collection of Ft-semimartingales.

Lemma 7.13 a) S is linear.

b) If τ is a Ft-stopping time and Y ∈ S, then Y τ ∈ S. (Y τ(t) = Y (t∧τ))

c) If ϕ is convex and Y ∈ S, then ϕ Y ∈ S.

Proof. If Y1, Y2 ∈ S and Yi = Mi + Vi, Mi a local martingale and Vi

finite variation, then for a, b ∈ R, aM1 + bM2 is a local martingale andaV1 + bV2 is finite variation.

If τ is a Ft-stopping time, M is a local martingale, and V is finitevariation, then M τ is a local martingale and V τ is finite variation.

Part (c) is Protter [3], Theorem IV.47.


Integrals against finite variation processes

Lemma 7.14 If V is of finite variation, then Z(t) =∫ t

0 X(s−)dV (s) is offinite variation.

Proof. For partitions ti of (a, b],

|Z(b)− Z(a)| = lim∣∣∣∑X(ti) (V (ti+1)− V (ti))

∣∣∣≤ lim

∑|X(ti)| |V (ti+1)− V (ti)|

≤ lim∑

|X(ti)|(Tti+1

(V )− Tti(V ))

=

∫ b

a

|X(s−)|dTs(V )

and hence

Tt(Z) ≤∫ t

0|X(s−)|dTs(V ). (7.10)


Stochastic integrals against local square integrable mar-tingales

Lemma 7.15 Let M be a local square integrable martingale, and let X becadlag and adapted. Then Z(t) =

∫ t

0 X(s−)dM(s) is a local square inte-grable martingale.

Proof. There exist τ1 ≤ τ2 ≤ · · ·, τn → ∞, such that M τn = M(· ∧ τn)is a square integrable martingale. Define

γn = inf t : |X(t)| ∨ |X(t−)| ≥ n ,and note that limn→∞ γn = ∞. Then settingXn(t) = (X(t) ∧ n)∨(−n),

Z(t ∧ τn ∧ γn) =

∫ t∧γn

0Xn(s−)dM τn(s)

is a square integrable martingale, and hence Z is a local square inte-grable martingale.


Stochastic integrals are semimartingales

Lemma 7.16 If Y is a semimartingale and X is cadlag and adapted, then∫X−dY is a semimartingale.

Proof. Let Y = M + V , where the discontinuities of M are boundedby 1, and define∫ t

0X(s−)dY (s) =

∫ t

0X(s−)dM(s) +

∫ t

0X(s−)dV (s) = M(t) + V (t)

Then the first term on the right is a local square integrable martin-gale and the second term on the right is a finite variation process. Inparticular, Tt(V ) ≤

∫ t

0 |X(s−)|dTs(V ) and lettingτn = inft : |X(t)| ∨ |X(t−)| ≥ n or |M(t)| ∨ |M(t−)| ≥ n,

M τn(t) =

∫ t

0Xτn(s−)dM τn(s)

is a square integrable martingale.


Basic estimate for stochastic integrals

Lemma 7.17 Let Y = M + V be a semimartingale where M is a localsquare-integrable martingale and V is a finite variation process. Let σ bea stopping time for which E[[M ]t∧σ] = E[M(t ∧ σ)2] < ∞, and let τc =inft : |X(t)| ∨ |X(t−)| ≥ c. Then

Psups≤t

|∫ s

0

X(u−)dY (u)| > K

≤ Pσ ≤ t+ Psups<t

|X(s)| ≥ c+ P sups≤t∧σ∧τc

|∫ s

0

X(u−)dM(u)| > K/2

+P sups≤t∧τc

|∫ s

0

X(u−)dV (u)| > K/2


|X(s)| ≥ c+16

K2E[

∫ t∧σ∧τc

0

|X(s−)|2d[M ]s]

+P∫ t∧τc

0

|X(s−)|dTs(V ) > K/2


|X(s)| ≥ c+16c2E[[M ]t∧σ]

K2+ PTt(V ) ≥ (2c)−1K.


Proof. The first and third inequalities are immediate. The secondfollows by applying Doob’s inequality to the square integrable mar-tingale ∫ s∧σ∧τc

0X(u−)dM(u)

and by (7.10).


Approximation of stochastic integrals

Corollary 7.18 Suppose Y is a semimartingale X1, X2, X3, . . . are cadlagand adapted, and

limn→∞

supt≤T

|Xn(t)−X(t)| = 0 (7.11)

in probability for each T > 0. Then X is cadlag and adapted and

limn→∞

supt≤T

∣∣∣∣∫ t

0Xn(s−)dY (s)−

∫ t

0X(s−)dY (s)

∣∣∣∣ = 0

in probability.

Proof. In Lemma 7.17, M is a local, square-integrable martingaleso Pσ ≤ t can be made arbitrarily small. Replacing X by Xn −X , in the first inequality, the remaining terms converge to zero by(7.11) and various applications of the bounded and dominated con-vergence theorems.


Approximation with random partitions

Theorem 7.19 Let Y be a semimartingale and X be cadlag and adapted.For each n, let 0 = τn

0 ≤ τn1 ≤ τn

2 ≤ · · · be stopping times and suppose thatlimk→∞ τ

nk = ∞ and limn→∞ supk |τn

k+1 − τnk | = 0. Then for each T > 0

limn→∞

supt≤T

|S(t, τnk , X, Y )−

∫ t

0X(s−)dY (s)| = 0.

Proof. If Y is FV, then the proof is exactly the same as for Theorem 7.3(which is an ω by ω argument). If Y is a square integrable martingaleand X is bounded by a constant, then defining τn(u) = τn

k for τnk ≤

u < τnk+1,

E[(S(t, τnk , X, Y )−

∫ t

0

X(s−)dY (s))2]

= E[

∫ t

0

(X(τn(u−))−X(u−))2d[Y ]u]

and the result follows by the dominated convergence theorem.


Change of integrator

Lemma 7.20 Let Y be a semimartingale, and let X and U be cadlag andadapted. Suppose Z(t) =

∫ t

0 X(s−)dY (s) . Then∫ t

0U(s−)dZ(s) =

∫ t

0U(s−)X(s−)dY (s).


Proof. Let ti be a partition of [0,∞), and define t(s) = ti as ti ≤ s <ti+1, and note that s→ U(t(s)) is a cadlag, adapted process.∫ t

0

U(s−)dZ(s) = lim∑

U(ti)(Z(t ∧ ti+1)− Z(t ∧ ti))

= lim∑

U(ti ∧ t)∫ t∧ti+1

t∧ti

X(s−)dY (s)

= lim∑∫ t∧ti+1

t∧ti

U(ti ∧ t)X(s−)dY (s)

= lim∑∫ t∧ti+1

t∧ti

U(t(s−))X(s−)dY (s)

= lim

∫ t

0

U(t(s−))X(s−)dY (s)

=

∫ t

0

U(s−)X(s−)dY (s)

The last limit follows from the fact thatU(t(s−)) → U(s−) as max |ti+1−ti| → 0 by splitting the integral into martingale and finite variationparts and arguing as in the proofs of Theorems 7.3 and 7.4.


ExampleLet τ be a stopping time. Then U(t) = 1[0,τ)(t) is cadlag and adaptedand

Y τ(t) = Y (t ∧ τ) =

∫ t

01[0,τ)(s−)dY (s)

and ∫ t∧τ

0X(s−)dY (s) =

∫ t

01[0,τ)(s−)X(s−)dY (s)

=

∫ t

0X(s−)dY τ(s).


Localization

Let τ be a stopping time, and define Y τ by Y τ(t) = Y (τ ∧ t) and Xτ−by setting Xτ−(t) = X(t) for t < τ and Xτ−(t) = X(τ−) for t ≥ τ .

If Y is a local martingale, then Y τ is a local martingale.

If X is cadlag and adapted, then Xτ− is cadlag and adapted.

If τ = inft : X(t) ∨X(t−) ≥ c, then Xτ− ≤ c. Note that

S(t ∧ τ, ti, X, Y ) = S(t, ti, Xτ−, Y τ). (7.12)

Lemma 7.21 If Y is a semimartingale, X is cadlag and adapted, and τ is astopping time, then∫ t∧τ

0X(s−)dY (s) =

∫ t

0Xτ−(s−)dY τ(s).


Approximation by bounded semimartingales

Lemma 7.22 Let Y = M + V be a semimartingale, and assume (withoutloss of generality) that sups |∆M(s)| ≤ δ. Let

A(t) = sups≤t

(|M(s)|+ |V (s)|+ [M ]s + Ts(V ))

andσδ = inft : A(t) ≥ δ,

and defineM δ ≡Mσδ , V δ ≡ V σδ−, and Y δ ≡M δ+V δ. Then Y δ(t) = Y (t)for t < σδ, limδ→∞ σδ = ∞, |Y δ| ≤ 2δ, sups |∆Y δ(s)| ≤ 3δ, [M δ]t ≤δ + δ2, Tt(V

δ) ≤ δ.


Change of time variable

∫ t

0X(s−)dY (s) = lim

∑X(ti)(Y (t ∧ ti+1)− Y (t ∧ ti)),

where the ti are a partition of [0,∞). By Theorem 7.19, the same limitholds if we replace the ti by stopping times. The following lemma isa consequence of this observation.

Lemma 7.23 Let Y be an Ft-semimartingale, X be cadlag and Ft-adapted, and γ be continuous and nondecreasing with γ(0) = 0. For eachu, assume γ(u) is an Ft-stopping time. Then, Gt = Fγ(t) is a filtration,Y γ is a Gt semimartingale, X γ is cadlag and Gt-adapted, and∫ γ(t)

0X(s−)dY (s) =

∫ t

0X γ(s−)dY γ(s). (7.13)

(Recall that if X is Ft-adapted, then X(τ) is Fτ measurable).


Proof. ∫ t

0X γ(s−)dY γ(s)

= lim ΣX γ(ti)(Y (γ(ti+1 ∧ t))− Y (γ(ti ∧ t)))= lim ΣX γ(ti)(Y (γ(ti+1) ∧ γ(t))− Y (γ(ti) ∧ γ(t)))

=

∫ γ(t)

0X(s−)dY (s),

where the last limit follows by Theorem 7.19. That Y γ is an Fγ(t)-semimartingale follows from the optional sampling theorem.


Defining a time change

Lemma 7.24 Let A be strictly increasing and adapted with A(0) = 0 andγ(u) = infs : A(s) > u. Then γ is continuous and nondecreasing, andγ(u) is an Ft-stopping time.

Proof.γ(u) ≤ t = A(t) ≥ u ∈ Ft


For A and γ as in Lemma 7.24, define B(t) = A γ(t) and note thatB(t) ≥ t.

Lemma 7.25 Let A, γ, and B be as above, and suppose that Z(t) is nonde-creasing with Z(0) = 0. Then∫ γ(t)

0Z(s−)dA(s) =

∫ t

0Z γ(s−)dA γ(s)

=

∫ t

0Z γ(s−)d(B(s)− s) +

∫ t

0Z γ(s−)ds

= Z γ(t)(B(t)− t)−∫ t

0(B(s)− s)dZ γ(s)

−[B,Z γ]t +

∫ t

0Z γ(s)ds

and hence∫ γ(t)

0Z(s−)dA(s) ≤ Z γ(t−)(B(t)− t) +

∫ t

0Z γ(s)ds.


Connection to Protter’s text

The approach to stochastic integration taken here differs somewhatfrom that taken in Protter [3] in that we assume that all integrandsare cadlag and do not introduce the notion of predictability. In fact,however, predictability is simply hidden from view and is revealedin the requirement that the integrands are evaluated at the left endpoints in the definition of the approximating partial sums. If X is acadlag integrand in our definition, then the left continuous processX(·−) is the predictable integrand in the usual theory. Consequently,our notation

∫X−dY and ∫ t

0X(s−)dY (s)

emphasizes this connection.


The definition in Protter

Protter [3] defines H(t) to be simple and predictable if

H(t) =m∑

i=0

ξi1(τi,τi+1](t),

where τ0 < τ1 < · · · are Ft-stopping times and the ξi are Fτimea-

surable. Note that H is left continuous. Protter defines H · Y by

H · Y (t) =∑

ξi (Y (τi+1 ∧ t)− Y (τi ∧ t)) .


DefiningX(t) =

∑ξi1[τi,τi+1)(t),

H(t) = X(t−) and

H · Y (t) =

∫ t

0X(s−)dY (s),

so the definitions of the stochastic integral are consistent for simplefunctions. Protter extends the definition H · Y by continuity, andPropositon 7.18 ensures that the definitions are consistent for all Hsatisfying H(t) = X(t−), where X is cadlag and adapted.


A metric on cadlag, adapted processes

Let DR[0, t] denote the space of cadlag, adapted real-valued processeson [0, t]. For Z1, Z2 ∈ DR[0, t], define

ρt(Z1, Z2) = infε > 0 : Psups≤t

|Z1(s)− Z2(s)| > ε < ε.

Lemma 7.26 ρt is a metric on DR[0, t] and (DR[0, t], ρt) is complete.

Note that∫∞

0 e−tρt(Z1, Z2)dt defines a metric on DR[0,∞).


A metric on caglad, adapted processes

By Lemma 7.17, if σ is a stopping time

Psups≤t

|∫ s

0X1(u−)dY (u)−

∫ s

0X2(u−)dY (u)| > δ (7.14)

≤ Pσ ≤ t+4

δ

√E[

∫ t∧σ

0|X1(s−)−X2(s−)|2d[M ]s]

+P∫ t

0|X1(s−)−X2(s−)|dTs(V ) > δ/2

Let Γt(X1, X2) be the collection of δ > 0 such that there exists a stop-ping time σ for which the right side of (7.14) is less than δ. DefinedM,V

t (X1, X2) = inf Γt(X1, X2), and let D−R[0, t] denote the space of

adapted processes that are left continuous with right limits at everys ∈ [0, t].


The stochastic integral as a continuous mapping

Lemma 7.27 dM,Vt (X1, X2) is a metric on D−

R[0, t] and the mapping X ∈D−

R[0, t] →∫XdY ∈ DR[0, t] is uniformly continuous and hence has a

unique extension to the completion of (D−R[0, t], dM,V

t ).


8. Covariation and Ito’s formula

• Covariation

• Properties of cadlag functions

• Computing covariation

• Covariation of stochastic integrals

• Ito’s formula

• Integration by parts

• Kronecker’s lemma


Quadratic covariation

The covariation of Y1, Y2 is defined by

[Y1, Y2]t ≡ lim∑

i

(Y1(ti+1 ∧ t)− Y1(ti ∧ t)) (Y2(ti+1 ∧ t)− Y2(ti ∧ t))

(8.1)where the ti are partitions of [0,∞) and the limit is in probabilityas max |ti+1 − ti| → 0.

Note that

[Y1 + Y2, Y1 + Y2]t = [Y1]t + 2[Y1, Y2]t + [Y2]t.

Lemma 8.1 If Y1, Y2, are semimartingales, then [Y1, Y2]t exists and

[Y1, Y2]t = Y1(t)Y2(t)−Y1(0)Y2(0)−∫ t

0Y1(s−)dY2(s)−

∫ t

0Y2(s−)dY1(s)


Proof.

[Y1, Y2]t = lim∑

i

(Y1(ti+1 ∧ t)− Y1(ti ∧ t)) (Y2(ti+1 ∧ t)− Y2(ti ∧ t))

= lim (∑

(Y1(ti+1 ∧ t)Y2(ti+1 ∧ t)− Y1(ti ∧ t)Y2(ti ∧ t))

−∑

Y1(ti ∧ t)(Y2(ti+1 ∧ t)− Y2(ti ∧ t))

−∑

Y2(ti ∧ t)(Y1(ti+1 ∧ t)− Y1(ti ∧ t)))

= Y1(t)Y2(t)− Y1(0)Y2(0)−∫ t

0Y1(s−)dY2(s)−

∫ t

0Y2(s−)dY1(s).


Computation of covariation

Lemma 8.3 Let Y be a finite variation process and X be cadlag. Then

[X, Y ]t =∑s≤t

∆X(s)∆Y (s).

Remark 8.4 Note that this sum will be zero if X and Y have no simulta-neous jumps. In particular, if either X or Y is a finite variation process andeither X or Y is continuous, then [X, Y ] = 0.


Proof. In [0, t],X has only finitely many discontinuities with |∆X(s)| ≥ε.


∑(X(ti+1)−X(ti))(Y (ti+1)− Y (ti))

= limmax |ti+1−ti|→0

∑(ti,ti+1]∩DX

ε (t)6=∅

(X(ti+1)−X(ti))(Y (ti+1)− Y (ti))

+ limmax |ti+1−ti|→0

∑(ti,ti+1]∩DX

ε (t)=∅

(X(ti+1)−X(ti))(Y (ti+1)− Y (ti)),

where the first term on the right converges to∑s∈DX

ε (t)

∆X(s)∆Y (s)

and the second term on the right is bounded by

ε∑

|Y (ti+1)− Y (ti)| ≤ εTt(Y ).


Estimating quadratic variation

Since∑aibi ≤

√∑a2

i

∑b2i it follows that [X,Y ]t ≤

√[X]t[Y ]t. From

[X − Y ]t = [X]t − 2[X, Y ]t + [Y ]t

[X − Y ]t + 2 ([X, Y ]t − [Y ]t) = [X]t − [Y ]t

[X − Y ]t + 2[X − Y, Y ]t = [X]t − [Y ]t,

it follows that

|[X]t − [Y ]t| ≤ [X − Y ]t + 2√

[X − Y ]t[Y ]t. (8.2)

Assuming that [Y ]t < ∞, we have that [X − Y ]t → 0 implies [X]t →[Y ]t.


Continuous dependence: Martingales

Lemma 8.5 Let Mn, n = 1, 2, 3, . . ., be square-integrable martingales withlimn→∞E[(Mn(t)−M(t))2] = 0 for all t. Then E [|[Mn]t − [M ]t|] → 0.

Proof. Since

E [|[Mn]t − [M ]t|] ≤ E [[Mn −M ]t] + 2E[√

[Mn −M ]t[M ]t

]≤ E [[Mn −M ]t] + 2

√E [[Mn −M ]t]E [[M ]t],

we have the L1 convergence of the quadratic variation.


Continuous dependence: Semimartingales

Lemma 8.7 Let Yi = Mi + Vi, Y ni = Mn

i + V ni , i = 1, 2, n = 1, 2, . . . be

semimartingales withMni a local square integrable martingale and V n

i finitevariation. Suppose that there exist stopping times γk such that γk →∞ ask →∞ and for each t ≥ 0,

limn→∞

E[(Mni (t ∧ γk)−Mi(t ∧ γk))

2] = 0,

supi,n Tt(Vni ) <∞, and

limn→∞

sups≤t

|V ni (s)− Vi(s)| = 0.

Then [Y n1 , Y

n2 ]t → [Y1, Y2]t.

Proof. The result follows from Lemmas 8.5 and 8.6 by writing

[Y n1 , Y

n2 ]t = [Mn

1 ,Mn2 ]t + [Mn

1 , Vn2 ]t + [V n

1 , Yn2 ]t.


Covariation of stochastic integrals

Lemma 8.8 Let Yi be a semimartingale, Xi be cadlag and adapted, and

Zi(t) =

∫ t

0Xi(s−)dYi(s) i = 1, 2.

Then,

[Z1, Z2]t =

∫ t

0X1(s−)X2(s−)d[Y1, Y2]s


Proof. First verify the identity for piecewise constant Xi. Then ap-proximate generalXi by piecewise constant processes and use Lemma8.7 to pass to the limit.

Let τ ε0 = 0 and

τ εk+1 = inft > τ ε

k : |X1(t)−X1(τεk)| ∨ |X1(t−)−X1(τ

εk)|

∨|X2(t)−X2(τεk)| ∨ |X2(t−)−X2(τ

εk)| ≥ ε

and Xεi (t) =

∑∞k=0Xi(τ

εk)1[τ ε

k,τ εk+1)(t). Then supt |Xε

i (t)−Xi(t)| ≤ ε. Xεi

is cadlag and adapted by Problem 4.

Define Zεi =

∫ t

0 Xεi−dYi. Then

[Zε1, Z

ε2]t =

∫ t

0Xε

1(s−)Xε2(s−)d[Y1, Y2]s

by direct calculation.


Let Yi = Mi + Vi. Assume that M1 and M2 are square integrablemartingales (otherwise localize) and that X1 and X2 are bounded bya constant (otherwise truncate). Then

supr≤t

|∫ r

0Xi(s−)dVi(s)−

∫ r

0Xε

i (s−)dVi(s)|

≤∫ t

0|Xi(s−)−Xε

i (s−)|dTs(Vi) → 0

and applying the L2-isometry

E[

(∫ t

0Xi(s−)dMi(s)−

∫ t

0Xε

i (s−)dMi(s)

)2

]

= E[

∫ t

0(Xi(s−)−Xε

i (s−))2d[Mi]s] → 0

verifying the conditions of Lemma 8.7.


Integrals against quadratic variations

Lemma 8.9 Let X be cadlag and adapted and Y be a semimartingale. Then


∑X(ti)(Y (ti+1∧t)−Y (ti∧t))2 =

∫ t

0X(s−)d[Y ]s. (8.3)

Proof. Observing that

(Y (ti+1∧t)−Y (ti∧t))2 = Y 2(ti+1∧t)−Y 2(ti∧t)−2Y (ti)(Y (ti+1∧t)−Y (ti∧t))

and applying Lemma 7.20, the left side of (8.3) equals∫ t

0X(s−)dY 2(s)−

∫ t

02X(s−)Y (s−)dY (s).

Since [Y ]t = Y 2(t)− Y 2(0)−∫ t

0 2Y (s−)dY (s), the lemma follows.


Ito’s formula

Theorem 8.10 Let f ∈ C2, and let Y be a semimartingale. Then

f (Y (t)) = f (Y (0)) +

∫ t

0f ′ (Y (s−)) dY (s) (8.4)

+

∫ t

0

1

2f ′′ (Y (s−)) d[Y ]s

+∑s≤t

(f (Y (s))− f (Y (s−))− f ′ (Y (s−)) ∆Y (s)

−1

2f ′′ (Y (s−)) (∆Y (s))2 ).


Continuous part of quadratic variation

The discontinuities in [Y ]s satisfy ∆[Y ]s = ∆Y (s)2, so defining thecontinuous part of the quadratic variation by

[Y ]ct = [Y ]t −∑s≤t

∆Y (s)2,

(8.4) becomes

f (Y (t)) = f (Y (0)) +

∫ t

0f ′ (Y (s−)) dY (s) (8.5)

+

∫ t

0

1

2f ′′ (Y (s−)) d[Y ]cs

+∑s≤t

(f (Y (s))− f (Y (s−))− f ′ (Y (s−)) ∆Y (s))


Proof.[of Theorem 8.10] Define

Γf(x, y) =f(y)− f(x)− f ′(x)(y − x)− 1

2f′′(x)(y − x)2

(y − x)2

Then

f (Y (t)) = f (Y (0)) +∑

f (Y (ti+1))− f (Y (ti)) (8.6)

= f (Y (0)) +∑

f ′ (Y (ti)) (Y (ti+1)− Y (ti))

+1

2

∑f ′′ (Y (ti)) (Y (ti+1)− Y (ti))

2

+∑

Γf(Y (ti), Y (ti+1)) (Y (ti+1)− Y (ti))2 .

The first three terms converge by previous lemmas.


γf(ε) ≡ sup|x−y|≤ε Γf(x, y) is a continuous function of ε and limε→0 γf(ε) =0. Let DY

ε (t) = s ≤ t : |Y (s)− Y (s−)| ≥ ε. Then∑Γf(Y (ti), Y (ti+1)) (Y (ti+1)− Y (ti))

2

=∑

(ti,ti+1]∩DYε (t)6=∅

Γf (Y (ti), Y (ti+1)) (Y (ti+1)− Y (ti))2

+∑

(ti,ti+1]∩DYε (t)=∅

Γf (Y (ti), Y (ti+1)) (Y (ti+1)− Y (ti))2 ,

where the second term on the right is bounded by

e(ti, Y ) ≡ γf

(max

(ti,ti+1]∩DYε (t)=∅

|Y (ti+1)− Y (ti)|)∑

(Y (ti+1)− Y (ti))2

andlim sup

max |ti+1−ti|→0e(ti, Y ) ≤ γf(ε)[Y ]t.


The product rule and integration by parts

Let X and Y be semimartingales. Then

X(t)Y (t) = X(0)Y (0) +∑

(X(ti+1)Y (ti+1)−X(ti)Y (ti))

= X(0)Y (0) +∑

X(ti) (Y (ti+1)− Y (ti))

+∑

Y (ti) (X(ti+1)−X(ti))

+∑

(Y (ti+1)− Y (ti)) (X(ti+1)−X(ti))

= X(0)Y (0) +

∫ t

0X(s−)dY (s) +

∫ t

0Y (s−)dX(s) + [X, Y ]t.

Note that this identity generalizes the usual product rule and pro-vides us with a formula for integration by parts.∫ t

0X(s−)dY (s) = X(t)Y (t)−X(0)Y (0)−

∫ t

0Y (s−)dX(s)− [X, Y ]t.

(8.7)


Solution of a stochastic differential equation

As an application of (8.7), consider the stochastic differential equa-tion dX = −αXdt+ dY or in integrated form,

X(t) = X(0)−∫ t

0αX(s)ds+ Y (t).

Using the integrating factor eαt.

eαtX(t) = X(0) +

∫ t

0eαtdX(s) +

∫ t

0X(s−)deαs

= X(0)−∫ t

0αX(s)eαsds+

∫ t

o

eαsdY (s) +

∫ t

0X(s)αeαsds

which gives

X(t) = e−αt(X(0)e−αt +

∫ t

0eαsdY (s)) = X(0)e−αt +

∫ t

0e−α(t−s)dY (s).


Kronecker’s lemma

Lemma 8.11 Let A be positive and nondecreasing, and limt→∞A(t) = ∞.Define

Z(t) =

∫ t

0

1

A(s−)dY (s).

If limt→∞ Z(t) exists a.s., then limt→∞Y (t)A(t) = 0 a.s.

Note that∫ t

0A(s−)dZ(s =

∫ t

0A(s−)

1

A(s−)dY (s) = Y (t)− Y (0)


Proof. By (8.7)

A(t)Z(t) = Y (t)− Y (0) +

∫ t

0Z(s−)dA(s) +

∫ t

0

1

A(s−)d[Y,A]s . (8.8)

Rearranging (8.8) gives

Y (t)

A(t)= Z(t)− 1

A(t)

∫ t

0Z(s−)dA(s) +

1

A(t)

∑s≤t

∆Y (s)

A(s−)∆A(s) +

Y (0)

A(t).

The difference between the first and second terms on the right con-verges to zero. Convergence of Z implies limt→∞

∆Y (t)A(t−) = 0, so the

third term on the right converges to zero giving the result.


A law of large numbers for martingales

Proposition 8.12 Let M be a local square integrable martingale. Supposethat there exists a constant C such that

supt

|M(t)−M(t−)|t+ 1

≤ C a.s.

and ∫ ∞

0

1

(s+ 1)2d[M ]s <∞ a.s.

Thenlimt→∞

M(t)

t+ 1= 0 a.s.

Proof. Let

τc = inft :

∫ t

0

1

(s+ 1)2d[M ]s ≥ c.


Then

Zc(t) =

∫ t∧τc

0

1

1 + sdM(s)

is a square integrable martingale with

E[Zc(t)2] = E[

∫ t∧τc

0

1

(1 + s)2d[M ]s] ≤ c+ C2,

and hence limt→∞ Zc(t) exists a.s. By Lemma 8.11,

limt→∞

M(t ∧ τc)t+ 1

= 0.

Since limc→∞ Pτc = ∞ = 1, the proposition follows.


Vector-valued semimartingales

An Rm-valued process Y = (Y1, . . . , Ym)T is a semimartingale if eachcomponent is a R-valued semimartingale for a fixed filtration Ft.

If X is a cadlag, Ft-adapted, Md×m-valued process, then∫ t

0X−dY = lim

∑X(ti)(Y (t ∧ ti+1)− Y (t ∧ ti)).

In particular, if Z =∫ t

0 X−dY , then

Zk(t) =m∑

l=1

∫ t

0Xkl(s−)dYl(s).


Ito’s formula for vector-valued semimartingales

Theorem 8.13 Let Y (t) = (Y1(t), Y2(t), ...Ym(t))T (a column vector) be anRm-valued semimartingale (that is, each component is a semimartingale).Let f ∈ C2(Rm). Then

f (Y (t)) = f (Y (0)) +m∑

k=1

∫ t

0∂kf (Y (s−)) dYk(s)

+m∑

k,l=1

1

2

∫ t

0∂k∂lf (Y (s−)) d[Yk, Yl]s

+∑s≤t

(f (Y (s))− f (Y (s−))−m∑

k=1

∂kf (Y (s−)) ∆Yk(s)

−m∑

k,l=1

1

2∂k∂lf (Y (s−)) ∆Yk(s)∆Yl(s)),


Defining[Yk, Yl]

ct = [Yk, Yl]t −

∑s≤t

∆Yk(s)∆Yl(s), (8.9)

f (Y (t)) = f (Y (0)) +m∑

k=1

∫ t

0∂kf (Y (s−)) dYk(s) (8.10)

+m∑

k,l=1

1

2

∫ t

0∂k∂lf (Y (s−)) d[Yk, Yl]

cs

+∑s≤t

(f (Y (s))− f (Y (s−))−m∑

k=1

∂kf (Y (s−)) ∆Yk(s)).


9. Stochastic differential equations

• Some examples

• Gronwall inequality

• Uniqueness

• Local existence

• Gronwall inequality for SDEs

• Euler approximation

• Existence

• Moment estimates


Examples

Consider the Ito equation

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds. (9.1)

Define Y (t) = (W (t), t)T and F (X) = (σ(X), b(X)). Then

X(t) = X(0) +

∫ t

0F (X(s−))dY (s) . (9.2)

Consider the stochastic difference equation

Xn+1 = Xn + σ(Xn)ξn+1 + b(Xn)h (9.3)

where the ξi are iid and h > 0. Define Yh(t) =∑[t/h]

k=1 ξk, Y2(t) = [t/h]h,and Xh(t) = X[t/h]. Then

Xh(t) = Xh(0) +

∫ t

0F (Xh(s−))dYh(s).


Stochastic differential equationLet Y be an Rm-valued semimartingale, F : R → Md×m, and U aRd-valued, cadlag, adapted process. X is a solution of

X(t) = U(t) +

∫ t

0F (X(s−))dY (s) (9.4)

if X is adapted and (9.4) holds a.s.


Gronwall inequality

Lemma 9.1 Suppose that A is cadlag and non-decreasing, X is cadlag, andthat

0 ≤ X(t) ≤ ε+

∫ t

0X(s−)dA(s) . (9.5)

ThenX(t) ≤ εeA(t).


Proof. Iterating (9.5),

X(t) ≤ ε+

∫ t

0

X(s−)dA(s)

≤ ε+ εA(t) +

∫ t

0

∫ s−

0

X(u−)dA(u)dA(s)

≤ ε+ εA(t) + ε

∫ t

0

A(s−)dA(s) +

∫ t

0

∫ s−

0

∫ u−

0

X(σ−)dA(σ)dA(u)dA(s)

Since A is finite variation, making [A]ct ≡ 0, Ito’s formula yields

eA(t) = 1 +

∫ t

0

eA(s−)dA(s) + Σs≤t(eA(s) − eA(s−) − eA(s−)∆A(s))

≥ 1 +

∫ t

0

eA(s−)dA(s)

≥ 1 + A(t) +

∫ t

0

∫ s−

0

eA(u−)dA(u)dA(s)

≥ 1 + A(t) +

∫ t

0

A(s−)dA(s) +

∫ t

0

∫ s−

0

∫ u−

0

eA(v−)dA(v)dA(u)dA(s) .

Continuing the iteration, we see that X(t) ≤ εeA(t).


Uniqueness for ODEs

Theorem 9.2 Consider the ordinary differential equation in Rd

X =dX

dt= F (X)

or in integrated form,

X(t) = X(0) +

∫ t

0F (X(s))ds. (9.6)

Suppose F is Lipschitz, that is, |F (x)−F (y)| ≤ L|x−y| for some constantL. Then for each x0 ∈ Rd, there exists a unique solution of (9.6) withX(0) = x0.


Proof. Suppose Xi(t) = Xi(0) +∫ t

0 F (Xi(s))ds, i = 1, 2

|X1(t)−X2(t)| ≤ |X1(0)−X2(0)|+∫ t

0|F (X1(s))− F (X2(s))|ds

≤ |X1(0)−X2(0)|+∫ t

0L|X1(s)−X2(s)|ds

By Gronwall’s inequality (take A(t) = Lt)

|X1(t)−X2(t)| ≤ |X1(0)−X2(0)|etL.

Hence, if X1(0) = X2(0), then X1(t) ≡ X2(t).


An inequality for stochastic integrals

Lemma 9.3 Let Y be a semimartingale, X be a cadlag, adapted process,and τ be a finite stopping time. Then for any stopping time σ for whichE[[M ](τ+t)∧σ] <∞,

Psups≤t

|∫ τ+s

τ

X(u−)dY (u)| > K (9.7)

≤ Pσ ≤ τ + t+ P supτ≤s<τ+t

|X(s)| > c

+16c2E[[M ](τ+t)∧σ − [M ]τ∧σ]

K2

+PTτ+t(V )− Tτ(V ) ≥ (2c)−1K.

Proof. The proof is the same as for Lemma 7.17.


Uniqueness of solutions of SDEsWe consider stochastic differential equations of the form

X(t) = U(t) +

∫ t

0F (X(s−))dY (s). (9.8)

where Y is an Rm-valued semimartingale, U is a cadlag, adapted Rd-valued process, and F : Rd → Md×m.

Theorem 9.4 Suppose that there exists L > 0 such that

|F (x)− F (y)| ≤ L|x− y|.

Then there is at most one solution of (9.8).


General nonanticipating equations

Remark 9.5 One can treat more general equations of the form

X(t) = U(t) +

∫ t

0F (X, s−)dY (s) (9.9)

where F : DRd[0,∞) → DMd×m[0,∞) and satisfies

sups≤t

|F (x, s)− F (y, s)| ≤ L sups≤t

|x(s)− y(s)| (9.10)

for all x, y ∈ DRd[0,∞) and t ≥ 0. Note that, defining xt by xt(s) = x(s ∧t), (9.10) implies that F is nonanticipating in the sense that F (x, t) =F (xt, t) for all x ∈ DRd[0,∞) and all t ≥ 0.


Proof. By Lemma 9.3, for each stopping time τ ≤ T a.s. and t, δ > 0,there exists a constant Kτ(t, δ) such that

Psups≤t

|∫ τ+s

τ

X(u−)dY (u)| ≥ Kτ(t, δ) ≤ δ

for all cadlag, adaptedX satisfying |X| ≤ 1. (Take c = 1 in (9.7).) Fur-thermore, Kτ can be chosen so that for each δ > 0, limt→0Kτ(t, δ) = 0.

SupposeX and X satisfy (9.8). Let τ0 = inft : |X(t)−X(t)| > 0, andsuppose Pτ0 < ∞ > 0. Select r, δ, t > 0, such that Pτ0 < r > δ

and LKτ0∧r(t, δ) < 1. Note that if τ0 <∞, then

X(τ0)− X0(τ0) =

∫ τ0

0(F (X(s−))− F (X(s−))dY (s) = 0. (9.11)


Since the right side does not depend on ε and limε→0 τε = τ0, it followsthat Pτ0−τ0∧r < t ≤ δ and hence that Pτ0 < r ≤ δ, contradictingthe assumption on δ and proving that τ0 = ∞ a.s.


Approximation by bounded semimartingales

Lemma 9.6 Let Y = M + V be a semimartingale with M(0) = V (0) = 0,and assume (without loss of generality) that sups |∆M(s)| ≤ δ. Let

A(t) = sups≤t

(|M(s)|+ |V (s)|+ [M ]s + Ts(V ))

andσδ = inft : A(t) ≥ δ,

and define M δ ≡ Mσδ , V δ ≡ V σδ−, and Y δ ≡ M δ + V δ. Then Y δ(t) =Y (t) for t < σδ, limδ→∞ σδ = ∞, |Y δ| ≤ 2δ, sups |∆Y δ(s)| ≤ 2δ,sups |∆V δ(s)|, [M δ]t ≤ δ + δ2, Tt(V

δ) ≤ δ.


Local existence of solutions

Let Y δ be defined as in Lemma 9.6, and recall

σδ = inft : A(t) ≥ δ,

whereA(t) = sup

s≤t(|M(s)|+ |V (s)|+ [M ]s + Ts(V )).

Suppose

Xδ(t) = U(t) +

∫ t

0F (Xδ(s−))dY δ(s)

Then since Y δ(t) = Y (t) for t < σδ, X(t) ≡ Xδ(t), t < σδ, is a solutionof (9.8) on the interval [0, σδ). Defining

X(σδ) = U(σδ) +

∫ σδ

0F (Xδ(s−))dY (s),

The solution extends to [0, σδ].


Recursively, defining σ1δ = σδ and

σk+1δ = inft > σk

δ : A(t)− A(σkδ ) ≥ δ,

define Y δk as Y δ with Y replaced by Yk(t) = Y (σk

δ +t)−Y (σkδ ). Assume

that X satisfies (9.8) for t ∈ [0, σkδ ] and Xk satisfies

Xk(t) = X(σkδ ) + U(σk

δ + t)− U(σkδ ) +

∫ t

0F (Xk(s−))dY δ

k (s).

Extend X by defining X(t) = Xk(t− σkδ ), σ

kδ < t < σk+1

δ and

X(σk+1δ ) = U(σk+1

δ ) +

∫ σk+1δ

0F (X(s−))dY (s).


A Gronwall inequality for SDEsLet Y be an Rm-valued semimartingale, and let F : Rd → Md×m sat-isfy |F (x)−F (y)| ≤ L|x−y|. For i = 1, 2, let Ui be cadlag and adaptedand let Xi satisfy

Xi(t) = Ui(t) +

∫ t

0F (Xi(s−))dY (s). (9.12)

Lemma 9.7 Let d = m = 1. Suppose that Y = M + V where M is a localsquare-integrable martingale, V is a finite variation process, and M(0) =V (0) = 0. Suppose that there exist δ > 0 such that supt |∆M(t)| ≤ δ,supt |∆V (t)| ≤ 2δ and Tt(V ) ≤ δ, and that c(δ) ≡ (1− 18L2δ2) > 0. Let

A0(t) = 12L2[M ]t + 3L2δTt(V ) + t, (9.13)

and define γ(u) = inft : A0(t) > u. Then

E[ sups≤γ(u)

|X1(s)−X2(s)|2] ≤3

c(δ)e

uc(δ)E[ sup

s≤γ(u)|U1(s)− U2(s)|2]. (9.14)


Since 0 ≤ A0 γ(u)− u ≤ supt ∆A0(t) ≤ 18L2δ2, (9.17) implies

c(δ)E[Z γ(u)] ≤ 3E[ sups≤γ(u)

|U1(s)− U2(s)|2] +

∫ u

0E[Z γ(s−)]ds,

and (9.14) follows by Gronwall’s inequality.

Note that the above calculation is valid only if the expectations onthe right of (9.15) and (9.16) are finite. This potential problem canbe eliminated by defining τK = inft : |X1(t) − X2(t)| ≥ K andreplacing γ(u) by γ(u) ∧ τK . Observing that |X1(s−) −X2(s−)| ≤ K

for s ≤ τK , the estimates in (9.17) imply (9.14) with γ(u) replaced byγ(u) ∧ τK . Letting K →∞ gives (9.14) as originally stated.


Euler approximationUnder the hypotheses of Lemma 9.7, consider the following approx-imation: Xn(0) = U(0) and

Xn(k + 1

n) = Xn(

k

n) + U(

k + 1

n)− U(

k

n) + F (X(

k

n))(Y (

k + 1

n)− Y (

k

n)).

Let ηn(t) = kn for k

n ≤ t < k+1n . Extend Xn to all t ≥ 0 by setting

Xn(t) = U(t) +

∫ t

0F (Xn ηn(s−))dY (s) .

Adding and subtracting the same term yields

Xn(t) = U(t) +

∫ t

0(F (Xn ηn(s−))− F (Xn(s−)))dY (s)

+

∫ t

0F (Xn(s−))dY (s)

≡ U(t) +Dn(t) +

∫ t

0F (Xn(s−))dY (s).


Convergence of Euler approximation

Lemma 9.8 Assume that the conditions of Lemma 9.7 hold. Then Xn isa Cauchy sequence in DR[0,∞) and converges in probability to a solutionof

X(t) = U(t) +

∫ t

0F (X(s−))dY (s) .

Proof. By (9.14)

E[ sups≤γ(u)

|Xn(s)−Xm(s)|2] ≤ 3

c(δ)e

uc(δ)E[ sup

s≤γ(u)|Dn(s)−Dm(s)|2].


Existence theorem

The localization argument gives the following theorem.

Theorem 9.9 Let Y be an Rm-valued semimartingale, U a cadlag and adapted,Rd-valued process, and F : Rd → Md×m be bounded and satisfy |F (x) −F (y)| ≤ L|x− y|. Then there exists a unique solution of

X(t) = U(t) +

∫ t

0F (X(s−))dY (s). (9.18)


Boundedness condition

For general Lipschitz F , the theorem implies existence and unique-ness up to τk = inft : |F (x(s)| ≥ k (replace F by a bounded func-tion that agrees with F on the set x : |F (x)| ≤ k). The global ex-istence question becomes whether or not limk τk = ∞? F is locallyLispchitz if for each k > 0, there exists an Lk, such that

|F (x)− F (y)| ≤ Lk|x− y| ∀|x| ≤ k, |y| ≤ k.

Note that if F is locally Lipschitz, and ρk is a smooth nonnegativefunction satisfying ρk(x) = 1 when |x| ≤ k and ρk(x) = 0 when |x| ≥k + 1, then Fk(x) = ρk(x)F (x) is globally Lipschitz and bounded.


Moment estimatesConsider the scalar Ito equation

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds.

Then by Ito’s formula and Lemma 7.20,

X(t)2 = X(0)2 +

∫ t

02X(s)σ(X(s))dW (s)

+

∫ t

02X(s)b(X(s))ds+

∫ t

0σ2(X(s))ds .

Define τk = inft : |X(t)| ≥ k. Then

|X(t ∧ τk)|2 = |X(0)|2 +

∫ t∧τk

02X(s)σ(X(s))dW (s)

+

∫ t∧τk

0(2X(s)b(X(s)) + σ2(X(s)))ds .


Since∫ t∧τk

02X(s)σ(X(s))dW (s) =

∫ t

01[0,τk)2X(s)σ(X(s))dW (s)

has a bounded integrand, the integral is a martingale. Therefore,

E[|X(t ∧ τk)|2] = E[|X(0)|2] +

∫ t

0E[1[0,τk)(2X(s)b(X(s)) + σ2(X(s)))]ds .

Assume (2xb(x) + σ2(x)) ≤ K1 +K2|x|2 for some Ki > 0. (Note thatthis assumption holds if both b(x) and σ(x) are globally Lipschitz.)Then

mk(t) ≡ E[|X(t ∧ τk)|2]

= E|X(0)|2 +

∫ t

0E1[0,τk)[2X(s)b(X(s)) + σ2(X(s))]ds

≤ m0 +K1t+

∫ t

0mk(s)K2ds ≤ (m0 +K1t)e

K2t.


Note that

|X(t ∧ τk)|2 = (1τk>t|X(t)|+ 1τk≤t|X(τk)|)2,

and we have

k2P (τk ≤ t) ≤ E(|X(t ∧ τk)|2) ≤ (m0 +K1t)eK2t.

Consequently, as k → ∞, P (τk ≤ t) → 0 and X(t ∧ τk) → X(t). ByFatou’s Lemma, E|X(t)|2 ≤ (m0 +K1t)e

K2t.


Uniformly bounded moments

Suppose 2xb(x) + σ2(x) ≤ K1 − ε|x|2. (For example, consider theequation X(t) = X(0)−

∫ t

0 αX(s)ds+W (t).) Then

eεt|X(t)|2 ≤ |X(0)|2 +

∫ t

0

eεs2X(s)σ(X(s))dW (s)

+

∫ t

0

eεs[2X(s)b(X(s)) + σ2(X(s))]ds+

∫ t

0

εeεs|X(s)|2ds

≤ |X(0)|2 +

∫ t

0

eεs2X(s)σ(X(s))dW (s) +

∫ t

0

eεsK1ds

≤ |X(0)|2 +

∫ t

0

eεs2X(s)σ(X(s))dW (s) +K1

2(eεt − 1),

and hence

eεtE[|X(t)|2] ≤ E[|X(0)|2] +K1

ε[eεt − 1].

E[|X(t)|2]] ≤ e−εtE[|X(0)|2] +K1

ε(1− e−εt).


Vector case

Assume, X(t) = X(0)+∫ t

0 σ(X(s))dW (s)+∫ t

0 b(X(s))ds, where σ is ad×m matrix, b is a d-dimensional vector, and W is an m-dimensionalstandard Brownian motion. Consequently,

|X(t)|2 = |X(0)|2 +

∫ t

0

2X(s)T

σ(X(s))dW (s)

+

∫ t

0

[2X(s) · b(X(s)) +∑i,k

σ2ik(X(s))]ds

= |X(0)|2 +

∫ t

0

2X(s)T

σ(X(s))dW (s)

+

∫ t

0

(2X(s) · b(X(s)) + trace(σ(X(s))σ(X(s))T

))ds .

As in the univariate case, if

2x · b(x) + trace(σ(x)σ(x)T

) ≤ K1 − ε|x|2,

then E[|X(s)|2] is uniformly bounded.


10. Stochastic differential equations for diffusion processes

• The generator for a diffusion process

• Exit distributions

• Expected exit time

• Dirichlet problem

• Parabolic equations

• Feynman-Kac formula

• Markov property

• Strong Markov property

• Forward equations

• Stationary distributions

• Reflecting diffusions


Differential operators and diffusion processes

Consider

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds,

where X is Rd-valued, W is an m-dimensional standard Brownianmotion, σ is a d × m matrix-valued function and b is an Rd-valuedfunction. For a C2 function f ,

f(X(t)) = f(X(0)) +d∑

i=1

∫ t

0∂if(X(s))dX(s)

+1

2

∑1≤i,j≤d

∫ t

0∂i∂jf(X(s))d[Xi, Xj]s.


Computation of covariation

The covariation satisfies

[Xi, Xj]t =

∫ t

0

∑k

σi,k(X(s))σj,k(X(s))ds =

∫ t

0ai,j(X(s))ds,

where a = ((ai,j)) = σ · σT , that is ai,j(x) =∑

k σik(x)σkj(x).


Definition of the generator

Let

Lf(x) =d∑

i=1

bi(x)∂if(x) +1

2

∑i,j

ai,j(x)∂i∂jf(x),

then

f(X(t)) = f(X(0)) +

∫ t

0∇fT (X(s))σ(X(s))dW (s) +

∫ t

0Lf(X(s))ds .

Sincea = σ · σT ,∑

ξiξjai,j = ξTσσT ξ = |σT ξ|2 ≥ 0,

a is nonnegative definite, and L is an elliptic differential operator.

L is called the generator for the corresponding diffusion process.


The generator for Brownian motionIf

X(t) = X(0) +W (t),

then ((ai,j(x))) = I , and Lf(x) = 12∆f(x).


Exit distributions in one dimension

If d = m = 1 and a(x) = σ2(x), then

Lf(x) =1

2a(x)f ′′(x) + b(x)f ′(x)

Suppose Lf(x) = 0 Then

f(X(t)) = f(X(0)) +

∫ t

0f ′(X(s))σ(X(s))dW (s).

Fix α < β, and define τ = inft : X(t) /∈ (α, β). If supα<x<β |f ′(x)σ(x)| <∞, then

f(X(t ∧ τ)) = f(X(0)) +

∫ t

01[0,τ)(s)f

′(X(s))σ(X(s))dW (s)

is a martingale, and

E[f(X(t ∧ τ))|X(0) = x] = f(x).


Formula for the exit distribution

Theorem 10.1 Let f satisfy Lf = 0. Suppose supα<x<β |f ′(x)σ(x)| <∞,supα<x<β f(x) <∞, and τ <∞ a.s. Then

E[f(X(τ))|X(0) = x] = f(x) (10.1)

andP (X(τ) = β|X(0) = x) =

f(x)− f(α)

f(β)− f(α).

Proof. By (10.1)

f(α)P (X(τ) = α|X(0) = x) + f(β)P (X(τ) = β|X(0) = x) = f(x),

andP (X(τ) = β|X(0) = x) =

f(x)− f(α)

f(β)− f(α). (10.2)


Finiteness of exit time

Let Lg(x) = 1. Then

g(X(t)) = g((X(0)) +

∫ t

0g′(X(s))σ(X(s))dW (s) + t,

and assuming supα<x<β |g′(x)σ(x)| <∞,

g(X(t ∧ τ))− t ∧ τ = g(x) +

∫ t∧τ

0g′(X(s))σ(X(s))dW (s)

is a martingale and hence

E[g(X(t ∧ τ))|X(0) = x] = g(x) + E[t ∧ τ ].


Solving the equation

1

2a(x)f ′′(x) + b(x)f ′(x) = 0

d

dxlog f ′(x) =

f ′′(x)

f ′(x)= −2b(x)

a(x)

and hencef ′(x) = C exp−

∫ x

x0

2b(y)

a(y)dy

1

2a(x)g′′(x) + b(x)g′(x) = 1

g′′(x) +2b(x)

a(x)g′(x) =

2

a(x)

d

dx

(exp

∫ x

x0

2b(y)

a(y)dyg′(x)

)= exp

∫ x

x0

2b(y)

a(y)dy 2

a(x)


Example

X(t) = X(0) +

∫ t

0σ√X(s)dW (s)

so Lf(x) = 12σ

2xf ′′(x). Note that Lf(x) = 0 for f(x) = x, so takingα = 0, if τ <∞ a.s.

PX(τ) = 0|X(0) = x =β − x

β.

Solving Lg = 1,

g′(x) =2

σ2 log x+ C, g(x) =2

σ2x log x

for C = 2σ2 . It follows that

E[τ |X(0) = x] =2

σ2 (x log β − x log x) =2x

σ2 logβ

x.


Dirichlet problems

Lf(x) = 0 x ∈ D

f(x) = h(x) x ∈ ∂D(10.3)

for D ⊂ Rd.

Definition 10.3 A function f is Holder continuous with Holder expo-nent δ > 0 if

|f(x)− f(y)| ≤ L|x− y|δ

for some L > 0.


Existence of solutions

Theorem 10.4 Suppose D is a bounded, smooth domain, there exists ε > 0such that

infx∈D

∑ai,j(x)ξiξj ≥ ε|ξ|2,

and ai,j, bi, and h are Holder continuous. Then there exists a unique C2 (inthe interior of D) solution f of the Dirichlet problem (10.3).


Representation of solution of Dirichlet problem

Let

X(t, x) = x+

∫ t

0σ(X(s, x))dW (s) +

∫ t

0b(X(s, x))ds. (10.4)

Define τ = τ(x) = inft : X(t, x) /∈ D. If f is C2 and bounded andsatisfies (10.3), then

f(x) = E[f(X(t ∧ τ, x))],

and assuming τ < ∞ a.s., f(x) = E[f(X(τ, x))]. By the boundarycondition

f(x) = E[h(X(τ, x))]. (10.5)

Conversely, define f by (10.5), and f will be, at least in some weaksense, a solution of (10.3). Note that if there is aC2, bounded solutionf and τ < ∞, f must be given by (10.5) proving uniqueness of C2,bounded solutions.


Harmonic functions

If ∆f = 0 (i.e., f is harmonic) on Rd, and W is standard Brownianmotion, then f(x+W (t)) is a martingale (at least a local martingale).


Parabolic equations

Suppose u is bounded and satisfiesut = Lu

u(0, x) = f(x).

By Ito’s formula, for a smooth function v(t, x),

v(t,X(t)) = v(0, X(0))+(local) martingale+

∫ t

0[vs(s,X(s))+Lv(s,X(s))]ds.

For fixed r > 0, define v(t, x) = u(r − t, x). Then ∂∂tv(t, x) = −u1(r −

t, x), where u1(t, x) = ∂∂tu(t, x). Since u1 = Lu and Lv(t, x) = Lu(r −

t, x), v(t,X(t)) is a martingale. Consequently,

E[u(r − t,X(t, x))] = u(r, x),

and setting t = r,

u(r, x) = E[u(0, X(r, x))] = E[f(X(r, x))].


Equations with a potential

We have that the solution ofut = Lu

u(0, x) = f(x).

is given by u(t, x) = E[f(X(t, x))]. We can also represent the solutionof

ut = Lu+ βu

u(0, x) = f(x).

where β is a function of x.


Feynman-Kac formula

Applying Ito’s formula

u(t0 − t,X(t, x))e∫ t

0β(X(z,x))dz

= u(t0, x) +

∫ t

0e∫ s

0β(X(z,x))dz∇xu(t0 − s,X(s, x))Tσ(X(s, x))dW (s)

+

∫ t

0e∫ s

0β(X(z,x))dz(β(X(s, x))u(t0 − s,X(s, x))

+Lu(t0 − s,X(s, x))− u1(t0 − s,X(s, x)))ds

and henceu(t, x) = E[f(X(t, x))e

∫ t

0β(X(z,x))dz]


Properties of X(t, x)

Assume that

|σ(x)− σ(y)| ≤ K|x− y|, |b(x)− b(y)| ≤ K|x− y|

for some constant K. Applying Ito’s formula and the Gronwall in-equality,

E[|X(t, x)−X(t, y)|n] ≤ C(t)|x− y|n. (10.6)

Theorem 10.5 There is a version ofX(t, x) such that the mapping (t, x) →X(t, x) is continous a.s.

Proof. The proof is based on Kolmogorov’s criterion for continuityof processes indexed by Rd.


Brownian motion with respect to a filtrationGiven a filtration Ft, W is called an Ft-standard Brownian motionif

1) W is Ft-adapted

2) W is a standard Brownian motion

3) W (r + ·)−W (r) is independent of Fr.

For example, if W is an Ft-Brownian motion, then

E[f(W (t+ r)−W (r))|Fr] = E[f(W (t))].


The process after time r

Let Wr(t) ≡ W (r + t) −W (r). Note that Wr is an Fr+t- Brownianmotion. We have

X(r + t, x) = X(r, x) +

∫ r+t

r

σ(X(s, x))dW (s) +

∫ r+t

r

b(X(s, x))ds

= X(r, x) +

∫ t

0σ(X(r + s, x))dWr(s)

+

∫ t

0b(X(r + s, x))ds.

Define Xr(t, x) such that

Xr(t, x) = x+

∫ t

0σ(Xr(s, x))dWr(s) +

∫ t

0b(Xr(s, x))ds.


Markov propertyThenX(r+t, x) = Xr(t,X(r, x)). Intuitively,X(r+t, x) = Ht(X(r, x),Wr)for some function H , and by the independence of X(r, x) and Wr,

E[f(X(r + t, x))|Fr] = E[f(Ht(X(r, x),Wr))|Fr]

= u(t,X(r, x)),

where u(t, z) = E[Ht(z,Wr)]. Hence

E[f(X(r + t, x))|Fr] = E[f(X(r + t, x))|X(r, x)],

that is, the Markov property holds for X .


Proof by discrete approximationDefine ηn(t) = k

n , for kn ≤ t < k+1

n , and let

Xn(t, x) = x+

∫ t

0σ(Xn(ηn(s), x))dW (s) +

∫ t

0b(Xn(ηn(s), x))ds.

Suppose that z ∈ CRm[0,∞). Then

Hn(t, x, z) = x+

∫ t

0σ(Hn(ηn(s), x, z))dz(s) +

∫ t

0b(Hn(ηn(s), x, z))ds

is well-defined. Note that Xn(t, x) = Hn(t, x,W ) and

X(r + t, x) = Xr(t,X(r, x))

= limn→∞

Xnr (t,X(r, x))

= limn→∞

Hn(t,X(r, x),Wr).


Markov property

E[f(X(r + t, x))|Fr] = limn→∞

E[f(Hn(t,X(r, x),Wr)|Fr]

= limn→n

E[f(Hn(t,X(r, x),Wr)|X(r, x)]

= E[f(X(r + t, x))|X(r, x)] .


Strong Markov property for Brownian motion

Theorem 10.6 Let W be an Ft-Brownian Motion and let τ be an Ftstopping time. Define F τ

t = Fτ+t. Then Wτ(t) ≡ W (τ + t) −W (τ) is anF τ

t -Brownian motion. In particular, Wτ is independent of Fτ .

Proof. Letτn =

k + 1

n, when

k

n≤ τ <

k + 1

n.

Then clearly τn > τ . We claim that

E[f(W (τn + t)−W (τn))|Fτn] = E[f(W (t))],

that is, for A ∈ Fτn∫A

f(W (τn + t)−W (τn))dP = P (A)E[f(W (t))].


Since A ∩ τn = k/n ∈ Fk/n,

LHS =∑

k

∫A∩τn=k/n

f(W (k

n+ t)−W (

k

n))dP

=∑

k

P (A ∩ τn = k/n)E[f(W (k

n+ t)−W (

k

n))]

=∑

k

P (A ∩ τn = k/n)E[f(W (t))]

= E[f(W (t))]P (A).

Since Fτn⊃ Fτ , E[f(W (τn+t)−W (τn))|Fτ ] = E[f(W (t))], and letting

n→∞,E[f(W (τ + t)−W (τ))|Fτ ] = E[f(W (t))]. (10.7)

Since τ+s is a stopping time, (10.7) holds with τ replaced by τ+s andit follows that Wτ has independent Gaussian increments and henceis a Brownian motion.


Strong Markov property

For

X(t, x) = x+

∫ t

0σ(X(s, x))dW (s) +

∫ t

0b(X(s, x))ds

and a stopping time τ ,

X(τ + t, x) = X(τ, x)+

∫ t

0σ(X(τ +s, x))dWτ(s)+

∫ t

0b(X(τ +s, x))ds.

By the same argument as for the Markov property, we have

E[f(X(τ + t, x))|Fτ ] = u(t,X(τ, x))

where u(t, x) = E[f(X(t, x))]. This identity is the strong Markov prop-erty. (Note that both the Markov and strong Markov properties areverified assuming uniqueness.


Equations for probability distributions

If X is a solution of an Ito equation, then

f (X(t))−∫ t

0Lf (X(s)) ds (10.8)

is a (local) martingale for all f in a specified collection of functionswhich we denote D(L), the domain of L. If

dX = σ(X)dW + b(X)dt

and

Lf(x) =1

2

∑i

aij(x)∂2

∂xi∂xjf(x) +

∑bi(x)

∂

∂xif(x) (10.9)

with((aij(x))) = σ(x)σT (x),

then (10.8) is a martingale for all f ∈ C2c (= D(L)). (C2

c denotes theC2 functions with compact support.)


The forward equation

Since f (X(t))−∫ t

0 Lf (X(s)) ds is a martingale,

E [f(X(t))] = E [f(X(0)] + E

[∫ t

0Lf(X(s))ds

]= E [f(X(0))] +

∫ t

0E [Lf(X(s))] ds.

Let νt(Γ) = PX(t) ∈ Γ. Then for all f in the domain of L,∫fdνt =

∫fdν0 +

∫ t

0

∫Lfdνsds, (10.10)

which is a weak form of the equation

d

dtνt = L∗νt. (10.11)


Uniqueness for the forward equation

Theorem 10.7 Let Lf be given by (10.9) with a and b continuous, and letνt be probability measures on Rd satisfying (10.10) for all f ∈ C2

c (Rd).If dX = σ(x)dW + b(x)dt has a unique solution for each initial condition,then PX(0) ∈ · = ν0 implies PX(t) ∈ · = νt.


Adjoint operatorIn nice situations, νt(dx) = pt(x)dx. Then L∗ should be a differentialoperator satisfying

∫Rd pLfdx =

∫Rd fL

∗pdx.

Example 10.8 Let d = 1. Integrating by parts, we have∫ ∞

−∞p(x)

(1

2a(x)f ′′(x) + b(x)f ′(x)

)dx

=1

2p(x)a(x)f ′(x)

∣∣∣∣∞−∞

−∫ ∞

−∞f ′(x)

(1

2

d

dx(a(x)p(x))− b(x)p(x)

)dx.

The first term is zero, and integrating by parts again we have∫ ∞

−∞f(x)

d

dx

(1

2

d


)dx

so L∗p = ddx

(12

ddx (a(x)p(x))− b(x)p(x)

).

Example 10.9 Let Lf = 12f

′′ (Brownian motion). Then L∗p = 12p

′′, thatis, L is self adjoint.


Stationary distributions

Suppose∫Lfdπ = 0 for all f in the domain of L. Then∫

fdπ =

∫fdπ +

∫ t

0

∫Lfdπds,

and hence νt ≡ π gives a solution of (10.10). Under appropriate con-ditions, in particular, those of Theorem 10.7, if PX(0) ∈ · = π

and f(X(t)) −∫ t

0 Lf(X(s))ds is a martingale for all f ∈ D(L), thenPX(t) ∈ · = π, i.e. π is a stationary distribution for X .


Stationary distributions for one-dimensional diffusions

Let d = 1. Assuming π(dx) = π(x)dx

d

dx

(1

2

d

dx(a(x)π(x))− b(x)π(x)

)︸︷︷︸

this is a constant:let the constant be 0

= 0,

so 12

ddx (a(x)π(x)) = b(x)π(x). Applying the integrating factor

exp(−∫ x

0 2b(z)/a(z)dz) to get a perfect differential,1

2e−

R x0

2b(z)a(z)

dz d

dx(a(x)π(x))− b(x)e−

R x0

2b(z)a(z)

dzπ(x) = 0

a(x)e−R x0

2b(z)a(z)

dzπ(x) = C

π(x) =C

a(x)e

R x0

2b(z)a(z)

dz.

Assume a(x) > 0 for all x. The condition for the existence of a sta-tionary distribution is

∫∞−∞

1a(x)e

∫ x

02b(z)a(z) dzdx <∞.


Diffusion with a boundary

Suppose

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds+ Λ(t)

with X(t) ≥ 0, and that Λ is nondecreasing and increases only whenX(t) = 0. Then

f(X(t))−∫ t

0Lf(X(s))ds

is a martingale, if f ∈ C2c and f ′(0) = 0.∫ ∞

0

p(x)Lf(x)dx =

[1

2p(x)a(x)f ′(x)

]∞0︸︷︷︸

=0

−∫ ∞

0

f ′(

1

2

d


)dx

=

[−f(x)

(1

2

d


)]∞0

+

∫ ∞

0

f(x)L∗p(x)dx.


Adjoint operator with boundary conditions

L∗p(x) =d

dx

(1

2

d


)for p satisfying

(12a

′(0)− b(0))p(0) + 1

2a(0)p′(0) = 0 . The density forthe distribution of the process should satisfy

d

dtpt = L∗pt

and the stationary density satisfies ddx

(12

ddx (a(x)π(x))− b(x)π(x)

)= 0

subject to the boundary condition. The boundary condition implies12

ddx (a(x)π(x))− b(x)π(x) = 0, and hence

π(x) =c

a(x)e∫ x

02b(z)a(z) dz, x ≥ 0.


Reflecting Brownian motion

Let X(t) = X(0) + σW (t) − bt + Λ(t), where a = σ2 and b > 0 areconstant. Then

π(x) =2b

σ2e− 2b

σ2 x,

so the stationary distribution is exponential.


11. Stochastic equations for general Markov processes in Rd

• Poisson random measures

• Stochastic integrals for space-time Poisson random measures

• Stochastic integrals for centered space-time Poisson random mea-sures

• Stochastic equations for Markov processes

• Martingale problems


Poisson distribution

Definition 11.1 A random variable X has a Poisson distribution withparameter λ > 0 (write X ∼ Poisson(λ)) if for each k ∈ 0, 1, 2, . . .

PX = k =λk

k!e−λ.

E[X] = λ V ar(X) = λ

and characteristic function of X is

E[eiθX ] = eλ(eiθ−1).


Sums of independent Poisson random variables

Since the characteristic function of a random variable characterizesits distribution, a direct computation gives

Proposition 11.2 IfX1, X2, . . . are independent random variables withXi ∼Poisson(λi) and

∑∞i=1 λi <∞, then

X =∞∑i=1

Xi ∼ Poisson

( ∞∑i=1

λi

)


Poisson sums of Bernoulli random variables

Proposition 11.3 Let N ∼ Poisson(λ), and suppose that Y1, Y2, . . . arei.i.d. Bernoulli random variables with parameter p ∈ [0, 1]. If N is inde-pendent of the Yi, then

∑Ni=0 Yi ∼ Poisson(λp).

For j = 1, . . . ,m, let ej be the vector in Rm that has all its entriesequal to zero, except for the jth which is 1. For θ, y ∈ Rm, let 〈θ, y〉 =∑m

j=1 θjyj.

Proposition 11.4 Let N ∼ Poisson(λ). Suppose that Y1, Y2, . . . are in-dependent Rm-valued random variables such that for all k ≥ 0 and j ∈1, . . . ,m

PYk = ej = pj,

where∑m

j=1 pj = 1. Define X = (X1, ..., Xm)T =∑N

k=0 Yk. If N isindependent of the Yk, then X1, . . . , Xm are independent random variablesand Xj ∼ Poisson(λpj).


Poisson random measures

Let ν be a σ-finite measure on U and (U, dU) be a complete, separablemetric space. Let N (U) denote the collection of counting measureson U .

Definition 11.5 A Poisson random measure on U with mean measure νis a random counting measure ξ (that is, a N (U)-valued random variable)such that

a) For A ∈ B(U), ξ(A) has a Poisson distribution with expectation ν(A)

b) ξ(A) and ξ(B) are independent if A ∩B = ∅.

For f ∈M(U), f ≥ 0, define

ψξ(f) = E[exp−∫

U

f(u)ξ(du)] = exp−∫

(1− e−f)dν

(Verify the second equality by approximating f by simple functions.)


Existence

Proposition 11.6 Suppose that ν is a measure on U such that ν(U) <∞.Then there exists a Poisson random measure with mean measure ν.

Proof. The case ν(U) = 0 is trivial, so assume that ν(U) ∈ (0,∞).Let N be a Poisson random variable defined on a probability space(Ω,F , P ) with E[N ] = ν(U). Let X1, X2, . . . be iid U -valued randomvariables such that for every A ∈ B(U),

PXj ∈ A =ν(A)

ν(U),

and assume that N is independent of the Xj.

Define ξ by ξ(A) =∑N

k=0 1Xk∈A. In other words ξ =∑N

k=0 δXk

where, for each x ∈ U , δx is the Dirac mass at x.

Extend the existence result to σ-finite measures by partitioning U =∪iUi, where ν(Ui) <∞.


Identities

Let ξ be a Poisson random measure with mean measure ν.

Lemma 11.7 Suppose f ∈M(U), f ≥ 0. Then

E[

∫f(y)ξ(dy)] =

∫f(y)ν(dy)

Lemma 11.8 Suppose ν is nonatomic and let f ∈ M(N (U) × U), f ≥ 0.Then

E[

∫U

f(ξ, y)ξ(dy)] = E[

∫U

f(ξ + δy, y)ν(dy)]


Proof. Suppose 0 ≤ f ≤ 1U0, where ν(U0) < ∞. Let U0 = ∪kU

nk ,

where the Unk are disjoint and diam(Un

k ) ≤ n−1. If ξ(Unk ) is 0 or 1, then∫

Unk

f(ξ, y)ξ(dy) =

∫Un

k

f(ξ(· ∩ Un,ck ) + δy, y)ξ(dy)

Consequently, if maxk ξ(Unk ) ≤ 1,∫

U0

f(ξ, y)ξ(dy) =∑

k

∫Un

k

f(ξ(· ∩ Un,ck ) + δy, y)ξ(dy)

Since ξ(U0) <∞, for n sufficiently large, maxk ξ(Unk ) ≤ 1,

E[

∫U

f(ξ, y)ξ(dy)] = limn→∞

∑k

E[

∫Un

k

f(ξ(· ∩ Un,ck ) + δy, y)ξ(dy)]

= limn→∞

∑k

E[

∫Un

k

f(ξ(· ∩ Un,ck ) + δy, y)ν(dy)]

= E[

∫U

f(ξ + δy, y)ν(dy)].


Note that the last equality follows from the fact that

f(ξ(· ∩ Un,ck ) + δy, y) 6= f(ξ + δy, y)

only if ξ(Unk ) > 0, and hence, assuming 0 ≤ f ≤ 1U0

,

|∑

k

∫Un

k

f(ξ(·∩Un,ck )+δy, y)ν(dy)−

∫U0

f(ξ+δy, y)ν(dy)| ≤∑

k

ξ(Unk )ν(Un

k ),

where defining Un(y) = Unk if y ∈ Un

k , the expectation of the rightside is∑

k

ν(Unk )2 =

∫U0

ν(Un(y))ν(dy) ≤∫

U0

ν(U0 ∩B1/n(y))ν(dy).

limn→∞ ν(U0 ∩B1/n(y)) = 0, since ν is nonatomic.


Space-time Poisson random measures

Let ξ be a Poisson random measure on U× [0,∞) with mean measureν × ` (where ` denotes Lebesgue measure).

ξ(A, t) ≡ ξ(A× [0, t]) is a Poisson process with parameter ν(A).

ξ(A, t) ≡ ξ(A× [0, t])− ν(A)t is a martingale.

Definition 11.9 ξ is Ft compatible, if for eachA ∈ B(U), ξ(A, ·) is Ftadapted and for all t, s ≥ 0, ξ(A× (t, t+ s]) is independent of Ft.


Stochastic integrals for Poisson random measures

For i = 1, . . . ,m, let ti < ri andAi ∈ B(U), and let ηi beFti-measurable.Let X(u, t) =

∑i ηi1Ai

(u)1[ti,ri)(t), and note that

X(u, t−) =∑

i

ηi1Ai(u)1(ti,ri](t). (11.1)

Define

Iξ(X, t) =

∫U×[0,t]

X(u, s−)ξ(du× ds) =∑

i

ηiξ(Ai × (ti, ri]).

Then

E [|Iξ(X, t)|] ≤ E

[∫U×[0,t]

|X(u, s−)|ξ(du× ds)

]=

∫U×[0,t]

E[|X(u, s)|]ν(du)ds

and if the right side is finite, E[Iξ(X, t)] =∫

U×[0,t]E[X(u, s)]ν(du)ds.


Estimates in L1,0

|Iξ(X, t)| ∧ 1 ≤∫

U×[0,t]|X(u, s−)| ∧ 1ξ(du× ds)

and

E

[supt≤T

|Iξ(X, t)| ∧ 1

]≤∫

U×[0,T ]E[|X(u, s)| ∧ 1]ν(du)ds

Definition 11.10 Let L1,0(U, ν) denote the space of B(U)× B[0,∞)×F-measurable mappings (u, s, ω) → X(u, s, ω) such that

∫∞0 e−s

∫U E[|X(u, s)|∧

1]ν(du)ds <∞.


Extension of the integralLet S− denote the collection of B(U)×B[0,∞)×F measurable map-pings (u, s, t) →

∑mi=1 ηi(ω)1Ai

(u)1(ti,ri](t) defined as in (11.1).

Lemma 11.11

d1,0(X, Y ) =

∫ ∞

0e−s

∫U

E[|X(u, s)− Y (u, s)| ∧ 1]ν(du)ds

defines a metric on L1,0(U, ν), and the definition of Iξ extends to the closureof S− in L1,0(U, ν).


The predictable σ-algebraWarning: Let N be a unit Poisson process. Then

∫∞0 e−sE[|N(s) −

N(s−)| ∧ 1]ds = 0, but P∫ t

0 N(s)dN(s) 6=∫ t

0 N(s−)dN(s) = 1− e−t.

Definition 11.12 Let (Ω,F , P ) be a probability space and let Ft be afiltration in F . The σ-algebra P of predictable sets is the smallest σ-algebra inB(U)×B[0,∞)×F containing sets of the formA×(t0, t0+r0]×Bfor A ∈ B(U), t0, r0 ≥ 0, and B ∈ Ft0.

Remark 11.13 Note that for B ∈ Ft0, 1A×(t0,t0+r0]×B(u, t, ω) is left contin-uous in t and adapted and that the mapping (u, t, ω) → X(u, t−, ω), whereX(u, t−, ω) is defined in (11.1), is P-measurable.


Predictable processes

Definition 11.14 A stochastic process X on U × [0,∞) is predictable ifthe mapping (u, t, ω) → X(u, t, ω) is P-measurable.

Lemma 11.15 If the mapping (u, t, ω) → X(u, t, ω) is B(U)× B[0,∞)×F-measurable and adapted and is left continuous in t, thenX is predictable.

Proof.Let 0 = tn0 < tn1 < · · · and tni+1 − tni ≤ n−1. Define Xn(u, t, ω) =X(u, tni , ω) for tni < t ≤ tni+1. ThenXn is predictable and limn→∞Xn(u, t, ω) =X(u, t, ω) for all (u, t, ω).


Stochastic integrals for predictable processes

Lemma 11.16 Let G ∈ P , B ∈ B(U) with ν(B) < ∞ and b > 0. Then1B×[0,b](u, t)1G(u, t, ω) is a predictable process and

Iξ(1B×[0,b]1G, t)(ω) =

∫U×[0,t]

1B×[0,b](u, s)1G(u, s, ω)ξ(du× ds, ω) a.s.

(11.2)and

E[

∫U×[0,t]

1B×[0,b](u, s)1G(u, s, ·)ξ(du× ds)] (11.3)

= E[

∫U×[0,t]

1B×[0,b]1G(u, s, ·)ν(du)ds]


Proof. Let

A = ∪mi=1Ai × (ti, ti + ri]×Gi : ti, ri ≥ 0, Ai ∈ B(U), Gi ∈ Fti.

Then A is an algebra, (11.2) holds by definition, and (11.3) holds bydirect calculation. The collection of G that satisfy (11.2) and (11.3)is closed under increasing unions and decreasing intersections, andthe monotone class theorem (see Theorem 4.1 of the Appendix of [2])gives the lemma.


Equivalence of stochastic integral and integral againstξ

Lemma 11.17 Let X be a predictable process satisfying∫ ∞

0e−s

∫U

E[|X(u, s)| ∧ 1]ν(du)ds <∞.

Then∫

U×[0,t] |X(u, t)|ξ(du× ds) <∞ a.s. and

Iξ(X, t)(ω) =

∫U×[0,t]

X(u, t, ω)ξ(du× ds, ω) a.s.

Proof. Approximate by simple functions.


Consequences of predictability

Lemma 11.18 If X is predictable and∫

U×[0,t] |X(u, s)| ∧ 1ν(du)ds < ∞a.s. for all t, then ∫

U×[0,t]|X(u, s)|ξ(du× ds) <∞ a.s. (11.4)

and∫

U×[0,t]X(u, s)ξ(du× ds) exists a.s.

Proof. Let τc = inft :∫

U×[0,t] |X(u, s)| ∧ 1ν(du)ds ≥ c, and considerXc(s, u) = 1[0,τc](s)X(u, s). ThenXc satisfies the conditions of Lemma11.17, so ∫

U×[0,t]|X(u, s)| ∧ 1ξ(du× ds) <∞ a.s.

Consequently, ξ(u, s) : s ≤ t, |X(u, s)| > 1 <∞, so (11.4) holds.


Martingale properties

Theorem 11.19 SupposeX is predictable and∫

U×[0,t]E[|X(u, s)|]ν(du)ds <∞ for each t > 0. Then∫

U×[0,t]X(u, s)ξ(du× ds)−

∫ t

0

∫U

X(u, s)ν(du)ds

is a Ft-martingale.


Proof. Let A ∈ Ft and define XA(u, s) = 1AX(u, s)1(t,t+r](s). ThenXA is predictable and

E[1A

∫U×(t,t+r]

X(u, s)ξ(du× ds)] = E[

∫U×[0,t+r]

XA(u, s)ξ(du× ds)]

= E[

∫U×[0,t+r]

XA(u, s)ν(du)ds]

= E[1A

∫U×(t,t+r]

X(u, s)ν(du)ds]

and hence

E[

∫U×(t,t+r]

X(u, s)ξ(du× ds)|Ft] = E[

∫U×(t,t+r]

X(u, s)ν(du)ds|Ft].


Local martingales

Lemma 11.20 If∫

U×[0,t] |X(u, s)|ν(du)ds <∞ a.s. t ≥ 0, then∫U×[0,t]

X(u, s)ξ(du× ds)−∫

U×[0,t]X(u, s)ν(du)ds

is a local martingale.

Proof. If τ is a stopping time andX is predictable, then 1[0,τ ](t)X(u, t)is predictable. Let τc = t > 0 :

∫U×[0,t] |X(u, s)|ν(du)ds ≥ c. Then∫

U×[0,t∧τc]X(u, s)ξ(du× ds)−

∫U×[0,t∧τc]

X(u, s)ν(du)ds

=

∫U×[0,t]

1[0,τc](s)X(u, s)ξ(du× ds)−∫

U×[0,t]1[0,τc](s)X(u, s)ν(du)ds.

is a martingale.


Representation of counting processes

LetU = [0,∞) and ν = `. Let λ be a nonnegative, predictable process,and define G = (u, t) : u ≤ λ(t). Then

N(t) =

∫[0,∞)×[0,t]

1G(u, s)ξ(du× ds) =

∫[0,∞)×[0,t]

1[0,λ(s)](u)ξ(du× ds)

is a counting process with intensity λ.

Stochastic equation for a counting process

λ : Dc[0,∞) × [0,∞) → [0,∞), λ(z, t) = λ(zt, t), t ≥ 0, λ(z, t) cadlagfor each z ∈ Dc[0,∞).

N(t) =

∫[0,∞)×[0,t]

1[0,λ(N,s−)](u)ξ(du× ds)


Semimartingale property

Corollary 11.21 If X is predictable and∫

U×[0,t] |X(u, s)| ∧ 1ν(du)ds <∞a.s. for all t, then

∫U×[0,t] |X(u, s)|ξ(du× ds) <∞ a.s.∫

U×[0,t]

X(u, s)ξ(du× ds)

=

∫U×[0,t]

1|X(u,s|≤1X(u, s)ξ(du× ds)−∫ t

0

∫U

1|X(u,s)|≤1X(u, s)ν(du)ds︸︷︷︸local martingale

+

∫ t

0

∫U

1|X(u,s)|≤1X(u, s)ν(du)ds+

∫U×[0,t]

1|X(u,s|>11X(u, s)ξ(du× ds)︸︷︷︸finite variation

is a semimartingale.


Stochastic integrals for centered Poisson random mea-sures

Let ξ(du× ds) = ξ(du× ds)− ν(du)ds

For X(u, t−) =∑

i ηi1Ai(u)1(ti,ri](t). As in (11.1), define

Iξ(X, t) =

∫U×[0,t]

X(u, s−)ξ(du× ds)

=

∫U×[0,t]

X(u, s)ξ(du× ds)−∫ t

0

∫U

X(u, s)ν(du)ds

and note that

E[Iξ(X, t)

2]

=

∫U×[0,t]

E[X(u, s)2]ν(du)ds

if the right side is finite. Then Iξ(X, ·) is a square-integrable martin-gale.


Extension of integral

The integral extends to predictable integrands satisfying∫U×[0,t]

|X(u, s)|2 ∧ |X(u, s)|ν(du)ds <∞ a.s. (11.5)

so that∫U×[0,t∧τ ]

X(u, s)ξ(du× ds) =

∫U×[0,t]

1[0,τ ](s)X(u, s)ξ(du× ds) (11.6)

is a martingale for any stopping time satisfying

E

[∫U×[0,t∧τ ]

|X(u, s)|2 ∧ |X(u, s)|ν(du)ds]<∞,

and (11.6) is a local square integrable martingale if∫U×[0,t]

|X(u, s)|2ν(du)ds <∞ a.s.


Quadratic variation

Note that if X is predictable and∫

U×[0,t] |X(u, s)| ∧ 1ν(du)ds <∞ a.s.,t ≥ 0, then∫

U×[0,t]|X(u, s)|2 ∧ 1ν(du)ds <∞ a.s., t ≥ 0,

and[Iξ(X, ·)]t =

∫U×[0,t]

X2(u, s)ξ(du× ds).

Similarly, if∫U×[0,t]

|X(u, s)|2 ∧ |X(u, s)|ν(du)ds <∞ a.s.,

[Iξ(X, ·)]t =

∫U×[0,t]

X2(u, s)ξ(du× ds).


Semimartingale properties

Theorem 11.22 Let Y be a cadlag, adapted process. If X satisfies (11.4),Iξ(X, ·) is a semimartingale and∫ t

0Y (s−)dIξ(X, s) =

∫U×[0,t]

Y (s−)X(u, s)ξ(du× ds),

and if X satisfies (11.5), Iξ(X, ·) is a semimartingale and∫ t

0Y (s−)dIξ(X, s) =

∫U×[0,t]

Y (s−)X(u, s)ξ(du× ds)


Levy processes

Theorem 11.23 Let U = R and∫

R |u|2 ∧ 1ν(du) <∞. Then

Z(t) =

∫[−1,1]×[0,t]

uξ(du× ds) +

∫[−1,1]c×[0,t]

uξ(du× ds)

is a process with stationary, independent increments with

E[eiθZ(t)] = expt∫

R(eiθu − 1− iθu1[−1,1](u))ν(du)


Proof.

eiθZ(t) = 1 +

∫ t

0

iθeiθZ(s−)dZ(s−) +∑s≤t

(eiθZ(s) − eiθZ(s−) − iθeiθZ(s−)∆Z(s))

= 1 +

∫[−1,1]×[0,t]

iθeiθZ(s−)uξ(du× ds) +

∫[−1,1]c×[0,t]

iθeiθZ(s−)uξ(du× ds)

+

∫R×[0,t]

(eiθ(Z(s−)+u) − eiθZ(s−) − iθeiθZ(s−)u)ξ(du× ds)

= 1 +

∫[−1,1]×[0,t]

iθeiθZ(s−)uξ(du× ds)

+

∫R×[0,t]

eiθZ(s−)(eiθu − 1− iθu1[−1,1](u))ξ(du× ds)

Taking expectations

ϕ(θ, t) = 1 +

∫R×[0,t]

ϕ(θ, s)(eiθu − 1− iθu1[−1,1](u))ν(du)ds

so ϕ(θ, t) = expt∫

R(eiθu − 1− iθu1[−1,1](u))ν(du)


Scaling

Let U = R and write ξ =∑δ(ui,si). Define ξa,b =

∑δ(aui,bsi). Then ξa,b

is a Poisson random measure with mean measure b−1νa(du)ds whereνa(c, d) = ν(a−1c, a−1d). If ν has a density γ, then γa(u) = a−1γ(a−1u).Let

Za,b(t) =

∫[−1,1]×[0,t]

uξa,b(du× ds) +

∫[−1,1]c×[0,t]

uξa.b(du× ds)

=

∫[−a−1,a−1]×[0,b−1t]

auξ(du× ds) +

∫[−a−1,a−1]c×[0,b−1t]

auξ(du× ds)

= aZ(b−1t) +

∫ ∞

−∞au(1[−1,1](u)− 1[−a−1,a−1](u))ν(du)b

−1t

Example: γ(u) = c|u|−1−α. Then the measure for ξa,b is caα|u|−1−αb−1duds

and the “drift” term on the right vanishes by symmetry. Conse-quently, if b = aα, then Za,b(t) = aZ(a−αt) has the same distributionas Z.


Approximation of Levy processes

For 0 < ε < 1, let

Zε(t) =

∫[−1−ε)∪(ε,1]×[0,t]

uξ(du× ds) +

∫[−1,1]c×[0,t]

uξ(du× ds)

=

∫(−∞,−ε)∪(ε,∞)×[0,t]

uξ(du× ds)− t

∫[−1−ε)∪(ε,1]

uν(du)

that is, throw out all jumps of size less than or equal to ε and thecorresponding centering. Then

E[|Zε(t)− Z(t)|2] = t

∫[−ε,ε]

u2ν(du).

Consequently, since Zε−Z is a square integrable martingale, Doob’sinequality gives

limε→0

E[sups≤t

|Zε(s)− Z(s)|2] = 0.


Summary on stochastic integrals

If X is predictable and∫

U×[0,t] |X(u, s)| ∧ 1ν(du)ds <∞ a.s. for all t,then ∫

U×[0,t]

|X(u, s)|ξ(du× ds) <∞ a.s.

∫U×[0,t]

X(u, s)ξ(du× ds)

=

∫U×[0,t]

1|X(u,s)|≤1X(u, s)ξ(du× ds)−∫ t

0

∫U

1|X(u,s)|≤1X(u, s)ν(du)ds︸︷︷︸local martingale

+

∫ t

0

∫U

1|X(u,s)|≤1X(u, s)ν(du)ds+

∫U×[0,t]

1|X(u,s)|>1X(u, s)ξ(du× ds)︸︷︷︸finite variation

is a semimartingale.


Centered Poisson random measruresIf X is predictable and∫

U×[0,t]|X(u, s)|2 ∧ |X(u, s)|ν(du)ds <∞ a.s.,

then∫U×[0,t]

X(u, s)ξ(du× ds)

= limε→0+

∫U×[0,t]

1|X(u,s)|≥ε(s)X(u, s)ξ(du× ds)

= limε→0+

(∫U×[0,t]

1|X(u,s)|≥εX(u, s)ξ(du× ds)−∫ t

0

∫U

1|X(u,s)|≥εX(u, s)ν(du)ds

)exists and is a local martingale.


Markov processes

Markov chain: Xn+1 = H(Xn, ηn+1), ηn iid and independent of X0.

Rd-valued Markov process:

X(t) = X(0) +

∫ t

0

σ(X(s))dW (s) +

∫ t

0

b(X(s))ds (11.7)

+

∫U1×[0,t]

α1(X(s−), u)ξ(du× ds) +

∫U2×[0,t]

α2(X(s−), u)ξ(du× ds)

where σ : Rd → Md×m, b : Rd → Rd, and for each compact K ⊂ Rd,

supx∈K

(|σ(x)|+ |b(x)|+∫

U1

|α1(x, u)|2ν(du)+

∫U2

|α2(x, u)|∧1ν(du)) <∞

(11.8)


Uniqueness and the Markov property

If X is a solution of (11.7), then

X(t) = X(r) +

∫ t

r

σ(X(s))dW (s) +

∫ t

r

b(X(s))ds

+

∫U1×(r,,t]

α1(X(s−), u)ξ(du× ds) +

∫U2×(r,t]

α2(X(s−), u)ξ(du× ds)

Uniqueness impliesX(r) is independent ofW (·+r)−W (r) and ξ(A×(r, ·]) and thatX(t), t ≥ r is determined byX(r), W (·+r)−W (r) andξ(A× (r, ·]), which gives the Markov property.


Proof of uniqueness

Recall the proof of uniqueness used for the general martingale sde.Let

Z(t) =

∫ t

0X1(s)dW (s) +

∫ t

0X2(s)ds+

∫U1×[0,t]

X3(s−, u)ξ(du× ds)

+

∫U2×[0,t]

X4(s−, u)ξ(du× ds)

= Z1(t) + Z2(t) + Z3(t) + Z4(t)


Integral estimate

Then there exists Kτ(t, δ) such that

Psups≤t

|Z(τ + s)− Z(τ)| ≥ Kτ (t, δ)

≤4∑

k=1

Psups≤t

|Z(τ + s)− Z(τ)| ≥ 1

4Kτ (t, δ)

≤4√E[∫ τ+t

τ|X1(s)|2ds]

Kτ (t, δ)+

4E[∫ τ+t

τ|X2(s)|ds]

Kτ (t, δ)

+4√E[∫

U1×[τ,τ+t]|X3(s, u)|2ν(du)ds]

Kτ (t, δ)+

4E[∫

U2×[τ,τ+t]|X4(s, u)|ν(du)ds]

Kτ (t, δ)

≤ δ

for allX1, X2, X3, X4 satisfying |X1(s)|+|X2(s)|+√∫

U1|X3(s, u)|2ν(du)+∫

U2|X4(s, u)|ν(du) ≤ 1.


Martingale problems

AnE-valued processX is an Ft-Markov process ifX is Ft-adaptedand E[g(X(t+ s))|Ft] = E[g(X(t+ s))|X(t)], g ∈ B(E).

The generator of a Markov process determines its short time behavior

E[g(X(t+ ∆t))− g(X(t))|Ft] ≈ Ag(X(t))∆t

Definition 11.24 X is a solution of the martingale problem for A if andonly if there exists a filtration Ft such that X is Ft-adapted and

g(X(t))− g(X(0))−∫ t

0Ag(X(s))ds (11.9)

is an Ft-martingale for each g ∈ D(A).

For ν ∈ P(E), X is a solution of the martingale problem for (A, ν) if X isa solution of the martingale problem for A and X(0) has distribution ν.


Generator for the SDE

Let

X(t) = X(0) +

∫ t

0

σ(X(s))dW (s) +

∫ t

0

b(X(s))ds

+

∫U1×[0,t]

α1(X(s−), u)ξ(du× ds) +

∫U2×[0,t]

α2(X(s−), u)ξ(du× ds) .

Then

f(X(t))− f(X(0))−∫ t

0

Af(X(s))ds

=

∫ t

0

∇f(X(s))Tσ(X(s))dW (s)

+

∫U1

(f(X(s−) + α1(X(s−), u))− f(X(s−))ξ(du× ds)

+

∫U2

(f(X(s−) + α2(X(s−), u))− f(X(s−))ξ(du× ds)

Note that, assuming (11.8), f ∈ C2c (Rd), and thatX exists for all t ≥ 0,

the right side is a local square integrable martingale


Form of the generator

Af(x) =1

2

∑aij(x)∂i∂jf(x) + b(x) · ∇f(x)

+

∫U1

(f(x+ α1(x, u))− f(x)− α1(x, u) · ∇f(x))ν(du)

+

∫U2

(f(x+ α2(x, u))− f(x))ν(du)

Let D(A) be a collection of functions for which Af is bounded. Thena solution of the SDE is a solution of the martingale problem for A.


Pure jump processes

ConsiderAf(x) = λ(x)

∫E

(f(y)− f(x))µ(x, dy)

λ ≥ 0, µ a transition function.

The corresponding Markov process stays in a state x for an expo-nential length of time with parameter λ(x) and then jumps to a newpoint with distribution µ(x, ·).

There exists a space U0, a probability measure ν0 ∈ P(U0), and ameasurable mapping H : E × U0 → E such that µ(x,Γ) = ν0(u :H(x, u) ∈ Γ), that is∫

U0

f(H(x, u))ν0(du) =

∫E

f(y)µ(x, dy).


Stochastic equation for a pure jump process

Let ξ be a Poisson random measure on U0× [0,∞)× [0,∞) with meanmeasure ν0 × `× `. Then there exists an E-valued process satisfying

X(t) = X(0)

+

∫U0×[0,∞)×[0,t]

1[0,λ(X(s−))](u1)(H(X(s−), u0)−X(s−))ξ(du0 × du1 × ds)

up to

τ∞ = limn→∞

inft :

∫U0×[0,∞)×[0,t]

1[0,λ(X(s−))](u1)ξ(du0 × du1 × ds) ≥ n


Lipschitz condition

α2(x, u) = 1[0,λ(x)](u1)(H(x, u0)− x)

so ∫U

|α2(x, u)− α2(y, u)|ν(du)

≤∫

U

|1[0,λ(x)](u1)− 1[0,λ(y)](u1)||(H(x, u0)− x)|ν0(u0)du1

+

∫U

1[0,λ(y)](u1)||(H(x, u0)−H(y, u0)− (x− y))|ν0(u0)du1

≤∫

U0

|λ(x)− λ(y)||(H(x, u0)− x)|ν0(u0)

+

∫U0

λ(y)|(H(x, u0)−H(y, u0)− (x− y))|ν0(u0)

Exercise 11.25 Try estimating∫|α2(x, u)− α2(y, u)|2ν(du).


Dynkin’s identity

Lemma 11.26 Suppose X is a solution of the martingale problem for A.Then for a stopping time τ ,

E[f(X(t ∧ τ))] = E[f(X(0))] + E[

∫ t∧τ

0Af(X(s))ds]

Proof. Apply the optional sampling theorem.


Moment estimates

Suppose

|a(x)|+ |b(x)|2 +

∫U1

|α1(x, u)|2ν(du) (11.10)

+

∫U2

|α2(x, u)|2ν(du) +

(∫U2

|α2(x, u)|ν(du))2

≤ K1 +K2|x|2

Then for f(x) = |x|2, Af(x) ≤ C1 + C2|x|2.

Af(x) =1

2

∑aij(x)∂i∂jf(x) + b(x) · ∇f(x)

+

∫U1

(f(x+ α1(x, u))− f(x)− α1(x, u) · ∇f(x))ν(du)

+

∫U2

(f(x+ α2(x, u))− f(x))ν(du)


Let τc = inft : |X(t)| ≥ c/2, and note that τc ≤ inft : |X(t) −X(t−)| > c. Consequently, if t < τc, |Y (t)| = |X(t)| < c/2. If τc ≤ t

and |X(τc) − X(τc−)| ≤ c, then |X(t ∧ τc)| = |Y (t ∧ τc)| ≥ c/2. Ifτc ≤ t and |X(τc) − X(τc−)| > c, then |Y (τc) − Y (τc−)| = c and|Y (t ∧ τc)| ≥ c/2. Consequently,

|X(t ∧ τc)| ∧ (c

2) ≤ |Y (t ∧ τc)|. (11.12)

Let f(x) = |x|2 for |x| ≤ 3c/2 and be constant for |x| sufficiently large.Then supx |Acf(x)| <∞ and, assuming |X(0)| ≤ 3c/2,

E[|Y (t ∧ τc)|2] = E[|X(0)|2] + E[

∫ t∧τc

0Acf(Y (s))ds]

≤ E[|X(0)|2] + E[

∫ t∧τc

0(C1 + C2|Y (s)|2)ds]

≤ E[|X(0)|2] + E[

∫ t

0(C1 + C2|Y (s ∧ τc)|2)ds]


and henceE[|Y (t ∧ τc)|2] ≤ (E[|X(0)|2] + C1t)e

C2t. (11.13)

By (11.12) and (11.13),

E[(|X(t)| ∧ (c

2))2] = E[(|X(t ∧ τc)| ∧ (

c

2))2] ≤ (E[|X(0)|2] + C1t)e

C2t.

Consequently, the monotone convergence theorem gives

E[|X(t)|2] ≤ (E[|X(0)|2] + C1t)eC2t.


12. Probability distributions on function spaces

• The space DRd[0,∞)

• Probability distributions on D

• The Markov and strong Markov properties

• Convergence in D

• The martingale central limit theorem

• Convergence of stochastic integrals

• Approximation of stochastic differential equations

• Wong-Zakai corrections

• Diffusion approximations for Markov chains


The space DRd[0,∞)

DRd[0,∞) denotes the space of cadlag Rd-valued functions on [0,∞).

B(DRd[0,∞)) = σ(x ∈ D : x(t) ∈ Γ : t ≥ 0,Γ ∈ B(Rd)) is the Borelσ-algebra for an appropriately defined metric.

Example: Suppose d = 1.

x ∈ D : sups≤t

x(s) ≤ c = x ∈ D : x(t) ≤ c∩∩s≤t,s∈Qx ∈ D : x(s) ≤ c

so x ∈ D → sups≤t x(s) ∈ R is a measurable mappting from DR[0,∞)into R.


Probability distributions on D

If X is a cadlag, Rd-valued proces, then µX(Γ) = PX ∈ Γ defines aprobability measure on B(DRd[0,∞)).

Lemma 12.1 If X is a cadlag, Rd-valued process, then µX is determined bythe finite dimensional distributions of X , that is, there exists a unique µ onB(DRd[0,∞)) satisfying

µx : x(ti) ∈ Gi, i = 1, . . . ,m = PX(ti) ∈ Gi, i = 1, . . . ,m.


The Markov and strong Markov properties

Lemma 12.2 Let X be a cadlag Ft-Markov process. Then for each Γ ∈B(DRd[0,∞)), PX(r + ·) ∈ Γ|Fr = PX(r + ·) ∈ Γ|X(r). If X isstrong Markov, then for each Ft-stopping time τ with τ <∞ a.s.

PX(τ + ·) ∈ Γ|Fτ = PX(τ + ·) ∈ Γ|X(τ).

Proof. Let Γ = x : x(ti) ∈ Gi, i = 1, . . . ,m. By induction on m,

PX(τ + ·) ∈ Γ|Fτ = E[m∏

i=1

1Gi(X(τ + ti))|Fτ ]

= E[m∏

i=1

1Gi(X(τ + ti))|X(τ)]

= PX(τ + ·) ∈ Γ|X(τ).By the Dynkin-class theorem, the identity extends to all ofB(DRd[0,∞)).


Convergence in D

DRd[0,∞) space of cadlag, Rd-valued functions

xn → x ∈ DRd[0,∞) in the Skorohod (J1) topology if and only if thereexist strictly increasing λn mapping [0,∞) onto [0,∞) such that foreach T > 0,

limn→∞

supt≤T

(|λn(t)− t|+ |xn λn(t)− x(t)| = 0.

The Skorohod topology is metrizable so that DRd[0,∞) is a complete,separable metric space.

Note that 1[1+ 1n ,∞) → 1[1,∞) in DR[0,∞), but (1[1+ 1

n ,∞),1[1,∞)) does notconverge in DR2[0,∞).


Some mappings on DE[0,∞)

πt : DRd[0,∞) → Rd πt(x) = x(t)

Cπt= x ∈ DRd[0,∞) : x(t) = x(t−)

Gt : DR[0,∞) → R Gt(x) = sups≤t x(s)

CGt= x ∈ DR[0,∞) : lim

s→t−Gs(x) = Gt(x) ⊃ Cπt

G : DR[0,∞) → DR[0,∞), G(x)(t) = Gt(x), is continuous

Ht : DRd[0,∞) → R Ht(x) = sups≤t r(x(s), x(s−))

CHt= x ∈ DRd[0,∞) : lim

s→t−Hs(x) = Ht(x) ⊃ Cπt

H : DRd[0,∞) → DR[0,∞), H(x)(t) = Ht(x), is continuous


Proofs

Suppose xn λn(t) → x(t) and λn(t) → t uniformly on bounded timeintervals. Then

G(xn) λn = G(xn λn) → G(x)

uniformly on bounded time intervals, and the continuity ofG : DR[0,∞) →DR[0,∞) follows. For fixed t and h > 0,

Gt−h(x) = limn→∞

Gt−h(xn λn) ≤ lim infn→∞

Gt(xn)

≤ lim supn→∞

Gt(xn) ≤ limn→∞

Gt+h(xn λn) = Gt+h(x)


Level crossing times

τc : DR[0,∞) → [0,∞) τc(x) = inft : x(t) > c

τ−c : DR[0,∞) → [0,∞) τ−c (x) = inft : x(t) ≥ c or x(t−) ≥ c

Cτc= Cτ−c = x : τc(x) = τ−c (x)

Note that τ−c (x) ≤ τc(x) and that xn → x implies

τ−c (x) ≤ lim infn→∞

τ−c (xn) ≤ lim supn→∞

τc(xn) ≤ τc(x)


Convergence in distribution

(S, d) complete, separable metric space

Xn S-valued random variable

Xn converges in distribution to X (PXn converges weakly to PX) if

for each f ∈ C(S)

limn→∞

E[f(Xn)] = E[f(X)].

Denote convergence in distribution by Xn ⇒ X .

Equivalent statements. Xn converges in distribution to X if andonly if

lim infn→∞

PXn ∈ A ≥ PX ∈ A, each open A,

or equivalently

lim supn→∞

PXn ∈ B ≤ PX ∈ B, each closed B.


Skorohod representation theorem

Theorem 12.3 Suppose that Xn ⇒ X . Then there exists a probabilityspace (Ω,F , P ) and random variables, Xn and X , such that Xn has thesame distribution as Xn, X has the same distribution as X , and Xn → X

a.s.

Continuous mapping theorem

Corollary 12.4 Let G(X) : S → E and define

CG = x ∈ S : G is continuous at x.

Suppose Xn ⇒ X and that PX ∈ CG = 1. Then G(Xn) ⇒ G(X).


The martingale central limit theorem

Theorem 12.5 Let Mn be a martingale such that for each t ≥ 0,

limn→∞

E[sups≤t

|Mn(s)−Mn(s−)|] = 0, limn→∞

[Mn]t = ct

in probability. Then Mn ⇒√cW .

Theorem 12.6 (Vector-valued version) If for each 1 ≤ i ≤ d

limn→∞

E[sups≤t

|M in(s)−M i

n(s−)|] = 0

and for each 1 ≤ i, j ≤ d,

[M in,M

jn]t → cijt,

then Mn ⇒ σW , where W is d-dimensional standard Brownian motionand σ is a symmetric d× d-matrix satisfying σ2 = c = ((cij)).


Donsker invariance principle

Theorem 12.7 Let ξi be iid R-valued random variables with E[ξi] = 0and V ar(ξi) = σ2. Define

Xn(t) =1√n

[nt]∑i=1

ξi.

Then Xn ⇒ σW

Proof. Xn is a martingale and

[Xn] =1

n

[nt]∑i=1

ξ2i → σ2t


Good integrator condition

If Y is a semimartingale, then

∫XdY : X ∈ S0, |X| ≤ 1

is stochastically bounded.

Y satisfying this stochastic boundedness condition is a good integra-tor.

Bichteler-Dellacherie: Y is a good integrator if and only if Y is a semi-martingale.


Markov chains

Xk+1 = H(Xk, ξk+1) where ξ1, ξ2 . . . are iid

PXk+1 ∈ B|X0, ξ1, . . . , ξk = PXk+1 ∈ B|Xk

Example: Xnk+1 = Xn

k + σ(Xnk ) 1√

nξk+1 + b(Xn

k+1)1n

Assume E[ξk] = 0 and V ar(ξk) = 1.

Define Xn(t) = Xn[nt] An(t) = [nt]

n Wn(t) = 1√n

∑[nt]k=1 ξk.

Xn(t) = Xn(0) +

∫ t

0σ(Xn(s−)dWn(s) +

∫ t

0b(Xn(s−))dAn(s)

Can we conclude Xn ⇒ X satisfying

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds ?


Wong-Zakai example

W a standard Brownian motion in R

d

dtWn(t) = n(W (

k + 1

n)−W (

k

n)) ,

k

n≤ t <

k + 1

n

i.e., Wn is a piecewise linear interpolation of W

Xn(t) = Xn(0) +

∫ t

0σ(Xn(s))dWn(s) +

∫ t

0b(Xn(s))ds

Can we conclude Xn ⇒ X satisfying

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds ?


Uniformity conditions

Uniform tightness (UT): For Sn0 , the collection of piecewise constant,

Fnt -adapted processes

H0t = ∪∞n=1|

∫ t

0Z(s−)dYn(s)| : Z ∈ Sn

0 , sups≤t

|Z(s)| ≤ 1

is stochastically bounded.

Uniformly controlled variations (UCV): For Yn = Mn+An, Tt(An), n =1, 2, . . . is stochastically bounded, and there exist stopping timesτα

n such that

supnE[[Mn]t∧τα

n] <∞, lim

α→∞sup

nPτα

n ≤ α = 0.

A sequence of semimartingales Yn that converges in distributionand satisfies either UT or UCV will be called good.


Basic convergence theorem

Theorem 12.8 (Xn, Yn) Fnt -adapted in DMkm×Rm[0,∞).

Yn = Mn + An an Fnt -semimartingale

Assume that Yn satisfies either UT or UCV.

If(Xn, Yn) ⇒ (X, Y )

in the Skorohod topology on DMkm×Rm[0,∞)

THEN(Xn, Yn,

∫XndYn) ⇒ (X, Y,

∫XdY )

in DMkm×Rm×Rk[0,∞)


Convergence for SDEs

Theorem 12.9 Let Yn be a good sequence of semimartingales and Una sequence of adapted, cadlag processes. Let Xn satisfy

Xn(t) = Un(t) +

∫ t

0Fn(Xn(s−))dYn(s).

Suppose (Yn, Un) ⇒ (Y, U) and

supx∈K

|Fn(x)− F (x)| → 0

for compact K where F is bounded and continous. Then (Xn, Yn, Un) isrelatively compact and any limit point satisfies

X(t) = U(t) +

∫ t

0F (X(s−))dY (s)


Wong-Zakai example

Wn(t) = W ([nt] + 1

n)− (

[nt] + 1

n− t)n

(W (

[nt] + 1

n)−W (

[nt]

n)

)= Yn(t) + Zn(t)

Assume Vn(t) = Yn(t) + Zn(t) where Yn is good and Zn ⇒ 0.

Assume ∫ZndZn, [Yn, Zn], and [Zn] are good.

Xn(t) = Xn(0) +

∫ t

0F (Xn(s−))dVn(s)

= Xn(0) +

∫ t

0F (Xn(s−))dYn(s) +

∫ t

0F (Xn(s−))dZn(s)


Integration by parts

Integrate by parts using

F (Xn(t)) = F (Xn(0)) +

∫ t

0F ′(Xn(s−))F (Xn(s−))dVn(s) +Rn(t)

Rn can be estimated in terms of

[Vn] = [Yn] + 2[Yn, Zn] + [Zn]

For G = F ′F∫ t

0

F (Xn(s−))dZn(s)

= F (Xn(t))Zn(t)− F (Xn(0))Zn(0)−∫ t

0

Zn(s−)dF (Xn(s)) + [F Xn, Zn]t

≈ −∫ t

0

Zn(s−)G(Xn(s−))dYn(s)−∫ t

0

G(Xn(s−))Zn(s−)dZn(s)

−∫ t

0

G(Xn(s−))d([Yn, Zn]s + [Zn]s)−∫ t

0

Zn(s−)dRn(s)− [Rn, Zn]t


Limit Theorem

Theorem 12.10 Assume

Vn(t) = Yn(t) + Zn(t) where Yn is good and Zn ⇒ 0.

∫ZndZn, [Yn, Zn], and [Zn] are good.

If

(Xn(0), Yn, Zn,

∫ZndZn, [Yn, Zn]) ⇒ (X(0), Y, 0, H,K)

then Xn is relatively compact and any limit point satisfies

X(t) = X(0)+

∫ t

0F (X(s−))dY (s)+

∫ t

0F ′(X(s−))F (X(s−))d(H(s)−K(s))

Note: For Wong-Zakai example, H(t)−K(t) = 12t.


Wong-Zakai correction

Wn(t) = W ([nt] + 1

n)− (

[nt] + 1

n− t)n

(W (

[nt] + 1

n)−W (

[nt]

n)

)= Yn(t) + Zn(t)∫ t

0

Zn(s−)dZn(s) = −∫ t

0

([ns] + 1

n− s)n2

(W (

[ns] + 1

n)−W (

[ns]

n)

)2

ds

+

[nt]∑k=1

Zn(k

n−)∆Zn(

k

n)

= −∫ t

[nt]n−1

(. . .)ds−[nt]∑k=1

1

2(W (

k

n)−W (

k − 1

n))2

→ −1

2t

and

[Yn, Zn]t = −[nt]∑k=1

(W (k + 1

n)−W (

k

n))2 → −t


Diffusion limit

Let Wn be the piecewise linear interpolation and

Xn(t) = X(0) +

∫ t

0σ(Xn(s))dWn(s) +

∫ t

0b(Xn(s))ds

for Lipschitz b and continuously differentiable σ. Then Xn convergesto the solution of

X(t) = X(0)+

∫ t

0σ(X(s))dW (s)+

∫ t

0(1

2σ(X(s))σ′(X(s))+b(X(s)))ds


Approximation of empirical CDF

Let ξi be i.i.d and uniform on [0, 1], let Nn(t) =∑n

k=1 1[ξk,1](t), where0 ≤ t ≤ 1. Define Fn

t = σ(Nn(u); u ≤ t). For t ≤ s ≤ 1,

E[Nn(s)|Fnt ] = E[Nn(t) +Nn(s)−Nn(t)|Fn

t ]

= Nn(t) + E[Nn(s)−Nn(t)|Fnt ]

= Nn(t) + (n−Nn(t))(s− t)/(1− t)

and

Mn(t) = Nn(t)−∫ t

0

n−Nn(s)

1− sds

is a martingale.


Scaling limit for empirical CDF

Define Fn(t) ≡ Nn(t)n and Bn(t) =

√n(Fn(t)− 1) = Nn(t)−nt√

n. Then

Bn(t) =1√n

(Nn(t)− nt)

=1√n

(Mn(t) + nt−√n

∫ t

0

Bn(s)ds

1− s− nt)

= Mn(t)−∫ t

0

Bn(s)

1− sds.

where Mn(t) = Mn(t)√n

. Note that [Mn]t = Fn(t) and by the law of largenumbers, [Mn]t → t. Since Fn(t) ≤ 1, the convergence is in L1 andTheorem 12.5 implies Mn ⇒ W . Therefore, Bn ⇒ B, where

B(t) = W (t)−∫ t

0

B(s)

1− sds

at least if we restrict our attention to [0, 1− ε] for some ε > 0.


Convergence on full interval

Observe that

E

∫ 1

1−ε

|Bn(s)|1− s

ds =

∫ 1

1−ε

E[|Bn(s)|]1− s

ds

≤∫ 1

1−ε

√E[Bn(s))2]

1− sds ≤

∫ 1

1−ε

√s− s2

1− sds

which is integrable. It follows that for any δ > 0,

supnP sup

1−ε≤s≤1|Bn(1)−Bn(s)| ≥ δ → 0.

The process B is known as Brownian Bridge.


Continuous time Markov chains

X a continuous time Markov chain in Z

P (X(t+ h) = j|X(t) = i) = qijh+ o(h) for i 6= j

WriteX(t) = X(0) +

∑l∈Z

lNl(t)

where Nl counts the number of jumps of X of size l at or before timet. Define βl(x) = qx,x+l.

E[Nl(t+ h)−Nl(t)|Ft] = qX(t),X(t)+lh+ o(h) = βl(X(t)) + o(h) .

Then

Ml(t) ≡ Nl(t)−∫ t

0βl(X(s))ds

is a martingale (or at least a local martingale). If we define τl(n) =inft : Nl(t) = n, then for each n, Ml(· ∧ τl(n)) is a martingale.


A stochastic “equation”

Assume∑

l |l|βl(x) <∞, and define b(x) ≡∑

l lβl(x).

X(t) = X(0) +∑

l

lNl(t) = X(0) +∑

l

lMl(t) +∑

l

l

∫ t

0βl(X(s))ds

= X(0) +∑

l

lMl(t) +

∫ t

0b(X(s))ds .

Note that [Ml]t = Nl(t) and [Ml,Mk]t = [Nl, Nk]t = 0. So the Ml areorthogonal. If E[

∫ t

0

∑l l

2βl(X(s))ds] <∞,

E[(∑

l

lMl(t))2] =

∑l

l2E[Ml(t)2] =

∑E[

∫ t

0

∑l

l2βl(X(s))ds],

and∑

l lMl(t) is a square integrable martingale.


Diffusion approximations for Markov chains

Let Xn(t) = Xn(0) +∑

lnN

nl (t) where

PXn(t+ h) = Xn(t) +l

n|Fn

t = n2βnl (Xn(t)) + o(h)

Define

Mnl (t) ≡ Nn

l (t)−∫ t

0n2βn

l (Xn(s))ds

so

[Mnl ]t = Nn

l (t), E[[Mnl ]t] = n2E[

∫ t

0βn

l (Xn(s))ds]


Then, setting bn(x) = n∑

l lβnl (x),

Xn(t) = Xn(0) +∑

l

l

nMn

l (t) +

∫ t

0bn(Xn(s))ds

= Xn(0) +Mn(t) +

∫ t

0bn(Xn(s))ds)

[Mn]t =∑

l

l2

n2Nnl (t)

and

[Mn]t −∫ t

0

∑l

l2βnl (Xn(s))ds

should be a martingale.


Conditions for convergence

Define σ2n(x) =

∑l l

2βnl (x) and assume that for each compact k > 0,

sup|x|≤k

|bn(x)− b(x)| → 0, sup|x|≤k

|σ2n(x)− σ2(x)| → 0,

and inf |x|≤k σ2(x) > 0. Define

Wn(t) =

∫ t

0

1

σn(Xn(s−)dMn(s).


Convergence of Wn

[Wn]t =

∫ t

0

1

σ2n(Xn(s−))

d[Mn]s

=∑

l

∫ t

0

l2

n2σ2n(Xn(s−))

dNnl (s)

=∑

l

∫ t

0

l2

n2σ2n(Xn(s−))

dMnl (s) + t

≡ Un(t) + t.

Note that Un is a martingale, and under modest assumptions

[Un]t =∑

l

∫ t

0

l4

n4σ4n(Xn(s−))

dNnl (s) → 0


Limiting equation

Since Mn(t) =∫ t

0 σn(Xn(s−))dWn(s),

Xn(t) = Xn(0) +

∫ t

0σn(Xn(s−))dWn(s) +

∫ t

0bn(Xn(s))ds ,

Since Wn ⇒ W , Xn converges to a solution of

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds


Moran model in population genetics

n population size

Assuming two genetic types:

• Xn(t) fraction of population that is of Type 1

• at rate n2 a randomly selected individual is killed and replace bythe offspring of a another randomly selected individual

• n−1µ1 mutation probability that the offspring of a Type 1 par-ent is of Type 2.

• n−1µ2 mutation probability that the offspring of a Type 2 par-ent is of Type 1.


Transition intensities

PXn(t+ h) = Xn(t) +1

n|Fn

t

= n2(1−Xn(t))(Xn(t)(1− n−1µ1) + (1−Xn(t))n−1µ2)

PXn(t+ h) = Xn(t)−1

n|Fn

t

= n2Xn(t)(Xn(t)n−1µ1 + (1−Xn(t))(1− n−1µ2))

βn−1(x) = x(xn−1µ1 + (1− x)(1− n−1µ2)

βn1 (x) = (1− x)(x(1− n−1µ1) + (1− x)n−1µ2)


Feller diffusion approximation

bn(x) = n∑

l

lβnl (x)

= −x2µ1 + x(1− x)(µ2 − µ1) + (1− x)2µ2

= (1− x)µ2 − xµ1 = µ2 − (µ1 + µ2)x

andσ2

n(x) =∑

l

l2βnl (x) = 2(1− x)x+O(n−1)

Limiting SDE:

X(t) = X(0)+

∫ t

0

√2Xn(s)(1−Xn(s)dW (s)+

∫ t

0(µ2−(µ1+µ2)Xn(s))ds


13. Numerical schemes

• Euler methods


Euler methods

Consider

X(t) = U(t) +

∫ t

0F (X(s−))dY (s).

The simplest numerical scheme is, of course, the Euler scheme. Spec-ifying a mesh 0 = t0 < t1 < · · · , define X0 recursively by settingX0(0) = X(0) and

X0(tk+1) = X0(tk) + U(tk+1)− U(tk) + F (X0(tk))∆Y (tk)

where ∆Y (tk) = Y (tk+1)− Y (tk).

Extend the definition of X0 to all t by X0(t) = X0(tk) for tk ≤ t < tk+1.Define Y0 by Y0(t) = Y (tk) for tk ≤ t < tk+1 and similarly for U0. Then

X0(t) = U0(t) +

∫ t

0F (X0(s−))dY0(s) .


“Goodness” of discrete approximations

Define β(t) = tk for tk ≤ t < tk+1. Then Y0 = Y β.

Note that the following lemma would allow for a mesh determinedby Ft-stopping times.

Lemma 13.1 Let Y be an Ft-semimartingale. For each n, let βn be anonnegative, nondecreasing process such that for each u ≥ 0, βn(u) is anFt-stopping time. If βn(u) → u a.s. for each u ≥ 0, then Y βn → Y

a.s. in the Skorohod topology and Y βn is a good sequence.

Proof. Note that Yn = Y βn is a semimartingale with respect to thefiltration Fβn(t). To verify uniform tightness, let X be adapted toFβn(t), X =

∑i ξi1[ti,ti+1), and |X| ≤ 1. Define

Xn(u) =∑

i

X(βn(ti))1[βn(ti),βn(ti+1))(u)


Then∫ t

0X(s−)dYn(s) =

∑X(ti)(Y (βn(ti+1 ∧ t))− Y (βn(ti ∧ t))

=

∫ βn(t)

0Xn(u−)dY (u).

Since

P|∫ βn(t)

0Xn(u−)dY (u)| ≥ K

≤ P|∫ T

01[0,βn(t))(u−)Xn(u−)dY (u)| ≥ K+ Pβn(t) > T

and the uniform tightness of Y βn follows from the uniform tight-ness of Y . (The “sequence” with Yn = Y , n = 1, 2, is uniformlytight.


Consistency for the Euler method

Theorem 13.2 Assume that F is bounded and continuous and that the so-lution of

X(t) = U(t) +

∫ t

0F (X(s−))dY (s) (13.1)

is unique. For each n, let τnk be an increasing sequence of stopping times

such that limn→∞ supk |τnk+1 − τn

k | = 0. Let Xn(0) = U(0) and

Xn(τnk+1) = Xn(τ

nk ) + U(τn

k+1)− U(τnk ) + F (Xn(τ

nk ))(Y (τn

k+1)− Y (τnk ))

and extend Xn to be constant in [τnk , τ

nk+1) for all k. Then Xn ⇒ X .


Another representation of the Euler schemeFor an increasing sequence of stopping time τk, Define η(t) = τkfor τk ≤ t < τk+1. Then the solution of

X0(t) = U(t) +

∫ t

0F (X0 η(s−))dY (s)

satisfies

X0(τk+1) = X0(τk) + U(τk+1)− U(τk) + F (X0(τk))Y (τk+1)− Y (τk))


Asymptotics for the error

Theorem 13.3 Let Y be an Ft-semimartingale and F be bounded andcontinuously differentiable. For each n, let τn

k be an increasing sequenceof Ft-stopping times. Define ηn(t) = τn

k , τnn ≤ t < τn

k+1, and let Xn

satisfy

Xn(t) = U(t) +

∫ t

0F (Xn η(s−))dY (s)

Let αn be a positive sequence converging to infinity, set Vn = αn(Xn −X), and define Zn by

Z ijn (t) = αn

∫ t

0(Yi(s−)− Yi ηn(s−))dYj(s) .


Suppose that Zn is a good sequence with (Y, Zn) ⇒ (Y, Z). Then Vn ⇒V satisfying

V (t) =∑

i

∫ t

0∇Fi(X(s−))V (s−)dYi(s) (13.2)

+∑ij

∫ t

0

∑k

∂kFi(X(s−))Fkj(X(s−))dZ ij(s) .


Example

Let

Y (t) =

(W (t)

t

)where W is an (m − 1)-dimensional standard Brownian motion. Letηn(t) = [nt]

n . Then, taking αn =√n, (Y, Zn) ⇒ (Y, Z) where Z is

independent of Y, Z im = Zmi = 0, and for 1 ≤ i, j ≤ m − 1, Z ij areindependent mean zero Brownian motions with E[(Z ij(t))2] = 1

2t.


Proof. Under the hypotheses of the theorem, the solution of (13.1)is unique, and (Xn, X, Y, Zn) ⇒ (X,X, Y, Z). For simplicity, assumek = m = 1. Then, noting that

Xn(s−)− Xn ηn(s−) = F (Xn ηn(s−))(Y (s−)− Y ηn(s−))

Vn(t) =

∫ t

0

αn

(F (Xn(s−))− F (X(s−))

)dY

−∫ t

0

αn

(F (Xn(s−))− F (Xn ηn(s−))

)dY

=

∫ t

0

F (Xn(s−))− F (X(s−))

Xn(s−)−X(s−)Vn(s−)dY (s)

−∫ t

0

(F (Xn ηn(s−) + F (Xn ηn(s−))(Y (s−)− Y ηn(s−)))

−F (Xn ηn(s−)))(Y (s−)− Y ηn(s−))−1dZn(s)

Let τ an = inft : |Vn(t)| > a. Then Vn(· ∧ τ a

n) is relatively compact,and any limit point will satisfy (13.2) on the time interval [0, τ a] whereτ a = inft : |V (t)| > a. But τ a →∞ as a→∞, so Vn ⇒ V .


14. Change of measure

• Absolute continuity and the Radon Nikodym theorem

• Applications of absolute continuity

• Bayes formula

• Local absolute continuity

• Martingales under a change of measure

• Change of measure for Brownian motion

• Change of measure for Poisson processes


Absolute continuity and the Radon-Nikodym theorem

Definition 14.1 Let P and Q be probability measures on (Ω,F). ThenP is absolutely continuous with respect to Q (P << Q) if and only ifQ(A) = 0 implies P (A) = 0.

Theorem 14.2 If P << Q, then there exists a random variable L ≥ 0 suchthat

P (A) = EQ[1AL] =

∫A

LdQ, A ∈ F .

Consequently, Z is P -integrable if and only if ZL is Q-integrable, and

EP [Z] = EQ[ZL].

Standard notation: dPdQ = L.


Maximum likelihood estimation

Suppose for each α ∈ A, Pα(Γ) =∫

Γ LαdQ and

Lα = H(α,X1, X2, . . . Xn),

for random variables X1, . . . , Xn. The maximum likelihood estimateα for the “true” parameter α0 ∈ A based on observations ofX1, . . . , Xn

is the value of α that maximizes H(α,X1, X2, . . . Xn).

For example, under certain conditions the distribution of

Xα(t) = X(0) +

∫ t

0σ(Xα(s))dW (s) +

∫ t

0b(Xα(s), α)ds,

will be absolutely continuous with respect to the distribution of Xsatisfying

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) . (14.1)


Sufficiency

If dPα = LαdQ where

Lα(X, Y ) = Hα(X)G(X, Y ),

then X is a sufficient statistic for α. Without loss of generality, wecan assume EQ[G(X, Y )] = 1 and hence dQ = G(X, Y )dQ defines aprobability measure.

Example 14.3 If (X1, . . . , Xn) are iidN(µ, σ2) under P(µ,σ) andQ = P(0,1),then

L(µ,σ) =1

σnexp

−1− σ2

2σ2

n∑i=1

X2i +

µ

σ2

n∑i=1

Xi −µ2

σ2

so (∑n

i=1X2i ,∑n

i Xi) is a sufficient statistic for (µ, σ).


Parameter estimates and sufficiency

Theorem 14.4 If θ(X, Y ) is an estimator of θ(α) and ϕ is convex, then

EPα[ϕ(θ(α)− θ(X,Y ))] ≥ EPα[ϕ(θ(α)− EQ[θ(X,Y )|X])]

Proof.

EPα[ϕ(θ(α)− θ(X, Y ))] = EQ[ϕ(θ(α)− θ(X, Y ))Hα(X)]

= EQ[EQ[ϕ(θ(α)− θ(X, Y ))|X]Hα(X)]

≥ EQ[ϕ(θ(α)− EQ[θ(X, Y )|X])Hα(X)]


Other applications

Finance: Asset pricing models depend on finding a change of mea-sure under which the price process becomes a martingale.

Stochastic Control: For a controlled diffusion process

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s), u(s))ds

where the control only enters the drift coefficient, the controlled pro-cess can be obtained from an uncontrolled process satisfying (14.1)via a change of measure.


ExamplesFor real-valued random variables with a joint densityX, Y ∼ fXY (x, y),conditional expectations can be computed by

E[g(Y )|X = x] =

∫∞−∞ g(y)fXY (x, y)dy

fX(x)

that is, setting h(x) equal to the right side, E[g(Y )|X] = h(X).

For general random variables, suppose X and Y are independent on(Ω,F , Q). Let L = H(X, Y ) ≥ 0, and EQ[H(X, Y )] = 1. Define

νY (Γ) = QY ∈ ΓdP = H(X, Y )dQ.

Bayes formula becomes

EP [g(Y )|X] =EQ[g(Y )H(X,Y )|X]

EQ[H(X, Y )|X]=

∫g(y)H(X, y)νY (dy)∫H(X, y)νY (dy)


Local absolute continuity

Theorem 14.6 Let (Ω,F) be a measurable space, and let P and Q be prob-ability measures on F . Suppose Dn ⊂ Dn+1 and that for each n, P |Dn

<<

Q|Dn. Define Ln = dP

dQ

∣∣∣Dn

. Then Ln is a nonnegative Dn-martingale

on (Ω,F , Q) and L = limn→∞ Ln satisfies EQ[L] ≤ 1. If EQ[L] = 1, thenP << Q on D =

∨nDn.

Proof. If D ∈ Dn ⊂ Dn+1, then P (D) = EQ[Ln1D] = EQ[Ln+11D]which implies E[Ln+1|Dn] = Ln. If E[L] = 1, then Ln → L in L1, so

P (D) = EQ[L1D], D ∈ ∪nDn,

hence for all D ∈∨

nDn.

Proposition 14.7 P << Q on D if and only if Plimn→∞Ln <∞ = 1.


Proof. The dominated convergence theorem implies

PsupnLn ≤ K = lim

m→∞EQ[1supn≤m Ln≤KLm] = EQ[1supn Ln≤KL].

Letting K →∞ we see that EQ[L] = 1.


Martingales and change of measure(See [4], Section III.6.)

Let Ft be a filtration and assume that P |Ft<< Q|Ft

and that L(t) isthe corresponding Radon-Nikodym derivative. Then as before, L isan Ft-martingale on (Ω,F , Q).

Lemma 14.8 Z is a P -local martingale if and only if LZ is a Q-local mar-tingale.


Proof. For a bounded stopping time τ , Z(τ) is P -integrable if andonly if L(τ)Z(τ) is Q-integrable. Furthermore, if L(τ ∧ t)Z(τ ∧ t) isQ-integrable, then L(t)Z(τ ∧ t) is Q-integrable and

EQ[L(τ ∧ t)Z(τ ∧ t)] = EQ[L(t)Z(τ ∧ t)].

By Bayes formula, EP [Z(t+ h)− Z(t)|Ft] = 0 if and only if

EQ[L(t+ h)(Z(t+ h)− Z(t))|Ft] = 0

which is equivalent to

EQ[L(t+ h)Z(t+ h)|Ft] = EQ[L(t+ h)Z(t)|Ft] = L(t)Z(t),

so Z is a martingale under P if and only if LZ is a martingale underQ.


Semimartingale decompositions under a change of mea-sure

Theorem 14.9 If M is a Q-local martingale, then

Z(t) = M(t)−∫ t

0

1

L(s)d[L,M ]s (14.3)

is a P -local martingale. (Note that the integrand is 1L(s) , not 1

L(s−) .)

Proof. Note that LM − [L,M ] is a Q-local martingale. We need toshow thatLZ is aQ-local martingale. But letting V denote the secondterm on the right of (14.3), we haveL(t)V (t) =

∫ t

0 V (s−)dL(s) +∫ t

0 L(s)dV (s) and hence

L(t)Z(t) = L(t)M(t)− [L,M ]t −∫ t

0V (s−)dL(s).

Both terms on the right are Q-local martingales.


Change of measure for Brownian motionLetW be standard Brownian motion, and let ξ be an adapted process.Define

L(t) = exp∫ t

0ξ(s)dW (s)− 1

2

∫ t

0ξ2(s)ds (14.4)

and note that L(t) = 1 +∫ t

0 ξ(s)L(s)dW (s). Then L(t) is a local mar-tingale.

Assume EQ[L(t)] = 1 for all t ≥ 0. Then L is a martingale. Fix a timeT , and restrict attention to the probability space (Ω,FT , Q). On FT ,define dP = L(T )dQ.

For t < T , let A ∈ Ft. Then, since L is a martingale,

P (A) = EQ[1AL(T )] = EQ[1AEQ[L(T )|Ft]] = EQ[1AL(t)]


New Brownian motion

Theorem 14.10 W (t) = W (t)−∫ t

0 ξ(s)ds is a standard Brownian motionon (Ω,FT , P ).

Proof. Since W is continous and [W ]t = t a.s., it is enough to showthat W is a local martingale (and hence a martingale). But since W isa Q-martingale and [L,W ]t =

∫ t

0 ξ(s)L(s)ds, Theorem 14.9 gives thedesired result.


Changing the drift of a diffusion

Suppose that

X(t) = X(0) +

∫ t

0σ(X(s))dW (s)

and setξ(s) = b(X(s)).

Note that X is a diffusion with generator 12σ

2(x)f ′′(x). Define

L(t) = exp∫ t

0b(X(s))dW (s)− 1

2

∫ t

0b2(X(s))ds,

and assume that EQ[L(T )] = 1 (e.g., if b is bounded). Set dP =L(T )dQ on (Ω,FT ).


Transformed SDE

Define W (t) = W (t)−∫ t

0 b(X(s))ds. Then

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) (14.5)

= X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0σ(X(s))b(X(s))ds

so under P , X is a diffusion with generator

Af(x) =1

2σ2(x)f ′′(x) + σ(x)b(x)f ′(x). (14.6)


Conditions that imply local absolute continuity

Let σ and b be locally bounded and W be a standard Brownian mo-tion.

Condition 14.11 If for n = 1, 2, . . ., Xn satisfies

Xn(t) = X(0) +

∫ t

0σ(Xn(s))dW (s) +

∫ t

0σ(Xn(s))b(Xn(s))ds,

for t ≤ τn = infs : |Xn(s)| ≥ n, then limn→∞ Pτn ≤ T = 0 for eachT > 0.

Theorem 14.12 Suppose Condition 14.11 holds, and let W be a Browianmotion on (Ω,F , Q). If X is a solution of

X(t) = X(0) +

∫ t

0σ(X(s))dW (s)

on (Ω,F , Q), then for each T , EQ[L(T )] = 1.


Proof. Let L(t) = exp∫ t

0 b(X(s))dW (s)− 12

∫ t

0 b2(X(s))ds, and define

τn = infs : |X(s)|ds ≥ n. Then EQ[L(T ∧ τn)] = 1 and we can definedP = L(T ∧ τn)dQ on FT∧τn

. On (Ω,FT∧τn, P ),

W (t ∧ τn) = W (t ∧ τn)−∫ t∧τn

0b(X(s))ds

is a Brownian motion stopped at τn and

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0σ(X(s))b(X(s))ds

for t ≤ T ∧ τn. Then (X, τn), n = 1, 2, . . . satisfies Condition 14.11, andsince

PsupnL(T ∧ τn) > K = P sup

r≤T∧τm

∫ r

0

b(X(s))dW (s) +1

2

∫ r

0

b2(X(s))ds > logK

+Pτm ≤ T

we can apply Proposition 14.2 to conclude that P << Q on FT , thatis, EQ[L(T )] = 1.


Change of measure for Poisson processes

Theorem 14.13 LetN be an unit Poisson process on (Ω,F , Q) that is com-patible with Ft. If Λ is nonnegative, Ft-predictable, and satisfies∫ t

0Λ(s)ds <∞ a.s., t ≥ 0,

then

L(t) = exp

∫ t

0ln Λ(s)dN(s)−

∫ t

0(Λ(s)− 1)ds

satisfies

L(t) = 1 +

∫ t

0(Λ(s)− 1)L(s−)d(N(s)− s) (14.7)

and is a Q-local martingale. If E[L(T )] = 1 and we define dP = L(T )dQon FT , then N(t)−

∫ t

0 Λ(s)ds is a P -local martingale.


Proof. By Theorem 14.9,

Z(t) = N(t)− t−∫ t

0

1

L(s)(Λ(s)− 1)L(s−)dN(s)

=

∫ t

0

1

Λ(s)dN(s)− t

is a local martingale under P . Consequently,∫ t

0Λ(s)dZ(s) = N(t)−

∫ t

0Λ(s)ds

is a local martingale under P .


Construction of counting processes by change of mea-sure

Let J [0,∞) denote the collection of nonnegative integer-valued cad-lag functions that are constant except for jumps of +1. Suppose thatλ : J [0,∞)× [0,∞) → [0,∞),∫ t

0λ(x, s)ds <∞, t ≥ 0, x ∈ J [0,∞)

and that λ(x, s) = λ(x(· ∧ s), s) (that is, λ is nonanticipating). If wetake Λ(t) = λ(N, t) and let τn = inft : N(t) = n, then defining dP =L(τn)dQ on Fτn

, N(· ∧ τn) on (Ω,Fτn, P ) has the same distribution as

N(· ∧ τn), where N is the solution of

N(t) =

∫[0,∞)×[0,t]

1[0,λ(N ,s−)](u)ξ(du× ds)

for ξ a Poisson random measure with Lebesgue mean measure andτn = inft : N(t) = n.


Change of measure for Poisson random measuresξ a Poisson random measure onU×[0,∞) with mean measure ν(du)×dt. λ a positive, predictable process satisfying∫

U×[0,t]

(λ(u, s)− 1)2 ∧ |λ(u, s)− 1|ν(du)ds <∞ a.s., t ≥ 0.

Let Mλ(t) =∫

U×[0,t](λ(u, s)− 1)ξ(du× ds) and L be the solution of

L(t) = 1+

∫ t

0L(s−)dMλ(s) = 1+

∫U×[0,t]

(λ(u, s)− 1)L(s−)ξ(du× ds).

(14.8)ThenL is a local martingale. If

∫U×[0,t] |λ(u, s)−1|ν(du)ds <∞ a.s., t ≥

0, then

L(t) = exp∫

U×[0,t]

log λ(u, s)ξ(du× ds)−∫

U×[0,t]

(λ(u, s)− 1)ν(du)ds,

and in general,

L(t) = exp∫

U×[0,t]

log λ(u, s)ξ(du× ds) +

∫U×[0,t]

(log λ(u, s)− λ(u, s) + 1)ν(du)ds.


Intensity for the transformed counting measure

If E[L(T )] = 1, then for A ∈ B(U) with ν(A) <∞,

MA(t) =

∫A×[0,t]

λ(u, s)ξ(du× ds)

is a local martingale under Q and

[MA, L]t =

∫A×[0,t]

λ(u, s)(λ(u, s)− 1)L(s−)ξ(du× ds).

Consequently,

ZA(t) =

∫A×[0,t]

λ(u, s)ξ(du× ds)−∫

A×[0,t]

1

L(s)λ(u, s)(λ(u, s)− 1)L(s−)ξ(du× ds)

= ξ(A, t)−∫

A×[0,t]

λ(u, s)ν(du)ds

is a local martingale under dP = L(T )dQ.


Change for stochastic equations

U = U1 ∪ U2, U1 ∩ U2 = ∅,

Af(x) =

∫U1

(f(x+ α(x, u))− f(x)− α(x, u) · ∇f(x))ν(du)

+

∫U2

(f(x+ α(x, u))− f(x))ν(du)

Let D(A) = C2c (Rd) and suppose that Af is bounded for f ∈ D(A).

Then X satisfying

X(t) = X(0)+

∫U1×[0,t]

α(X(s−), u)ξ(du×ds)+∫

U2×[0,t]α(X(s−), u)ξ(du×ds)

is a solution of the martingale problem for A.


New martingale problem

Let λ(u, s) = λ(u,X(s−)). Under Q

f(X(t))− f(X(0))−∫ t

0Af(X(s))ds

=

∫U×[0,t]

(f(X(s−) + α(X(s−), u))− f(X(s−))ξ(du× ds)

is a local martingale, so under P ,

f(X(t))− f(X(0))−∫ t

0

Af(X(s))ds

−∫ t

0

∫U

(f(X(s) + α(X(s), u))− f(X(s))(λ(u,X(s))− 1)ν(du)ds

=

∫U×[0,t]

(f(X(s−) + α(X(s−), u))− f(X(s−))(ξ(du× ds)− λ(u,X(s))ν(du)ds)

is a local martingale.


New generator

X is a solution of the martingale problem for

Af(x) =

∫U1

(f(x+ α(x, u))− f(x)− α(x, u) · ∇f(x))λ(u, x)ν(du)

+

∫U2

(f(x+ α(x, u))− f(x))λ(x, u)ν(du)

+

∫U1

α(x, u)(λ(x, u)− 1)ν(du) · ∇f(x)


15. Filtering

• Observation of a signal in noise

• Continuous time filtering in Gaussian white noise

• Zakai equation

• Kushner-Stratonovich equation

• Point process observations


Observation of a signal in noiseSignal: X1, X2, . . .

Observation: Yk = h(Xk) + ζk, ζk iid “noise”

Filtering problem: Compute πn(Γ) = PXn ∈ Γ|FYn .

Suppose ζk has a strictly positive density γ with respect to Lebesguemeasure.

Theorem 15.1 Suppose that under Q, Yk are iid with density γ(z) andare independent of Xk. Then

Ln =n∏

k=1

γ(Yk − h(Xk))

γ(Yk)

is a martingale and under dP = LndQ, (Y1, . . . , Yn) has the same distribu-tion as

(h(X1) + ζ1, . . . , h(Xn) + ζn).


Proof.

EQ[g(Y1, . . . , Yn)Ln]

=

∫Rd

· · ·∫

Rd

g(z1, . . . , zn)

(n∏

k=1

γ(zk − h(Xk))

γ(zk)

)n∏

k=1

γ(zk)dz1 · · · dzn

=

∫Rd

· · ·∫

Rd

g(z1, . . . , zn)n∏

k=1

γ(zk − h(Xk))dz1 · · · dzn

=

∫Rd

· · ·∫

Rd

g(h(X1) + z1, . . . , h(Xn) + zn)n∏

k=1

γ(zk)dz1 · · · dzn

= EQ[g(h(X1) + ζ1, . . . , h(Xn) + ζn)]


Unnormalized conditional distributions

Define φ0(dz) = π0(dz) and∫E

g(z)φn(dz) =

∫E

g(z)γ(Yn − h(z))P (dz|x)φn−1(dx)

Then ∫E

g(x)πn(dx) =

∫E g(x)φn(dx)

φn(E)


Continuous time filtering in Gaussian white noise

Suppose Y (n)k = h(X

(n)k ) 1

n+ 1√nζk, X(n)

k ≈ X(k/n), and ζk are iid with

mean zero and variance σ2. Then Yn(t) =∑[nt]

k=1 Y(n)k is approximately

Y (t) =

∫ t

0h(X(s))ds+ σW (t)

Then

EP [g(X(t))|FYt ] =

EQ[g(X(t))L(t)|FYt ]

EQ[L(t)|FYt ]

where under Q, X and Y are independent, Y is a Brownian motionwith mean zero and variance σ2t, and

L(t) = exp∫ t

0

h(X(s))

σdY (s)− 1

2

∫ t

0

h2(X(s))

σ2 ds

that is,

L(t) = 1 +

∫ t

0

h(X(s))

σL(s)dY (s)


Monte Carlo solution

Let X1, X2, . . . be iid copies of X that are independent of Y under Q,and let

Li(t) = 1 +

∫ t

0

h(Xi(s))

σLi(s)dY (s).

Note that

φ(g, t) ≡ EQ[g(X(s))L(s)|FYs ] = EQ[g(Xi(s))Li(s)|FY

s ]

Claim:1

n

n∑i=1

g(Xi(s))Li(s) → EQ[g(X(s))L(s)|FYs ]


Zakai equation

Assume X is a diffusion

X(t) = X(0) +

∫ t

0σ(X(s))dB(s) +

∫ t

0b(X(s))ds,

where under Q, B and Y are independent. Since

g(X(t)) = g(X(0)) +

∫ t

0g′(X(s))σ(X(s))dB(s) +

∫ t

0Ag(X(s))ds

g(X(t))L(t) = g(X(0)) +

∫ t

0

L(s)dg(X(s)) +

∫ t

0

g(X(s))dL(s)

= g(X(0)) +

∫ t

0

L(s)g′(X(s))σ(X(s))dB(s) +

∫ t

0

L(s)Ag(X(s))ds

+

∫ t

0

g(X(s))h(X(s))L(s)dY (s)


Monte Carlo derivation of Zakai equation

Xi(t) = Xi(0) +

∫ t

0σ(Xi(s))dBi(s) +

∫ t

0b(Xi(s))ds,

where (Xi(0), Bi) are iid copies of (X(0), B).

g(Xi(t))Li(t) = g(Xi(0)) +

∫ t

0

Li(s)g′(Xi(s))σ(Xi(s))dBi(s) +

∫ t

0

Li(s)Ag(Xi(s))ds

+

∫ t

0

g(Xi(s))h(Xi(s))Li(s)dY (s)

and hence

φ(g, t) = φ(g, 0) +

∫ t

0φ(Ag, s)ds+

∫ t

0φ(gh, s)dY (s)


Kushner-Stratonovich equation

π(g, t) = EP [g(X(t))|FYt ] =

φ(g, t)

φ(1, t)

=φ(g, 0)

φ(1, 0)+

∫ t

0

1

φ(1, s)dφ(g, s)−

∫ t

0

φ(g, s)

φ(1, s)2dφ(1, s)

+

∫ t

0

φ(g, s)

φ(1, s)3d[φ(1, ·)]s −

∫ t

0

1

φ(1, s)2d[φ(g, ·), φ(1, ·)]s

= π(g, 0) +

∫ t

0

π(Ag, s)ds+

∫ t

0

(π(gh, s)− π(g, s)π(h, s))dY (s)

+

∫ t

0

σ2π(g, s)π(h, s)2ds−∫ t

0

σ2π(gh, s)π(h, s)ds

= π(g, 0) +

∫ t

0

π(Ag, s)ds+

∫ t

0

(π(gh, s)− π(g, s)π(h, s))(dY (s)− π(h, s)ds)


Counting process observationsModel: X is a diffusion

X(t) = X(0) +

∫ t

0σ(X(s))dB(s) +

∫ t

0b(X(s))ds (15.1)

andY (t) =

∫[0,∞)×[0,t]

1[0,λ(X(s)](u)ξ(du× ds),

where V is unit Poisson process independent of B.

Reference measure construction: Under Q, X is the diffusion givenby (15.1) and Y is an independent, unit Poisson process. The changeof measure is given by (14.7):

L(t) = 1 +

∫ t

0(λ(X(s))− 1)L(s−)d(Y (s)− s)


Zakai equation

g(X(t))L(t) = g(X(0)) +

∫ t

0

g(X(s))dL(s) +

∫ t

0

L(s)dg(X(s))

= g(X(0)) +

∫ t

0


∫ t

0

L(s)Ag(X(s))ds

+

∫ t

0

g(X(s))(λ(X(s))− 1)L(s−)d(Y (s)− s)

The unnormalized conditional distribution φ(g, t) = EQ[g(X(t))L(t)|FYt ]

satisfies

φ(g, t) = φ(g, 0) +

∫ t

0φ((A− C)g, s)ds+

∫ t

0φ(Cg, s−)dY (s),

where Cg(x) = (λ(x)− 1)g(x).


Kushner-Stratonovich equation

For π(g, t) ≡ φ(g,t)φ(1,t) ,

π(g, t) = π(g, 0) +

∫ t

0(π(Ag, s)− π(λg, s) + π(λ, s)π(g, s))ds

+

∫ t

0

(π(λg, s−)− π(λ, s−)π(g, s−)

π(λ, s−)

)dY (s)


Solution of the Zakai equation

Let T (t) be the semigroup given by

T (t)f(x) = E[f(X(t))e−∫ t

0(λ(X(s))−1)ds]

Suppose the jump times of Y satisfy 0 < τ1 < · · · < τm < t < τm+1.Then

φ(g, t) = φ(T (t− τm)g, τm)

φ(g, τk+1−) = φ(T (τk+1 − τk)g, τk)

φ(g, τk+1) = φ(λg, τk+1−) = φ(T (τk+1 − τk)λg, τk).

If φ(dx, t) = φ(x, t)dx, then

φ(·, t) = T ∗(t− τm)φ(·, τm)

φ(·, τk+1−) = T ∗(τk+1 − τk)φ(·, τk)φ(·, τk+1) = λ(·)φ(·, τk+1−) = λ(·)T ∗(τk+1 − τk)φ(·, τk).


Point process observations

Model: X is adapted to Ft, and

Y (t,Γ)−∫ t

0

∫Γλ(X(s), u)ν(du)ds

is an Ft-martingale for all Γ ∈ B(U) with ν(Γ) <∞.

supx

∫U

|λ(x, u)− 1| ∧ |λ(x, u)− 1|2ν(du) <∞.

Reference measure construction: Under Q, X is the diffusion givenby (15.1) and Y is an independent, Poisson random measure withmean measure ν. The change of measure is given by (14.8):

L(t) = 1+

∫ t

0L(s−)dMλ(s) = 1+

∫U×[0,t]

(λ(X(s), u)−1)L(s−)Y (du×ds).


Zakai equation

g(X(t))L(t) = g(X(0)) +

∫ t

0

g(X(s))dL(s) +

∫ t

0

L(s)dg(X(s))

= g(X(0)) +

∫ t

0


∫ t

0

L(s)Ag(X(s))ds

+

∫U×[0,t]

g(X(s))(λ(X(s), u)− 1)L(s−)Y (du× ds)

Assume that ν(U) < ∞. Setting λ(x) =∫

U λ(x, u)ν(du) and Cg(x) =(λ(x) − 1)g(x), the unnormalized conditional distribution φ(g, t) =EQ[g(X(t))L(t)|FY

t ] satisfies

φ(g, t) = φ(g, 0)+

∫ t

0φ((A−C)g, s)ds+

∫ t

0φ((λ(·, u)−1)g, s−)Y (du×ds).


Solution of the Zakai equation

Let T (t) by the semigroup given by

T (t)f(x) = E[f(X(t))e−∫ t

0(λ(X(s))−1)ds]

Suppose Y =∑

k δ(τk,uk), and 0 < τ1 < · · · < τm < t < τm+1. Then

φ(g, t) = φ(T (t− τm)g, τm)

φ(g, τk+1−) = φ(T (τk+1 − τk)g, τk)

φ(g, τk+1) = φ(λ(·, uk+1)g, τk+1−) = φ(T (τk+1 − τk)λ(·, uk+1)g, τk).

If φ(dx, t) = φ(x, t)dx, then

φ(·, t) = T ∗(t− τm)φ(·, τm)

φ(·, τk+1−) = T ∗(τk+1 − τk)φ(·, τk)φ(·, τk+1) = λ(·, uk+1)φ(·, τk+1−) = λ(·, uk+1)T

∗(τk+1 − τk)φ(·, τk).


16. Finance

• Model of a market

• No arbitrage condition

• Pricing with a bond

• Pricing with a money market account

• Black-Scholes formula


Model of a market

Consider financial activity over a time interval [0, T ] modeled by aprobability space (Ω,F , P ).

Assume that there is a “fair casino” or market which is complete inthe sense that at time 0, for each event A ∈ F , a price Q(A) ≥ 0 isfixed for a bet or a contract that pays one dollar at time T if and onlyif A occurs.

Assume that the market is frictionless in that an investor can eitherbuy or sell the contract at the same price and that it is liquid in thatthere is always a buyer or seller available. Also assume that Q(Ω) <∞.

An investor can construct a portfolio by buying or selling a variety ofcontracts (possibly countably many) in arbitrary multiples.


Price and payoff of a portfolio

If ai is the “quantity” of a contract for Ai (ai < 0 corresponds toselling the contract), then the payoff at time T is∑

i

ai1Ai.

Require∑

i |ai|Q(Ai) < ∞ (only a finite amount of money changeshands) so that the initial cost of the portfolio is (unambiguously)∑

i

aiQ(Ai).

The market has no arbitrage if no combination (buying and selling) ofcountably many policies with a net cost of zero results in a positiveprofit at no risk.


No arbitrage condition

If∑|ai|Q(Ai) <∞,∑

i

aiQ(Ai) = 0, and∑

i

ai1Ai≥ 0 a.s.,

then ∑i

ai1Ai= 0 a.s.


Consequences of the no arbitrage condition

Lemma 16.1 Assume that there is no arbitrage. If P (A) = 0, thenQ(A) =0. If Q(A) = 0, then P (A) = 0.

Proof. Suppose P (A) = 0 and Q(A) > 0. Buy one unit of Ω and sellQ(Ω)/Q(A) units of A.

Cost = Q(Ω)− Q(Ω)

Q(A)Q(A) = 0

Payoff = 1− Q(Ω)

Q(A)1A = 1 a.s.

which contradicts the no arbitrage assumption.

Now suppose Q(A) = 0. Buy one unit of A. The cost of the portfoliois Q(A) = 0 and the payoff is 1A ≥ 0. So by the no arbitrage assump-tion, 1A = 0 a.s., that is, P (A) = 0.


Price monotonicity

Lemma 16.2 If there is no arbitrage and A ⊂ B, then Q(A) ≤ Q(B), withstrict inequality if P (A) < P (B).

Proof. Suppose P (B) > 0 (otherwise Q(A) = Q(B) = 0) and Q(B) ≤Q(A). Buy one unit of B and sell Q(B)/Q(A) units of A.

Cost = Q(B)− Q(B)

Q(A)Q(A) = 0

Payoff = 1B −Q(B)

Q(A)1A = 1B−A + (1− Q(B)

Q(A))1A ≥ 0,

Payoff = 0 a.s. implies Q(B) = Q(A) and P (B − A) = 0.


Q must be a measure

Theorem 16.3 If there is no arbitrage, Q must be a measure on F .

Proof. A1, A2, . . . disjoint and A = ∪∞i=1Ai. Assume P (Ai) > 0 forsome i. (Otherwise, Q(A) = Q(Ai) = 0.)

Let ρ ≡∑

iQ(Ai), and buy one unit of A and sell Q(A)/ρ units of Ai

for each i.Cost = Q(A)− Q(A)

ρ

∑i

Q(Ai) = 0

Payoff = 1A −Q(A)

ρ

∑i

1Ai= (1− Q(A)

ρ)1A.

If Q(A) ≤ ρ, then Q(A) = ρ.

If Q(A) ≥ ρ, sell one unit of A and buy Q(A)/ρ units of Ai.


Equivalence of measures

Theorem 16.4 If there is no arbitrage, Q << P and P << Q. (P and Qare equivalent measures.)

Proof. The result follows from Lemma 16.1.


Pricing general payoffs

If X and Y are random variables satisfying X ≤ Y a.s., then no arbi-trage should mean

Q(X) ≤ Q(Y ).

It follows that for any Q-integrable X , the price of X is

Q(X) =

∫XdQ

By the Radon-Nikodym theorm, dQ = LdP , for some nonnegative,integrable random variable L, and

Q(X) = EP [XL]


Assets that can be traded at intermediate times

Ft represents the information available at time t.

B(t) is the price at time t of a bond that is worth $1 at time T (e.g.B(t) = e−r(T−t)), that is, at any time 0 ≤ t ≤ T , B(t) is the price of acontract that pays exactly $1 at time T .

Note that B(0) = Q(Ω)

Define Q(A) = Q(A)/B(0), so that Q is a probability measure


Martingale properties of tradable assets

Let S(t) be the price at time t of another tradable asset, that is, S(t)is the buying or selling price at time t of an asset that will be worthS(T ) at time T . S must be Ft-adapted.

For any stopping time τ ≤ T , we can buy one unit of the asset attime 0, sell the asset at time τ and use the money received (S(τ)) tobuy S(τ)/B(τ) units of the bond. Since the payoff for this strategy isS(τ)/B(τ) (the value of the bonds at time T ), we must have

S(0) =

∫S(τ)

B(τ)dQ = EQ[

B(0)S(τ)

B(τ)].

Theorem 16.5 If S is the price of a tradable asset, then S/B is a martingaleon (Ω,F , Q).


Characterizing martingales by stopping

Theorem 16.6 Let M be an Ft adapted process. If E[M(τ)] = E[M(0)]for every bounded Ft-stopping time, then M is a martingale.

Proof. Let t < s and A ∈ Ft. Define τ = t on A and τ = s on Ac. Thenτ is a stopping time and

E[M(s)] = E[M(0)] = E[M(τ)] = E[1AM(t)] + E[1AcM(s)].

Consequently, E[1AM(s)] = E[1AM(t)], A ∈ Ft, and hence,

E[M(s)|Ft] = M(t).


Pricing tradable assets in a market with a money-marketaccount

Instead of a bond, suppose that there is an account that pays interestat time t at rate R(t). Then, $1 invested in the account at time zero isworth L(T ) = e

∫ T

0R(s)ds at time T . Consequently, if Q is an arbitrage-

free pricing measure

1 =

∫e∫ T

0R(s)dsdQ.

Define dP = L(T )dQ, and note that P is a probability measure.

If S is another tradable asset, we must have

S(0) =

∫S(τ)e

∫ T

τR(s)dsdQ = EP [S(τ)e−

∫ τ

0R(s)ds],

so the discounted asset value S(t)e−∫ t

0R(s)ds is a martingale under P .


Pricing general contracts

Let V (T ) be a FT -measureable random variable, and suppose thatV (T ) is the value of a contract at time T . If the contract is tradableat intermediate times, its discounted price V (t)e−

∫ t

0R(s)ds must be a

martingale under P , and hence

V (t) = EP [V (T )e−∫ T

tR(s)ds|Ft]. (16.1)

Taking V (T ) ≡ 1, the price of a bond must be

B(t) = EP [e−∫ T

tR(s)ds|Ft].


Are the two approaches to pricing consistent?

Consider ∫S(τ)

B(τ)dQ = EP [

S(τ)

B(τ)e−

∫ T

0R(s)ds]

= EP [S(τ)

B(τ)EP [e−

∫ T

0R(s)ds|Fτ ]]

= EP [S(τ)e−∫ τ

0R(s)ds]

= S(0)

Note thatdP = e

∫ T

0R(s)dsdQ, dQ = B(0)−1dQ

are both probability measures. Discounted prices are martingalesunder P while prices normalized by the bond are martingales underQ. If R is deterministic, P = Q.


A market with one stock and a money market

Suppose we start with a market consisting of one stock with price Sand a money market with interest rate R where

S(t) = S(0) +

∫ t

0α(s)S(s)ds+

∫ t

0σ(s)S(s)dW (s)

and α, σ, and R are adapted to FWt . Suppose that we are free to

trade using the information given by FWt so that a portfolio worth

X(0) at time zero pursuing and adapted trading strategy ∆ is worth

X(T ) = X(0) +

∫ T

0∆(t)dS(t) +

∫ T

0R(t)(X(t)−∆(t)S(t))dt

= X(0) +

∫ T

0∆(t)σ(t)S(t)dW (t)

+

∫ T

0(R(t)X(t) + ∆(t)(α(t)−R(t))S(t))dt


Constraints on the pricing measure

Define Θ(t) = α(t)−R(t)σ(t) . Then

X(T ) = X(0) +

∫ T

0∆(t)σ(t)S(t)dW (t) +

∫ T

0∆(t)σ(t)Θ(t)S(t))dt

+

∫ T

0R(t)X(t)dt

If the market is complete and there is no arbitrage, any pricing mea-sure must satisfy

∫X(T )dQ = X(0) and under the corresponding

P , dP = e∫ T

0R(s)dsdQ, X(t)e−

∫ t

0R(s)ds must be a martingale. Setting

D(t) = e−∫ t

0R(s)ds

X(t)D(t) = X(0) +

∫ t

0∆(s)σ(s)D(s)S(s)(dW (s) + Θ(s)ds),

the integral must be a martingale under P .


Choice of P

Recall that we want

X(t)D(t) = X(0) +

∫ t

0∆(s)σ(s)D(s)S(s)(dW (s) + Θ(s)ds),

to be a martingale under P . Taking ξ = −Θ in (14.4), that is, let

L(t) = exp−∫ t

0Θ(s)dW (s)− 1

2

∫ t

0Θ2(s)ds

and define dP |Ft= L(t)dP . Then the required condition holds, at

least for ∆ satisfying

EP [

∫ t

0(∆(s)σ(s)D(s)S(s))2ds] <∞.


Risk-neutral measureRecall that Θ(t) = α(t)−R(t)

σ(t) and

D(t)S(t) = S(0) +

∫ t

0σ(s)D(s)S(s)dW (s) +

∫ t

0σ(s)Θ(s)D(s)S(s)dt

= S(0) +

∫ t

0σ(s)D(s)S(s)dW (s)

Θ(t) is called the market price of risk. Under P , the market price of riskis zero and hence the model is “risk neutral.”


Black-Scholes formula

Assuming that R(t) ≡ r and

S(t) = S(0) +

∫ t

0σS(s)dW (s) +

∫ t

0αS(s)ds,

then by (16.1), a contract with payoff h(S(T )) have a price at time tof the form

V (t) = EP [h(S(T ))e−r(T−t)|Ft]

which is q(t, S(t)) for

q(t, x) = E[h(x expσW (T − t) + (r − 1

2σ2)(T − t))e−r(T−t)]


Multiple tradeable assetsHow reasonable is the assumption that there exists a pricing mea-sure Q? Start with a model for a collection of tradeable assets. Forexample, let

X(t) = X(0) +

∫ t

0σ(X(s))dW (s) +

∫ t

0b(X(s))ds

or more generally just assume that X is a vector semimartingale. Al-low certain trading strategies producing a payoff at time T :

Y (T ) = Y (0) +∑

i

∫ t

0Hi(s−)dXi(s)

Arbitrage exists if there is a trading strategy satisfying

Y (T ) =∑

i

∫ t

0Hi(s−)dXi(s) ≥ 0 a.s.

with PY (T ) > 0 > 0.


First fundamental “theorem”

Theorem 16.7 (Meta theorem) There is no arbitrage if and only if thereexists a probability measure Q equivalent to P under which the Xi are mar-tingales.

Problems:

• What trading strategies are allowable?

• The definition of no arbitrage above is, in general, too weak togive theorem.


Example

Assume that B(t) ≡ 1 and that there is a single asset satisfying

X(t) = X(0)+

∫ t

0σX(s)dW (s)+

∫ t

0bX(s)ds = X(0)+

∫ t

0σX(s)dW (s).

Let T = 1 and for some stopping time τ < T (to be determined), letH(t) = 1

σX(t)(1−t) , 0 ≤ t < τ , and H(t) = 0 for t ≥ τ . Then for t < τ ,∫ t

0H(s)dX(s) =

∫ t

0

1

1− sdW (s) = W (

∫ t

0

1

(1− s)2ds),

where W is a standard Brownian motion under Q. Let

τ = infu : W (u) = 1,∫ τ

0

1

(1− s)2ds = τ .

Then with probability 1,∫ 1

0 H(s)dX(s) = 1.


Admissible trading strategies

The trading strategy denoted x,H1, . . . , Hd is admissible if for

V (t) = x+∑

i

∫ t

0Hi(s−)dXi(s)

there exists a constant a such that

inf0≤t≤T

V (t) ≥ −a, a.s.


Definitions

No arbitrage: If 0, H1, . . . , Hd is an admissible trading strategy and∑i

∫ T

0 Hi(s−)dXi(s) ≥ 0 a.s., then∑

i

∫T0Hi(s−)dXi(s) = 0 a.s.

No free lunch with vanishing risk: If 0, Hn1 , . . . , H

nd are admissible

trading strategies and

limn→∞

‖0 ∧∑

i

∫ T

0Hn

i (s−)dXi(s)‖∞ = 0,

then

|∑

i

∫ T

0Hn

i (s−)dXi(s)| → 0

in probability.


First fundamental theorem

Theorem 16.8 (Delbaen and Schachermayer). Let X = (X1, . . . , Xd) be abounded semimartingale defined on (Ω,F , P ), and let Ft = σ(X(s), s ≤t). Then there exists an equivalent martingale measure defined on FT if andonly if there is no free lunch with vanishing risk.


Second fundamental “theorem”

Theorem 16.9 (Meta theorem) If there is no arbitrage, then the market iscomplete if and only if the equivalent martingale measure is unique.

Problems:

• What prices are “determined” by the allowable trading strate-gies?

• Specifically, how can one “close up” the collection of attainablepayoffs?


Second fundamental theorem

Theorem 16.10 If there exists an equivalent martingale measure, then it isunique if and only if the set of replicable, bounded payoffs is “complete” inthe sense that

x+∑

i

∫ T

0Hi(s−)dXi(s) : Hi simple ∩ L∞(P )

is weak∗ dense in L∞(P,FT ),


Extension to general BFor general B, if we assume that after time 0 all wealth V must eitherbe invested in the assets Xi or the bondB, then the number of unitsof the bond held is

V (t)−∑

iHi(t)Xi(t)

B(t),

and

V (t) = V (0)+∑

i

∫ t

0Hi(s−)dXi(s)+

∫ t

0

V (s−)−∑

iHi(s−)Xi(s−)

B(s−)dB(s).

Applying Ito’s formula, we have

V (t)

B(t)=V (0)

B(0)+∑

i

∫ t

0

Hi(s−)

B(s−)dXi(s),

which should be a martingale under Q.


17. Technical lemmas

• Dynkin-class theorem


The Dynkin-class theorem

A collectionD of subsets of Ω is a Dynkin class if Ω ∈ D, A,B ∈ D andA ⊂ B imply B − A ∈ D, and An ⊂ D with A1 ⊂ A2 ⊂ · · · implies∪nAn ∈ D.

Theorem 17.1 Let S be a collection of subsets of Ω such that A,B ∈ Simplies A ∩B ∈ S . If D is a Dynkin class with S ⊂ D, then σ(S) ⊂ D.

σ(S) denotes the smallest σ-algebra containing S.

Example 17.2 If Q1 and Q2 are probability measures on Ω, then B :Q1(B) = Q2(B) is a Dynkin class.


Proof. Let D(S) be the smallest Dynkin-class containing S.

If A,B ∈ S, then Ac = Ω− A, Bc = Ω− B, and Ac ∪ Bc = Ω− A ∩ Bare in D(S).

Consequently, Ac∪Bc−Ac = A∩Bc, Ac∪B = Ω−A∩Bc, Ac∩Bc =Ac ∪B −B, and A ∪B = Ω− Ac ∩Bc are in D(S).

For A ∈ S, B : A ∪ B ∈ D(S) is a Dynkin class containing S, andhence D(S).

Consequently, for A ∈ D(S), B : A ∪ B ∈ D(S) is a Dynkin classcontaining S and hence D(S).

It follows that A,B ∈ D(S) implies A ∪ B ∈ D(S). But if D(S) isclosed under finite unions it is closed under countable unions.


Equality of two measures

Lemma 17.3 Let µ and ν be measures on (M,M). Let S ⊂ M be closedunder finite intersections. Suppose that µ(M) = ν(M) and µ(B) = ν(B)for each B ∈ S. Then µ(B) = ν(B) for each B ∈ σ(S).

Proof. Since µ(M) = ν(M), B : µ(B) = ν(B) is a Dynkin-classcontaining S and hence contains σ(S).

For example: M = Rd, S = ∏d

i=1(−∞, ci] : ci ∈ R. If

PX1 ≤ c1, . . . , Xd ≤ cd = PY1 ≤ c1, . . . , Yd ≤ cd, c1, . . . , cd ∈ R,

then

P(X1, . . . , Xd) ∈ B = P(Y1, . . . , Yd) ∈ B, B ∈ B(Rd).


18. Appendix

• Existence of conditional expectations


18.1. Existence of conditional expectations

Lemma 18.1 Let M be a closed linear subspace of L2, and let X ∈ L2. Then there exists a unique Y ∈ M such thatE[(X − Y )2] = infZ∈M E[(X − Z)2].

Proof. Let ρ = infZ∈M E[(X − Z)2], and let Yn ∈ M satisfy limn→∞ E[(X − Yn)2] = ρ. Then noting that

E[(Yn − Ym)2] = E[(X − Yn)2] + E[(X − Ym)2]− 2E[(X − Yn)(X − Ym)]

we have

4ρ ≤ E[(2X − (Yn + Ym))2]

= E[(X − Yn)2] + E[(X − Ym)2] + 2E[(X − Yn)(X − Ym)]

= 2E[(X − Yn)2] + 2E[(X − Ym)2]− E[(Yn − Ym)2],

and it follows that Yn is Cauchy in L2. By completeness, there exists Y such that Y = limn→∞ Yn, andρ = E[(X − Y )2].

Note that uniqueness also follows from the inequality.

Definition 18.2 Let X, Y ∈ L2. Then X and Y are orthogonal (X ⊥ Y ) if and only if E[XY ] = 0.

Lemma 18.3 Let M be a closed linear subspace of L2, and let X ∈ L2. Then the best approximation constructed inLemma 18.1 is the unique Y ∈ M such that (X − Y ) ⊥ Z for every Z ∈ M .


Proof. Suppose Z ∈ M . Then

E[(X − Y )2] ≤ E[(X − (Y + aZ))2]

= E[(X − Y )2]− 2aE[Z(X − Y )] + a2E[Z2].

Since a may be either positive or negative, we must have

E[Z(X − Y )] = 0.

Uniqueness follows from the fact that E[Z(X − Y1)] = 0 and E[Z(X − Y2)] = 0 for all Z ∈ M implies

E[(Y1 − Y2)2] = E[(Y1 − Y2)(X − Y2)]− E[(Y1 − Y2)(X − Y1)] = 0.

Lemma 18.4 Let M be a closed linear subspace of L2, and for X ∈ L2, denote the Y from Lemma 18.1 by PMX . ThenPM is a linear operator on L2, that is,

PM (a1X1 + a2X2) = a1PMX1 + a2PMX2.

Proof. Since

E[Z(a1X1 + a2X2 − (a1PMX1 + a2PMX2)]

= a1E[Z(X1 − PMX1)] + a2E[Z(X2 − PMX2)]

the conclusion follows by the uniqueness in Lemma 18.3.


Let D ⊂ F be a sub-σ-algebra, and let L2(D) be the linear space of D-measurable random variables in L2.Define

E[X|D] = PL2(D)X.

Then by orthogonality (Lemma 18.3),

E[X1D] = E[E[X|D]1D], D ∈ D.

We extend the defintion to L1.

Definition 18.5 Let X ∈ L1. Then E[X|D] is the unique D-measurable random variable satisfying

E[X1D] = E[E[X|D]1D], D ∈ D.

Lemma 18.6 Let X1, X2 ∈ L1, X1 ≥ X2 a.s., and suppose Y1 = E[X1|D] and Y2 = E[X2|D]. Then Y1 ≥ Y2 a.s.

Proof. Let D = Y2 > Y1. Then

0 ≤ E[(X1 −X2)1D] = E[(Y1 − Y2)1D] ≤ 0.

Lemma 18.7 Let X ∈ L1, X ≥ 0. Then

E[X|D] = limc→∞

E[X ∧ c|D] (18.1)


Proof. Note that the right side of (18.1) (call it Y ) is D-measurable and for D ∈ D,

E[X1D] = limc→∞

E[(X ∧ c)1D] = limc→∞

E[E[X ∧ c|D]1D] = E[Y 1D],

where the first and last equalities hold by the monotone convergence theorem and the middle equality holdsby definition.


References[1] Durrett, Richard [1991], Probability: Theory and Examples, The Wadsworth & Brooks/Cole Statis-

tics/Probability Series, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.

[2] Ethier, Stewart N. and Thomas G. Kurtz [1986], Markov Processes: Characterization and Convergence, WileySeries in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley &Sons Inc., New York.

[3] Protter, Philip [1990], Stochastic Integration and Differential Equations: A New Approach, Vol. 21 of Applica-tions of Mathematics (New York), Springer-Verlag, Berlin.

[4] Protter, Philip E. [2005], Stochastic integration and differential equations, Vol. 21 of Stochastic Modelling andApplied Probability, Springer-Verlag, Berlin. Second edition. Version 2.1, Corrected third printing.


List of topics1. Review of probability

math 735: stochastic analysiskurtz/735/m735f07.pdf · math 735: stochastic analysis 1. introduction...

Documents