stochastic calculus notes 1/5

Stochastic Calculus

Laura Ballotta

MSc Financial MathematicsOctober 2008

0 c© Laura Ballotta - Do not reproduce without permission.

2

Table of Contents

1. Review of Measure Theory and Probability Theory

(a) The basic framework: the probability space

(b) Random variables

(c) Conditional expectation

(d) Change of measure

2. Stochastic processes

(a) Some introductory definitions

(b) Classes of processes

3. Brownian motions

(a) The martingale property

(b) Construction of a Brownian motion

(c) The variation process of a Brownian motion

(d) The reflection principle and functionals of a Brownian motion

(e) Correlated Brownian motions

(f) Simulating trajectories of the Brownian motion - part 1

4. Ito Integrals and Ito Calculus

(a) Motivation

(b) The construction of the Ito integral

(c) Ito processes and stochastic calculus

(d) Stochastic differential equations

(e) Steady-state distribution

(f) The Brownian bridge and stratified Monte Carlo

5. The Change of Measure for Brownian Motions

(a) Change of probability measure: the martingale problem

(b) PDE detour

(c) Feynman-Kac representation

(d) Martingale representation theorem

REFERENCES 3

References

[1] Grimmett, G. and D Stirzaker (2003). Probability and Random Processes. OxfordUniversity Press.

[2] Mikosch, T. (2004). Elementary Stochastic Calculus, with Finance in View. WorldScientific Publishing Co Pte Ltd.

[3] Shreve, S. (2004). Stochastic Calculus for Finance II - Continuous-time models.Springer Finance.

4 REFERENCES

Introduction

This set of lecture notes will take you through the theory of Brownian motions andstochastic calculus which is required for a sound understanding of modern option pricingtheory and modelling of the term structure of interest rates.

As the theory of stochastic processes has its own special “language”, the first chapteris devoted to introducing this new notation but also to some revision of the basic conceptsin probability theory required in the following chapter. Particular attention is given tothe conditional expectation operator which is the building block of modern mathematicalfinance. This will allow us to introduce the idea of martingale, which underpins the theoryof contingent claim pricing. Once these concepts are clear and well understood, we willdevote the rest of the module to the Brownian motion and the rules of calculus that gowith it. These will be our main “tools” for financial applications, which are explored ingreat details in the module “Mathematical Models for Financial Derivatives”.

As the Brownian motion by construction links us to a prespecified distribution of theincrements of the process, we will introduce very briefly a more general class of processeswhich can be used in the context of mathematical finance. However, the full investigationof these processes and their applications will be the focus of the module “AdvancedStochastic Modelling in Finance” which runs in Term 2.

The material in this booklet covers the entire module; however it is far from being ex-haustive and students are strongly recommended to do some self-reading. Some referenceshave been provided in the previous page.

Each chapter contains a number of sample exam questions, some in the form of solvedexamples, others in form of exercises for you to practice. Solutions to these exercises willbe posted on CitySpace at some point before the end of term, together with the solutionsto the exam papers that you will find in the very last chapter of this booklet.

Needless to say that waiting for these solutions to become available beforeattempting the exercises on your own will not help you much in preparing forthe exam itself. You need to test yourself first!

5

1 Review of Measure Theory and Probability Theory

1.1 The basic framework: the probability space

Imagine a random experiment like the toss of a coin or the prices of securities traded inthe market in the next period of time. Imagine that we want to explore the features of thisrandom experiment in order to make appropriate and informed decisions. These featurescould be: the expected price of the security tomorrow, or its volatility; the characteristicsof the tails of the price distribution (if for example you need to calculate some risk measuresuch as VaR, or shortfall expectation).

In order to be able to do all this, we need appropriate tools describing the randomexperiment in such a way that we can extract all this information, i.e. we need a mathe-matical model of the random experiment. This is represented by the so-called probabilityspace.

Definition 1 (Probability space) We denote the probability space by the triplet

Θ := (Ω,F , P) .

A probability space can be considered as a mathematical model of a random experiment.

This definition is telling us that the probability space is made up of three buildingblocks, which we are going to explore one by one.

The first piece of the probability space is Ω, which represents our sample space, i.e.the set of all possible outcomes of random experiment.

Example 1 Let the random experiment be defined as: choose a number from the unitinterval [0, 1]. Then Ω = ω : 0 ≤ ω ≤ 1 = [0, 1].

Example 2 Assume now that the random experiment you are interested into is theevolution of a stock price over an infinite time horizon, when only 2 states of nature canoccur, i.e. up or down. Then Ω = the set of all infinite sequences of ups and downs= ω : ω1ω2ω3... , where ωn is the result at the n-th period.

The second piece you need in order to have a probability space is F which is calledσ-algebra. The σ-algebra of a random experiment can be interpreted as the collection ofall possible histories of the random experiment itself. Formally, it is defined as follows.

Definition 2 (σ-algebra) Given a set Ω, a collection F of subsets of Ω is a σ-algebraif:

1. ∅ ∈ F

2. A ∈ F implies Ac ∈ F0 c© Laura Ballotta - Do not reproduce without permission.

6 1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

3. Am ∈ F implies∞⋃

m=1

Am ∈ F (infinite union).

Example 3 1. F = ∅, Ω is a σ-algebra

2. Consider some event A ⊂ Ω. Then the σ-algebra generated by A is F = ∅, Ω, A,Ac.

3. Consider the sample space defined above for the evolution of the stock price in a2-state economy, i.e. Ω = the set of infinite sequences of ups and downs, and define

AU = ω : ω1 = UAD = ω : ω1 = D .

The σ-algebra generated by these two sets is

F (1) = ∅, Ω, AU ,AD .

Now consider the sets

AUU = ω : ω1 = U, ω2 = UAUD = ω : ω1 = U, ω2 = DADU = ω : ω1 = D, ω2 = UADD = ω : ω1 = D, ω2 = D .

Then

F (2) = ∅, Ω, AUU ,AUD, ADU ,ADD, AcUU ,Ac

UD, AcDU ,Ac

DD, AU ,AD,

AUU

⋃

ADU , AUU

⋃

ADD, ADU

⋃

AUD, AUD

⋃

ADD

is the corresponding σ-algebra.

Example 4 The Borel σ-algebra B on R is the σ-algebra generated by open subsets ofR.

Every σ-algebra has a set of properties that will be useful in the future.

Theorem 3 The σ-algebra has the following properties:

1. Ω ∈ F .

2. Am ∈ F implies∞⋂

m=1

Am ∈ F .

1.1 The basic framework: the probability space 7

Proof. 1) ∅ ∈ F by definition, hence ∅c = Ω ∈ F by definition as well, (apply properties1 and 2 from the previous definition).

2) By assumption: Am ∈ F ; hence Acm ∈ F which implies that

∞⋃

m=1

Acm ∈ F . By the

law of De Morgan (b)1:∞⋃

m=1

Acm =

(

∞⋂

m=1

Am

)c

,

therefore(

∞⋂

m=1

Am

)c

∈ F .

From the definition of σ-algebra, it follows that

[(

∞⋂

m=1

Am

)c]c

∈ F and consequently

∩∞m=1Am ∈ F .

The last piece of our probability space is represented by the symbol P. This is calledprobability measure, and you can consider it as a sort of “metrics”, that measures thelikelihood of a specific event or story of the random experiment.

Definition 4 A probability measure P is a set function P : F → [0, 1] such that:

1. P (Ω) = 1

2. For any sequence of disjoint events Am , P (⋃∞

m Am) =∑∞

m=1 P (Am).

Based on this definition, you can show that

P (∅) = 0;

P(

A⋃

B)

= P (A) + P (B) ;

P (Ac) = 1 − P (A) .

Moreover, we can define independent events: two events, A and B, are independent if andonly if P (A

⋂

B) = P (A) P (B).

Example 5 Consider the previous example of the evolution of the stock price overan infinite time horizon, so that Ω = ω : ω1ω2ω3..., and AU = ω : ω1 = U, AD =ω : ω1 = D. Assume that the different up/down movements at each time step are in-dependent, and let

P (AU ) = p; P (AD) = q = 1 − p.

1Proposition (Law of De Morgan) (a) (A ∪ B)c

= Ac∩Bc. More in general: (∪mAm)c

= ∩mAc

m.

(b) (A ∩ B)c

= Ac ∪ Bc . Generalising: (∩Am)c

= ∪Ac

m.

Proof. (a) Assume x ∈ ∩∞

m=1Ac

m. Then x ∈ Ac

m∀m. Hence x /∈ Am ∀m, which implies x /∈

∪∞

m=1Am. Therefore x ∈ (∪∞

m=1Am)c .

(b) Assume x ∈ ∪∞

m=1Ac

m; then x ∈ Ac

mfor some m. Hence x /∈ Am for the same m. Therefore

x /∈ ∩∞

m=1Am and hence x ∈ (∩∞

m=1Am)c

. The other direction of the statement can be proved in asimilar fashion.


ThenP (AUU) = p2; P (AUD) = P (ADU) = pq; P (ADD) = q2.

Further, P (AcUU) = 1 − p2; similarly, you can calculate the probability of each other

set in F (2). Moreover, if AUUU = ω : ω1 = U, ω2 = U, ω3 = U, you can calculate thatP (AUUU) = p3. And so on. Hence, in the limit you can conclude that the probabilityof the sequence UUU... is zero. The same applies for example to the sequence UDUD...;in fact this sequence is the intersection of the sequences U, UD, UDU, .... From thisexample, we can conclude that every single sequence in Ω has probability zero.

In the previous example, we have shown that

P (every movement is up) = 0;

this implies that this event is sure not to happen. Similarly, since the above is true, we aresure to get at least one down movement in the sequence, although we do not know exactlywhen in the sequence. Because of this fact, and the fact that the infinite sequence UUU...is in the sample space (which means that still is a possible outcome), mathematicians havecome up with a somehow strange way of saying: we will get at least one down movementalmost surely.

Definition 5 Let (Ω,F , P) be a probability space. If A ⊂ F is such that

P (A) = 1,

we say that the event A occurs almost surely (a.s.).

Now, in order to introduce the next definition, consider the following, maybe a littlesilly, example. Assume that you want to measure the length of a room, and assume youexpress this measure in meters and centimeters. It turns out that the room is 4.30m.long. Now assume that you want to change the reference system and express the lengthof the room in terms of feet and inches. Then, the room is 14ft. long. But in the processof switching from one reference system to the other, the room did not change: it did notshrink; it did not expand. The same applies to events and probability measures. The ideais given in the following.

Definition 6 (Absolutely continuous/equivalent probability measure) Given twoprobability measures P and P∗defined on the same σ-algebra F , then:

i) P is absolutely continuous with respect to P∗, i.e. P << P∗, if P (A) = 0 whenever,P∗ (A) = 0∀A ∈ F .

ii) If P << P∗ and also P∗ << P, then P ∼ P∗, i.e. P and P∗ are equivalent measures.Thus, for P ∼ P∗the following are equivalent:

• P (A) = 0 ⇔ P∗ (A) = 0 (same null sets)

• P (A) = 1 ⇔ P∗ (A) = 1 (same a.s. sets)

1.1 The basic framework: the probability space 9

• P (A) > 0 ⇔ P∗ (A) > 0 (same sets of positive measures)

Example 6 Consider a closed interval [a, b], for 0 ≤ a ≤ b ≤ 1 and consider the experi-ment of choosing a number from this interval. Define the following

P (the number chosen is in [a, b]) = P [a, b] := b − a.

But you can also define a different metrics P∗, according to which

P∗ (the number chosen is in [a, b]) = P∗ [a, b] := b2 − a2.

As there is a conversion factor that helps you to switch between meters and feet, so that4.30m = 14ft, there is also a conversion factor between probability measures. However,this conversion factor depends on few objects that we have not met yet. Therefore, thediscussion of this last feature is postponed to the end of this unit.

Exercise 1 Let A and B belong to some σ-algebra F . Show that F contains the setsA⋂

B, A\B, and A∆B, where ∆ denotes the symmetric difference operator, i.e.

A∆B = x : x ∈ A, x /∈ B or x /∈ A, x ∈ B .

Exercise 2 Show that for every function f : Ω −→ R the following hold:

1. f−1 (⋃

n An) =⋃

n f−1 (An);

2. f−1 (⋂

n An) =⋂

n f−1 (An);

3. f−1(

AC)

= (f−1 (A))C

for any subsets An, A of R.

Exercise 3 Let F be a σ-algebra of subsets of Ω and suppose that B ∈ F . Show thatG = A⋂B : A ∈ F is a σ-algebra of subsets of B.

Exercise 4 Let P be a probability measure on F . Show that P has the following prop-erties:

1. for any A, B ∈ F such that A⋂

B = ∅, P (A⋃

B) = P (A) + P (B);

2. for any A, B ∈ F such that A ⊂ B, P (A) ≤ P (B) [Hint: use the fact that for anytwo sets A and B such that A ⊂ B, B = A

⋃

(B\A) , where we define B\A :=x : x ∈ B, x /∈ A, (difference operator for sets]

3. for any A, B ∈ F such that A ⊂ B, P (B\A) = P (B) − P (A)


1.2 Random variables

So far, we have considered random events, like and up or down movement in the stock priceover the next period of time, and the likelihood of such events to occur, as described bythe probability measure. The next step in which you might be interested is to “quantify”the outcome of the random event, for example you might want to know how much thestock price is going to change if an up or down movement is going to occur in the nexttime period. In order to do this, you need the idea of random variable.

Definition 7 (Random variable) Let (Ω,F , P) be a probability space. A random vari-able X is a function X : Ω → R such that ω ∈ Ω |X (ω) ≤ x ∈ F ∀x ∈ R.

Note that if B is any subset of the Borel σ-algebra B, i.e. B is a set of the form B =(−∞, x] ∀x ∈ R, then Definition 7 implies that X−1 (B) ∈ F ∀x ∈ R. In other words,any random variable is a measurable function2, i.e. a numerical quantity whose value isdetermined by the random experiment of choosing some ω ∈ Ω.

Example 7 Consider once again the random experiment of the evolution of the stockprice over an infinite time horizon in a 2-state economy, described in Example 3. Let usdefine the stock prices by the formulae:

S0 (ω) = 4;

S1 (ω) =

8 if ω1 = up2 if ω1 = down

S2 (ω) =

16 if ω1 = ω2 = up4 if ω1 6= ω2

1 if ω1 = ω2 = down.

All of these are random variables, assigning a numerical value to each sequence of upand down movements in the stock price at each time period. Example 5 tells us how tocalculate the probability that the random variable S takes any of these values; for example

P (S1 (ω) = 8) = P (AU ) = p;

P (S2 (ω) = 4) = P(

ADU

⋃

AUD

)

= 2pq.

The above Example shows that we can associate to any random variable anotherfunction measuring the likelihood of the outcomes. This is what we call the law of X.Precisely, by law of X we mean a probability measure on (R,B), LX : B → [0, 1] suchthat

LX (B) = P (X ∈ B) ∀B ⊂ B.

2Definition (Measurable function) Let F be a σ-algebra on Ω and f : Ω → R. For A ∈ R let

f−1 (A) = ω ∈ Ω |f (ω) ∈ A ;

then, f is called F -measurable if f−1 (E) ∈ F ∀E ∈ B, where f−1 (E) is called the pre-image of E.

1.2 Random variables 11

In general, we prefer to speak in terms of distribution of a random variable; this is afunction FX : R → [0, 1] defined as

FX (a) = P (X ≤ a) = P (ω : X (ω) ≤ a) .

This is the law of X for any set B of the form B = (−∞, a], i.e. FX (a) = LX (−∞, a].In some special cases, we can describe the distribution function of a random variable Xin even more details. The first case is the case of a discrete random variable, like theone introduced in Example 7, which assigns lumps of mass to events. For this randomvariable, we can express the distribution function as

FX (a) = P (X ≤ a) =∑

X≤a

pX (x) ,

where pX (x) is the probability mass function of X. If instead the random variable Xspreads the mass continuously over the real line, then we have a continuous randomvariable and

FX (a) = P (X ≤ a) =

∫ a

−∞

fX (x) dx, (1)

where f (x) denotes the density function of X.

Exercise 5 Let X be a random variable. Show that the distribution FX of X defined by

FX (A) = P (X ∈ A) = P(

X−1 (A))

, A ∈ B (R) ,

is a probability measure on the σ-algebra B (R).

Remark 1 (A matter of notation) From equation (1), we see that we could write thedensity function as

fX (x) =dFX

dx=

dP (ω)

dx∀x ∈ R.

The expectation E of a random variable X on (Ω,F , P) is then defined by:

E [X] =

∫

Ω

X (ω) dP (ω) =

∞∫

−∞

xdFX (x) .

The expectation returns the mean of the distribution; you might be interested inthe dispersion around the mean, this feature is described by the variance of a randomvariable. Further features that characterize the distribution of a random variable arethe skewness (degree of asymmetry) and the kurtosis (behaviour of the tails). Thesefeatures are described by the moments (from the mean) of a random variable which canbe recovered via the moment generating function (MGF)

MX(k) = E[

ekX]

=

∞∫

−∞

ekxdFX (x) .


Example 8 Few (and very important, as we will use them throughout the entire year)examples of random variables:

1. The Poisson random variable is an example of discrete random variable. Moreprecisely, a Poisson random variable N ∼ Poi(λ), with rate λ has probability mass

pN (n) =e−λλn

n!

from which it follows that

E(N) = λ = Var(N); MN (k) = eλ(ek−1).

2. The normal (or Gaussian) random variable X ∼ N (µ, σ2) is a continuous randomvariable defined by the density function

fX(x) =e−

(x−µ)2

2σ2

σ√

2π.

You can easily show that

E(X) = µ; Var(X) = σ2; MX(k) = ekµ+ k2σ2

2

3. Assume X ∼ Γ (α, λ), α > 0. Then X is a non-negative random variable whichfollows a Gamma distribution; its density function is given by

f (x) =1

Γ (α)λαxα−1e−λx,

where Γ (α) is the Gamma function, which is defined as

Γ (α) =

∫ ∞

0

xα−1e−xdx,

and has the property that3

Γ (α) = Γ (α − 1) (α − 1) .

This means thatΓ (α) = (α − 1)!

where α is a positive integer. The MGF of X is

MX (k) =1

Γ (α)λα

∫ ∞

0

xα−1e−x(λ−k)dx.

3Why don’t you try to prove this last property... just integrate by parts.

1.2 Random variables 13

Set y = x (λ − k), then

MX (k) =1

Γ (α)λα

∫ ∞

0

(

y

λ − k

)α−1e−y

λ − kdy =

(

λ

λ − k

)α

.

Note that if α = 1, then X follows an exponential distribution with rate λ. Usingthe MGF you can show that the Gamma random variable has mean µ = α/λ andvariance ν = α/λ2. The parameter α is the shape parameter, whilst λ is the scaleparameter.

Moment generating functions suffer the disadvantage that the integrals which definethem may not always be finite.

Example 9 A Cauchy random variable X has density function

f(x) =1

π(1 + x2)x ∈ R.

Hence the MGF of X is given by

MX(k) =

∫ ∞

−∞

ekx

π(1 + x2).

This is an improper integral of the 1st kind which does not converge unless k = 0 (whichof course is a nonsense...) In fact, if you perform the convergence test, you obtain that:

limx→∞

ekx

π(1+x2)

( 1x)α

α=2=

1

πlim

x→∞ekx =

0 if k < 0∞ if k > 0,

limx→−∞

ekx

π(1+x2)

( 1x)α

α=2=

1

πlim

x→−∞ekx =

0 if k > 0∞ if k < 0.

Hence, the MGF of a Cauchy random variable does not exist.

Characteristic functions are another class of functions equally useful and whose finitinessis guaranteed.

Definition 8 The characteristic function of X is the function φX : R → C defined by

φX (u) = E(

eiuX)

where i =√−1.

This is a common transformation and is often called the Fourier transform of thedensity f of X if this quantity exists. In this case

φX (u) =

∫

eiuxdF (x) =

∫

eiuxf(x)dx.


The characteristic function of a random variable has several nice properties. Firstlyit always exists and it is finite (in L1): note that

φX (u) = E(

eiuX)

= E (cos (uX) + i sin (uX)) ,

hence4

|cos (uX) + i sin (uX)| :=

√

cos (uX)2 + sin (uX)2 = 1.

Then∣

∣E(

eiuX)∣

∣ ≤ E(∣

∣eiuX∣

∣

)

= 1.

Moreover:

1. if X and Y are independent random variables, φX+Y (u) = φX (u)φY (u) ;

2. if a, b ∈ R and Y = aX + b, then φY (u) = eiubφX (au).

1.2.1 Examples of characteristic functions

Calculations of integrals involving complex numbers are not always pleasant; usually youshould know about contour integration... but for our purposes you can get away withonly knowing about analytic continuation.

Analytic continuation provides a way of extending the domain over which a complexfunction is defined. Let us start from a complex function f (like the characteristicfunction); this function is complex differentiable at z0 and has derivative A if and only if

f (z) = f (z0) + A (z − z0) + o (z − z0) , ∀z ∈ C.

A complex function is said to be analytic on a region D if it is complex differentiable atevery point in D (i.e. has no singularities, i.e. points at which the function “blows up”or becomes degenerate). Now, let f1 and f2 be analytic functions on domains D1 and D2

respectively, with D1 ⊂ D2, such that f1 = f2 on D1

⋂

D2. Then f2 is called the analyticcontinuation of f1 to D2. Moreover, if it exists, the analytic continuation of f1 to D2 isunique.

Consider now the MGF MX of some random variable X; we can say that the function

MX (z) =

∫ ∞

−∞

f (x) ezxdx z ∈ C

is the analytic continuation of MX to the complex plane, if it respects the condition above.Then, the characteristic function of X, φX , is the restriction of MX to the imaginary axis,i.e.

φX (u) = MX (iu)

And now, let’s calculate some characteristic functions.

4Note that this is the complex square of the complex number z = cos (uX) + i sin (uX), and you caninterpret the notation as a norm.

1.3 Conditional expectation 15

1. Let X ∼ N (0, 1). The characteristic function is

φX (u) =1√2π

∫ ∞

−∞

eiux−x2

2 dx.

Now consider the real valued function

MX (k) =1√2π

∫ ∞

−∞

ekx−x2

2 dx = ek2

2 ,

i.e. the MGF of X. Since R⋂

C 6=∅, then MX has analytic continuation on thecomplex plane given by

MX (z) =1√2π

∫ ∞

−∞

ezx−x2

2 dx = ez2

2 z ∈ C.

Therefore, by analytic continuation

φX (u) = MX (iu) = e−u2

2 .

2. Let X be a Poisson random variable with rate u. You can apply the same argumentas above (i.e. analytic continuation) to show that

φX (u) = MX (iu) = eλ(eiu−1).

3. Consider now the Gamma distribution. Analytic continuation implies that

φX (u) =

(

λ

λ − iu

)α

.

4. Assume X is a Cauchy random variable, i.e.

f (x) =1

π (1 + x2).

We cannot use the analytic continuation argument because the function is not ana-lytic (can you spot why?). Here you need to use contour integration and the residuetheorem. You should obtain that

φX (u) = e−|u|.

1.3 Conditional expectation

At the beginning of this Unit, we talked about the problem of setting up a mathematicalmodel of a random experiment, in order to support our decision process. Specifically, wetalked about informed decisions, and we have seen that information in the probability


space is captured by the σ-algebra. Then, in the previous section, we have seen how toquantify a random event by using random variables.

Now, consider as always that some random experiment is performed, whose outcome issome ω ∈ Ω. Imagine that we are given some information, G, about this possible outcome,not enough to know the precise value of ω, but enough to narrow down the possibilities.Then, we can use this information to estimate, although not precisely, the value of therandom variable X (ω). Such an estimate is represented by the conditional expectation ofX given G.

In order to understand the definition of conditional expectation, we need to familiarizefirst with the indicator function. Precisely, we use the notation 1A for

1A (ω) =

1 if ω ∈ A0 otherwise

Hence 1A is a random variable which follows a Bernoulli distribution, taking values 1 withprobability P (A), and 0 with probability P (Ac). Hence E [1A] = P (A). Properties of theindicator function are listed below.

1. 1A + 1AC = 1A∪Ac = 1Ω = 1;

2. 1A∩B = 1A1B.

Now, we are ready for the following.

Definition 9 (Axiomatic definition-Kolmogorov) Let (Ω,F , P) be a probability spaceand X a random variable with E |X| < ∞. Let G be a sub σ-algebra of F . Then the ran-dom variable Y = E [X |G ] is the conditional expectation of X with respect to G if:

1. Y is G-measurable (Y ∈ G).

2. E |Y | < ∞

3. ∀A ∈ G : E (Y 1A) = E (X1A) , i.e.∫

A

Y dP =∫

A

XdP.

The idea is that, if X and G are somehow connected, we can expect the informationcontained in G to reduce our uncertainty about X. In other words, we can better predictX with the help of G. In fact, Definition 9 is telling us that, although the estimateof X based on G is itself a random variable, the value of the estimate E [X |G ] can bedetermined from the information in G (property 1). Further, Y is an unbiased estimatorof X (property 3 with A = Ω).

Example 10 Consider once again the stock price evolution described in Example 7.Suppose you are told that the outcome of the first stock price movement is “up”. Youcan now use this information to estimate the value of S2

E [S2 (ω) |up] = 12p + 4.

1.3 Conditional expectation 17

In this case, G = AU . Similarly,

E [S2 (ω) |down] = 3p + 1,

and G = AD. Question: what is

E [S2 (ω) |G = AUD ]?

Theorem 10 The conditional expectation has the following properties:

1. E [E (X |G )] = E [X] , i.e. E [Y ] = E [X].

2. If G = ∅, Ω (smallest σ-algebra),E [X |G ] = E [X].

3. If G = F , E [X |G ] = X.

4. If X ∈ G, E [X |G ] = X

5. If Z ∈ G, then E [ZX |G ] = ZE [X |G ] = ZY

6. Let G0 ⊂ G, E [E(X |G ) |G0 ] = E [X |G0 ] .

7. Let G0 ⊂ G, E [E (X |G0 ) |G ] = E [X |G0 ] .

8. If X is independent of G, then E [X |G ] = E [X]

Proof. One by one:

1. Check point 3 in the previous definition for A = Ω (remember that Ω ∈ G ...):E [Y 1Ω] = E [X1Ω] but 1Ω = 1.

2. Check point 3 in the axiomatic definition. ForA = ∅, we have

∫

∅

Y dP =

∫

∅

XdP = 0

For A = ΩE [X1Ω] = E [X]

E [E (X) 1Ω] = E [X]

in virtue of property 1. Hence both sides return E [X].

3. Verify the definition of conditional expectation on X for G = F :

• X ∈ F because it is F -measurable by definition of random variable.

• E |X| < ∞ by assumption (axiomatic definition).

• E (Y 1A) = E (X1A) ∀A ∈ G.


In this case you have available the entire “history” of X. Hence you know everythingand therefore there is no uncertainty left.

4. If X ∈ G, then we go back to the same situation as depicted in (3).

5. We prove this property for the simple case of an indicator function; hence, assumeZ = 1B for some B ∈ G; then condition 3 in the definition of conditional expectationreads:

∀A ∈ G E (ZX1A) = E (X1A1B) = E (X1A∩B) .

But ∀A ∩ B ∈ G, condition 3 implies

E (X1A∩B) = E (Y 1A∩B) = E (Y 1A1B) = E (ZY 1A) .

The extension to the case of a more general random variable relies on the construc-tion of a random variable as the limit of the sum of indicator functions. However,this is out of the grasp of this unit.

6. Let Y = E [X|G ] and Z = E [X|G0 ]. If A ∈ G0, then E (Z1A) = E (X1A), butsince G0 ⊂ G, A ∈ G as well, and by definition E (Y 1A) = E (X1A). ThereforeE (Z1A) = E (Y 1A)∀A ∈ G0.

7. Let Z = E [X|G0 ], then Z ∈ G0. Since G0 ⊂ G, it follows that Z ∈ G. ThereforeE [Z|G ] = Z.

8. ∀A ∈ G : E (X1A) = E (X) E (1A) = E [E (X) 1A] .

Exercise 6 Let X1, X2, ... be identically distributed random variables with mean µ, andlet N be a random variable taking values in the non-negative integers and independentof the Xi. Let S = X1 + X2 + ... + XN . Show that E (S|N) = µN and deduce thatE (S) = µE (N).

Exercise 7 We define the conditional variance of a random variable X given a σ-algebraF by

V ar(X|F) = E[(X − E(X|F))2|F].

Show thatV ar (X) = E [V ar(X|F)] + V ar [E(X|F)] .

1.4 Change of measure

Let us go back to the example of measuring the length of a room and of wishing to dothis using different references. If you want to convert meters in feet, you need a “bridge”between the two (1 ft = 0.30 meters). There is something equivalent to this also forprobability measures and it is defined as follows.

1.4 Change of measure 19

Theorem 11 (Radon-Nikodym) If P and P∗ are two probability measures on (Ω,F)such that P ∼ P∗, then there exists a random variable Y ∈ F such that

P∗ (A) =

∫

A

Y dP = E [Y 1A] , ∀A ∈ F . (2)

Y is called the Radon-Nikodym derivative of P∗ with respect to P and is also written as

Y =dP∗

dP

Remark 2 From the discussion in Section 1.1, it should be obvious by now that Y is nota proper derivative but more something like a likelihood ratio.

Example 11 Consider Example 6. Here we defined two metrics on the interval [a, b],0 ≤ a ≤ b ≤ 1:

P (the number chosen is in [a, b]) = P [a, b] := b − a

P∗ (the number chosen is in [a, b]) = P∗ [a, b] := b2 − a2.

We could be more specific and say that

P [a, b] =

∫ b

a

dω =

∫

[a,b]

dP (ω) ;

P∗ [a, b] =

∫ b

a

2ωdω =

∫

[a,b]

2ωdP (ω) .

The last equation is (2) with Y (ω) = 2ω.

Exercise 8 Consider the usual probability space (Ω, F, P) and a standard normal randomvariable X, i.e. X ∼ N (0, 1). Define a new random variable Y as Y = X + θ, and letP (A) be another probability measure on Ω, defined by

dPdP

= Z,

where

Z = e−θx− θ2

2 .

Show that Y ∼ N (0, 1) on(

Ω, F, P)

.

Note that for any random variable X,

E∗ [X] =

∫

XdP∗ =

∫

XY dP = E [XY ] .


Theorem 12 (Bayes formula) Let P and P∗ be two equivalent probability measures onthe same measurable space (Ω,F) and let

Y =dP∗

dP

be the Radon-Nikodym derivative of P∗ with respect to P. Furthermore, let X be a randomvariable on (Ω,F , P∗) such that E∗ |X| < ∞ and G ∈ F a sub σ-algebra of F . Then thefollowing generalised version of the Bayes formula holds:

E∗ [X |G ] =E [XY |G ]

E [Y |G ].

Proof. Let Z = E∗ [X |G ]. By definition: Z ∈ G, E∗ |Z| < ∞ and E∗ (Z1A) =E∗ (X1A) ∀A ∈ G. Hence

∫

A

ZdP∗ =∫

A

XdP∗

⇔∫

A

ZY dP =∫

A

XY dP

⇔ E (ZY 1A) = E (XY 1A)

Now

E (XY 1A) = E [E (XY |G ) 1A] ;

E (ZY 1A) = E [E (ZY |G ) 1A] , ∀A ∈ G.

Then

E [(E (ZY |G ) − E (XY |G )) 1A] ,

which implies that E (ZY |G ) = E (XY |G ). Since Z ∈ G, E (XY |G ) = E (Y |G ) Z.We will use this rule to link expectations calculated in a particular “universe” to the

ones calculated in another universe.

1.5 Some more exercises

1. a) Formally define the components of any probability space Θ = (Ω,F ,P) .

b) Let Ω = 1, 2, 3, 4, 5 and let U be the collection

U = 1, 2, 3 , 3, 4, 5 .

Find the smallest σ-algebra F (U) generated by U .

c) Define X : Ω → R by

X (1) = X (2) = 0; X (3) = 10; X (4) = X (5) = 1.

Define the condition of F -measurability for X. Check if X is measurable withrepsect to F (U).

21

d) Define Y : Ω → R by

Y (1) = 0; Y (2) = Y (3) = Y (4) = Y (5) = 1.

Find the σ-algebra F (Y ) generated by Y and show that Y is F (Y )-measurable.

2. Let X be a non-negative random variable defined on a probability space (Ω, F, P)with exponential distribution, which is

P (X ≤ x) = FX (x) = 1 − e−λx, x ≥ 0,

where λ is a positive constant. Let λ be another positive constant, and define

Z =λ

λe−(λ−λ)X .

Define P by

P (A) =

∫

A

ZdP for all A ∈ F.

(a) Show that P (Ω) = 1.

(b) Compute the cumulative distribution function

P (X ≤ x) for x ≥ 0

for the random variable X under the probability measure P.

A Set theory: quick reminder

For further references, you can look at Grimmett and Stirzaker, and Schaum (Chapter2).

A.1 Sets, elements and subsets

• a ∈ A: stays for “ a is an element of set S”;

• if a ∈ A implies (⇒, in short) a ∈ B, then A is a subset of B, or A ⊆ B, which isread “ A is contained in B”;

• A = B ⇐⇒ (read: “if and only if”) A ⊆ B and B ⊆ A;

• Negations:a /∈ A; A * B; A 6= B

• If A ⊆ B and A 6= B, then A ⊂ B (proper subset)

• An example: let A = 1, 3, 5, 7, 9 ; B = 1, 2, 3, 4, 5 ; C = 3, 5

22 A SET THEORY: QUICK REMINDER

• C ⊂ A

• C ⊂ B

• A * B

• B * A

• Sets can be specified in

– tabular form (roster method): A = 1, 3, 5, 7, 9– set-builder form (property method): B = x : x is an even integer, x > 0

• Special sets:

– Universal set U

– Empty set ∅: S = x : x is a positive integer, x2 = 3 = ∅

A.2 Union and intersection

• Union of A and B: set of all elements which belong either to A, B, or both:

A⋃

B := x : x ∈ A or x ∈ B

• Intersection of A and B: set of all elements which belong to both A and B:

A⋂

B := x : x ∈ A and x ∈ B

• If A⋂

B = ∅, then A and B are disjoint.

• If A ⊆ B, then

A⋃

B = B

A⋂

B = A

A.2.1 Properties

• A⋃

∅ = A; A⋂

∅ = ∅

• If A ⊆ U , A⋃

U = U and A⋂

U = A

• Commutative Law

A⋃

B = B⋃

A

A⋂

B = B⋂

A

A.3 Complements and difference 23

• Associative Law(

A⋃

B)

⋃

C = A⋃

(

B⋃

C)

(

A⋂

B)

⋂

C = A⋂

(

B⋂

C)

• Distributive Law

A⋃

(

B⋂

C)

=(

A⋃

B)

⋂

(

A⋃

C)

A⋂

(

B⋃

C)

=(

A⋂

B)

⋃

(

A⋂

C)

• Idempotent Law

A⋃

A = A

A⋂

A = A

A.3 Complements and difference

• The set (absolute) complement of A is defined as

AC = x : x ∈ U, x /∈ A

i.e. the set of elements which do not belong to A;

• The set relative complement of B with respect to A (or difference of A and B) isdefined as

A\B = x : x ∈ A, x /∈ B

! Note that

A\B = A⋂

BC

A\(

B⋃

C)

= (A\B)⋂

(A\C)

A\(

B⋂

C)

= (A\B)⋃

(A\C)

Example 12 Let

U = 1, 2, 3, 4, 5, ...A = 1, 2, 3B = 3, 4, 5, 6, 7

then

Ac = 4, 5, 6, ...A\B = 1, 2

Note:

• A⋃

Ac = U

• A⋂

Ac = ∅

24 B MODES OF CONVERGENCE OF A RANDOM VARIABLE

A.3.1 Properties

• (Ac)c = A

• if A ⊂ B, then Bc ⊂ Ac

• De Morgan Laws

– (A⋃

B)c = Ac⋂

Bc

– (A⋂

B)c = Ac⋃

Bc

A.4 Further definitions

• A × B := (x, y) : x ∈ A, y ∈ B is the Cartesian product of A and B

• A is finite if it is empty or if it consists of exactly n elements, where n is a positiveinteger;

• Otherwise A is infinite;

• A is countable if it is finite or if its elements can be listed in the form of a sequence(countable infinite)

• Otherwise A is uncountable

Example 13 • A = letters of the English alphabet

• D = days of the week

• R = x : x is a river on Earth

• Y = x : x is a positive integer, x is even = 2, 4, 6, 8, ...

• I = x : 0 ≤ x ≤ 1

B Modes of convergence of a random variable

Let Xmm∈Nbe a sequence of random variables, and let X be another random variable.

Then:

• ALMOST SURE CONVERGENCE: Xma.s→ X if, ∀ε > 0, the event

ω ∈ Ω :Xm (ω) → X (ω) as m → ∞ has probability 1.

• CONVERGENCE IN PROBABILITY: XmP→ X if, ∀ε > 0

limm→∞

P (|Xm − X|> ε) = 0.

B.1 Further convergences 25

• CONVERGENCE IN Lp (in Lp mean): XmL?p

→ X if

limm→∞

E (|Xm − X| p) = 0

• CONVERGENCE IN DISTRIBUTION: XmD→ X

limm→∞

P (Xm ≤ x) = P (X ≤ x) ∀x ∈ R.

B.1 Further convergences

• MONOTONE CONVERGENCE: if 0 ≤ Xm ↑ X a.s., then E (Xm) ↑ E (X) < ∞,or equivalently, limm→∞ E (Xm) = E (limm→∞Xm) = E (X) , as X = limm→∞ Xm.

• DOMINATED CONVERGENCE: for Xm → X a.s., if |Xm| ≤ Y (ω) with E (Y ) <∞, then

E (|Xm − X|) → 0

In other words E (Xm) ↑ E (X) or limm→∞ E (Xm) = E (X).

• BOUNDED CONVERGENCE THEOREM: for Xm → X a.s., if |Xm| ≤ K

E (|Xm − X|) → 0

implied by dominated convergence.

stochastic calculus notes 1/5

Documents