numerical methods in mathematical nanceversion: october 12, 2015 preface these notes are the basis...

Numerical methods in mathematical finance

Tobias Jahnke

Karlsruher Institut fur TechnologieFakultat fur Mathematik

Institut fur Angewandte und Numerische Mathematik

[email protected]

© Tobias Jahnke, Karlsruhe 2015

Version: October 12, 2015

Preface

These notes are the basis of my lecture Numerical methods in mathematical finance givenat Karlsruhe Institute of Technology in the winter term 2014/15. The purpose of thisnotes is to help students who have missed parts of the course to fill these gaps, and toprovide a service for those students who can concentrate better if they do not have tocopy what I write on the blackboard.

It is not the purpose of these notes, however, to replace the lecture itself, or to write atext which could compete with the many excellent books about the subject. This is whythe style of presentation is rather sketchy. As a rule of thumb, one could say that thesenotes only cover what I write during the lecture, but not everything I say.

There are still many typos and possibly also other mistakes. Of course, I will try tocorrect any mistake I find as soon as possible, but please be aware of the fact that youcannot rely on these notes.

Karlsruhe, winter term 2014/15,Tobias Jahnke

i

Contents

I Mathematical models in option pricing 1

1 Options and arbitrage 21.1 European options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 More types of options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Arbitrage and modelling assumptions . . . . . . . . . . . . . . . . . . . . . 41.4 Arbitrage bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 A simple discrete model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Stochastic differential equations 92.1 The Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Stochastic differential equations and the Ito formula . . . . . . . . . . . . . 112.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 The Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Extension to higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . 17

3 The Black-Scholes equation 193.1 Geometric Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Derivation of the Black-Scholes equation . . . . . . . . . . . . . . . . . . . 213.3 Black-Scholes formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Risk-neutral valuation and equivalent martingale measures . . . . . . . . . 273.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

II Numerical methods 33

4 Binomial methods 344.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Discrete Black-Scholes formula . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Numerical methods for stochastic differential equations 405.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Euler-Maruyama method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iii

5.2.2 Weak and strong convergence . . . . . . . . . . . . . . . . . . . . . 435.2.3 Existence and uniqueness of solutions of SDEs . . . . . . . . . . . . 435.2.4 Strong convergence of the Euler-Maruyama method . . . . . . . . . 445.2.5 Weak convergence of the Euler-Maruyama method . . . . . . . . . . 48

5.3 Higher-order methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.4 Numerical methods for systems of SDEs . . . . . . . . . . . . . . . . . . . 585.5 Mean-square-error of the Monte-Carlo simulation . . . . . . . . . . . . . . 59

6 Pseudo-random numbers and Monte Carlo simulation 616.1 Pseudo-random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.1.1 Uniform pseudo-random numbers . . . . . . . . . . . . . . . . . . . 616.1.2 Normal pseudo-random numbers . . . . . . . . . . . . . . . . . . . 626.1.3 Correlated normal random vectors . . . . . . . . . . . . . . . . . . . 65

6.2 Monte-Carlo integration and variance reduction . . . . . . . . . . . . . . . 666.3 Quasi Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Finite-difference methods for parabolic differential equations 737.1 Motivation and model problem . . . . . . . . . . . . . . . . . . . . . . . . 737.2 Space discretization with finite differences . . . . . . . . . . . . . . . . . . 747.3 Time discretization with Runge-Kutta methods . . . . . . . . . . . . . . . 797.4 Approximation of the heat equation in time and space . . . . . . . . . . . 867.5 Application to the Black-Scholes equation . . . . . . . . . . . . . . . . . . 897.6 Non-smooth initial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8 Finite-difference methods for Asian options 978.1 Modelling Asian options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978.2 Finite-difference methods for convection-diffusion equations . . . . . . . . . 1008.3 Analysis of numerical methods applied to the transport equation . . . . . . 101

9 Finite-difference methods for American options 1069.1 Modelling American options . . . . . . . . . . . . . . . . . . . . . . . . . . 1069.2 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089.3 An iterative method for linear complementary problems . . . . . . . . . . . 1109.4 Summary: Pricing American options with the projected SOR method . . . 116

A Some definitions from probability theory 117

B The Ito integral 119B.1 The Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119B.2 Construction of the Ito integral . . . . . . . . . . . . . . . . . . . . . . . . 121B.3 Sketch of the proof of the Ito formula (Theorem 2.2.2). . . . . . . . . . . . 128

iv

Part I

Mathematical models in optionpricing

1

Chapter 1

Options and arbitrage

References: [BK04, Sey09]

1.1 European options

Financial markets trade investments into stocks of a company, commodities (e.g. oil, gold),etc.Stocks and commodities are risky assets, because their future value cannot be predicted.Bonds are considered as riskless assets in this lecture. If B(t0) is invested at time t0 intoa bond with a risk-free interest rate r > 0, then the value of the bond at time t ≥ t0 issimply

B(t) = er(t−t0)B(t0). (1.1)

Simplifying assumption: continuous payment of interestSpot contract: buy or sell an asset (e.g. a stock, a commodity etc.) with immediate de-liveryFinancial derivatives: contracts about future payments or deliveries with certain condi-tions

1. Forwards and futures: agreement between two parties to buy or sell an asset at acertain time in the future for a certain delivery price

2. Swaps: contracts regulating an exchange of cash flows at different future times(e.g. currency swap, interest rate swaps, credit default swaps)

3. Options

Definition 1.1.1 (European option)

• A European call option is a contract which gives the holder (=buyer) of the optionthe right to buy an underlying risky asset at a future maturity date (expiration time)T at a fixed exercise price (strike) K from the writer (=seller) of the option.Typical assets: stocks, parcels of stocks, stock indices, currencies, commodities, ...Difference to forwards and futures: At maturity the holder can choose if he wantsto buy the asset or not.

2

Tobias Jahnke October 12, 2015 – 3

• European put option: Similar to call option, but vice versa, i.e. the holder cansell the underlying to the writer.

Example: At time t = 0 Mr. J. buys 5 European call options. Each of these optionsgives him the right to buy 10 shares of the company KIT at maturity T > 0 at the exerciseprice of K = 120e per share.

• Case 1: At time t = T , the market price of KIT is 150e per share.Mr. J. exercises his options, i.e. he buys 5 · 10 = 50 KIT shares at the price ofK = 120e per share and sells the shares on the market for 150e per share. Hence,he wins 50 · 30 = 1500e.

• Case 2: At time t = T , the market price of KIT is 100e per share.Hence, Mr. J. does not exercise his options.

What are options good for?

• Speculation

• Hedging (“insurance” against changing market values)

Since an option gives an advantage to the holder, the option has a certain value.For given T and K the value V (t, S) of the option must depend on the time t and thecurrent price S of the underlying.For an European option we know that the value at the maturity T is

V (T, S) =

(S −K)+ := maxS −K, 0 (European call)

(K − S)+ := maxK − S, 0 (European put).

The functions S 7→ (S −K)+ and S 7→ (K − S)+ are called the payoff functions of acall or put, respectively.

The goal of this course is to answer the following question:

What is the fair price V (t, S) of an option for t < T?

Why is this question important? In order to sell/buy an option, we need to know the fairprice.Why is this question non-trivial? Because the value of the risky asset is random. Inparticular, the price S(T ) at the future expiration time T is not yet known when webuy/sell the option at time t = 0.

1.2 More types of options

Variations of the basic principle:

• European options can be exercised only at the maturity date.

4 – October 12, 2015 Numerical methods in mathematical finance

• American options can be exercised at any time before and including the maturitydate.

• Bermuda options can be exercised at a set of times.

The names “European”, “American”, “Bermuda” etc. have no geographical meaning.American options can be traded in Europe, European options can be traded in the USA,etc.

• Vanilla options = standard options, i.e. European, American or Bermuda calls/puts

• Exotic options = non-standard options

Examples for exotic options:

• Path-dependent option: The payoff function does not only depend on the priceS(T ) of the underlying at time T , but on the entire path t 7→ S(t) for t ∈ [0, T ].

Asian options: The payoff function depends on the average price, e.g. 1

T

T∫0

S(t) dt−K

+

(payoff of an average price call).

Barrier options: The payoff depends on the question if the price of theunderlying has crossed a certain (upper or lower) barrier.

Lookback options: The payoff depends on maxt∈[0,T ] S(t) or mint∈[0,T ] S(t).

• Options on several assets:

Basket options: The payoff depends on the weighted sum of the prices Si ofseveral assets, e.g. (

d∑i=1

ciSi −K

)+

, ci > 0

(payoff of a basket call)

Rainbow options: The payoff depends on the relation between the assets,e.g. maxS1, . . . , Sd.

• Binary options: The payoff function has only two possible values

• Compound options: Options on options

Remark: There are even more types of options.

1.3 Arbitrage and modelling assumptions

Example. Consider

• a stock with price S(t)

• a European call option with maturity T = 1, strike K = 100, and value V (t, S(t))


• a bond with price B(t)

Initial data: S(0) = 100, B(0) = 100, V (0) = 10.Assumption: At time t = 1, we either have

“up”: B(1) = 110, S(1) = 120or

“down”: B(1) = 110, S(1) = 80

At t = 0, Mrs. C. buys 0.4 bonds, one call option and sells 0.5 stock (“short selling”).Value of the portfolio at t = 0:

0.4 ·B(0) + 1 · V (0)− 0.5 · S(0) = 0.4 · 100 + 1 · 10− 0.5 · 100 = 0

Value of the portfolio at t = 1 is

0.4 ·B(1) + 1 · V (1, S(1))︸︷︷︸=(S(1)−K)+

−0.5 · S(1)

Two cases:

“up”: 0.4 · 110 + 1 · (120− 100)+ − 0.5 · 120 = 44 + 20− 60 = 4

“down”: 0.4 · 110 + 1 · (80− 100)+ − 0.5 · 80 = 44 + 0− 40 = 4

In both cases, Mrs. C. wins 4e without any risk or investment!Why is this possible? Because the price V (0) = 10 of the option is too low!

Definition 1.3.1 (Arbitrage) Arbitrage is the existence of a portfolio, which

• requires no initial investment, and

• which cannot cause any loss, but very likely a gain.

Remark. A bond will always yield a risk-less gain, but it requires an investment.

Assumptions for modelling an idealized market:

(A1) Arbitrage is impossible (no-arbitrage principle).

(A2) There is a risk-free interest rate r > 0 which applies for all credits. Continuouspayment of interest according to (1.1).

(A3) No transaction costs, taxes, etc. Trading is possible at any time. Any fraction ofan asset can be sold. Liquid market, i.e. selling an asset does not change its valuesignificantly.

(A4) A seller can sell assets he/she does not own yet (“short selling”, cf. Mrs. C. above)

(A5) No dividends on the underlying asset are paid.


Remark. Discrete payment of interest: obtain r · ∆t · B(0) after time ∆t. Value att = n∆t:

B(t) = (1 + r ·∆t)nB(0) = (1 + rt/n)nB(0)

For n −→∞ and ∆t −→ 0:

limn→∞

B(t) = limn→∞

(1 + rt/n)nB(0) = ertB(0) = B(t)

(continuous payment of interest)

1.4 Arbitrage bounds

Consider European options with strike K > 0 and maturity T on an underlying with priceS(t). Let VP (t, S) and VC(t, S) be the values of a put option and call option, respectively.

Lemma 1.4.1 (Put-call parity) Under the assumptions (A1)-(A5) we have

S(t) + VP (t, S(t))− VC(t, S(t)) = e−r(T−t)K

for all t ∈ [0, T ].

Proof. Buy one stock, buy a put, write (sell) a call. Then, the value of this portfolio is

π(t) = S(t) + VP (t, S(t))− VC(t, S(t))

and at maturity

π(T ) = S(T ) + VP (T, S(T ))− VC(T, S(T )) = S(T ) + (K − S(T ))+ − (S(T )−K)+ = K.

Hence, the portfolio is risk-less. No arbitrage: The profit of the portfolio must be thesame as the profit for investing π(t) into a bond at time t:

π(T ) = K!

= er(T−t)π(t) =⇒ e−r(T−t)K = π(t) = S(t) + VP (t, S(t))− VC(t, S(t)).

Lemma 1.4.2 (Bounds for European calls and puts) Under the assumptions (A1)-(A5), the following inequalities hold for all t ∈ [0, T ] and all S = S(t) ≥ 0:(

S − e−r(T−t)K)+ ≤ VC(t, S) ≤ S (1.2)(

e−r(T−t)K − S)+ ≤ VP (t, S) ≤ e−r(T−t)K (1.3)

Proof.

• It is obvious that VC(t, S) ≥ 0 and VP (t, S) ≥ 0 for all t ∈ [0, T ] and S ≥ 0.


• Assume that VC(t, S(t)) > S(t) for some S(t) ≥ 0.Write (sell) a call, buy the stock and put the difference δ := VC(t, S(t))− S(t) > 0in your pocket.At t = T , there are two scenarios:

If S(T ) > K: Must sell stock at the price K to the owner of the call.Gain: K + δ > 0

If S(T ) ≤ K: Gain S(T ) + δ > 0

=⇒ Arbitrage! Contradiction!

• Put-call parity:

S − e−r(T−t)K = VC(t, S)− VP (t, S)︸︷︷︸≥0

≤ VC(t, S)

This proves (1.2). The proof of (1.3) is left as an exercise.

Remark. Similar inequalities can be shown for American options (exercise).

1.5 A simple discrete model

Consider

• a stock with price S(t)

• a European option with maturity T , strike K, and value V (t, S(t))

• a bond with price B(t) = ertB(0)

Suppose that the initial data S(0) = S0 and B(0) = 1 are known, and that (A1)-(A5)hold. Goal: Find V (0, S0).

Simplifying assumption: At time t = T , there are only two scenarios

“up”: S(T ) = u · S0 with probability por

“down”: S(T ) = d · S0 with probability 1− p

Assumption: 0 < d ≤ erT ≤ u and p ∈ (0, 1)In both cases, we have B(T ) = erTB(0) = erT .

Replication strategy: Construct portfolio with c1 bonds and c2 stocks such that

c1B(t) + c2S(t)!

= V (t, S(t))

for t ∈ 0, T. For t = T , this means

case “up”: c1erT + c2uS0

!= V (T, uS0) =: Vu

case “down”: c1erT + c2dS0

!= V (T, dS0) =: Vd


Vu and Vd are known if u and d are known. The unique solution is (check!)

c1 =uVd − dVu(u− d)erT

c2 =Vu − Vd

(u− d)S0

.

Hence, the fair price of the option is

V (0, S0) = c1B(0)︸︷︷︸=1

+c2S0 =uVd − dVu(u− d)erT

+Vu − Vd(u− d)

which yields (check!)

V (0, S0) = e−rT(qVu + (1− q)Vd

)with q :=

erT − du− d

. (1.4)

Remark: The value of the option does not depend on p.

Since 0 < d ≤ erT ≤ u by assumption, q ∈ [0, 1] can be seen as a probability. Now, definea new probability distribution Pq by

Pq(S(T ) = uS0

)= q, Pq

(S(T ) = dS0

)= 1− q

(q instead of p). Then, we have

Pq(V (T, S(T )) = Vu

)= q, Pq

(V (T, S(T )) = Vd

)= 1− q

and hence

qVu + (1− q)Vd = Eq(V (T, S(T ))

)can be regarded as the expectation of the payoff V (T, S(T )) with respect to Pq. In (1.4),this expectation is multiplied by an discounting factor e−rT .Interpretation: In order to have an amount of B(t) at time t, we have to invest B(0) =e−rtB(t) into a bond at time t = 0.

The probability q has the property that

Eq(S(T )) = quS0 + (1− q)dS0 =erT − du− d

uS0 +u− erT

u− ddS0 = erTS0.

Hence, the expected (with respect to Pq) value of S(T ) is exactly the amount we obtainwhen we invest S0 into a bond. Therefore, Pq is called the risk-neutral probability.

Moral of the story so far:

Under the risk-neutral probability, the price of a Europeanoption is the discounted expectation of the payoff.

Chapter 2

Stochastic differential equations

Let (Ω,F ,P) be a probability space1: Ω 6= ∅ is a set, F is a σ-algebra (or σ-field) on Ω,and P : F −→ [0, 1] is a probability measure.A probability space is complete if F contains all subsets G of Ω with P-outer measurezero, i.e. with

P∗(G) := infP(F ) : F ∈ F and G ⊂ F

= 0.

Any probability space can be completed. Hence, we can assume that every probabilityspace in this lecture is complete.

2.1 The Wiener process

Definition 2.1.1 (Stochastic process) Let T be an ordered set (e.g. T = [0,∞), T =N). A stochastic process is a family X = Xt : t ∈ T of random variables

Xt : Ω −→ Rd.

Below, we will often simply write Xt instead of Xt : t ∈ T .Equivalent notations: X(t, ω), X(t), Xt(ω), Xt, . . .For a fixed ω ∈ Ω, the function t 7→ Xt(ω) is called a realization (or path or trajectory)of X.

The path of a stochastic process is associated to some ω ∈ Ω. As time evolves, moreinformation about ω becomes available.

Example (cf. chapter 2 in [Shr04]). Toss a coin three times. Possible results are:

ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8

HHH HHT HTH HTT THH THT TTH TTT

(H = heads, T = tails).

1See “2.2.2 What is (Ω,F ,P) anyway?” in the book [CT04] for a nice discussion of this concept.

9


• Before the first toss, we only know that ω ∈ Ω.

• After the first toss, we know if the final result will belong to

HHH,HHT,HTH,HTT or to THH, THT, TTH, TTT.

These sets are “resolved by the information”. Hence, we know in which of the sets

w1, w2, w3, w4, w5, w6, w7, w8

ω is.

• After the second toss, the sets

HHH,HHT, HTH,HTT, THH, THT, TTH, TTT

are resolved, and we know in which of the sets

w1, w2, ω3, w4, w5, w6, w7, w8

ω is.

This motivates the following definition.

Definition 2.1.2 (Filtration)

• A filtration is a family Ft : t ≥ 0 of sub-σ-algebras of F such that Fs ⊂ Ft forall t ≥ s ≥ 0.A filtration models the fact that more and more information about a process is knownas time evolves.

• If Xt : t ≥ 0 is a family of random variables and Xt is Ft-measurable, thenXt : t ≥ 0 is adapted to (or nonanticipating with respect to) Ft : t ≥ 0.Interpretation: At time t we know for each set S ∈ Ft if ω ∈ S or not. The valueof Xt is revealed at time t.

• For every s ∈ [0, t] let σXs be the σ-algebra generated by Xs, i.e. the smallestσ-algebra on Ω containing the sets

X−1s (B) for all B ∈ B

where B denotes the Borel σ-algebra. By definition σXs is the smallest σ-algebrawhere Xs is measurable.

A very important stochastic process is the Wiener process; cf. section B.1 in the appendix.

Filtration of the Wiener process. The natural filtration of the Wiener process on[0, T ] is given by

Ft : t ∈ [0, T ], Ft = σWs, s ∈ [0, t]

(cf. Definition 2.1.2). For technical reasons, however, it is more advantageous to usean augmented filtration called the standard Brownian filtration. See pp. 50-51 in[Ste01] for details.


2.2 Stochastic differential equations and the Ito for-

mula

Definition 2.2.1 (SDE) A stochastic differential equation (SDE) is an equationof the form

X(t) = X(0) +

t∫0

f(s,X(s)

)ds+

t∫0

g(s,X(s)

)dW (s). (2.1)

The solution X(t) of (2.1) is called an Ito process.

The last term is an Ito integral, with W (t) denoting the Wiener process. An informalconstruction of the Ito integral can be found in the appendix of these notes and will bepresented in the problem class. The functions f : R × R −→ R and g : R × R −→ Rare called drift and diffusion coefficients, respectively. These functions are typically givenwhile X(t) = X(t, ω) is unknown.This equation is actually not a differential equation, but an integral equation! Oftenpeople write

dXt = f(t,Xt)dt+ g(t,Xt)dWt

as a shorthand notation for (2.1). Some people even “divide by dt” in order to make theequation look like a differential equation, but this is more than audacious since “dWt/dt”does not make sense.

Two special cases:

• If g(t,X(t)

)≡ 0, then (2.1) is reduced to

X(t) = X(0) +

t∫0

f(s,X(s)

)ds.

If X(t) is differentiable, this is equivalent to the initial value problem

dX(t)

dt= f

(t,X(t)

), X(0) = X0.

• For f(t,X(t)

)≡ 0, g

(t,X(t)

)≡ 1 and X(0) = 0, (2.1) turns into

X(t) = X(0)︸︷︷︸=0

+

t∫0

f(s,X(s)

)ds

︸︷︷︸=0

+

t∫0

g(s,X(s)

)︸︷︷︸=1

dW (s) = W (t)−W (0) = W (t).


Computing Riemann integrals via the basic definition is usually very tedious. The fun-damental theorem of calculus provides an alternative which is more convenient in mostcases. For Ito integrals, the situation is similar: The approximation via elementary func-tions which is used to define the Ito integral is rarely used to compute the integral. Whatis the counterpart of the fundamental theorem of calculus for the Ito integral?

Theorem 2.2.2 (Ito formula) Let Xt be the solution of the SDE

dXt = f(t,Xt)dt+ g(t,Xt)dWt

and let F (t, x) be a function with continuous partial derivatives ∂F∂t

, ∂F∂x

, and ∂2F∂x2

. Then,we have for Yt := F (t,Xt) that

dYt =∂F

∂tdt+

∂F

∂xdXt +

1

2

∂2F

∂x2g2dt

=

(∂F

∂t+∂F

∂xf +

1

2

∂2F

∂x2g2

)dt+

∂F

∂xgdWt. (2.2)

with f = f(t,Xt), g = g(t,Xt),∂F

∂x=∂F (t,Xt)

∂x, and so on.

Notation. From now on, the partial derivatives of some function u(t, x) will be denotedby

∂tu :=∂u

∂t, ∂xu :=

∂u

∂x, ∂2

xu :=∂2u

∂x2

and so on. Evaluations of the derivatives of F are to be understood in the sense of, e.g.,

∂xF (s,Xs) := ∂xF (t, x)∣∣(t,x)=(s,Xs)

and so on.

The proof of Theorem 2.2.2 is sketched in the Appendix, cf. B.3.

Remarks:

1. If y(t) is a smooth deterministic fuctions, then according to the chain rule thederivative of t 7→ F

(t, y(t)

)is

d

dtF(t, y(t)

)= ∂tF

(t, y(t)

)+ ∂xF

(t, y(t)

)· dy(t)

dt

and in shorthand notation

dF = ∂tFdt+ ∂xFdy.

The Ito formula can be considered as a stochastic version of the chain rule, but theterm 1

2∂2xF ·g2dt is surprising since such a term does not appear in the deterministic

chain rule.


2. Let f(t,Xt) = 0, g(t,Xt) = 1, Xt = Wt and suppose that F (t, x) = F (x) does notdepend on t. Then, the Ito formula yields for Yt := F (Wt) that

dYt = F ′(Wt)dWt +1

2F ′′(Wt)dt

which is the shorthand notation for

F (Wt) = F (W0) +

t∫0

F ′(Ws)dWs +1

2

t∫0

F ′′(Ws)ds.

This can be seen as a counterpart of the fundamental theorem of calculus. Again,the last term is surprising, because for a suitable deterministic function v(t) = vtwe obtain

F (vt) = F (v0) +

t∫0

F ′(vs)dvs.

Example 1. Consider the integral ∫ t

0

Ws dWs.

Xt := Wt solves the SDE with f(t,Xt) ≡ 0 and g(t,Xt) ≡ 1. For

F (t, x) = x2, Yt = F (t,Xt) = X2t = W 2

t

the Ito formula

dYt =

(∂tF + ∂xF · f +

1

2∂2xF · g2

)dt+ ∂xF · g dWt

yields

d(W 2t ) = 0 + 0 +

1

2· 2 · 12 dt+ 2Wt · 1 dWt = dt+ 2Wt dWt

=⇒ Wt dWt =1

2

(d(W 2

t )− dt)

This means that

t∫0

Ws dWs =1

2

t∫0

1 d(W 2s )− 1

2

t∫0

1 ds =1

2W 2t −

1

2t.


Example 2. The solution of the SDE

dYt = µYt dt+ σYt dWt

with constants µ, σ ∈ R and deterministic initial value Y0 ∈ R is given by

Yt = Y0 exp

((µ− σ2

2

)t+ σWt

).

This process is called a geometric Brownian motion and is often used in mathematicalfinance to model stock prices (see below).

Proof. Let f(t,Xt) ≡ 0, g(t,Xt) ≡ 1, Xt = Wt as before, but now with

F (t, x) = exp

((µ− σ2

2

)t+ σx

)Y0

and derivatives

∂tF (t, x) =(µ− σ2

2

)F (t, x), ∂ixF (t, x) = σiF (t, x), i ∈ 1, 2.

Hence, the Ito formula applied to Yt = F (t,Xt) = F (t,Wt) yields

dYt =

((µ− σ2

2

)Yt + 0 +

1

2σ2Yt · 1

)dt+ σYt · 12 dWt

= µYt dt+ σYt dWt.

2.3 Martingales

Recall the definition of the expectation of a random variable X:

E(X) =

∫Ω

X(ω) dP(ω)

Definition 2.3.1 (conditional expectation) Let X be an integrable random variable,and let G be a sub-σ-algebra of F . Then, Y is a conditional expectation of X withrespect to G if Y is G-measurable and if

E(X1A) = E(Y 1A) for all A ∈ G,

⇔∫A

X(ω) dP(ω) =

∫A

Y (ω) dP(ω) for all A ∈ G.

In this case, we write Y = E(X | G).


“This definition is not easy to love. Fortunately, love is not required.”J.M. Steele in [Ste01], p. 45.

Interpretation. E(X | G) is a random variable on (Ω,G,P) and hence on (Ω,F ,P), too.Roughly speaking, E(X | G) is the best approximation of X detectable by the events inG. The more G is refined, the better E(X | G) approximates X.

Examples.

1. If G = Ω, ∅, then E(X | G) = E(X).

2. If G = F , then E(X | G) = X.

3. If F ∈ F with P(F ) > 0 and

G = ∅, F,Ω \ F,Ω

then it can be shown that

E(X | G)(ω) =

1

P(F )

∫F

XdP if ω ∈ F

1

P(Ω \ F )

∫Ω\F

XdP if ω ∈ Ω \ F.

Lemma 2.3.2 (Properties of the conditional expectation) For all integrable ran-dom variables X and Y and all sub-σ-algebras G ⊂ F , the conditional expectation has thefollowing properties:

• Linearity: E(X + Y | G) = E(X | G) + E(Y | G)

• Positivity: If X ≥ 0, then E(X | G) ≥ 0.

• Tower property: If H ⊂ G ⊂ F are sub-σ-algebras, then

E(E(X | G) | H

)= E(X | H)

• E(E(X | G)

)= E(X)

• Factorization property: If Y is G-measurable and |XY | and |Y | are integrable, then

E(XY | G) = Y E(X | G)

Proof: Exercise.

Definition 2.3.3 (martingale) Let Xt be a stochastic process which is adapted to afiltration Ft : t ≥ 0 of F . If

1. E(|Xt|) <∞ for all 0 ≤ t <∞, and

2. E(Xt|Fs) = Xs for all 0 ≤ s ≤ t <∞,

then Xt is called a martingale. A martingale Xt is called continuous if there is a setΩ0 ⊂ Ω with P(Ω0) = 1 such that the path t 7→ Xt(ω) is continuous for all ω ∈ Ω0.


Interpretation: A martingale models a fair game. Observing the game up to time sdoes not give any advantage for future times.

Examples. It can be shown that each of the following processes is a continuous martingalewith respect to the standard Brownian filtration:

Wt, W 2t − t, exp

(αWt −

α2

2t

)Proof: Exercise.

Theorem 2.3.4 (The Ito integral as a martingale) The Ito integral

X(t, ω) =

t∫0

u(s, ω) dW (s, ω).

of a function u ∈ H2[0, T ] is a continuous martingale with respect to the standard Brow-nian filtration.

Proof: Theorem 6.2, p. 83 in [Ste01].

Remark: The space H2[0, T ] is defined in Definition B.2.1 in the appendix. If u ∈ L2loc

(see step 4 in section B.2 in the appendix), then the Ito integral is only a local martingale;cf. Proposition 7.7 in [Ste01].

2.4 The Feynman-Kac formula

Let Xt be the solution of the SDE

dXt = f(t,Xt)dt+ g(t,Xt)dWt, t ∈ [t0, T ], Xt0 = ξ

with suitable functions f and g. Let u(t, x) be the solution of the (deterministic) partialdifferential equation (PDE)

∂tu(t, x) + f(t, x)∂xu(t, x) +1

2g2(t, x)∂2

xu(t, x) = 0, t ∈ [t0, T ], x ∈ R

with terminal condition

u(T, x) = ψ(x)

for some ψ : R −→ R. Apply the Ito formula (Theorem 2.2.2) to u(t,Xt):

du(t,Xt) =(∂tu(t,Xt) + f(t,Xt)∂xu(t,Xt) +

1

2g2(t,Xt)∂

2xu(t,Xt)

)︸︷︷︸

=0

dt+ g(t,Xt)∂xu(t,Xt) dWt


Equivalent:

u(T,XT )︸︷︷︸ψ(XT )

= u(t0, Xt0︸︷︷︸ξ

) +

T∫t0

g(t,Xt)∂xu(t,Xt) dWt

Taking the expectation and applying Lemma B.2.7 yields the Feynman-Kac formula(Richard Feynman, Mark Kac)

E(ψ(XT )

)= u(t0, ξ).

Remark: This derivation is informal, because we have tacitly assumed that all termsexist. See, e.g., Chapter 15 in [Ste01] for a correct proof.

2.5 Extension to higher dimensions

In order to model options on several underlying assets (e.g. basket options), we have toconsider vector-valued Ito integrals and SDEs. A d-dimensional SDE takes the form

Xj(t) = Xj(0) +

t∫0

fj(s,X(s)) ds+m∑k=1

t∫0

gjk(s,X(s)) dWk(s) (2.3)

(j = 1, . . . , d)

for d,m ∈ N and suitable functions

fj : R× Rd −→ R, gjk : R× Rd −→ R.

W1(s), . . . ,Wm(s) are one-dimensional scalar Wiener processes which are pairwise inde-pendent. (2.3) is equivalent to

X(t) = X(0) +

t∫0

f(s,X(s)) ds+

t∫0

g(s,X(s)) dW (s) (2.4)

with vectors

W (t) =(W1(t), . . . ,Wm(t)

)T ∈ Rm

f(t, x) =(f1(t, x), . . . , fd(t, x)

)T ∈ Rd

and a matrix

g(t, x) =

g11(t, x) · · · g1m(t, x)...

...gd1(t, x) · · · gdm(t, x)

∈ Rd×m


Theorem 2.5.1 (Multi-dimensional Ito formula ) Let Xt be the solution of the SDE(2.4) and let F : [0,∞) × Rd −→ Rn be a function with continuous partial derivatives∂tF , ∂xjF , and ∂xj∂xkF . Then, the process Y (t) := F (t,Xt) satisfies

dY`(t) = ∂tF`(t,Xt) dt

+d∑i=1

∂xiF`(t,Xt) · fi(t,Xt) dt

+1

2

d∑i=1

d∑j=1

∂xi∂xjF`(t,Xt) ·

(m∑k=1

gik(t,Xt)gjk(t,Xt)

)dt

+d∑i=1

∂xiF`(t,Xt) ·m∑k=1

gik(t,Xt) dWk

or equivalently

dY` =

∂tF` + fT∇F` +

1

2tr(gT (∇2F`)g

)dt+ (∇F`)Tg dW (t)

where ∇F` is the gradient and ∇2F` is the Hessian of F`, and where tr(A) =∑m

j=1 ajj isthe trace of a matrix A = (aij)i,j ∈ Rm×m.

Proof: Similar to the case d = m = 1.

Final remarks.

1. Existence and uniqueness of solutions. Ordinary differential equations canhave multiple solutions with the same initial value, and solutions do not necessarilyexist for all times. Hence, we cannot expect that every SDE has a unique solution.As in the ODE case, however, existence and uniqueness can be shown under certainassumptions concerning the coefficients f and g. See 4.5 in [KP99] for details.

2. Ito vs. Stratonovich. The Ito integral is not the only stochastic integral, andthe Stratonovich integral is a famous alternative. The Stratonovich integral hasthe advantage that the ordinary chain rule remains valid, i.e. the additional termin the Ito formula does not appear when the Stratonovich integral is used. Theirdisadvantage is the fact that Stratonovich integrals are not martingales, whereas Itointegrals are. Stratonovich integrals can be transformed into Ito integrals and viceversa. See 3.1, 3.3 in [Øks03] and 3.5, 4.9 in [KP99].Actually, the SDEs (2.1) or (2.3) should be called “Ito SDE” or “SDE of the Itotype”. Since only Ito SDEs and no Stratonovich SDEs will appear in this lecture,however, we simply use the term “SDE” for “Ito SDE”.

Chapter 3

The Black-Scholes equation

References: [BK04, GJ10, Sey09]

Goal: Find equations to determine the value of an option on a single underlying asset.Throughout this chapter, we make the assumptions (A1)-(A5) from 1.3 unless otherwisestated.

3.1 Geometric Brownian motion

First step: Model the price of the underlying by a suitable process St.For the value of a bond with interest rate r > 0, we have Bt = B0e

rt. Try to “stochastify”this equation with a Wiener process in order to model the underlying.

First attempt: St = S0eat+σWt for some a, σ ∈ R. Problem: St can have negative values.

Not good.

Second attempt: For the bond we have

lnBt = lnB0 + rt.

which motivates the ansatz

lnSt = lnS0 + at+ σWt

for some a, σ ∈ R to model the underlying. The parameter σ is called the volatility.Applying exp(. . .) gives

St = S0 exp(at+ σWt)

and hence St ≥ 0 if S0 ≥ 0. In fact, St is the geometric Brownian motion from 2.2 andsolves the SDE

dSt = µStdt+ σStdWt

19


with µ = a+ σ2/2. Interpretation:

dStSt

= µdt + σdWt

relative change = deterministic trend + random fluctuations

Lemma 3.1.1 (moments of GBM) The geometric Brownian motion

St = S0 exp(at+ σWt), a = µ− σ2/2

with µ ∈ R, σ ∈ R and fixed (deterministic) initial value X0 has the following properties:

1. E(St) = S0eµt

2. E(S2t ) = S2

0e(2µ+σ2)t

3. V(St) = S20e

2µt(eσ

2t − 1)

Proof: Exercise.

Definition 3.1.2 (log-normal distribution) A vector-valued random variable X(ω) ∈Rd is log-normal (=log-normally distributed) if lnX = (lnX1, . . . , lnXd)

T ∈ Rd is nor-mally distributed, i.e. lnX ∼ N (ξ,Σ) for some ξ ∈ Rd and a symmetric, positive definitematrix Σ ∈ Rd×d. The expectation and the covariance matrix have the entries

Ei(X) = eξi+Σii/2,

Vij(X) = E((

Ei(X)−Xi

)(Ej(X)−Xj

))= eξi+ξj+

12

(Σii+Σjj)(eΣij − 1

).

For d = 1 and Σ = σ2 the corresponding density is

φ(x) = φ(x, ξ, σ) =

1√

2πσxexp

(−(lnx− ξ)2

2σ2

)if x > 0

0 else.

Proof: Exercise.

Example. The (one-dimensional) geometric Brownian motion

St = S0 exp(at+ σWt)

is log-normal, because

lnSt = lnS0 + at+ σWt ∼ N (lnS0 + at, σ2t)


3.2 Derivation of the Black-Scholes equation

Situation: St value of an underlying, Bt value of a bond.

Goal: Determine the fair price Vt of an option.

Replication strategy: Consider a portfolio containing at ∈ R underlyings and bt ∈ Rbonds such that

Vt = atSt + btBt

(cf. section 1.5). Assume that the portfolio is self-financing: no cash inflow or outflow,i.e. buying an item must be financed by selling another one. Consequence:

Vt+δ − Vt = (at+δSt+δ − atSt) + (bt+δBt+δ − btBt)

≈ at(St+δ − St) + bt(Bt+δ −Bt)

for all t ≥ 0 and small δ > 0. For δ −→ 0 we obtain (in an integral sense)

dVt = atdSt + btdBt.

Now suppose that

dSt = µStdt+ σStdWt, dBt = rBtdt (3.1)

with µ, σ, r ∈ R. This yields

dVt = at(µStdt+ σStdWt

)+ bt

(rBtdt

)=(atµSt + btrBt

)dt+ atσStdWt. (3.2)

Now assume that the value of the option is a function of t and St, i.e. Vt = V (t, St).Apply the Ito formula:

dV (t, St) =

(∂tV (t, St) + ∂SV (t, St) · µSt +

1

2∂2SV (t, St) · σ2S2

t

)dt

+ ∂SV (t, St) · σStdWt

(3.3)

Equating the dWt-terms in (3.2) and (3.3) yields

at = ∂SV (t, St),

while equating the dt-terms yields

atµSt + btrBt = ∂tV (t, St) + ∂SV (t, St) · µSt +1

2∂2SV (t, St) · σ2S2

t

=⇒ btrBt = ∂tV (t, St) +1

2∂2SV (t, St) · σ2S2

t

=⇒ bt =1

Btr

(∂tV (t, St) +

1

2∂2SV (t, St) · σ2S2

t

)


if we assume that Btr 6= 0. The formulas for at and bt yield

V (t, St) = atSt + btBt = ∂SV (t, St) · St +1

Btr

(∂tV (t, St) +

1

2∂2SV (t, St) · σ2S2

t

)Bt.

Since this is true for every value of St, we can consider S = St as a parameter. Multiplyingwith r yields the Black-Scholes equation

∂tV (t, S) +σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)− rV (t, S) = 0.

Fischer Black and Myron Scholes 1973, Robert Merton 1973Nobel Prize in Economics 1997

The Black-Scholes equation is a partial differential equation (PDE): It involves partialderivatives with respect to t and S. This PDE must be solved backwards in time:instead of an initial condition, we have the terminal condition

V (T, S) = ψ(S)

where T is the expiration time and ψ(S) is the payoff function, i.e. ψ(S) = (S −K)+ fora call and ψ(S) = (K − S)+ for a put.The Black-Scholes equation must be solved for S ∈ R+ := [0,∞) because only non-negative prices make sense. At the boundary S = 0, no boundary condition is required,because

limS→0

σ2

2S2∂2

SV (t, S) = 0, limS→0

rS∂SV (t, S) = 0

if V is sufficiently smooth. For S = 0, we obtain

0 = ∂tV (t, 0)− rV (t, 0) =⇒ V (t, 0) = e−r(T−t)V (T, 0). (3.4)

This yields V (t, 0) = 0 for calls and V (t, 0) = e−r(T−t)K for puts.

Remark. Surprisingly, the parameter µ from (3.1) does not appear in the Black-Scholesequation. A similar observation has been made for the simple discrete model from 1.5.

3.3 Black-Scholes formulas

First goal: Solve the Black-Scholes equation for an European call, i.e.

∂tV (t, S) +σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)− rV (t, S) = 0 t ∈ [0, T ], S > 0

V (T, S) = (S −K)+

with parameters r, σ,K, T > 0.


Step 1: Transformation to the heat equation

Define new variables:

x(S) = ln(S/K) x : (0,∞) −→ (−∞,∞)

τ(t) =σ2

2(T − t) τ : [0, T ] −→ [0, σ2T/2]

w(τ, x) =V (t, S)

Kw : [0, σ2T/2]× (−∞,∞) −→ R

Derivatives in new variables:

∂tV (t, S) = K∂tw(τ, x) = K∂τw(τ, x)dτ

dt= −Kσ2

2∂τw(τ, x)

∂SV (t, S) = K∂xw(τ, x)dx

dS=K

S∂xw(τ, x)

(because

dx

dS=

1

S/K· 1

K=

1

S

)∂2SV (t, S) = . . . =

K

S2

(∂2xw(τ, x)− ∂xw(τ, x)

)Insert into the Black-Scholes equation:

0 = ∂tV (t, S) +σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)− rV (t, S)

= −Kσ2

2∂τw(τ, x) +

σ2

2S2 K

S2

(∂2xw(τ, x)− ∂xw(τ, x)

)+ rS

K

S∂xw(τ, x)− rKw(τ, x)

Divide by K σ2

2:

∂τw(τ, x) = ∂2xw(τ, x)− ∂xw(τ, x) + c∂xw(τ, x)− cw(τ, x)

with c := 2r/σ2. Next, we eliminate the last three terms. Ansatz:

u(τ, x) = e−αx−βτw(τ, x), α, β ∈ R

Substitute:

∂τu(τ, x) = −βu(τ, x) + e−αx−βτ∂τw(τ, x)

= −βu(τ, x) + e−αx−βτ(∂2xw(τ, x) + (c− 1)∂xw(τ, x)− cw(τ, x)

)Since

∂xw(τ, x) = ∂x(eαx+βτu(τ, x)

)= αeαx+βτu(τ, x) + eαx+βτ∂xu(τ, x)

∂2xw(τ, x) = α2eαx+βτu(τ, x) + 2αeαx+βτ∂xu(τ, x) + eαx+βτ∂2

xu(τ, x)

it follows that

∂τu(τ, x) =− βu(τ, x) +(α2u(τ, x) + 2α∂xu(τ, x) + ∂2

xu(τ, x))

+ (c− 1)(αu(τ, x) + ∂xu(τ, x)

)− cu(τ, x)

=∂2xu(τ, x) +

(2α + (c− 1)

)∂xu(τ, x) +

(− β + α2 + (c− 1)α− c

)u(τ, x)


Hence, the terms including u(τ, x) and ∂xu(τ, x) vanish if

−β + α2 + (c− 1)α− c = 0 and 2α + (c− 1) = 0.

The solution is

α = −1

2(c− 1), β = −1

4(c+ 1)2 = −(1− α)2.

With these parameters, u(τ, x) solves the heat equation

∂τu(τ, x) = ∂2xu(τ, x), x ∈ R, τ ∈ [0, σ2T/2]

with initial condition

u(0, x) = e−αxw(0, x) = e−αxV (T, S)

K= e−αx

(S −K)+

K= e−αx(ex − 1)+

since S = Kex.

Step 2: Solving the heat equation

Lemma 3.3.1 (solution of the heat equation) Let u0 : R −→ R be a continuousfunction which satisifes the growth condition

|u0(x)| ≤Meγx2

.

with constants M > 0 and γ ∈ R. Then, the function

u(τ, x) =1√4πτ

∞∫−∞

exp

(−(x− ξ)2

4τ

)u0(ξ) dξ

is the unique solution of the heat equation

∂τu(τ, x) = ∂2xu(τ, x), x ∈ R, τ > 0

and for all x ∈ R we have

limτ→0

u(τ, x) = u0(x).

Proof. The fact that u solves the PDE can be checked by substituting and computingthe partial derivatives (exercise). The last assertion can be verified via the transformationη = (ξ − x)/

√4τ (exercise). Uniqueness follows from the maximum principle.


By a tedious1 calculation, it can be shown that

u(τ, x) = exp((1− α)x+ (1− α)2τ

)Φ(d1)− exp

(−αx+ α2τ

)Φ(d2)

Φ(x) =1√2π

x∫−∞

e−s2/2 ds (3.5)

d1/2 =ln S

K+(r ± σ2

2

)(T − t)

σ√T − t

(3.6)

Remark: Φ(x) is the cumulative distribution function of the standard normal distribu-tion.

Step 3: Inverse transform

Since β = −(1− α)2 and β + α2 = 2α− 1 = −c it follows that

V (t, S) = Kw(τ, x) = K exp(αx+ βτ)u(τ, x)

= K exp(αx+ βτ) exp((1− α)x+ (1− α)2τ

)Φ(d1)

−K exp(αx+ βτ) exp(−αx+ α2τ

)Φ(d2)

= K exp(x)︸︷︷︸=S

Φ(d1)−K exp(

(β + α2)︸︷︷︸−c

τ)

Φ(d2)

= SΦ(d1)−K exp(−r(T − t))Φ(d2)

Check boundary: Since limS→0 Φ(d1/2(S)

)= 0 we obtain

limS0

V (t, S) = limS0

[SΦ(d1(S)

)−KΦ

(d2(S)

)]= 0 ⇐⇒ (3.4) X

Check terminal condition:

V (T, S) = SΦ(d1)−KΦ(d2)

By definition of d1/2 = d1/2(t)

limt→T

d1/2(t) = limt→T

ln SK

+(r ± σ2

2

)(T − t)

σ√T − t

= limt→T

ln SK

σ√T − t

=

∞ if S > K

0 if S = K

−∞ if S < K

and hence

limt→T

Φ(d1/2(t)) =

1 if S > K

1/2 if S = K

0 if S < K

=⇒ limt→T

V (t, S) =

S −K if S > K

0 if S = K

0 if S < K X

1... so tedious that we do not even dare to ask the reader to prove this as an exercise.


All in all, we have shown the following

Theorem 3.3.2 (Black-Scholes formula for calls) If r, σ,K, T > 0, then the Black-Scholes formula

V (t, S) = SΦ(d1)−K exp(−r(T − t))Φ(d2)

with Φ and d1/2 from (3.5) and (3.6), respectively, is the (unique) solution of the Black-Scholes equation for European calls, i.e.

∂tV (t, S) +σ2

2S2∂2


V (T, S) = (S −K)+.

Corollary 3.3.3 (Black-Scholes formula for puts) The Black-Scholes equation for aEuropean put

∂tV (t, S) +σ2

2S2∂2


V (T, S) = (K − S)+

with r, σ,K, T > 0 has the unique solution

V (t, S) = K exp(−r(T − t))Φ(−d2)− SΦ(−d1)

with Φ and d1/2 from (3.5) and (3.6), respectively.

Proof. Let VC(t, S) be the value of a call with the same T and K. The put-call-parity(Lemma 1.4.1) and Theorem 3.3.2 imply

V (t, S) = e−r(T−t)K + VC(t, S)− S= e−r(T−t)K + SΦ(d1)−K exp(−r(T − t))Φ(d2)− S= e−r(T−t)K

(1− Φ(d2)

)+ S

(Φ(d1)− 1

)= e−r(T−t)KΦ(−d2)− SΦ(−d1)

because Φ(x) + Φ(−x) = 1.

0 50 100 150 2000

20

40

60

80

100European Call (K = 100, T = 1)

S

V(t

,S)

t = 0t = 0.25t = 0.5t = 0.75t = 1

0 50 100 150 2000

20

40

60

80

100European Put (K = 100, T = 1)

S

V(t

,S)

t = 0t = 0.25t = 0.5t = 0.75t = 1


Definition 3.3.4 (Greeks) For a European option with value V (t, S) we define “thegreeks”

delta: ∆ = ∂SV theta: θ = ∂tVgamma: Γ = ∂2

SV rho: ρ = ∂rVvega/kappa: κ = ∂σV

These partial derivatives can be considered as “condition numbers” which measure thesensitivity of V (t, S) with respect to the corresponding parameters. This information isimportant for stock broker.

Remark: Explicit formulas for the greeks can be derived from the Black-Scholes formulas(execise).

3.4 Risk-neutral valuation and equivalent martingale

measures

In 1.5 we have seen that in the simplified two-scenario model the value of an option can bepriced by replication. The same strategy was applied to the refined model in the previoussection. In the simple situation considered in 1.5, the value of an option turned out tobe the discounted expectation of the payoff under the risk-neutral probability. In thissubsection, we will see that this is also true for the refined model from 3.2.

Theorem 3.4.1 (Option price as discounted expectation) If V (t, S) is the solutionof the Black-Scholes equation

∂tV (t, S) +σ2

2S2∂2


V (T, S) = ψ(S)

with payoff function ψ(S), then

V (t?, S?) = e−r(T−t?)

∞∫0

ψ(x)φ(x, ξ, β) dx (3.7)

for all t? ∈ [0, T ] and S? > 0. The function φ is the density of the log-normal distribution(cf. Definition 3.1.2) with parameters

ξ = lnS? +

(r − σ2

2

)(T − t?), β = σ

√T − t?. (3.8)

The assertion can be shown by showing that the above representation coincides with theBlack-Scholes formulas for puts and calls. Such a proof, however, involves several changes


of variables in the integral representations and rather tedious calculations. We give ashorter and more elegant proof:

Proof. Step 1: In our derivation of the Black-Scholes model, we have assumed that

dSt = µStdt+ σStdWt,

i.e. that the price of the underlying is a geometric Brownian motion with drift µSt;cf. (3.1). It turned out, however, that the parameter µ does not appear in the Black-Scholes equation. Hence, we can choose µ = r and consider the SDE

dSt = rStdt+ σStdWt, t ∈ [t?, T ]

St? = S?

as a model for the stock price.Step 2: The function u(t, S) := er(T−t)V (t, S) solves the PDE

∂tu(t, S) +σ2

2S2∂2

Su(t, S) + rS∂Su(t, S) = 0, t ∈ [0, T ]

because

∂tu(t, S) +σ2

2S2∂2

Su(t, S) + rS∂Su(t, S)

=− rer(T−t)V (t, S) + er(T−t)∂tV (t, S) +σ2

2S2er(T−t)∂2

SV (t, S) + rSer(T−t)∂SV (t, S)

=er(T−t)(−rV (t, S) + ∂tV (t, S) +

σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)

)︸︷︷︸

=0 (Black-Scholes equation)

= 0.

Moreover, u satisfies the terminal condition

u(T, S) = V (T, S) = ψ(S).

Step 3: Applying the Feynman-Kac formula (cf. 2.4) with f(t, S) = rS and g(t, S) = σSyields

E(ψ(ST )

)= u(t?, S?) = er(T−t?)V (t?, S?)

and thus

V (t?, S?) = e−r(T−t?)E(ψ(ST )

).

We know that ST is log-normal, i.e.

E(ψ(ST )

)=

∞∫0

ψ(x)φ(x, ξ, β) dx

where φ(x, ξ, β) is the density of the log-normal distribution with parameters (3.8); cf. theexample after Definition 3.1.2.


Interpretation. We know from Definition 3.1.2 that

E(ST

)=

∞∫0

xφ(x, ξ, β) dx

= exp

(ξ +

β2

2

)= exp

(lnS? +

(r − σ2

2

)(T − t?) +

1

2

(σ√T − t?

)2)

= exp (lnS? + r(T − t?))= S? exp (r(T − t?)) .

This means that for µ = r the expected value of the stock is exactly the money obtained byinvesting S? into a bond at time t? and waiting until T . Hence, the log-normal distributionwith parameters (3.8) defines the risk-neutral probability; cf. 1.5. The integral in (3.7)is precisely the expected payoff under the risk-neutral probability, and (3.7) states thatthe price of the option is obtained by discounting the expected payoff.

A different perspective. Consider now the geometric Brownian motion


with µ 6= r. Since E(St) = S0eµt, an investor expects µ > r as a compensation for the

risk, because otherwise he might prefer to invest into the riskless bond Bt = B0ert. The

term

γ =µ− rσ

is called market price of risk, and we have

dSt = rStdt+ σSt(γ dt+ dWt) = rStdt+ σStdWγt , with W γ

t = γ t+Wt.

Problem: W γt is not a Wiener process under the probability measure P, because E(W γ

t ) =γ t+ E(Wt) = γ t 6= 0 for t > 0 and µ 6= r.

Question: Is there another probability measure Q such that dW γt is a Wiener process

under Q?

Definition 3.4.2 (equivalent martingale measure) Let (Ω,F ,P) be a probability spacewith filtration Ft : t ≥ 0. A probability measure Q is called an equivalent martingalemeasure or risk-neutral probability if there is a random variable Y > 0 such that

• Q(A) = E(1A · Y ) =∫AY (ω)dP(ω) for all events A ∈ F , and

• e−rtSt is a martingale under Q with respect to the filtration Ft : t ≥ 0.


Remark. The first property implies that P(A) > 0⇐⇒ Q(A) > 0 (“equivalent”).

Now let

YT := exp

(−γWT −

γ2

2T

)and Q(A) = E(1A · YT ).

Then, Girsanov’s theorem states that W γt = γ t + Wt is a Wiener process under Q (see

e.g. 4.4 in [Ben04], 8.6 in [Øks03]). Moreover, the Ito formula yields

d(e−rtSt) = −re−rtStdt+ e−rt(rStdt+ σStdW

γt

)= σe−rtStdW

γt .

Hence, e−rtSt is a martingale under Q, and Q is an equivalent martingale measure. Allin all, we have exchanged

µ −→ r, P −→ Q, Wt −→ W γt .

If µ = r, then P = Q and Wt = W γt . Now we are back in the situation of Theorem 3.4.1,

and it follows that

V (t?, S?) = e−r(T−t?)EQ (ψ(ST )) (3.9)

where St is the solution of

dSt = rStdt+ σStdWγt , t ∈ [t?, T ]

St? = S?.

General pricing formula. Up to now, we have only considered European options, i.e.options with a payoff that depends only on the value of the underlying at maturity. ForAsian or barrier options, the pricing formula (3.9) can be generalized to

Vt = e−r(T−t?)EQ (VT |Ft)

(cf. 5.2.4 in [Shr04]).

Fundamental theorems of option pricing:

• If a market model has at least one equivalent martingale measure, then there is noarbitrage possibility (cf. Theorem 5.4.7 in [Shr04]).

• Consider a market model with at least one equivalent martingale measure. Then,the equivalent martingale measure is unique if and only if the model is complete, i.e.if every derivative (options, forwards, futures, swaps, ...) can be replicated (hedged)(cf. Theorem 5.4.9 in [Shr04]).


3.5 Extensions

The “standard” Black-Scholes model can be generalized in several ways:

• Options with d > 1 underlyings (e.g. basket options) are modeled by the d-dimensionalBlack-Scholes equation

∂tV +1

2

d∑i,j=1

ρijσiσjSiSj∂Si∂SjV + r

d∑i=1

Si∂SiV − rV = 0

where V = V (t, S1, . . . , Sd), r, σi > 0 and ρij ∈ [−1, 1] are the correlation coefficients.

• Non-constant interest rate and volatility: r = r(t, S), σ = σ(t, S)

• Stochastic volatility: Either σ = σ(ω) is a random variable with known distributionor σ = σt(ω) is a stochastic process.

• Dividends: When a dividend δ · St with δ ≥ 0 is paid at time t, the price of theunderlying drops by the same amount due to the no-arbitrage assumption. Hence,a continuous flow of dividends can be modeled by

dSt = (µ− δ)Stdt+ σStdWt,

which yields the Black-Scholes equation

∂tV (t, S) +σ2

2S2∂2

SV (t, S) + (r − δ)S∂SV (t, S)− rV (t, S) = 0.

Black-Scholes formulas with dividends: 4.5.1 in [GJ10].

• Nonzero transaction costs =⇒ nonlinear Black-Scholes equation

• Discontinuous underlyings: Jump-diffusion models, Black-Scholes PDE with addi-tional integral term

Remark: Some of these extensions will be considered in the lecture Numerical methodsin mathematical finance II (summer term).

Part II

Numerical methods

33

Chapter 4

Binomial methods

Situation: Let S(t) be the value of an underlying, and let V (t, S) be the value of anoption with maturity T > 0.

Assumptions: Assume (A1)-(A5) from Section 1.3.

Goal: Approximate V (t, S) by a numerical method.

Idea: Refine the simple discrete model from 1.5 such that it approximates the continuous-time Black-Scholes model.

Remark: For European calls/puts, such an approximation is not necessary, becauseV (t, S) can be computed via the Black-Scholes formula. Nevertheless, such options willserve as a model problem. The numerical method can be extended to other types ofoptions.

4.1 Derivation

Discretize the time-interval [0, T ]: Choose N ∈ N, let τ = T/N and tn = n · τ .Let Sn be the price of the underlying at time tn. Bond: B(t) = B(0)ert

Additional assumptions:

1. For a given price Sn, the price at tn+1 = tn + τ is

Sn+1 =

u · Sn with probability p

d · Sn with probability 1− p

with (unknown) u > 1, 0 < d < 1, p ∈ [0, 1].

2. The expected profit from investing into the underlying is the same as for the bond:

E(Sn+1|Sn

)= erτSn.

3. E(S2n+1|Sn

)= e(2r+σ2)τS2

n with given volatility σ ∈ R.

Remark: In the continuous-time model where S(t) is modeled by a geometric Brownianmotion, the last two conditions hold for Sn = S(tn) if µ = r (risk-neutral pricing).

34


Compute u, d, p:

1. erτSn = E(Sn+1|Sn

)= uSnp+ dSn(1− p)⇐⇒ p =

erτ − du− d

Since p ∈ [0, 1], we have d ≤ erτ ≤ u.

2. e(2r+σ2)τS2n = E

(S2n+1|Sn

)=(uSn

)2p+

(dSn)2

(1− p)Only two conditions for three unknowns u, d, p. Choose third condition:

3. u · d != 1

Solution of 1.-3.:

u = β +√β2 − 1 β :=

1

2

(e−rτ + e(r+σ2)τ

)(4.1)

d =1

u= β −

√β2 − 1 p =

erτ − du− d

.

Replication strategy: Consider a portfolio containing a ∈ R underlyings and b ∈ R bondssuch that

aSn + bB(tn)!

= V(tn, Sn

)=: Vn

It follows that

E(Vn+1 | Vn

)= aE(Sn+1 | Sn) + berτB(tn)

= erτ(aSn + bB(tn)

)= erτVn (4.2)

4.2 Algorithm

Cox, Ross & Rubinstein 1979

1. Forward phase: initialization of the tree. For all n = 0, . . . , N and j = 0, . . . , nlet

Sjn = ujdn−jS(0) = (approximate) price of the underlying at time tnafter j “ups” and n− j “downs”.

The condition d · u = 1 implies that

S(0) = S00 = S12 = S24 = . . .

S11 = S23 = S35 = . . .

S01 = S13 = S25 = . . .

At tn there are only n+ 1 possible values S0n, . . . , Snn of the underlying.


S00 = S(0)for n = 0, 1, 2, . . . , N − 1

S0,n+1 = dS0,n

for j = 0, . . . , nSj+1,n+1 = uSj,n

end

end

2. Backward phase: compute the option values. Let Vjn be the value of the optionafter j “ups” and n− j “downs” of the underlying. At maturity, we have

VjN = ψ(SjN), ψ(SjN) =

(SjN −K)+ (call)

(K − SjN)+ (put)

Use (4.2):

erτVjn = E(V (tn+1) | Vjn

)= pVj+1,n+1 + (1− p)Vj,n+1

=⇒ Vjn = e−rτ(pVj+1,n+1 + (1− p)Vj,n+1

)

(a) European options:

for j = 0, . . . , NVjN = ψ(SjN)

end

for n = N − 1, N − 2, . . . , 0for j = 0, . . . , n

Vjn = e−rτ(pVj+1,n+1 + (1− p)Vj,n+1

)end

end

Result: V00

(b) American options: Check in each step if early exercise is advantageous. At timetn the value of the option must not be less than ψ(Sjn) for all j.


0 0.2 0.4 0.6 0.8 10

5

10

15

time

Sjn

Figure 4.1: Illustration of the binomial method (S(0) = 5, T = 1, N = 10).

for j = 0, . . . , NVjN = ψ(SjN)

end

for n = N − 1, N − 2, . . . , 0for j = 0, . . . , n

Vjn = e−rτ(pVj+1,n+1 + (1− p)Vj,n+1

)Vjn = maxVjn, ψ(Sjn)

end

end

Result: V00

Remarks.

1. The result V00 is only an approximation for the true value V (0, S(0)) of the option,because the price process has been approximated.

2. The result V00 ≈ V (0, S(0)) depends on the initial value S(0). For a different valueof S(0), the entire computation must be repeated.

3. An efficient implementation of the binomial method requires only O(N) operations;see [Hig02].

Examples: see slides


4.3 Discrete Black-Scholes formula

Lemma 4.3.1 Let V (t, S) be the value of a European option with payoff function ψ(S)and maturity T > 0. Then, the binomial method yields the approximation

V00 = e−rTN∑j=0

B(j,N, p)ψ(SjN)

where B(j,N, p) =

(N

j

)pj(1− p)N−j

is the binomial distribution.

Proof: exercise.

Remarks:

1. This result explains the name “binomial method”.

2. Interpretation: V00 is the discounted expected payoff under a suitable probability;cf. 1.5 and 3.4.

Proposition 4.3.2 (Discrete Black-Scholes formula) For a European call with ma-turity T and strike K, the binomial method yields the approximation

V00 = S(0)Ψ(m,N, q)−K exp(−rT )Ψ(m,N, p)

where

q = upe−rτ

m = min0 ≤ j ≤ N : (SjN −K) ≥ 0.

Ψ(m,N, p) =N∑j=m

B(j,N, p)

Proof: exercise.

Question: What happens if we let N −→∞ and τ = T/N −→ 0?

For simplicity, we consider a slightly different binomial method. For given σ > 0 let

u = u(τ) = eσ√τ and d = d(τ) =

1

u= e−σ

√τ (4.3)

and suppose that u and d from (4.1) are replaced by u and d. It can be shown, however,that

u(τ)− u = O(τ 3/2

). (4.4)

If τ ≤ (σ/r)2, then we have d ≤ erτ ≤ u as before.


Proposition 4.3.3 (Convergence of the discrete Black-Scholes formula)

Consider a European call with maturity T and strike K. Let V00 = V(N)

00 be the approxi-mation given by the binomial method with τ = T/N , and for u and d replaced by u and d,

respectively. Then, V(N)

00 converges to the value given by the (continuous) Black-Scholesformula:

limN→∞

V(N)

00 = S(0)Φ(d1)−K exp(−rT )Φ(d2)

Φ(x) =1√2π

x∫−∞

e−s2/2 ds

d1/2 =ln(S(0)K

)+(r ± σ2

2

)T

σ√T

Proof: See 3.3 in [GJ10] (use central limit theorem).

Chapter 5

Numerical methods for stochasticdifferential equations

5.1 Motivation

According to 3.4 the value of a European option is the discounted expected payoff underthe risk-neutral probability:

V (0, S0) = e−rTEQ

(ψ(S(T )

))For the standard Black-Scholes model:

V (0, S0) = e−rT∞∫

0

ψ(x)φ(x, ξ, β) dx

with log-normal density φ and parameters

ξ = lnS0 +

(r − σ2

2

)T, β = σ

√T .

Two ways to price the option:

1. Quadrature formula. Let w(x) := ψ(x)φ(x, ξ, β).Choose 0 ≤ xmin < xmax such that w(x) ≈ 0 for x 6∈ [xmin, xmax].xmin = K and xmax sufficiently large for calls, xmin = 0 and xmax = K for puts.Choose large N ∈ N, let h = (xmax − xmin)/N and xk = xmin + kh. Approximate

∞∫0

w(x) dx ≈xmax∫xmin

w(x) dx =N−1∑k=0

xk+1∫xk

w(x) dx ≈N−1∑k=0

hs∑j=1

bjw(xk + cjh)

with suitable nodes cj ∈ [0, 1] and weights bj.

40


2. Monte-Carlo method. In the Black-Scholes model, S(t) is defined by the SDE

dS(t) = rS(t)dt+ σS(t)dW (t), t ∈ [0, T ], S0 given

(risk-neutral, µ = r)Solution: Geometric Brownian motion

S(t) = S0 exp

((r − σ2

2

)t+ σW (t)

).

This is the process which corresponds to φ(x, ξ, β), because S(T ) is log-normal withthe same parameters. Estimate the expectated payoff as follows:

• Generate many realizations S(T, ω1), . . . , S(T, ωm), m ∈ N “large”.

• Approximate

V (0, S0) ≈ e−rT1

m

m∑j=1

ψ(S(T, ωj)

)Consider now a more complicated price process:

dS(t) = rS(t)dt+ σS(t)dW 1(t) (5.1a)

dσ2(t) = κ(θ − σ2(t)

)dt+ ν

(ρdW 1(t) +

√1− ρ2dW 2(t)

)(5.1b)

Heston model with parameters r, κ, θ, ν > 0, initial values S0, σ0, independent scalarWiener processes W 1(t), W 2(t) , correlation ρ ∈ [−1, 1]Steven L. Heston 1993Now the volatility is not a parameter, but a stochastic process defined by a second SDE.We do not have an explicit formula for S(t) and σ(t), but the Monte-Carlo approach isstill feasible:

• Choose N ∈ N, define step-size τ = T/N and tn = nτ . For each ω1, . . . , ωm computeapproximations

X(1)n (ωj) ≈ S(tn, ωj), X(2)

n (ωj) ≈ σ2(tn, ωj), n = 0, . . . , N

by solving the SDEs (5.1a), (5.1b) numerically.

• Approximate

V (0, S0) ≈ e−rT1

m

m∑j=1

ψ(X

(1)N (ωj)︸︷︷︸≈S(T,ωj)

)The Monte-Carlo approach even works for other types of options. As an example, consideran Asian option with payoff

ψ(t 7→ S(t)

)=

S(T )− 1

T

T∫0

S(t) dt

+

(average strike call).


Now the payoff depends on the entire path t 7→ S(t). We approximate

S(tn, ωj) ≈ X(1)n (ωj),

1

T

T∫0

S(t, ωj) dt ≈1

N + 1

N∑n=0

X(1)n (ωj)

and hence

V (0, S0) ≈ e−rT1

m

m∑j=1

(X

(1)N (ωj)−

1

N + 1

N∑n=0

X(1)n (ωj)

)+

Remark: In the original paper, Heston derives an explicit Black-Scholes-type formulafor European options by means of characteristic functions. Hence, European options inthe Heston model can also be priced by quadrature formulas, but for Asian options thisis impossible.

Goal: Construct and analyze numerical methods for SDEs.

5.2 Euler-Maruyama method

5.2.1 Derivation

Consider the one-dimensional SDE

dX(t) = f(t,X(t)

)dt+ g

(t,X(t)

)dW (t), t ∈ [0, T ], X(0) = X0

with suitable functions f and g and a given initial value X0. Choose N ∈ N, definestep-size τ = T/N and tn = nτ .

X(tn+1) = X(tn) +

tn+1∫tn

f(s,X(s)

)ds+

tn+1∫tn

g(s,X(s)

)dW (s)

≈ X(tn) + (tn+1 − tn)︸︷︷︸=τ

f(tn, X(tn)

)+ g(tn, X(tn)

) (W (tn+1)−W (tn)

)︸︷︷︸=:∆Wn

Replacing X(tn) −→ Xn and “≈” −→ “=” yields the

Euler-Maruyama method (Gisiro Maruyama 1955, Leonhard Euler 1768-70):For n = 0, . . . , N − 1 let ∆Wn = W (tn+1)−W (tn) and

Xn+1 = Xn + τf(tn, Xn

)+ g(tn, Xn

)∆Wn.

Hope that Xn ≈ X(tn):

SDE recursion

X(t) exact =⇒ approx.

Xn approx.?⇐= exact


The exact solution X(tn) and the numerical approximation Xn are random variables. Forevery path t 7→ W (t, ω) of the Wiener process, a different result is obtained. X(t) iscalled strong solution if t 7→ W (t, ω) is given, and weak solution if t 7→ W (t, ω) canbe chosen. Approximations of weak solutions: For each n, generate a random numberZn ∼ N (0, 1) and let

∆Wn =√τZn.

Question: Does Xn really approximate X(tn)? In which sense? How accurately?

5.2.2 Weak and strong convergence

Definition 5.2.1 (strong and weak convergence) Let T > 0, N ∈ N, τ = T/N andtn = nτ . An approximation Xn(ω) ≈ X(tn, ω) converges

• strongly with order γ > 0, if there is a constant C > 0 independent of τ such that

maxn=0,...,N

E(|X(tn)−Xn|

)≤ Cτ γ

for all sufficiently small τ , and

• weakly with order γ > 0 with respect to a function F : R −→ R, if there is aconstant C > 0 independent of τ such that

maxn=0,...,N

∣∣∣E[F(X(tn))]− E

[F (Xn)

]∣∣∣ ≤ Cτ γ

for all sufficiently small τ .

Remarks:

• Strong convergence =⇒ path-wise convergenceWeak convergence =⇒ convergence of moments (if F (x) = xk) or probabilities (ifF (x) = 1[a,b]).

• Strong convergence ←→ Asian optionsWeak convergence ←→ European options

• Strong convergence of order γ implies weak convergence of order γ with respect toF (x) = x (exercise).

5.2.3 Existence and uniqueness of solutions of SDEs

Theorem 5.2.2 (existence and uniqueness)Let f : R+×R −→ R and g : R+×R −→ R be functions with the following properties:

• Lipschitz condition: There is a constant L ≥ 0 such that

|f(t, x)− f(t, y)| ≤ L|x− y|, |g(t, x)− g(t, y)| ≤ L|x− y| (5.2)

for all x, y ∈ R and t ≥ 0.


• Linear growth condition: There is a constant K ≥ 0 such that

|f(t, x)|2 ≤ K(1 + |x|2

), |g(t, x)|2 ≤ K

(1 + |x|2

)(5.3)

for all x ∈ R and t ≥ 0.

Then, the SDE

dX(t) = f(t,X(t)

)dt+ g

(t,X(t)

)dW (t), t ∈ [0, T ]

with deterministic initial value X(0) = X0 has a continuous adapted solution and

supt∈[0,T ]

E(X2(t)

)<∞.

If both X(t) and X(t) are such solutions, then

P(X(t) = X(t) for all t ∈ [0, T ]

)= 1.

Proof: Theorem 9.1 in [Ste01] or Theorem 4.5.3 in [KP99].

Remark: The assumptions can be weakened.

5.2.4 Strong convergence of the Euler-Maruyama method

For simplicity, we only consider the autonomous SDE

dX(t) = f(X(t)

)dt+ g

(X(t)

)dW (t), t ∈ [0, T ]

and the Euler-Maruyama approximation

Xn+1 = Xn + τf(Xn

)+ g(Xn

)∆Wn.

with X(0) = X0, T > 0, N ∈ N, τ = T/N , tn = nτ .We assume that f = f(x) and g = g(x) satisfy the Lipschitz condition (5.2). In theautonomous case, this implies the linear growth condition (5.3) (exercise).

Theorem 5.2.3 (strong error of the Euler-Maruyama method) Under these con-ditions, there is a constant C such that

maxn=0,...,N

E(|X(tn)−Xn|

)≤ Cτ 1/2

for all sufficiently small τ . C does not depend on τ .

For the proof we need the following


Lemma 5.2.4 (Gronwall) Let α : [0, T ] −→ R+ be a positive integrable function. Ifthere are constants a > 0 and b > 0 such that

0 ≤ α(t) ≤ a+ b

t∫0

α(s) ds

for all t ∈ [0, T ], then α(t) ≤ aebt.

Proof: exercise.

Proof of Theorem 5.2.3.Strategy:

• Define the step function

Y (t) =N−1∑n=0

1[tn,tn+1)(t)Xn for t ∈ [0, T ), Y (T ) := XN .

For n = 0, . . . , N − 1 this means that

Y (t) = Xn ⇐⇒ t ∈ [tn, tn+1).

• Define α(s) := supr∈[0,s]

E(|Y (r)−X(r)|2

)and prove the Gronwall inequality

0 ≤ α(t) ≤ Cτ + b

t∫0

α(s) ds. (5.4)

• Apply Gronwall’s lemma. This yields α(t) ≤ τC2 with C2 = CebT for all t ∈ [0, T ]

• Since1 E(Z) ≤√

E(Z2) for random variables Z, it follows that

maxn=0,...,N

E(|Xn −X(tn)|

)≤ sup

t∈[0,T ]

E(|Y (t)−X(t)|

)≤ sup

t∈[0,T ]

√E(|Y (t)−X(t)|2

)=√α(T ) ≤

√τC

Main challenge: Prove Gronwall inequality (5.4). Choose fixed t ∈ [0, T ] and let n be theindex with t ∈ [tn, tn+1).

1Elementary calculation: 0 ≤ V(Z) = E[(Z − E(Z))2

]= E

[Z2 − 2ZE(Z) + E(Z)2

]= E

(Z2)−(

E(Z))2

and hence(E(Z)

)2 ≤ E(Z2).


Derive integral representation of the error:

Y (t) = Xn = X0 +n−1∑k=0

(Xk+1 −Xk)

= X0 +n−1∑k=0

(τf(Xk) + g(Xk)∆Wk

)

= X0 +n−1∑k=0

tk+1∫tk

f(Xk) ds+n−1∑k=0

tk+1∫tk

g(Xk) dW (s)

= X0 +

tn∫0

f (Y (s)) ds+

tn∫0

g (Y (s)) dW (s)

This is not an SDE, because we have∫ tn

0. . . instead of

∫ t0. . . . Comparing with the exact

solution

X(t) = X(0) +

t∫0

f(X(s)

)ds+

t∫0

g(X(s)

)dW (s)

yields the error representation

Y (t)−X(t) =

tn∫0

[f (Y (s))− f

(X(s)

)]ds

︸︷︷︸=:T1

+

tn∫0

[g (Y (s))− g

(X(s)

)]dW (s)

︸︷︷︸=:T2

−t∫

tn

f(X(s)

)ds

︸︷︷︸=:T3

−t∫

tn

g(X(s)

)dW (s)

︸︷︷︸=:T4

= T1 + T2 − T3 − T4.

The Cauchy-Schwarz inequality gives

(T1 + T2 − T3 − T4)2 =((1, 1,−1,−1)T

)2 ≤ 4‖T ‖22 = 4 · (T 2

1 + T 22 + T 2

3 + T 24 )

and hence

E(|Y (t)−X(t)|2

)≤ 4 · E

(T 2

1 + T 22 + T 2

3 + T 24

).


First term: For functions u ∈ L2([0, tn]) the Cauchy-Schwarz inequality yields tn∫0

u(s) · 1 ds

2

≤tn∫

0

|u(s)|2 ds ·tn∫

0

12 ds

︸︷︷︸=tn

. (5.5)

Using the Lipschitz bound (5.2), we obtain

E(T 2

1

)= E

tn∫0

[f (Y (s))− f

(X(s)

)]ds

2≤ tnE

tn∫0

∣∣∣f (Y (s))− f(X(s)

)∣∣∣2 ds

≤ TL2

tn∫0

E(∣∣Y (s)−X(s)

∣∣2) ds

≤ TL2

t∫0

α(s) ds (t instead of tn)

because t ≥ tn by assumption.

Second term: It follows from the Ito isometry (Theorem B.2.5) and the Lipschitz bound(5.2) that

E(T 2

2

)= E

tn∫0

[g (Y (s))− g

(X(s)

)]dW (s)

2= E

tn∫0

∣∣g (Y (s))− g(X(s)

)∣∣2 ds

≤ L2

tn∫0

E(∣∣Y (s)−X(s)

∣∣2) ds

≤ L2

t∫0

α(s) ds

because t ≥ tn by assumption.


Third term: Equation (5.5) and the linear growth bound (5.3) yield

E(T 2

3

)= E

t∫tn

f(X(s)

)ds

2≤ (t− tn)E

t∫tn

∣∣f(X(s))∣∣2 ds

≤ τK · E

t∫tn

(1 + |X(s)|2

)ds

≤ cτ 2

because Theorem 5.2.2 states that E (1 + |X(s)|2) remains bounded on [tn, t].

Last term: Using the Ito isometry and the linear growth bound (5.3) it follows that

E(T 2

4

)= E

t∫tn

g(X(s)

)dW (s)

2= E

t∫tn

∣∣g(X(s))∣∣2ds

≤ K · E

t∫tn

(1 + |X(s)|2

)ds

≤ cτ

These bounds yield the Gronwall inequality (5.4) with b = 4(T + 1)L2 and with C de-pending on K and sups∈[0,T ] E (1 + |X(s)|2).

5.2.5 Weak convergence of the Euler-Maruyama method

Theorem 5.2.5 (weak error of the Euler-Maruyama method) Under the conditionsof 5.2.4, there is a constant C such that

maxn=0,...,N

∣∣∣E[F(X(tn))]− E

[F (Xn)

]∣∣∣ ≤ Cτ

for all sufficiently small τ and all smooth functions F . C does not depend on τ .

Proof. Define piecewise linear interpolation: In addition to the piecewise constantY (t), we define the piecewise linear interpolation

Z(t) = Xn + (t− tn)f(Xn

)+ g(Xn

)(W (t)−W (tn)

)for t ∈ [tn, tn+1).


Since Y (t) = Xn for t ∈ [tn, tn+1), this is equivalent to

Z(t) = X(0) +

t∫0

f(Y (s)

)ds+

t∫0

g(Y (s)

)dW (s)

or dZ = f(Y )dt+ g(Y )dW (t).

Properties:

• Z(tn) = Xn = Y (tn) for all n = 0, . . . , N .

• For every δ ∈ [0, τ ], Z(tn + δ) is the Euler-Maruyama approximation after one stepwith step-size δ and initial value Z(tn) = Yn.

• t 7→ Z(t, ω) is continuous with probability 1.

Choose n ∈ 1, . . . , N and consider the error at time tn.Apply the Feynman-Kac formula: Let u(t, x) be the solution of the PDE

∂tu(t, x) + f(x)∂xu(t, x) +1

2g2(x)∂2

xu(t, x) = 0, t ∈ [0, tn], x ∈ R

with terminal condition u(tn, x) = F (x). Apply the Ito formula to u(t, Z(t)):

du(t, Z) =(∂tu(t, Z)︸︷︷︸= ... (PDE)

+f(Y )∂xu(t, Z) +1

2g2(Y )∂2

xu(t, Z))dt

+ g(Y )∂xu(t, Z) dW (t)

=([f(Y )− f(Z)

]∂xu(t, Z) +

1

2

[g2(Y )− g2(Z)

]∂2xu(t, Z)

)dt

+ g(Y )∂xu(t, Z) dW (t)

Equivalent:

u(tn, Z(tn)) = u(0, Z(0)) +

tn∫0

[f(Y )− f(Z)

]∂xu(t, Z) dt

+1

2

tn∫0

[g2(Y )− g2(Z)

]∂2xu(t, Z) dt

+

tn∫0

g(Y )∂xu(t, Z) dW (t)

By construction: u(tn, Z(tn)) = u(tn, Xn) = F (Xn)Feynman-Kac (see 2.4): u(0, Z(0)) = u(0, X(0)) = E

[F (X(tn))

]This yields


∣∣E[F (Xn)]− E[F (X(tn))

]∣∣ ≤ tn∫0

∣∣∣E([f(Y )− f(Z)]∂xu(t, Z)

)∣∣∣ dt︸︷︷︸

=:T1

+1

2

tn∫0

∣∣∣E( [g2(Y )− g2(Z)]∂2xu(t, Z)

)∣∣∣ dt︸︷︷︸

=:T2

Bounds for T1 and T2: Define

G(t, x) =[f(Y (t))− f(x)

]∂xu(t, x)

and apply the Ito formula to G(t, Z):

dG(t, Z) =

(∂tG(t, Z) + ∂xG(t, Z) · f(Y ) +

1

2∂2xG(t, Z) · g2(Y )

)dt

+∂xG(t, Z) · g(Y )dW (t)

Equivalent:

G(t, Z(t)) = G(tn−1, Z(tn−1))︸︷︷︸=0 (Def.)

+

t∫tn−1

∂tG(s, Z) ds+

t∫tn−1

∂xG(s, Z) · f(Y ) ds

+1

2

t∫tn−1

∂2xG(t, Z) · g2(Y ) ds+

t∫tn−1

∂xG(s, Z) · g(Y ) dW (s)

where Z = Z(s) and Y = Y (s). Consider E(. . .):

E[G(t, Z(t))

]=

t∫tn−1

E(∂tG(s, Z)

)ds+

t∫tn−1

E(∂xG(s, Z) · f(Y )

)ds

+1

2

t∫tn−1

E(∂2xG(t, Z) · g2(Y )

)ds + 0

It can be shown that all three integrands remain bounded. It follows that∣∣E[G(t, Z(t))]∣∣ ≤ C · (t− tn−1) ≤ Cτ.


Consequence:

T1 =

tn∫0

∣∣∣E([f(Y )− f(Z)]∂xu(t, Z)

)∣∣∣ dt =

tn∫0

∣∣E[G(t, Z(t))]∣∣ dt ≤ Ctnτ ≤ CTτ.

In a similar way, it can be shown that T2 ≤ Ctnτ ≤ CTτ . This proves that∣∣∣E[F(X(tn))]− E

[F (Xn)

]∣∣∣ ≤ Cτ.

5.3 Higher-order methods

Consider again the one-dimensional SDE

dX(t) = f(X(t)

)dt+ g

(X(t)

)dW (t), t ∈ [0, T ], X(0) = X0

with suitable functions f and g and a given initial value X0.

Goal: Construct numerical methods with higher order.

Caution! Numerical methods for ordinary differential equations can in general not beextended to stochastic differential equations!

Example: Heun’s method.

Heun’s method for the ODE y(t) = f(y) with initial value y(0) = y0 takes the form

yn+1 = yn + τf(yn)

yn+1 = yn +τ

2(f(yn) + f(yn+1)) .

Similar to trapezoidal rule, but explicit. The natural modification of this method forSDEs is

Xn+1 = Xn + τf(Xn) + g(Xn)∆Wn

Xn+1 = Xn +τ

2

(f(Xn) + f(Xn+1)

)+

1

2

(g(Xn) + g(Xn+1)

)∆Wn.

Consider the special case f(x) ≡ 0, g(x) = x. For the exact solution X(t), it follows that

E(X(t)) = E(X0) +

t∫0

E[f(X(s)

)︸︷︷︸=0

]ds+ E

t∫0

g(s,X(s)

)dW (s)

︸︷︷︸

=0

,


i.e. that E(X(t)) is constant. The method simplifies to

Xn+1 = Xn +Xn∆Wn

Xn+1 = Xn +1

2

(Xn + Xn+1

)∆Wn.

or equivalently

Xn+1 = Xn +Xn∆Wn +1

2Xn(∆Wn)2.

This yields

E(Xn+1) = E(Xn) + E(Xn∆Wn)︸︷︷︸=0

+1

2E(Xn(∆Wn)2

)= E(Xn) +

τ

2E(Xn)

and thus for N −→∞ and τ = T/N −→ 0

limτ→0

E(XN) = limτ→0

(1 + τ/2)NX0 = limN→∞

(1 +

T

2N

)NX0 = eT/2X0.

Hence, the method is not consistent! No convergence!

Stochastic Taylor expansions

Important tool for the construction of higher-order methods.For smooth F : R −→ R, the Ito formula yields

dF (X) =

(F ′(X) · f(X) +

1

2F ′′(X) · g2(X)

)︸︷︷︸

=:L0F (X)

dt+ F ′(X) · g(X)︸︷︷︸=:L1F (X)

dW (t)

(no time derivative, because F = F (x) does not depend on t) or equivalently

F(X(s)

)= F

(X(tn)

)+

s∫tn

L0F(X(r)

)dr +

s∫tn

L1F(X(r)

)dW (r) (5.6)


Let F (x) = f(x) and F (x) = g(x), respectively, and substitute into the SDE:

X(t) = X(tn) +

t∫tn

f(X(s)

)ds+

t∫tn

g(X(s)

)dW (s)

= X(tn) +

t∫tn

f(X(tn)

)ds

︸︷︷︸=(t−tn)f

(X(tn)

)+

t∫tn

s∫tn

L0f(X(r)

)dr ds

︸︷︷︸=T11

+

t∫tn

s∫tn

L1f(X(r)

)dW (r) ds

︸︷︷︸=T12

(5.7)

+

t∫tn

g(X(tn)

)dW (s)

︸︷︷︸g(X(tn)

)[W (t)−W (tn)

]+

t∫tn

s∫tn

L0g(X(r)

)dr dW (s)

︸︷︷︸=T21

+

t∫tn

s∫tn

L1g(X(r)

)dW (r) dW (s)

︸︷︷︸=T22

with

L0f = f ′ · f +1

2f ′′ · g2 L1f = f ′ · g

L0g = g′ · f +1

2g′′ · g2 L1g = g′ · g

If all double integrals Tij are ignored and t = tn+1, we obtain the Euler-Maruyama method.

The Milstein method

Since

E[(W (tn+1)−W (tn)

)2]

= tn+1 − tn,

we conjecture that for t −→ tn the dominant integral term is

T22 =

t∫tn

s∫tn

L1g(X(r)

)dW (r)dW (s).

Ignoring T11, T12, T21 yields the approximation

X(t) ≈ X(tn) + (t− tn)f(X(tn)

)+ g(X(tn)

)(W (t)−W (tn)

)+ T22.


In order to approximate T22, we apply (5.6) with F (x) = L1g(x) and ignore higher-orderterms:

T22 =

t∫tn

s∫tn

L1g(X(r)

)dW (r)dW (s)

≈ L1g(X(tn)

) t∫tn

s∫tn

dW (r)dW (s)

= g′(X(tn)

)· g(X(tn)

) t∫tn

s∫tn

dW (r)dW (s) (5.8)

The integral can be explicitly computed:

t∫tn

s∫tn

dW (r)dW (s) =

t∫tn

[W (s)−W (tn)]dW (s)

=

t∫tn

W (s)dW (s)−W (tn)

t∫tn

dW (s)

=

t∫0

W (s)dW (s)−tn∫

0

W (s)dW (s)−W (tn)[W (t)−W (tn)]

=1

2

(W 2(t)− t

)− 1

2

(W 2(tn)− tn

)−W (tn)[W (t)−W (tn)]

=1

2W 2(t) +

(1− 1

2

)W 2(tn)−W (tn)W (t) +

1

2(tn − t)

=1

2

[W (t)−W (tn)

]2 − 1

2(t− tn)

For t = tn+1 = tn + τ , this yields the

Milstein method (Grigori N. Milstein 1974):For n = 0, . . . , N − 1 let ∆Wn = W (tn+1)−W (tn) and

Xn+1 = Xn + τf(Xn

)+ g(Xn

)∆Wn + g′(Xn) · g(Xn)

1

2

[(∆Wn

)2 − τ].

Strong order 1, weak order 1 (cf. 10.3 in [KP99])

Remark: Milstein = Euler-Maruyama + additional termIf g′(x) = 0 (“additive noise”) then Milstein = Euler-Maruyama


Problem: Have to compute derivative g′. Difficult if g is complicated or not explicitlygiven (i.e. no formula for g).Idea: Avoid g′ by using a (sufficiently accurate) approximation. Let

Xn+1 = Xn + τf(Xn

)+√τg(Xn

).

(similar to Euler-Maruyama, but with√τ instead of ∆Wn). Taylor yields

g(Xn+1) = g(Xn) + g′(Xn)[Xn+1 −Xn

]+O

(∣∣∣Xn+1 −Xn

∣∣∣2)= g(Xn) + g′(Xn)

[τf(Xn

)+ g(Xn

)√τ]

+O(τ)

= g(Xn) + g′(Xn)g(Xn)√τ +O(τ)

and hence

g′(Xn)g(Xn) =g(Xn+1)− g(Xn)√

τ+O

(√τ).

This yields the

Stochastic Milstein-Runge-Kutta method:For n = 0, . . . , N − 1 let ∆Wn = W (tn+1)−W (tn) and

Xn+1 = Xn + τf(Xn

)+√τg(Xn

)Xn+1 = Xn + τf

(Xn

)+ g(Xn

)∆Wn +

g(Xn+1)− g(Xn)√τ

· 1

2

[(∆Wn

)2 − τ].

Strong order 1, weak order 1 (in spite of the additional approximation).

Higher weak order

Go back to (5.7), i.e.

X(t) = X(tn) + (t− tn)f(X(tn)

)+ T11 + T12

+ g(X(tn)

)[W (t)−W (tn)

]+ T21 + T22


and approximate each double integral by freezing the integrand at tn:

T11 =

t∫tn

s∫tn

L0f(X(r)

)dr ds ≈

t∫tn

s∫tn

L0f(X(tn)

)dr ds =

(t− tn)2

2L0f

(X(tn)

)

T12 =

t∫tn

s∫tn

L1f(X(r)

)dW (r) ds ≈ L1f

(X(tn)

)I(1,0)(t)

T21 =

t∫tn

s∫tn

L0g(X(r)

)dr dW (s) ≈ L0g

(X(tn)

)I(0,1)(t)

T22 ≈ L1g(X(tn)

)1

2

([W (t)−W (tn)

]2 − (t− tn))

(cf. 5.8)

with

I(1,0)(t) =

t∫tn

s∫tn

dW (r) ds =

t∫tn

W (s)−W (tn) ds

I(0,1)(t) =

t∫tn

s∫tn

dr dW (s) =

t∫tn

(s− tn) dW (s).

Applying the integration by parts formula∫ b

a

u(s) dW (s) = u(b)W (b)− u(a)W (a)−∫ b

a

W (s)u′(s) ds (5.9)

(see exercises) with a = tn, b = t, u(s) = s gives

t∫tn

s dW (s) = tW (t)− tnW (tn)−t∫

tn

W (s) ds

= tW (t)− tnW (tn)−t∫

tn

[W (s)−W (tn)

]ds− (t− tn)W (tn)

= t[W (t)−W (tn)

]− I(1,0)(t)

and thus

I(0,1)(t) =

t∫tn

(s− tn) dW (s) =

t∫tn

s dW (s)− tn[W (t)−W (tn)

]= (t− tn)

[W (t)−W (tn)

]− I(1,0)(t).


Hence, only I(1,0)(t) has to be computed. For weak convergence all random variables canbe replaced by other random variables with the same moments. It can be shown (exercise)that

E(I(1,0)(t)

)= 0, E

(I2

(1,0)(t))

=(t− tn)3

3, E

(I(1,0)(t)

[W (t)−W (tn)

])=

(t− tn)2

2.

Let Z1 ∼ N (0, 1) and Z2 ∼ N (0, 1) be independent and W (t) −W (tn) =√t− tnZ1. If

we let

Yn :=τ 3/2

2

(Z1 +

1√3Z2

),

then Yn has the same properties as I(1,0)(tn+1), i.e.

E (Yn) = 0, E(Y 2n

)=τ 3

3, E

(Yn∆Wn

)=τ 2

2

(exercise). For t = tn+1, this yields the following method:

Xn+1 = Xn + τf(Xn

)+τ 2

2L0f

(Xn

)︸︷︷︸

≈T11

+L1f(Xn

)Yn︸︷︷︸

≈T12

+ g(Xn

)∆Wn + L0g(Xn)

[τ∆Wn − Yn

]︸︷︷︸≈T21

+L1g(Xn

)1

2

(∆W 2

n − τ)

︸︷︷︸≈T22

Weak order 2 (no proof).

Simplification: The weak order is not reduced if ∆Wn is replaced by a “cheaper”random variable with the same moments. Let ∆Vn ∈

√3τ ,−

√3τ , 0

with

P(

∆Vn =√

3τ)

= P(

∆Vn = −√

3τ)

=1

6, P(∆Vn = 0) =

2

3.

It can be checked (exercise) that

E((∆Vn)k

)= E

((∆Wn)k

)for k = 1, . . . , 5.

Moreover, we can replace Yn by

Yn =τ

2∆Vn.

because

E(Yn

)= 0, E

(Yn

2)

=τ 3

4=τ 3

3+O

(τ 3), E

(Yn∆Vn

)=τ 2

2.


5.4 Numerical methods for systems of SDEs

Consider now the vector-valued SDE

X(t) = X(0) +

t∫0

f(s,X(s)) ds+

t∫0

g(s,X(s)) dW (s)

with

X(t) ∈ Rd, W (t) ∈ Rm, f : R× Rd −→ Rd, and g : R× Rd −→ Rd×m.

Notation as in 2.5.The Euler-Maruyama method can be readily extended to vector-valued SDEs: Forn = 0, . . . , N − 1 let ∆Wn = W (tn+1)−W (tn) and

Xn+1 = Xn + τf(tn, Xn

)+ g(tn, Xn

)∆Wn.

Strong order 1/2, weak order 1, similar proof.What about the Milstein method?

• Case d ≥ 1 and m = 1: Straightforward extension.For n = 0, . . . , N − 1 let ∆Wn = W (tn+1)−W (tn) and

Xn+1 = Xn + τf(tnXn

)+ g(tn, Xn

)∆Wn + Jg(tn, Xn)g(tn, Xn)

1

2

[(∆Wn

)2 − τ]

where Jg = [∂xkgj]j,k is the Jacobian. Strong order 1, weak order 1.

• Case d ≥ 1 and m > 1: More complicated. Adapting the derivation via stochas-tic Taylor expansions yields

X(j)n+1 = X(j)

n + τf(tn, Xn

)+

m∑k=1

gjk(tn, Xn

)∆W (k)

n

+m∑

i,k=1

d∑l=1

∂xlgjk(tn, Xn

)· gli(tn, Xn

) tn+1∫tn

s∫tn

dWi(θ)dWk(s)

where X(j)n and ∆W

(j)n denote the j-th entry of Xn and ∆Wn, respectively. Similar

to the scalar case, the derivatives of g can be avoided by a Runge-Kutta-type ap-proach.The stochastic integrals cannot be computed analytically. These integrals are solu-tions of small systems of SDEs, which have to be approximated numerically. Details:5.3 in [GJ10].


5.5 Mean-square-error of the Monte-Carlo simula-

tion

Consider a European option with payoff function ψ and price process

dS(t) = f(t, S(t)

)dt+ g

(t, S(t)

)dW (t), t ∈ [0, T ]. (5.10)

Standard Monte Carlo method (cf. section 5.1):

• Choose N ∈ N, let τ = T/N and tn = nτ for n = 0, . . . , N .Generate m ∈ N paths t 7→ W (t, ωj) of the Wiener process (j = 1, . . . ,m).For each path, compute approximations Sn(wj) ≈ S(tn, ωj) by solving (5.10) with anumerical method of weak order γ.

• Approximate the discounted expectation:

EQ

[ψ(S(T )

)]≈ 1

m

m∑j=1

ψ(S(T, ωj)

)≈ 1

m

m∑j=1

ψ(SN(ωj)

)=: V

Two sources of error:

• Estimate the expectation from finitely many samples.

• Approximate the exact S(T, ωj) by a numerical method.

Both errors are measured by the mean-square-error.

Definition 5.5.1 (Mean-square-error) Let θ be an estimator for an unknown (deter-ministic) quantity θ. Then, the mean-square-error of θ is

MSE(θ) = E[(θ − θ)2

](?)= V(θ) + E(θ − θ)2.

The term E(θ − θ) = E(θ)− θ is called the bias. Notation: E(X)2 :=(E(X)

)2 6= E(X2).Proof of (?):

V(θ) + E(θ − θ)2 =(E(θ2)− E(θ)2

)+(E(θ)2 − 2E(θ)θ + θ2

)= E

(θ2 − 2θθ + θ2

)= E

[(θ − θ)2

].

Applying this with E = EQ and

θ = E(ψ(S(T )

)), θ = V =

1

m

m∑j=1

ψ(SN(ωj)

)yields

MSE(V ) =1

m2V

(m∑j=1

ψ(SN(ωj)

))+

(1

m

m∑j=1

E[ψ(SN(ωj)

)]− E

[ψ(S(T )

)])2

=V(ψ(SN)

)m

+(E[ψ(SN)

]− E

[ψ(S(T )

)])2

≤ C

m+ Cτ 2γ.


Consequence:

√MSE(V ) ∼ C

√m−1 + τ 2γ.

Slow convergence with respect to m!

Example: Euler-Maruyama method. If ε > 0 is a given error tolerance, then

MSE(V ) = ε2 ≤ C

m+ Cτ 2 ⇐⇒ m = O

(ε−2)

and τ = O(ε) .

Since τ = T/N , we have to computem = O(ε−2) simulations with N = O(ε−1) time-steps.Hence, the total numerical work (= total number of time-steps) is O(ε−3).

The computational costs can be reduced by Multi-Level Monte Carlo methods, cf. part 2of the lecture.

Chapter 6

Pseudo-random numbers and MonteCarlo simulation

6.1 Pseudo-random numbers

Stochastic simulations are based on random variables. In order to approximate the weaksolution of a SDE, for example, the Wiener increment ∆Wn = W (tn+1)−W (tn) is simu-lated by drawing a random number Zn ∼ N (0, 1) and letting ∆Wn =

√τZn; cf. 5.2.

Question: What does it mean to “draw a random number”? How can a computer gen-erate a random number?

Computers can only generate pseudo-random numbers, i.e. sequences of numberswhich seem to be random, but which are actually generated by a deterministic algorithm.Hence, simulation results can be reproduced if necessary. Every such sequence is periodic,but with a very large period.

6.1.1 Uniform pseudo-random numbers

First goal: generate uniformly distributed pseudo-random numbers Xi ∈ [0, 1].Notation: Xi ∼ U(0, 1).Matlab command: rand(...)

Method 1: Linear congruential generator

Choose M ∈ N and a, b,X0 ∈ 0, 1, . . . ,M − 1 and let

Xi = (aXi−1 + b) mod M, Ui =Xi

M(i = 1, 2, 3, . . .)

Reminder: x mod y = z ⇐⇒ x = ny + z for some n ∈ N and z ∈ 0, 1, . . . , y − 1.The entire sequence depends on the “seed” X0.“Bad” parameters must be avoided:

61


• a 6= 0

• If b = 0, then we must choose X0 6= 0.

• a 6= 1 (too predictable)

By definition Ui ∈

0, 1M, 2M, . . . , M−1

M

, i.e. M =number of possible values of Ui. Hence,

M should be choosen very large.Since Xi ∈ 0, 1, . . . ,M − 1, the sequence (Xi)i is periodic with period ≤M .

Matlab: a = 75, b = 0, M = 231 − 1, period ≈ 2 · 109. Too small!

Method 2: Fibonacci generator

Choose k, l,M ∈ N, let m = maxk, l, generate X1, . . . , Xm−1 with Method 1

Xi = (Xi−k +Xi−l) mod M, Ui =Xi

M(i = m,m+ 1,m+ 2, . . .)

Matlab: k = 31, l = 63, M = 264, period ≈ 2124 > 2 · 1037

Method 3: Combined multiple recursive generator

Choose M1,M2,M ∈ N very large, a, b, c, d ∈ N large

Xi = (aXi−2 − bXi−3) mod M1

Yi = (cYi−1 − dYi−3) mod M2

Zi = (Xi − Yi) mod (M − 1) U =

Zi/M if Zi 6= 0

(M − 1)/M if Zi = 0

Matlab parameters: p. 108 in [GJ10].

Remark: There are many more methods.

6.1.2 Normal pseudo-random numbers

Idea: Transform uniform pseudo-random numbers to obtain normal pseudo-random num-bers.

Method 1: Inversion

Let U ∼ U(0, 1), i.e. P(U ≤ x) = x for all x ∈ [0, 1].Let F : R −→ [0, 1] be a strictly increasing probability distribution. Hence, F−1 :[0, 1] −→ R exists, and if X := F−1(U), then

P(X ≤ x) = P(U ≤ F (x)

)= F (x).


Hence, F is the distribution of X. Apply this to the normal distribution

F (x) =1√2π

x∫−∞

e−s2/2 ds.

Problem: No explicit formula for F or F−1. Numerical inversion with Newton’s methodis ill-conditioned: If u ≈ 0 or u ≈ 1, then small perturbations cause large perturbationsof F−1(u) .

Method 2: Box-Muller method

Let X ∈ Rd be a random variable with density f : Rd −→ R, and let A := x ∈ Rd :f(x) > 0. Let g : A −→ B := g(A) ⊂ Rd be invertible with continuously differentiableinverse g−1. If Y = g(X), then

P(Y ∈ C) = P(g(X) ∈ C

)= P

(X ∈ g−1(C)

)=

∫g−1(C)

f(x) dx

=

∫C

f(g−1(y)

)· |det Jg−1(y)| dy

for all Borel sets C ⊂ B. Jg−1 denotes the Jacobian of g−1. Hence, the function

y 7→ f(g−1(y)

)· |det Jg−1(y)|

is the density of Y = g(X).

Use this to transform the uniform distribution to the normal distribution. Let d = 1,A = [0, 1], f(x) = 1A(x) and seek g such that for all y ∈ B

f(g−1(y)

)︸︷︷︸

=1 since g−1(y)∈A

· |det Jg−1(y)| =∣∣∣∣dg−1

dy(y)

∣∣∣∣ !=

1√2πe−y

2/2

Problem: No explicit formula for g. Idea: Transform in R2 instead of R.Let A = (0, 1)× (0, 1), f(x) = 1A and

g(x) =

(√−2 lnx1 cos(2πx2)√−2 lnx1 sin(2πx2)

), x = (x1, x2) ∈ A.

The inverse is (exercise)1

g−1(y) =

exp(−1

2(y2

1 + y22))

12π

arctan(y2y1

) 1The arctan cannot yield negative values because y1 and y2 are not arbitrary – these values are coupled

according to the definition of g.


and it can be shown (exercise) that

|det Jg−1(y)| = 1

2πexp

(−1

2(y2

1 + y22)

)is the density of the standard normal distribution in R2. Hence:

g1(X), g2(X) ∼ N (0, 1) ⇐⇒ X1, X2 ∼ U(0, 1).

Box-Muller algorithm: Generate uniformly distributed random numbers U1, U2 ∼U(0, 1) and let

Z1 =√−2 lnU1 cos(2πU2) ∼ N (0, 1), Z2 =

√−2 lnU1 sin(2πU2) ∼ N (0, 1)

G. E. P. Box and M. E. Muller 1958

Method 3: Polar method

Goal: Avoid trigonometric functions.

If Ui ∼ U(0, 1), then Vi = 2Ui − 1 ∼ U(−1, 1). Reject (V1, V2) if V := V 21 + V 2

2 ≥ 1.The accepted samples are uniformly distributed on the unit circle (with density f(x) =1/π), and it can be shown (exercise) that(

W1

W2

)=

(V

12π

arctan(V2V1

))is uniformly distributed on (0, 1)× (0, 1). Hence

Z1 =√−2 lnW1 cos(2πW2) ∼ N (0, 1), Z2 =

√−2 lnW1 sin(2πW2) ∼ N (0, 1)

and by definition

cos(2πW2) =V1√V, sin(2πW2) =

V2√V.

Polar method: For i ∈ 1, 2 generate uniform random numbers Ui ∈ (0, 1) and letVi = 2Ui − 1.

• If V := V 21 + V 2

2 ≥ 1: reject and start again.

• Else: Let

Z1 =V1√V

√−2 lnV ∼ N (0, 1), Z2 =

V2√V

√−2 lnV ∼ N (0, 1).

G. Marsaglia

The probability that V < 1 is π/4. Hence, about 21.46% of the random tuples (V1, V2)are rejected. Nevertheless, the polar method is usually more efficient than the standardBox-Muller method.


6.1.3 Correlated normal random vectors

Let X(ω) ∈ Rd, µ ∈ Rd and let Σ ∈ Rd×d be symmetric and positive definite.

Goal: Generate random vectors X ∼ N (µ,Σ), i.e.

P(X ∈ B) =

∫B

1√(2π)d det(Σ)

exp(− 1

2(x− µ)TΣ−1(x− µ)

)dx

for all Borel sets B ⊂ Rd; cf. Definition B.1.1. The matrix

ρ ∈ Rd×d with entries ρij =Σij√

ΣiiΣjj

is called the correlation matrix.

Reminder: Every symmetric, positive definite matrix Σ ∈ Rd×d has a Cholesky decom-position

Σ = LLT , L =

? 0 · · · · · · · · · 0

? ? 0...

.... . . . . .

......

. . . 0? · · · · · · · · · ?

, Lij = 0 if i < j.

Proof by induction (exercise).

If z ∈ Rd and x = Lz, then

zT z = (L−1x)T (L−1x) = xT (LLT )−1x = xTΣ−1x.

For A ⊂ Rd and B := z = L−1x, x ∈ A we have∫B

1√(2π)d

exp(− 1

2zT z)dz =

∫A

1√(2π)d

· 1

| det(L)|exp

(− 1

2xTΣ−1x

)dx

=

∫A

1√(2π)d det(Σ)

exp(− 1

2xTΣ−1x

)dx

because (det(L))2 = det(L) · det(LT ) = det(Σ). Consequence:

Z ∼ N (0, I) =⇒ X = LZ ∼ N (0,Σ) =⇒ X + µ ∼ N (µ,Σ).


6.2 Monte-Carlo integration and variance reduction

Example: European basket call with d ∈ N underlyings modelled by geometric Brownianmotion

dS(t) = rSdt+ diag(σ)diag(S)LdW (t), S(t?) = S?

with σ = (σ1, . . . , σd), r > 0, S = (S1, . . . , Sd), dW = (dW1, . . . , dWd) and a lowertriangular matrix L ∈ Rd×d.

Payoff function (cf. 1.2):

ψ(x) =

(d∑i=1

cixi −K

)+

, ci > 0

As in the scalar case (cf. 3.4), it can be shown that the value of the option is the discountedexpected payoff

V (t?, S?) = e−r(T−t?)EQ (ψ(ST )) = e−r(T−t?)

∞∫0

· · ·∞∫

0

ψ(x)φ(x, ξ, β)︸︷︷︸=:g(x)

dx1 · · · dxd

where φ is the multivariate log-normal distribution. The parameters ξ and β depend onS?, T − t?, and on the covariance matrix Σ = LLT . In order to price the option, we thushave to approximate the d-dimensional integral

∞∫0

· · ·∞∫

0

g(x) dx1 · · · dxd

Approximation by quadrature as in 5.1:

• Truncation: 0 ≤ x(i)min < x

(i)max such that

x(d)max∫

x(d)min

· · ·x(1)max∫

x(1)min

g(x) dx1 · · · dxd ≈∞∫

0

· · ·∞∫

0

g(x) dx1 · · · dxd

• Discretization: For every i = 1, . . . , d choose large N (i) ∈ N, let h(i) = (x(i)max −

x(i)min)/N (i) and x

(i)k = x

(i)min + kh(i)

• Approximate by quadrature (here: midpoint rule):

x(d)max∫

x(d)min

· · ·x(1)max∫

x(1)min

g(x) dx1 · · · dxd

≈ h(1) · . . . · h(d)

N(i)∑k1=1

. . .

N(i)∑kd=1

g(x

(1)k1

+ 12h(1), . . . , x

(d)kd

+ 12h(d))


Problem: Need function evaluations at N (1) · . . . · N (d) points, e.g. Nd evaluationsif N (i) = N for all i. Exponential growth for d −→ ∞, “curse of dimension”. Veryexpensive or impossible for d 1.

Solutions? Sparse grids (−→ summer term) or (Quasi-)Monte-Carlo integration.

Monte Carlo integration

Consider bounded domain D ⊂ Rd, function f : D −→ R, density φ : D −→ R. As in 5.1approximate

Eφ(f) :=

∫D

f(x)φ(x) dx ≈ 1

m

m∑j=1

f(Xj) (6.1)

where Xj ∈ D are random vectors with

P(Xj ∈ A) =

∫A

φ(x) dx for all measurable A ⊂ D.

Question: How accurate is this approximation?

For simplicity consider only the case d = 1. If d > 1, then the following result can beapplied to each entry of the random vector.

Lemma 6.2.1 (Chebyshev’s inequality) Let (Ω,F ,P) be a probability space, and letδ > 0. If Z : Ω −→ R is square integrable, i.e. if

∫Ω|Z|2dP exists, then

P(|Z − E(Z)| ≥ δ

)≤ V(Z)

δ2

where V(Z) = E(|Z − E(Z)|2) is the variance of Z.

Proof. Define

χδ(ω) =

1 if |Z(ω)− E(Z)| ≥ δ

0 else.

Then χδ(ω) ≤ |Z(ω)−E(Z)|δ

for all ω ∈ Ω by construction, and hence

P(|Z − E(Z)| ≥ δ

)=

∫Ω

χδ(ω) dP(ω) =

∫Ω

χ2δ(ω) dP(ω)

≤ 1

δ2

∫Ω

|Z(ω)− E(Z)|2 dP(ω) =V(Z)

δ2.


Now let Ym := 1m

∑mj=1 f(Xj) ≈ Eφ(f) and

E(f(Xj)) = Eφ(f), V(f(Xj)) = σ2

for all j = 1, . . . ,m and some σ > 0. Consequence:

E(Ym) = Eφ(f)

V(Ym) = V

(1

m

m∑j=1

f(Xj)

)=

1

m2

m∑j=1

V(f(Xj)

)︸︷︷︸=σ2

=σ2

m.

Applying Lemma 6.2.1 to Ym yields for all δ > 0

P(|Ym − Eφ(f)| ≥ δ

)≤ V(Ym)

δ2=

σ2

δ2m.

Now choose ε > 0 and let δ := σ√εm

:

P(|Ym − Eφ(f)| ≥ σ√

εm

)≤ ε

or equivalently

P(|Ym − Eφ(f)| < σ√

εm

)> 1− ε.

Interpretation:

Good approximation with high probability ⇐⇒ ε small,σ√εm

small.

Slow convergence: For fixed ε and σ, reducing the error by a factor of 10 comes at thecost of increasing the number of samples by a factor of 100.

Variance reduction

Idea: Try to decrease σ to improve the accuracy.

Method 1: Decomposition

Let g : Ω −→ R be a function such that

Eφ(g) =

∫Ω

g(x)φ(x) dx ≈∫Ω

f(x)φ(x) dx = Eφ(f)


and such that Eφ(g) can be computed analytically. Let

Ym =1

m

m∑j=1

f(Xj) ≈ Eφ(f) (as before)

Y ?m =

1

m

m∑j=1

g(Xj) ≈ Eφ(g)

and approximate

Eφ(f) ≈ Ym := Ym − Y ?m + Eφ(g).

Let Cov(Z1, Z2) be the covariance of two random variables Z1 and Z2, i.e.

Cov(Z1, Z2) = E((Z1 − E(Z1)

)(Z2 − E(Z2)

)).

Since

0 ≤ V(Z1 − Z2) = V(Z1) + V(Z2)− 2Cov(Z1, Z2) (6.2)

it follows that

Cov(Z1, Z2) ≤ 1

2

(V(Z1) + V(Z2)

). (6.3)

Idea: If g ≈ f , then we expect that Cov(Ym, Y?m) is nearly maximal, i.e.

Cov(Ym, Y?m) ≈ 1

2

(V(Ym) + V(Y ?

m)).

Hence, the new estimator Ym has a smaller variance:

V(Ym) = V (Ym − Y ?m) + V (Eφ(g))︸︷︷︸

=0

(6.2)= V (Ym) + V (Y ?

m)− 2Cov(Ym, Y?m) ≈ 0.

Method 2: Antithetic variates

Assumption: Ym is generated with normal random variables Xj ∼ N (0, 1).

Since −Xj ∼ N (0, 1), too, we define Y −m := 1m

∑mj=1 f(−Xj) ≈ Eφ(f) and put Ym =

12(Ym + Y −m ). Applying

V(Z1 + Z2) = V(Z1) + V(Z2) + 2Cov(Z1, Z2)

we obtain

V(Ym) =1

4V(Ym + Y −m ) =

1

4

(V(Ym) + V(Y −m )︸︷︷︸

=V(Ym)

+2Cov(Ym, Y−m ))

=1

2

(V(Ym) + Cov(Ym, Y

−m )). (6.4)


If Cov(Ym, Y−m ) > 0, then (6.3) yields

Cov(Ym, Y−m ) ≤ 1

2

(V(Ym) + V(Y −m )

)= V(Ym)

and hence it follows from (6.4) that

V(Ym) ≤ V(Ym) (=⇒ at least not worse).

If Cov(Ym, Y−m ) ≤ 0, then (6.4) yields

V(Ym) ≤ 1

2V(Ym) (=⇒ smaller variance).

Variance reduction by antithetic variates for SDEs

For SDE-based Monte-Carlo methods (cf. section 5.5), the mean-square-error depends onV(ψ(SN)

). This can be reduced by approximating the SDE

dS(t) = f(t, S(t)

)dt+ g

(t, S(t)

)dW (t), t ∈ [0, T ].

with Euler-Maruyama and antithetic variates:

S+n+1 = S+

n + τf(tn, S

+n

)+ g(tn, S

+n

)∆Wn

S−n+1 = S−n + τf(tn, S

−n

)− g(tn, S

−n

)∆Wn, n = 0, . . . , N − 1,

and then use the values Sn = 12

(S+n + S−n

).

6.3 Quasi Monte Carlo methods

Let C = [0, 1]d be the d-dimensional unit cube. Monte Carlo integration2:

E(f) :=

∫C

f(x) dx ≈ 1

m

m∑j=1

f(Xj)

with uniformly distributed random vectors Xj ∈ C.

Problem: Tuples of uniform random numbers are usually not homogeneously distributedin space.Quasi Monte Carlo methods use the same formula, but replace the random vectors Xj bydeterministic low-discrepancy point sequences.

2 This corresponds to (6.1) with φ(x) = 1C(x)/vol(C). Since vol(C) = 1 and 1C(x) = 1 for all x ∈ C,the density φ can be omitted.


0 0.5 10

0.5

1

Quasi−random numbers

0 0.5 10

0.5

1

Pseudo−random numbers

Definition 6.3.1 (Discrepancy) 1. LetR be the set of all axially parallel d-dimensionalrectangles R ⊂ C. The discrepancy of the points x1, . . . , xm ∈ C ⊂ Rd is

Dm := supR∈R

∣∣∣∣#of xi in R

m− vol(R)

∣∣∣∣where vol(R) =

∫R

1 dx denotes the volume of R.

2. The star discrepancy D?m is defined as Dm, but the supremum is only taken over

those R for which (0, . . . , 0) is one of the corners.

3. A sequence (xk)k∈N of points xk ∈ C is called low-discrepancy sequence if

Dm = O(

(logm)d

m

).

In this case, the xk are called quasi-random vectors.

Properties (without proofs):

• D?m ≤ Dm ≤ 2dD?

m

• The Koksma-Hlawka-Theorem provides the deterministic error bound∣∣∣E(f)− 1

m

m∑j=1

f(Xj)∣∣∣ ≤ TV (f) ·D?

m

where TV (f) is the total variation of f ; cf. (B.2). Numerical tests show that thisbound is often too pessimistic.

• It can be shown that

E(Dm) = O

(√log logm

m

)for randomly chosen sequences.


Examples:

For d = 1 the sequence with

xj =2j − 1

2m

has D?m = 1/(2m). This value is optimal. But: This sequence can only be used if m is

known a priory, and if m is changed, then all values change.

Van der Corput sequence:

1

2,

1

4,

3

4,

1

8,

5

8,

3

8,

7

8,

1

16, . . .

Algorithm: Represent the index j ∈ N as a binary number

j =L∑k=0

dk2k = (dLdL−1 . . . d1d0)2, dk ∈ 0, 1

and define

η2(j) =L∑k=0

dk2−k−1 = (.d0d1 . . . dL)2, dk ∈ 0, 1.

Interpretation: Reverse binary digits and put the radix point in front of the sequence.Example: j = 6 yields d2 = d1 = 1, d0 = 0 and hence η2(6) = 1

8+ 1

4+ 0 = 3

8.

Generalization: For an arbitrary base b ∈ N define the radical-inverse function

ηb(j) =∑k

dkb−k−1

where dk ∈ 0, 1, . . . , b− 1 are the coefficients from the representation j =∑

k dkbk.

The Halton sequence generates quasi-random vectors in the hypercube C = [0, 1]d byletting

xj =(ηp1(j), . . . , ηpd(j)

)where p1, . . . , pd are prime numbers with pi 6= pj for i 6= j.

Other possibility: Sobol sequence

Chapter 7

Finite-difference methods forparabolic differential equations

7.1 Motivation and model problem

Reminder (cf. 3.2): The value of an European option is the solution of the Black-Scholesequation

∂tV (t, S) +σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)− rV (t, S) = 0


V (T, S) = ψ(S) (payoff function).

Notation: T > 0 maturity, r > 0 interest rate, σ ∈ R volatility, S price of the underlying.

More complicated market models (i.e. with volatility σ = σ(t, S) leads to similar PDEsfor which no solution formulas are available; cf. 3.5.

Question: Numerical methods?

Basic types of PDEs

• Elliptic PDEs: Poisson equation

−∆u(x) = f(x), f(x) given, ∆u =d∑

k=1

∂2xku(x) Laplace operator

• Parabolic PDEs: Heat equation

∂tu(t, x) = ∆u(t, x)

73


• Hyperbolic PDEs: Wave equation

∂2t u(t, x) = ∆u(t, x)

The Black-Scholes equation is a parabolic PDE and can be transformed to the heatequation; cf. 3.3. Therefore, we will consider the following

Model problem: Heat equation on an interval with Dirichlet boundary conditions.

∂tu(t, x) = ∂2xu(t, x) t ∈ [0, tend], x ∈ (a, b) PDE (7.1a)

u(t, a) = ua(t), u(t, b) = ub(t) t ∈ [0, tend] boundary conditions (7.1b)

u(0, x) = u0(x) x ∈ [a, b] initial condition (7.1c)

The parameters a, b, tend, the boundary values ua(t) and ub(t) and the initial data u0(x)are given.

Notation: We say that g ∈ Cj([a, b]) if and only if x 7→ g(x) is j times continuouslydifferentiable on (a, b), and all derivatives can be extended to [a, b]. Moreover, let

‖g‖∞ = maxx∈[a,b]

|g(x)|

denote the maximum norm on [a, b].

7.2 Space discretization with finite differences

Choose 1 < m ∈ N, let h = (b− a)/m and xk = a+ k · h for k = 0, . . . ,m.Goal: Find v : [0, tend] −→ Rm−1 such that the entries vk(t) of the vector v(t) approximatethe values of the exact solution at the inner grid points, i.e.

vk(t) ≈ u(t, xk) (k = 1, . . . ,m− 1)

for all t ∈ [0, tend]. Approximate spatial derivatives by difference quotients.

Lemma 7.2.1 (difference quotients) The derivatives of a function y : [a, b] −→ R canbe approximated as follows:

• If y ∈ C2([a, b]), then

maxk=1,...,m−1

∣∣∣∣y′(xk)− y(xk + h)− y(xk)

h

∣∣∣∣ ≤ Ch‖y′′‖∞ (7.2)


maxk=1,...,m−1

∣∣∣∣y′(xk)− y(xk)− y(xk − h)

h

∣∣∣∣ ≤ Ch‖y′′‖∞ (7.3)



maxk=1,...,m−1

∣∣∣∣y′(xk)− y(xk + h)− y(xk − h)

2h

∣∣∣∣ ≤ Ch2

∥∥∥∥d3y

dx3

∥∥∥∥∞

(7.4)


maxk=1,...,m−1

∣∣∣∣y′′(xk)− y(xk + h)− 2y(xk) + y(xk − h)

h2

∣∣∣∣ ≤ Ch2

∥∥∥∥d4y

dx4

∥∥∥∥∞

(7.5)

The constant C is independent of h and can have different values in each case.

Proof: Use Taylor’s theorem (exercise).

Since y(xk ± h) = y(xk±1), applying (7.5) to the heat equation yields

∂tu(t, xk) = ∂2xu(t, xk) ≈

u(t, xk+1)− 2u(t, xk) + u(t, xk−1)

h2(7.6)

for all k = 1, . . . ,m − 1 and t ∈ [0, tend]. The boundary values u(t, x0) = ua(t) andu(t, xm) = ub(t) are known from (7.1b).

Reformulation in matrix-vector notation: Define

u(t) =(u(t, x1), . . . , u(t, xm−1)

)T∈ Rm−1 (7.7)

g(t) =(ua(t), 0, 0, . . . , 0, 0, ub(t)

)T∈ Rm−1 (7.8)

and the matrix

A =

−2 1 0 · · · · · · · · · 0

1 −2 1 0...

0 1 −2 1 0...

.... . . . . . . . . . . . . . .

...... 0 1 −2 1 0... 0 1 −2 10 · · · · · · · · · 0 1 −2

∈ R(m−1)×(m−1). (7.9)

Then (7.6) is equivalent (check!) to

u′(t) ≈ 1

h2Au(t) +

1

h2g(t).

The approximation v(t) ≈ u(t) is now defined as the solution of the initial value problem

v′(t) =1

h2Av(t) +

1

h2g(t) (7.10a)

v(0) = u(0). (7.10b)

Hence, the space discretization turns the PDE into an ODE.


Properties of the discretization matrix

Lemma 7.2.2 For k = 1, . . . ,m− 1, the vectors

νk :=

(sin

(kπ

m

), sin

(2kπ

m

), . . . , sin

((m− 1)kπ

m

))T∈ Rm−1

are eigenvectors of the matrix A, and the corresponding eigenvalues are

λk := 2

(cos

(kπ

m

)− 1

)∈ (−4, 0).

Proof: Exercise.

Definition 7.2.3 (scaled norm) Let | · |m be the scaled vector norm defined by

|v|m =

(1

m− 1

m−1∑i=1

v2i

)1/2

=

√vTv√m− 1

v ∈ Rm−1

For m = 2, this coincides with the 2-norm of a vector without scaling, i.e. |v|2 =√vTv.

The induced matrix norm of a matrix M ∈ R(m−1)×(m−1) is again denoted by |M |m. Notethat

|M |m = supv 6=0

|Mv|m|v|m

= supv 6=0

1√m−1

√vTMTMv

1√m−1

√vTv

= supv 6=0

|Mv|2|v|2

= |M |2. (7.11)

Remark: If f ∈ L2([a, b]) is square integrable with f(a) = f(b) = 0 and vk = f(xk) fork = 1, . . . ,m− 1, then

‖f‖L2([a,b]) =

b∫a

|f(x)|2 dx

1/2

≈

(b− am

m−1∑k=1

|f(xk)|2)1/2

≈√b− a · |v|m

for sufficiently large m.

Lemma 7.2.4 For any s ≥ 0, 1 < m ∈ N and h = (b− a)/m we have∣∣esA∣∣m≤ 1,

∣∣(I − sA)−1∣∣m≤ 1

where eM =∞∑k=0

Mk

k!is the matrix exponential function and I is the identity matrix.


Proof. Since A is symmetric, there is an orthogonal matrix Q such that A = QΛQT with

Λ = diag(λ1, . . . , λm−1), λk = 2

(cos

(kπ

m

)− 1

)< 0.

Hence,

A2 = QΛQTQΛQT = QΛ2QT , Ak = QΛkQT

and as a consequence

esA =∞∑k=0

Q(sΛ)k

k!QT = QesΛQT .

Since Q is orthogonal, it follows from (7.11) that |Q|m = |Q|2 = 1. Since esΛ is a diagonalmatrix and sλk ≤ 0, we have1 ∣∣esΛ∣∣

m= sup

k=1,...,m−1|esλk | ≤ 1.

This yields the bound ∣∣esA∣∣m≤ |Q|m ·

∣∣esΛ∣∣m· |QT |m ≤ 1.

In a similar way, it can be checked that

(I − sA)−1 = Q (I − sΛ)−1QT

which yields the bound

∣∣(I − sA)−1∣∣m≤ |Q|m ·

∣∣(I − sΛ)−1∣∣m· |QT |m = max

k=1,...,m−1

1

1− sλk≤ 1

since sλk ≤ 0.

1 If D = diag(d1, . . . , dm−1) is a diagonal matrix and dmax := maxk=1,...,m−1 |dk| is the largest (inmodulus) diagonal entry, then for every v ∈ Rm−1

|Dv|2 =

√√√√m−1∑k=1

(dkvk)2 ≤

√√√√m−1∑k=1

(dmaxvk)2 = dmax|v|2.

This means that |D|2 ≤ dmax. Now suppose that ` is the index where the maximum is attained, i.e.|dmax| = |d`|. If we choose v to be the `-th canonical basis vector (i.e. vk = 0 for k 6= ` and v` = 1), then|Dv|2 = dmax|v|2. Hence, |D|2 = dmax, and with (7.11) we also obtain |D|m = dmax.


Theorem 7.2.5 (error of the space discretization) Let u = u(t, x) be the solutionof the model problem (7.1a), (7.1b), (7.1c). Assume that t 7→ u(t, x) ∈ C1([0, tend]) andthat x 7→ u(t, x) ∈ C4([a, b]) for all t ∈ [0, tend] with

sups∈[0,tend]

∥∥∂4xu(s, ·)

∥∥∞ <∞.

Then, there is a constant C such that

|u(t)− v(t)|m ≤ Ctendh2 sups∈[0,tend]

∥∥∂4xu(s, ·)

∥∥∞

for all t ∈ [0, tend].

Proof. By definition of u we have

u′k(t) = ∂tu(t, xk) = ∂2xu(t, xk) =

u(t, xk+1)− 2u(t, xk) + u(t, xk−1)

h2+ rk(t)

for k = 1, . . . ,m− 1, with remainder term rk(t) be bounded by

|rk(t)| ≤ Ch2∥∥∂4

xu(t, ·)∥∥∞

according to Lemma 7.2.1. Setting r := (r1, . . . , rm−1)T , this is equivalent to

u′(t) =1

h2Au(t) +

1

h2g(t) + r(t).

Comparing with (7.36) shows that the error solves the ODE

d

dt

(u(t)− v(t)

)=

1

h2A(u(t)− v(t)

)+ r(t).

with initial value u(0) − v(0) = 0. The solution is given by the variation-of-constantsformula

(u(t)− v(t)

)= etA/h

2 (u(0)− v(0)

)︸︷︷︸=0

+

t∫0

e(t−s)A/h2r(s) ds,

and hence

∣∣u(t)− v(t)∣∣m≤

t∫0

∣∣∣e(t−s)A/h2∣∣∣m︸︷︷︸

≤1

|r(s)|m ds ≤t∫

0

Ch2∥∥∂4

xu(s, ·)∥∥∞ ds

≤ Ctendh2 sups∈[0,tend]

∥∥∂4xu(s, ·)

∥∥∞ .

Remark. The regularity assumptions are very strong and exclude payoff functions asinitial data.


7.3 Time discretization with Runge-Kutta methods

(a) Runge-Kutta methods

Consider the initial value problem

y′(t) = f(t, y), t ∈ [t0, tend], y(0) = y0 (7.12)

with appropriate function f : [t0, tend]×Rd −→ Rd (e.g. y 7→ f(t, y) Lipschitz continuous).Choose N ∈ N, let τ = (tend − t0)/N and tn = t0 + nτ .

Goal: Find approximations yn ≈ y(tn), n = 0, 1, . . . , N . Ansatz:

y(tn+1) = y(tn) +

tn+1∫tn

y′(s) ds = y(tn) +

tn+1∫tn

f(s, y(s)

)ds

Approximate the integral by the quadrature formula

y(tn + τ) ≈ y(tn) + τs∑i=1

bif(tn + ciτ, y(tn + ciτ)

)with s ∈ N and coefficients bi and ci. Apply the same procedure to y(tn + ciτ): Approxi-mate

y(tn + ciτ) = y(tn) +

tn+ciτ∫tn

f(s, y(s)

)ds

≈ y(tn) + τs∑j=1

aijf(tn + cjτ, y(tn + cjτ)

), i = 1, . . . , s

with coefficients aij. This yields the Runge-Kutta method: For each n = 0, . . . , N − 1solve the nonlinear system

Yi = yn + τ

s∑j=1

aijf(tn + cjτ, Yj

), i = 1, . . . , s (7.13a)

(e.g. by a version of Newton’s method) and let

yn+1 = yn + τ

s∑i=1

bif(tn + ciτ, Yi

). (7.13b)

Each Runge-Kutta method is characterized by its coefficients aij, bi, cj. These are repre-sented in the Butcher tableau:

c1 a11 · · · a1s

......

...

cs as1 · · · ass

b1 · · · bs


Examples:

• Explicit Euler method: yn+1 = yn + τf(tn, yn)

• Implicit Euler method: yn+1 = yn + τf(tn+1, yn+1)

• Trapezoidal rule: yn+1 = yn + τ2

(f(tn, yn) + f(tn+1, yn+1)

)0 0

1

1 1

1

0 0 01 1/2 1/2

1/2 1/2

explicit Euler implicit Euler trapezoidal rule

The Runge-Kutta method (7.13a), (7.13b) is explicit if aij = 0 for all j ≥ i.

Notation: The approximation of a Runge-Kutta method after n steps with step-size τand initial value y? is denoted by Φn

τ (y?).

(b) Order and order conditions

Definition 7.3.1 A Runge-Kutta method applied to (7.12) has order p

‖y(t1)− y1‖ ≤ Cτ p+1 (7.14)

for all sufficiently smooth functions f and sufficiently small step-sizes τ > 0. The constantmay depend on f but must be independent of τ .

Remark: The term ‖y(t1) − y1‖ = ‖y(t1) − Φτ (y0)‖ is the local error, i.e. the approx-imation error after only one step of the method. If the method is stable, then (7.14)implies the bound

‖y(tn)− yn‖ ≤ Cτ p, yn = Φnτ (y0)

for the global error after n steps; cf. chapter II.3 in [HNW10].

Order conditions (cf. chapter II.2 in [HNW10]):

Order 1:s∑j=1

bj = 1,s∑j=1

aij = ci for all i = 1, . . . , s (7.15)

Order 2:s∑j=1

bjcj =1

2and (7.15) (7.16)

Order 3:s∑j=1

bjc2j =

1

3,

s∑i=1

bi

s∑j=1

aijcj =1

6and (7.15), (7.16) (7.17)

More complicated conditions for higher order. Both Euler methods have order 1, thetrapezoidal rule has order 2.


(c) A-stability

Consider the heat equation after space discretization (cf. (7.10))

v′(t) =1

h2Av(t) +

1

h2g(t)

For simplicity: Assume homogeneous boundary conditions in (7.1b), i.e. u(t, a) = u(t, b) =0 and hence g(t) = 0.As in the proof of Lemma 7.2.4 we consider the eigendecomposition

A = QΛQT , Λ = diag(λ1, . . . , λm−1), λk < 0 (7.18)

with |Q|m ≤ 1 and λk ∈ (−4, 0). The exact solution

v(t) = exp(tA/h2

)v(0) = Q exp

(tΛ/h2

)QTv(0)

remains bounded for all t ≥ 0 because

|v(t)|m ≤∣∣exp

(tΛ/h2

)∣∣m︸︷︷︸

≤1

|v(0)|m.

Does the numerical approximation have the same property?

Explicit Euler method:

vn+1 = vn +τ

h2Avn, Φn

τ (v0) =(I +

τ

h2A)nv0.

The approximations are bounded by

|vn|m ≤∣∣∣I +

τ

h2A∣∣∣nm|v0|m =

∣∣∣I +τ

h2Λ∣∣∣nm|v0|m = max

k=1,...,m−1

∣∣∣∣1 +τλkh2

∣∣∣∣n |v0|m

Hence, the numerical solution remains bounded for all n ∈ N if∣∣∣∣1 +τλkh2

∣∣∣∣ ≤ 1 ⇐⇒ τ ≤ 2h2

|λk|

for all k = 1, . . . ,m− 1. Since maxk=1,...,m−1 |λk| ≈ 4, we obtain the stability condition

τ ≤ h2

2.

This is a severe restriction, because h 1 must be small to ensure an acceptable accuracyof the spatial approximation. For larger step-sizes, the norm of the numerical solutionmay tend to ∞ whereas the exact solution remains bounded. Reducing the step size τ ,however, increases the number of steps and hence the numerical costs. Inefficient!


Definition 7.3.2 (A-stability) The initial value problem

y′ = λy, t ≥ 0, y(0) = y0, (7.19)

is called Dahlquist’s test equation. A one-step method (e.g. Runge-Kutta) is calledA-stable if the numerical solution (yn)n∈N of (7.19) with arbitrary λ ∈ C, Re(λ) ≤ 0and arbitrary step-size τ > 0 remains bounded for all n ∈ N.

Remark. If νk is the k-th eigenvector of A, then

v′(t) =1

h2Av(t), v(0) = νk

is equivalent to (7.19) with λ := λk/h2 < 0.

The explicit Euler method (and every other explicit Runge-Kutta method) is not A-stable.

Implicit Euler method:

The implicit Euler

vn+1 = vn +τ

h2Avn+1 or equivalently

(I − τ

h2A)vn+1 = vn

is the simplest A-stable method. Hence, we expect that the corresponding numericalapproximation of

v′(t) =1

h2Av(t)

remains bounded without any step-size restrictions. This is indeed the case: Lemma 7.2.4implies that

|vn|m ≤∣∣∣∣(I − τ

h2A)−1∣∣∣∣m

|vn−1|m ≤ |vn−1|m

and hence |vn|m ≤ |v0|m for all n ∈ N.

Trapezoidal rule:

The trapezoidal rule is A-stable (exercise).

(d) Error bound for the implicit Euler method

Let 〈z, z〉 = zT z denote the Euclidean scalar product on Rd with norm |z| =√〈z, z〉.


Theorem 7.3.3 (error bound for the implicit Euler method) Let f : [t0, tend]×Rd −→Rd be C1 and assume that f satisfies the one-sided Lipschitz condition⟨

f(t, z)− f(t, z), z − z⟩≤ `|z − z|2 (7.20)

with a constant ` ∈ R for all t ∈ [t0, tend] and z, z ∈ Rd. Let y(t) be the exact solution ofthe initial-value problem

y′ = f(t, y), t ∈ [t0, tend], y(t0) = y0,

and let yn ≈ y(tn) be the approximations computed with the implicit Euler method withstep-size τ = (tend − t0)/N . If τ` < 1, then the global error is bounded by

maxn=0,...,N

|yn − y(tn)| ≤ C

2max

t∈[t0,tend]|y′′(t)| · τ (7.21)

with

C :=

1/|`| if ` < 0

tend − t0 if ` = 0

e`(tend−t0)/(1−τ`) − 1

`if ` > 0.

Remarks:

• The assumption τ` < 1 guarantees that the nonlinear problem which has to besolved in each time-step has indeed a solution; cf. Satz 75.1, p. 561, in [HB09].

• For large positive ` 1, the implicit Euler method is usually not suited, becausethen the condition τ` < 1 imposes very small time-steps. The error bound alsoshows that this condition is not sufficient for an accurate approximation: If τ` < 1but τ` ≈ 1, then e`(tend−t0)/(1−τ`) 1 and hence C 1.

In many applications, however, we have ` ≤ 0. In this case, the assumption τ` < 1is not a restriction.

• For Dahlquist’s test equation (7.19) with real λ < 0, we can choose ` = λ.

• If ` < 0 and ` ≈ 0, then the constant C = 1/|`| in the error bound is very large. Inthis case, however, we can simply choose the slightly larger ` = 0 in the one-sidedLipschitz condition (7.20). This yields an error bound with C = tend − t0.

Proof. For simplicity consider only autonomous ODEs, i.e. y′ = f(y). Implicit Euler:

yn+1 = Φτ (yn) = yn + τf(yn+1) = yn + τf(

Φτ (yn)). (7.22)


Step 1: Local error. Taylor expansion of the exact solution:

y(tn) = y(tn+1)− τ y′(tn+1)︸︷︷︸=f(y(tn+1))

+dn+1 (7.23)

with |dn+1| ≤1

2max

t∈[t0,tend]|y′′(t)|︸︷︷︸

=:C

τ 2.

(7.22), (7.23) and the Cauchy-Schwarz inequality yield

|y(tn+1)− Φτ

(y(tn)

)|2

=

⟨y(tn+1)︸︷︷︸

=... (7.23)

−Φτ

(y(tn)

)︸︷︷︸=... (7.22)

, y(tn+1)− Φτ

(y(tn)

)⟩

=⟨y(tn) + τf

(y(tn+1)

)− y(tn)− τf

(Φτ

(y(tn)

)), y(tn+1)− Φτ

(y(tn)

)⟩−⟨dn+1, y(tn+1)− Φτ

(y(tn)

)⟩(7.20)

≤ τ`∣∣y(tn+1)− Φτ

(y(tn)

)∣∣2 + |dn+1| ·∣∣y(tn+1)− Φτ

(y(tn)

)∣∣and hence

(1− τ`)∣∣y(tn+1)− Φτ

(y(tn)

)∣∣2 ≤ |dn+1| ·∣∣y(tn+1)− Φτ

(y(tn)

)∣∣ .Since 1− τ` > 0 by assumption, it follows that

∣∣y(tn+1)− Φτ

(y(tn)

)∣∣ ≤ 1

(1− τ`)|dn+1| ≤

C

(1− τ`)τ 2.

Step 2: Stability. For any z, z ∈ Rd we have

|Φτ (z)− Φτ (z)|2

=⟨z + τf

(Φτ (z)

)− z − τf

(Φτ (z)

),Φτ (z)− Φτ (z)

⟩(7.20)

≤ |z − z| · |Φτ (z)− Φτ (z)|+ τ`|Φτ (z)− Φτ (z)|2.

Using again that 1− τ` > 0 by assumption, this yields

|Φτ (z)− Φτ (z)| ≤ 1

1− τ`|z − z|.

For the error after k steps, this implies

|Φkτ (z)− Φk

τ (z)| ≤ 1

(1− τ`)k|z − z|.


Step 3: Error accumulation and global error. We represent the global error bythe telescoping sum (“Lady Windermere’s fan”)

y(tn)− yn = Φ0τ

(y(tn)

)− Φn

τ (y0)

=n−1∑k=0

(Φkτ

(y(tn−k)

)− Φk+1

τ

(y(tn−k−1)

)). (7.24)

From step 1 and 2, we know that

Figure 7.1: Lady Windermere’s fan. The term was coined in [HNW10]as an allusion to a play of the same title by Oscar Wilde. Thanks toLukas Baron for providing this picture.

∣∣Φkτ

(y(tn−k)

)− Φk+1

τ

(y(tn−k−1)

)∣∣ ≤ 1

(1− τ`)k∣∣y(tn−k)− Φτ

(y(tn−k−1)

)∣∣≤ C

(1− τ`)k+1τ 2.

Taking norms in (7.24) and applying the triangle inequality thus gives for ` 6= 0

|y(tn)− yn| ≤C

1− τ`τ 2

n−1∑k=0

1

(1− τ`)k=

C

1− τ`τ 2 · (1− τ`)−n − 1

(1− τ`)−1 − 1

= Cτ · (1− τ`)−n − 1

`

• If ` < 0, then

(1− τ`)−n − 1

`=

1− (1 + τ |`|)−n

|`|≤ 1

|`|

because (1 + τ |`|)−n > 0.


• If ` > 0, then 1 − τ` < 1. By assuption, we also know that 1 − τ` > 0, i.e.1− τ` ∈ (0, 1) and hence (1− τ`)−n > 1. This yields

(1− τ`)−n =

(1 +

τ`

1− τ`

)n≤(eτ`/(1−τ`)

)n= enτ`/(1−τ`) ≤ e`(tend−t0)/(1−τ`)

because 1 + ξ ≤ eξ for all ξ ∈ R and nτ ≤ tend − t0 for all n = 0, . . . , N .

• The case ` = 0 is left as an exercise.

General principle: Consistency + stability =⇒ convergence

7.4 Approximation of the heat equation in time and

space

Back to the heat equation:

∂tu(t, x) = ∂2xu(t, x) t ∈ [0, tend], x ∈ [a, b]

u(t, a) = ua(t), u(t, b) = ub(t) t ∈ [0, tend]

u(0, x) = u0(x) x ∈ [a, b],

cf. (7.1a)-(7.1c). Semi-discretization in space leads to the initial-value problem

v′(t) =1

h2Av(t) +

1

h2g(t)

v(0) = u(0)

with vk(t) ≈ u(t, xk) for k = 1, . . . ,m− 1; cf. (7.10).

Apply the implicit Euler method:

wn+1 = wn +τ

h2Awn+1 +

τ

h2g(tn+1)

Remark: To avoid confusion, we point out that vk(t) ∈ R is a scalar for each t (namelythe k-th entry of v(t)), whereas wn ∈ Rm−1 is a vector for each n. The k-th entry of wnis denoted by wn,k. By definition, we expect that

wn ≈ v(tn), wn,k ≈ vk(tn) ≈ u(tn, xk).

In order to apply Theorem 7.3.3, we need the following


Lemma 7.4.1 For every 1 < m ∈ N and h = (b− a)/m there is a constant `m < 0 suchthat the function

F (t, w) =1

h2Aw +

1

h2g(t)

satisfies the one-sided Lipschitz condition⟨F (t, w)− F (t, w), w − w

⟩≤ `m|w − w|2

for all t ∈ [t0, tend] and w, w ∈ Rm−1.

Proof. The eigendecomposition A = QΛQT from (7.18) yields⟨F (t, w)− F (t, w), w − w

⟩=

1

h2

⟨A(w − w), w − w

⟩=

1

h2

⟨ΛQT (w − w), QT (w − w)

⟩≤ 1

h2max

k=1,...,m−1λk︸︷︷︸

=:`m<0

∣∣QT (w − w)∣∣2

= `m∣∣w − w∣∣2.

Remark. The constant `m depends on m, but we can simply choose ` = 0 in order toobtain a one-sided Lipschitz bound with a constant independent of m.

Corollary 7.4.2 (total error in time and space) Under the assumptions of Theorem 7.2.5the approximation obtained with the implicit Euler method in time and finite differencesin space is bounded by

maxn=1,...,N

|u(tn)− wn|m ≤ Ktend(h2 + τ)

where

u(t) =(u(t, x1), . . . , u(t, xm−1)

)T∈ Rm−1. (7.25)

The constant K does not depend on h or τ , but on the regularity of the solution.

Proof. For all n ∈ N and 1 < m ∈ N, Theorem 7.2.5 and 7.3.3 imply

|u(tn)− wn|m ≤ |u(tn)− v(tn)|m + |v(tn)− wn|m︸︷︷︸≤|v(tn)−wn|2

≤ C1tendh2 supt∈[0,tend]

∥∥∂4xu(t, ·)

∥∥∞ + tend

C2

2max

t∈[0,tend]|v′′(t)|2 · τ


If the initial-value problem for v(t) is solved with the trapezoidal rule instead of theimplicit Euler, we obtain the error bound

maxn=1,...,N

|u(tn)− wn|m ≤ Ktend(h2 + τ 2)

because the order of the trapezoidal rule is 2. (The combination of the trapezoidal rulein time and finite differences in space is called the Crank-Nicolson in the literature.)Nevertheless, the results obtained with the implicit Euler method are sometimes betterbecause this method is L-stable such that errors committed in previous steps are damped;cf. IV.3 in [HW10]

If a Runge-Kutta method of order p is applied, then one would expect that the corre-sponding total error could be bounded by Ktend(h

2 + τ p). Unfortunately, this is in generalnot the case! The reason is that the error analysis for ODEs is based on regularity as-sumptions of the right-hand side which are not given in case of the heat equation. Roughlyspeaking, this is due to the fact that for m −→∞ we have |A|/h2 −→∞, and the problembecomes “infinitely stiff”. This leads to order reduction; cf. IV.15 in [HW10].

Solving the linear systems

A disadvantage of implicit Runge-Kutta methods is the fact that in each step a system of(typically nonlinear) equations has to be solved. In case of the heat equation, this systemis linear. If the implicit Euler method is applied to (7.10), then we have to solve

Mwn+1 = wn +τ

h2g(tn+1) =: z with M := I − τ

h2A.

It can be shown that there is a decomposition M = LDLT with

D = diag(d1, . . . , dm−1), L =

1 0 · · · · · · · · · 0

l2 1 0...

0 l3 1 0...

.... . . . . . . . . . . .

...... 0 lm−2 1 00 · · · · · · 0 lm−1 1

,

and D and L can be computed with O(m) operations. Hence, the solution of Mwn+1 = zcan be computed by solving Ly = b and then DLTwn+1 = y by simple backward iteration.One can as well use the Cholesky decomposition M = LLT to solve the linear system. Itcan be shown that the Cholesky decomposition does not produce any fill-in, i.e. Ljk = 0for j 6∈ k, k + 1.


7.5 Application to the Black-Scholes equation

Consider a European capped symmetric power call, i.e. a European call option with payoff

V (T, S) = min(L, ((S −K)+)p

)(7.26)

and maturity T > 0, strike K > 0 and parameters p, L > 0.

• “capped”: V (T, S) ≤ L bounded

• “power”: ((S −K)+)p instead of (S −K)+

• “symmetric”: ((S −K)+)p instead of (Sp −K)+

For p = 1 and L =∞: standard European callThe value V (t, S) of the option evolves according to the Black-Scholes equation

∂tV (t, S) +σ2

2S2∂2

SV (t, S) + rS∂SV (t, S)− rV (t, S) = 0, S ∈ [0,∞), t ∈ [0, T ]

with volatility σ > 0, interest rate r > 0 and terminal condition (7.26).

(a) Truncation of the domain

Replace [0,∞) by [0, S] with a sufficiently large S > K + p√L and boundary condition

V (t, S) = Le−r(T−t).

(b) Time inversion

Let u(t, S) := V (T − t, S). Then, the problem reads

∂tu =σ2

2S2∂2

Su+ rS∂Su− ru, S ∈ [0, S], t ∈ [0, T ]

u(0, S) = min(L, ((S −K)+)p

)S ∈ [0, S]

u(t, S) = Le−rt t ∈ [0, T ]

No boundary conditions for u(t, 0) are required, cf. 3.2.

(c) Space discretization

Choose 1 < m ∈ N, let h = S/m and Sk = k · h for k = 0, . . . ,m. Approximate

∂2Su(t, Sk) ≈

u(t, Sk+1)− 2u(t, Sk) + u(t, Sk−1)

h2

∂Su(t, Sk) ≈u(t, Sk+1)− u(t, Sk−1)

2h


for all k = 1, . . . ,m− 1 and t ∈ [0, T ]. This yields the ODE

v′(t) = Mv(t) + g(t)

M :=σ2

2D2 1

h2A+ rD

1

2hB − rI

with A as before and

D = diag(S1, . . . , Sm−1) ∈ R(m−1)×(m−1)

B =

0 1 0 · · · · · · 0

−1 0 1. . .

...

0 −1. . . . . . . . .

......

. . . . . . . . . 1 0...

. . . −1 0 10 · · · · · · 0 −1 0

∈ R(m−1)×(m−1)

g(t) =(0, . . . , 0, gm−1(t)

)T, gm−1(t) =

(σ2

2h2S2m−1 +

r

2hSm−1

)u(t, S).

The initial condition is

v(0) =(u(0, S1), . . . , u(0, Sm−1)

)T.

(d) Time discretization

Solve the ODE with the implicit Euler method

(I − τM)wn+1 = wn + τg(tn+1)

or the trapezoidal rule

(I − τM/2)wn+1 = (I + τM/2)wn +τ

2

(g(tn) + g(tn+1)

).

Final result: wn,k ≈ u(tn, Sk) for all n = 0, . . . , N and k = 1, . . . ,m− 1.

(e) Numerical experiments

See slides. The numerical examples show that the expected order of convergence is notachieved in practice because the payoff function (i.e. the initial data) does not have therequired C4 regularity. Nevertheless, we still observe convergence of the methods at somelower order. This calls for an explanation.


7.6 Non-smooth initial data

(a) Parabolic smoothing

For the our analysis, we consider again the model problem (7.1), i.e. the heat equation onan interval with Dirichlet boundary condition. This time, we choose (a, b) = (0, π) andua(t) = ub(t) ≡ 0 for simplicity:

∂tu(t, x) = ∂2xu(t, x) t ∈ [0, tend], x ∈ (0, π) (7.27a)

u(t, 0) = u(t, π) = 0 t ∈ [0, tend] (7.27b)

u(0, x) = u0(x) x ∈ [0, π] (7.27c)

Theorem 7.6.1 (Solution of the heat equation) If u0 is continuous and piecewisecontinuously differentiable with finitely many “kinks”, then the unique solution of (7.27a)-(7.27c) is given by the Fourier series

u(t, x) =∞∑k=1

ck sin(kx)e−k2t with ck =

2

π

π∫0

u0(x) sin(kx) dx. (7.28)

Sketch of the proof. It can be shown that the Fourier series of the initial data

u0(x) =∞∑k=1

ck sin(kx) with ck =2

π

π∫0

u0(x) sin(kx) dx

converges uniformly, and that

∞∑k=1

|ck| <∞; (7.29)

cf. §6, 2.8 in volume 2 of [FK08]. Hence, limk→∞ ck = 0, and c := supk∈N |ck| <∞ exists.Choose a fixed t > 0. Since the Fourier series

u(t, x) =∞∑k=1

ck sin(kx)e−k2t (7.30)

is dominated by (7.29), it converges uniformly, and u(t, ·) is thus continuous. It is knownthat

∞∑k=1

kpe−k2t (7.31)

converges for every fixed p ∈ N. Hence, the series

∞∑k=1

kcke−k2t cos(kx) (7.32)


converges uniformly, because

|kcke−k2t cos(kx)| ≤ c|ke−k2t|. (7.33)

This means that (7.32) coincides with ∂xu(t, x). In a similar way, we obtain from (7.31)with p = 2 that

−∞∑k=1

k2cke−k2t sin(kx) = ∂2

xu(t, x). (7.34)

Now it can easily be checked that each term of the Fourier series (7.30) solves (7.27a) and(7.27b). (7.27c) follows from the fact that

π∫0

sin(jx) sin(kx) dx =

0 if j 6= kπ2

if j = k.

Uniqueness follows from the maximum principle. See exercises for details.

Remark:

1. The function

u(t, x) = u0 +uπ − u0

πx+ u(t, x)

solves (7.27a) in the special case a = 0, b = π with arbitrary but constant u0 anduπ. Solutions on arbitrary intervals can be constructed by rescaling.

2. Since (7.1a) involves ∂2xu(t, x), one may expect that all solutions of the PDE are

twice continuously differentiable with respect to x. Theorem 7.6.1 shows, however,that solutions with lower regularity of the initial data exist.

3. All terms in the series representation (7.35) oscillate in space due to the term sin(kx),and the higher k, the faster the oscillations. The k-th term, however, is multipliedwith cke

−k2t, i.e. it decays exponentially as time evolves, and the larger k, the fasterthe decay. This has a surprising consequence:


Theorem 7.6.2 (Parabolic smoothing) If u0 is continuous and piecewise continu-ously differentiable with finitely many “kinks”, and if u(t, x) is the solution of (7.27a)-(7.27c), then

x 7→ u(t, x) ∈ C∞(0, π) (7.35)

for every t > 0.

This means that after an arbitrarily short time the solution is infinitely smooth althoughthe initial data may have a much lower regularity. Hence, only t ≈ 0 is critical for theapproximation with finite differences.

Proof. For any q ∈ N, the argument in the proof of Theorem 7.6.1 can be applied to

±∞∑k=1

k2q+1cke−k2t cos(kx) and ±

∞∑k=1

k2qcke−k2t sin(kx).

This yields the assertion.

(b) Alternative error bound for the implicit Euler method

After discretizing (7.27) in space with finite differences, we obtain the ODE

v′(t) =1

h2Av(t), v(0) = u(0). (7.36)

with A ∈ R(m−1)×(m−1) defined in (7.9). Let wn ≈ v(tn) be the approximation given bythe implicit Euler method:

wn+1 = wn +τ

h2Awn+1, w0 = v(0)

with N ∈ N, τ = tend/N , n = 0, . . . N − 1.

Theorem 7.6.3 For all n = 1, . . . N , the error of the implicit Euler method is boundedby

|v(tn)− wn|m ≤ Cτ

tn|v(0)|m

with a constant C ≥ 0 which does not depend on m, n or τ .


Interpretation. This result differs from the corresponding error bounds in sections 7.3and 7.4 in several ways. First, the term 1/tn is new. This term means that for small n(i.e. in the first steps), the error can be very large, but it also means that for large tn theerror will vanish. This is exactly what we have observed in the numerical simulations ofthe power option in section 7.5 (e). Moreover, the error does not depend on the term

maxt∈[t0,tend]

|v′′(t)|2 = maxt∈[t0,tend]

|A2v(t)|2h4

≥ |A2v(0)|2h4

(7.37)

Note that

|A2v(0)|2h4

−→∞ for m −→∞, h =π

m−→ 0 (7.38)

if the initial data are not C4.A more general result is shown in [Tho06], Theorem 7.2, p. 117.

Numerical illustration: See slides.

Proof. As in the proof of Lemma 7.2.4 we consider the diagonalization of A, i.e. A =QΛQT with

Q orthogonal, Λ = diag(λ1, . . . , λm−1), λk ∈ (−4, 0) eigenvalues.

Since QTQ = I the function y(t) := QTv(t) solves the ODE

y′(t) = QTv′(t) =1

h2QTQΛQTv(t) =

1

h2Λy(t),

and hence y(t) = etΛ/h2y(0). In a similar way, we transform the approximations wn of the

implicit Euler method: If we let yn = QTwn, then

yn = yn−1 +τ

h2Λyn, yn = (I − τ

h2Λ)−1yn−1 = (I − τ

h2Λ)−ny0.

Hence, we only have to show that

|etnλ/h2 − (1− τ

h2λ)−n| ≤ C

τ

tnfor all λ ∈ (−4, 0), h > 0, n = 1, . . . , N.

or, with µ = τλ/h2, τ/tn = 1/n and R(µ) = 11−µ , that

|enµ −Rn(µ)| ≤ C

nfor all µ ∈ (−∞, 0) and n = 1, . . . , N. (7.39)

Case 1: µ ∈ [−1, 0). Comparing the Taylor expansion

eµ = 1 + µ+

∫ |µ|0

∫ s

0

e−r dr ds


with

R(µ) = 1 + µ+µ2

1− µ

yields for all µ ∈ [−1, 0) the “local” error bound

|eµ −R(µ)| ≤∫ |µ|

0

∫ s

0

|e−r|︸︷︷︸≤1

dr ds+µ2

1− µ

≤∫ |µ|

0

∫ s

0

1 dr ds︸︷︷︸=µ2

2

+µ2 =3

2µ2. (7.40)

Moreover, it can be shown2 that

R(µ) ≤ eγµ, γ = log(2) ≈ 0.693 < 1 for all µ ∈ [−1, 0). (7.41)

Substituting (7.40) and (7.41) into the telescoping sum

enµ −Rn(µ) =n−1∑k=0

R(µ)n−1−k(eµ −R(µ))ekµ

yields

|enµ −Rn(µ)| ≤n−1∑k=0

Rn−1−k(µ) |eµ −R(µ)| ekµ

≤ 3

2µ2

n−1∑k=0

eγµ(n−1−k)ekµ

≤ 3

2µ2eγµ(n−1)

n−1∑k=0

ekµ(1−γ).

Since 1− γ = 1− log(2) > 0 and µ < 0, it follows that

n−1∑k=0

ekµ(1−γ) ≤n−1∑k=0

1 = n

and hence for µ ∈ [−1, 0)

|enµ −Rn(µ)| ≤ 3

2nµ2eγµn e−γµ︸︷︷︸

≤eγ=

3

2(µn)2eγµneγ︸︷︷︸

≤C

1

n.

2To prove this, we let f(µ) = eγµ(1 − µ) − 1 and show that f(µ) ≥ 0 for all µ ∈ [−1, 0]. Sincef(−1) = f(0) = 0, it is enough to show that f has a maximum but no minimum in (−1, 0).


Case 2: µ < −1. In this case, we have

enµ ≤ e−n ≤ 1

n

R(µ) ≤ 1

2= e−γ =⇒ Rn(µ) ≤ e−nγ ≤ C

n

and hence

|enµ −Rn(µ)| ≤ |enµ|+ |Rn(µ)| ≤ C

n.

Both cases together prove (7.39) and hence (7.37).

Chapter 8

Finite-difference methods for Asianoptions

8.1 Modelling Asian options

Asian options are options with a payoff function which depends on the mean of theunderlying:

• Arithmetic mean:

1

T

∫ T

0

St dt or1

N

N∑n=1

Sn·τ with τ =T

N

• Geometric mean:

exp

(1

T

∫ T

0

logStdt

)or exp

(1

N

N∑n=1

logSn·τ

)=

(N∏n=1

Sn·τ

)1/N

Possible payoff functions (ST denotes the mean):

Average price call: (ST −K)+

Average price put: (K − ST )+

Average strike call: (ST − ST )+

Average strike put: (ST − ST )+

For a given function f : R× R→ R we define At =∫ t

0f(θ, Sθ)dθ.

• If f(θ, Sθ) = Sθ =⇒ ATT

= arithmetic mean

• If f(θ, Sθ) = log(Sθ) =⇒ exp

(ATT

)= geometric mean

97


The value of the option depends on t, S and A, i.e. V = V (t, S, A).

Goal: Derive a PDE for V (t, S, A).Assume that St is a geometric Brownian motion with constant parameters µ, σ > 0.Obtain a system of SDEs:


dAt = f(t, St)dt+ 0 · dWt

Apply the two-dimensional Ito formula and obtain for V = V (t, St, At)

dV =

(∂tV + µSt∂SV +

σ2

2S2t ∂

2SV + f(t, St)∂AV

)dt+ σSt∂SV dWt

By mimicing the derivation of the Black-Scholes equation, we obtain

∂tV +σ2

2S2∂2

SV + rS∂SV + f(t, S)∂AV︸︷︷︸new

−rV = 0, V = V (t, S, A). (8.1)

New difficulties for the numerical solution:

• V (t, S, A) depends on three variables → space discretization in two dimensionsnecessary

• No term of the type (...)∂2AV appears (consequences: see below)

• (8.1) cannot be transformed to the heat equation.

Reduction to a one-dimensional PDE

Consider a Asian strike call with arithmetic mean:

f(t, S) = S, At =

∫ t

0

Sθdθ, V (T, S,A) =

(S − 1

TA

)+

In this special case, the PDE (8.1) can be reduced to a PDE in two variables.Ansatz: For S > 0 define R(S,A) := A

Sand assume that V (t, S, A) = S ·H(t, R) with an

unknown function H(t, R). The partial derivatives of R(S,A) = A/S are

∂SR(S,A) = − AS2

= −RS, ∂AR(S,A) =

1

S

and for V (t, S, A) = S ·H(t, R(S,A)

)we compute

• ∂tV = S · ∂tH• ∂SV = H + S · ∂RH · ∂SR = H −R · ∂RH• ∂2

SV = ∂RH · ∂SR− ∂SR · ∂RH −R · ∂2RH · ∂SR = +R2

S· ∂2

RH

• ∂AV = S · ∂RH · ∂AR = ∂RH


These equations turn (8.1) into

S · ∂tH +σ2

2S2 · R

2

S· ∂2

RH + rS(H −R · ∂RH)− rS ·H + f(t, S)︸︷︷︸=S

∂RH = 0.

Dividing both sides by S > 0 yields the PDE

∂tH +σ2

2R2∂2

RH + (1− rR)∂RH = 0 (8.2)

This PDE only depends on t and R, but not on S or A. Hence, we can determine H(t, R)by solving (8.2). Then, V (t, S, A) = S ·H(t, R) is the solution of (8.1).

• Terminal condition: H(T,R) =1

SV (T, S,A) =

1

S

(S − 1

TA

)+

=

(1− R

T

)+

.

• Asymptotic behaviour for R→∞ :Since R(S,A) = A/S and At =

∫ t0Sθdθ is bounded, we have R −→∞⇐⇒ S −→ 0.

Since V (t, 0, A) = 0 and ∂SV (t, 0, A) = 0 for a call with A > 0, we obtain

limR→∞

H(t, R) = limS→0

1

SV (t, S, A)

= limS→0

1

S

(V (t, 0, A) + S∂SV (t, 0, A) +O

(S2))

= 0.

This agrees with the terminal condition, because

H(T,R) =

(1− R

T

)+

−→ 0 for R −→∞.

• Boundary condition for R = 0:For 0 ≈ R > 0, equation (8.2) implies

∂tH(t, R) +σ2

2R2∂2

RH(t, R) + ∂RH(t, R) ≈ 0 (8.3)

(The term R2∂2RH(t, R) cannot be omitted because we do not know if ∂2

RH(t, R)remains bounded.) Assume that the solution H is bounded for R ≈ 0. Then, itfollows that

limR→0

R2∂2RH(t, R) = 0. (8.4)

Proof by contradiction: Assume that R2∂2RH(t, R) = C0 6= 0 for R ≈ 0. Integrating

∂2RH(t, R) = C0/R

2 twice yields

H(t, R) = −C0 logR + C1R + C2, C1, C2 ∈ R

and hence |H(t, R)| → ∞ for R → 0, which contradicts the assumption that H isbounded for R ≈ 0.Hence, (8.3) and (8.4) yield the boundary condition

∂tH(t, 0) + ∂RH(t, 0) = 0.


Numerical illustration: See slides.

8.2 Finite-difference methods for convection-diffusion

equations

Model problem. Consider the PDE

∂tu+ a∂xu = b∂2xu, x ∈ R, t ≥ 0 (8.5)

with a, b ≥ 0. This is the prototype of a convection-diffusion equation.

• Convection term: a∂xu

• Diffusion term: b∂2xu

Special cases:

For a = 0: ∂tu = b∂2xu heat equation (parabolic)

For b = 0: ∂tu+ a∂xu = 0 transport equation (hyperbolic)

In the second case, the solution is u(t, x) = u0(x− at), because

∂tu(t, x) + a∂xu(t, x) = u′0(x− at)(−a) + au′0(x− at) = 0

Numerical experiment: Solve (8.5) numerically for t ∈ [0, tend] and x ∈ (0, L) withhomogeneous Dirichlet boundary conditions (tend > 0, L > 0).

Space discretisation: Choose m ∈ N, define mesh-size h = L/m and mesh points xj = jhfor j = 0, . . . ,m.Goal: Compute approximations unj ≈ u(tn, xj)Central finite differences in space:

∂xu(tn, xj) ≈unj+1 − unj−1

2h, j = 1, . . . ,m− 1

∂2xu(tn, xj) ≈

unj+1 − 2unj + unj−1

h2, u0 = um = 0 (boundary condition)

Matrix-vector notation: Define un = (un1 , . . . , unm−1)T and A = −aA1+bA2 ∈ R(m−1)×(m−1)

with

A1 =1

2h

0 1

−1. . . . . .. . . . . . 1−1 0

, A2 =1

h2

−2 1

1. . . . . .. . . . . . 1

1 −2

.


Time discretisation: Choose N ∈ N, define step-size τ = tend/N and time points tn = nτfor n = 0, . . . , N .Implicit Euler method:

un+1 = un + τAun+1.

Trapezoidal rule (Crank-Nicolson method):

un+1 = un +τ

2A(un+1 + un).

Result (see slides): In the case b = 0, the approximation given by the trapezoidalrule shows erroneous oscillations and negative values if the step-size τ is too large. Thenumerical solution obtained with the implicit Euler method is strongly damped. Why?

Remark:

• Since A2 is symmetric, all eigenvalues are real and (according to Lemma 7.2.2)negative.

• Since A1 is skew-symmetric (i.e. AT1 = −A1), all eigenvalues are purely imaginary:If A1v = λv and v 6= 0, then

λv∗v = v∗A1v = −v∗AT1 v = −(A1v)∗v = −(λv)∗v = −λv∗v.

This means that λ = −λ and hence real(λ) = 0.

8.3 Analysis of numerical methods applied to the

transport equation

Consider again the PDE (8.5), but now for a > 0 and b = 0 (transport equation, hyper-bolic):

∂tu = −a∂xu, x ∈ R, t ≥ 0. (8.6)

In order to avoid technical difficulties with boundary conditions, we consider this equationand the corresponding numerical methods on the entire real line R. For a fixed k ∈ R thefunction

u(t, x) = exp(iωt− ikx) (8.7)

solves (8.6) if and only if the dispersion relation

ω = ak

holds. Is a similar result true for the numerical methods?


Implicit Euler method:

un+1j = unj −

τa

2h

(un+1j+1 − un+1

j−1

). (8.8)

Suppose that the numerical approximations have the form

unj = exp(iwtn − ikxj) = exp(iwnτ − ikjh) (8.9)

with the same k, but a perturbed w ≈ ω. Goal: Find w.

Substituting (8.9) into (8.8) and using

un+1j = eiwτunj , un+1

j+1 = eiwτe−ikhunj , un+1j−1 = eiwτeikhunj

yields

eiwτunj

(1 +

τa

2h(e−ikh − eikh)

)= unj .

With e−ikh − eikh = −2i sin(kh), this is equivalent to

eiwτ(

1− iτa

hsin(kh)

)= 1. (8.10)

Let α and β be the real and imaginary parts of w, i.e. w = α + iβ and

eiwτ = eiατe−βτ = e−βτ(

cos(ατ) + i sin(ατ)).

We substitute this into (8.10) and obtain

e−βτ(

cos(ατ) + sin(ατ)τa

hsin(kh)

)+ ie−βτ

(sin(ατ)− cos(ατ)

τa

hsin(kh)

)= 1.

Comparing the real and imaginary parts of both sides of the equation yields(cos(ατ) + sin(ατ)

τa

hsin(kh)

)= eβτ (8.11a)

e−βτ(

sin(ατ)− cos(ατ)τa

hsin(kh)

)= 0. (8.11b)

Since e−βτ > 0, it follows from (8.11b) that

tan(ατ) =sin(ατ)

cos(ατ)=τa

hsin(kh) =⇒ α =

1

τarctan

(τah

sin(kh))

(8.12)

under the assumption that ατ 6= (`+1/2)π for all ` ∈ Z. Now that α is known, we obtainβ from (8.11a):

β =1

τlog(

cos(ατ) + sin(ατ)τa

hsin(kh)

)=

1

τlog(

cos(ατ)(

1 + tan(ατ)τa

hsin(kh)

))=

1

τlog

(cos(ατ)

(1 +

τ 2a2

h2sin2(kh)

))


Interpretation: Now we know that the approximations given by the implicit Eulermethod have the form

unj = exp(i(α + iβ)tn − ikxj

)= exp(−βtn) exp

(iαtn − ikxj

). (8.13)

If β > 0, then the term exp(−βtn) tends to zero for increasing n. This explains the“damping” observed in the numerical simulations. Note that

β > 0 ⇐⇒ cos(ατ)

(1 +

τ 2a2

h2sin2(kh)

)> 1.

Next, we show that α = real(w) from (8.12) approximates the exact value ω = ak. It canbe shown that | sin(x)/x| < 1 for all x 6= 0. Hence, we know that

∣∣∣τah

sin(kh)∣∣∣ ≤ τa|k|

∣∣∣∣sin(kh)

kh

∣∣∣∣ ≤ τa|k|.

If τ < 1/(a|k|), we can thus apply the series

arctan(x) =∞∑`=0

(−1)`x2`+1

2`+ 1= x− x3

3+x5

5− . . .

and obtain with | sin(kh)/h| ≤ |k| that

α =1

τarctan

(τah

sin(kh))

=1

τ

(τah

sin(kh) +O(τ 3))

= asin(kh)

h+O

(τ 2)

= ak +O(k3h2

)+O

(τ 2).

Hence, (8.12) can be interpreted as a numerical dispersion relation which approximatesthe exact dispersion relation ω = ak.

The exact solution u(t, x) = exp(iωt− ikx) = exp(ik(at−x)) travels with constant speeda to the right, and the propagation speed is independent of k. This is not true for thenumerical solution (8.13), because its propagation speed α = α(k) given by (8.12) dependson k. Now suppose that f is a function represented by the inverse Fourier transform

f(x) =1√2π

∫ ∞−∞

eikxf(k) dk

where f is the Fourier transform of f . If we use this function as initial data for thetransport equation, then our result impies that each “part” eikx will travel at a slightlydifferent speed. This behaviour is called numerical dispersion and is the reason why thelocalised peak in the numerical example (see slides) is “smeared out” as time evolves.


Trapezoidal rule: Now we carry out a similar analysis for the trapezoidal rule

un+1j = unj −

τa

4h

(un+1j+1 − un+1

j−1 + unj+1 − unj−1

)or equivalently

un+1j +

τa

4h

(un+1j+1 − un+1

j−1

)= unj −

τa

4h

(unj+1 − unj−1

). (8.14)

As before, we suppose that the numerical approximations have the form (8.9) and substi-tute this into (8.14). This yields

eiwτunj

(1 +

τa


)= unj

(1− τa


).

With e−ikh − eikh = −2i sin(kh) and σ := τa2h

sin(kh), this is equivalent to

eiwτ (1− iσ) = (1 + iσ) .

Applying the magic formula

1 + ix

1− ix= exp(2i arctan(x))

yields eiwτ = exp(2i arctan(σ)) and, with the series representation of arctan(x),

w =2

τarctan(σ) =

2

τarctan

(τa2h

sin(kh))

=2

τ

τa

2hsin(kh) +O

(τ 2)

= asin(kh)

h+O

(τ 2)

= ak +O(k3h2

)+O

(τ 2)

for τ < 2/(a|k|). The consequence of the approximation w ≈ ω is the same as before:Different “parts” of the numerical solution travel at different speeds due to numericaldispersion. This explains the erroneous oscillations observed in the approximation givenby the trapezoidal rule. Note, however, that w is real in case of the trapezoidal rule.Hence, there is no damping term exp(−βtn).

Convection-dominated problems in mathematical finance? The phenomena an-alyzed in this section make the approximation of hyperbolic differential equations difficult.For problems in mathematical finance, however, these phenomena do only apear if theconvection part is much “stronger” than the diffusion part. When is this the case?

1. Black-Scholes equation (in forward-time):

∂tV =σ2

2S2∂2

SV + rS∂SV − rV


In the new variables

x = ln(S), w(t, x) = V (t, S)

this is equivalent to

∂tw(t, x) =σ2

2∂2xw(t, x) +

(r − σ2

2

)∂xw(t, x)− rw(t, x)

Hence, the convection part dominates if r 6≈ 0 and σ ≈ 0. This situation is not verylikely in practice.

2. PDE (8.2) for the price of an Asian option (after reduction to one dimension):

∂tH +σ2

2R2∂2

RH + (1− rR)∂RH = 0, H = H(t, R)

The convection part dominates for Rk ≈ 0 no matter which parameters σ and r arechosen. This explains why we have observed oscillations in the numerical simulationof (8.2). This effect, however, only appears if τ is rather large and σ is rather small.Here, the implicit Euler method is a good alternative, because the “smearing-out”effect is not critical.

Chapter 9

Finite-difference methods forAmerican options

9.1 Modelling American options

American options can be exercised before maturity. Mathematical model?

Properties of American options: If V AmC (t, S) is the value of an American call, V Eu

P (t, S)is the value of an European put, etc., then

(K − S)+ ≤ V AmP (t, S) ≤ K (9.1)

V EuP (t, S) ≤ V Am

P (t, S)

V EuC (t, S) = V Am

C (t, S) if no dividends are paid

V EuC (t, S) ≤ V Am

C (t, S) if dividends are paid

Proof: Proposition 2.7 and remark 2.9 in [GJ10].

Picture

In the entire chapter, we consider American puts with no dividends and drop the indices,i.e. V (t, S) = V Am

P (t, S). American calls (with dividends) can be treated in a similar way.

For every t ∈ [0, T ), there is a unique 0 ≤ S?(t) < K such that

V (t, S) > (K − S)+ for S > S?(t) =⇒ no early exercise (9.2a)

V (t, S) = (K − S)+ for S ≤ S?(t) =⇒ early exercise (9.2b)

Sketch of the proof. For S = 0 the inequalities (9.1) imply K ≤ V (t, 0) ≤ K and hence

V (t, 0) = K = (K − 0)+.

106


Since V (t, S) > 0 for t < T , it follows that

V (t, S) > 0 = (K − S)+ for S ≥ K.

A monotonicity argument yields the existence and uniqueness of 0 ≤ S?(t) < K.

For fixed t, S?(t) is called the contact point, and the function t 7→ S?(t) is called theearly-exercise curve, because the option should be exercised before time T if S ≤ S?(t).

For S ≤ S?(t), the value of the option is known. For S > S?(t) the option is not exercisedand can thus be modeled by the Black-Scholes equation

∂tV +AV = 0, AV =σ2

2S2∂2

SV + rS∂SV − rV.

If S 7→ V (t, S) is C1 for all t ∈ [0, T ), then it follows that

∂SV(t, S?(t)

)= −1.

Hence, we have to solve the free boundary value problem

∂tV (t, S) +AV (t, S) = 0 for S > S?(t), t ∈ [0, T ) (PDE) (9.3a)

V (T, S) = (K − S)+ for S ≥ 0 (terminal cond.) (9.3b)

V(t, S?(t)

)=(K − S?(t)

)+for t ∈ [0, T ) (Dirichlet b.c.) (9.3c)

∂SV(t, S?(t)

)= −1 for t ∈ [0, T ) (Neumann b.c.) (9.3d)

The boundary S?(t) changes in time and depends on the solution. Goal: Reformulate theproblem without S?(t).

We know that for S ≤ S?(t) < K, the value of the option is V (t, S) = (K−S)+ = K−S.This function, however, does not solve the Black-Scholes equation (9.3a), because then

∂tV (t, S) +AV (t, S) = ∂t(K − S)︸︷︷︸=0

+A(K − S)

=σ2

2S2 ∂2

S(K − S)︸︷︷︸=0

+rS ∂S(K − S)︸︷︷︸=−1

−r(K − S) = −rK < 0

for all S ≤ S?(t). However, the solution V (t, S) = K − S solves the Black-Scholesinequality

∂tV (t, S) +AV (t, S) ≤ 0 for S ≥ 0, t ∈ [0, T ],


and we know:

S > S?(t) ⇔ V (t, S) > (K − S)+ ⇔ ∂tV (t, S) +AV (t, S) = 0 ⇔ hold

S ≤ S?(t) ⇔ V (t, S) = (K − S)+ ⇔ ∂tV (t, S) +AV (t, S) < 0 ⇔ exercise

Hence, the free boundary value problem (9.3) is equivalent to the linear complementaryproblem(

V (t, S)− (K − S)+)(∂tV (t, S) +AV (t, S)

)= 0 S ≥ 0, t ∈ [0, T ]

−(∂tV (t, S) +AV (t, S)

)≥ 0

V (t, S)− (K − S)+ ≥ 0


V (T, S) = (K − S)+

and boundary condition

V (t, 0) = K.

9.2 Discretization

(a) Transformation to the heat equation.

As in 3.3 we use the following transformation:

x(S) = ln(S/K), θ(t) =σ2

2(T − t), c =

2r

σ2

E(θ, x) = exp

(1

2(c− 1)x+

1

4(c+ 1)2θ

)u(θ, x) =

1

KE(θ, x)V (t, S)

ψ(θ, x) = E(θ, x)(1− ex)+

The new function u(θ, x) solves the transformed complementary problem(u(θ, x)− ψ(θ, x)

)(∂θu(θ, x)− ∂2

xu(θ, x))

= 0 x ∈ R, θ ∈ [0, σ2T/2] (9.4a)

∂θu(θ, x)− ∂2xu(θ, x) ≥ 0 (9.4b)

u(θ, x)− ψ(θ, x) ≥ 0 (9.4c)

with initial condition

u(0, x) = ψ(0, x)


and “boundary conditions”

limx−→−∞

(u(θ, x)− ψ(θ, x)

)= 0

limx−→∞

u(θ, x) = 0

Proof: exercise

(b) Truncation and discretization in time and space.

Truncate the computational domain: Consider x ∈ [xmin, xmax] instead of x ∈ R.Choose 1 < m ∈ N, let h = (xmax − xmin)/m and xk = xmin + kh.Choose N ∈ N, let τ = σ2T

2Nand θn = nτ .

Goal: Compute approximation wn,k ≈ u(θn, xk).Reminder: The Crank-Nicolson-discretization of the heat equation

∂θu(θ, x)− ∂2xu(θ, x) = 0

is

0 = wn+1,k − wn,k −τ

2h2

(wn+1,k+1 − 2wn+1,k + wn+1,k−1

)− τ

2h2

(wn,k+1 − 2wn,k + wn,k−1

)for n = 0, . . . , N − 1 and k = 1, . . . ,m− 1. In vector notation:

0 =(I − τ

2h2A)wn+1 −

(I +

τ

2h2A)wn −

τ

2h2

(gn+1 + gn

)with A defined in (7.9) and

wn = (wn,1 , . . . , wn,m−1)T

gn =(ψ(θn, xmin), 0, . . . , 0

)TFor vectors (y1, . . . , yd)

T and (z1, . . . , zd)T with nonnegative entries, we have

yk · zk = 0 for all k = 1, . . . , d ⇐⇒ yT z = 0.

This motivates the discretization(wn+1 − ψn+1

)T ((I − τ

2h2A)wn+1 −

(I +

τ

2h2A)wn −

τ

2h2

(gn+1 + gn

))= 0 (9.5a)

wn+1 − ψn+1 ≥ 0 (9.5b)(I − τ

2h2A)wn+1 −

(I +

τ

2h2A)wn −

τ

2h2

(gn+1 + gn

)≥ 0 (9.5c)

with ψn =(ψ(θn, x1), . . . , ψ(θn, xm−1)

)for the transformed linear complementary problem

(9.4). This has to be solved for n = 0, . . . , N − 1.


9.3 An iterative method for linear complementary

problems

Consider the linear complementary problem

(w − v)T (Mw − b) = 0

w − v ≥ 0

Mw − b ≥ 0

with given M ∈ Rd×d, given v, b ∈ Rd and unknown w ∈ Rd.The problem (9.5) is obtained for w = wn+1, M := I − τ

2h2A, v = ψn+1 and

b =(I +

τ

2h2A)wn +

τ

2h2

(gn+1 + gn

).

Numerical method?

(a) Iterative methods for linear systems

First, consider only the linear system Mw = b with M = (Mij)ij ∈ Rd×d. Insteadof direct methods (e.g. Gauss elimination) we consider iterative methods based on thedecomposition

M = D − L− U

with

D =

M11 0 · · · 0

0. . .

. . ....

.... . .

. . . 00 · · · 0 Mdd

, L = −

0 · · · · · · 0

M21. . .

......

. . .. . .

...Md1 · · · Md,d−1 0

,

U = −

0 M12 · · · M1d

0. . .

. . ....

.... . . Md−1,d

0 · · · 0 0

Assume that Dkk = Mkk > 0. This is the case, e.g., if M is symmetric and positivedefinite. By definition:

Mw = b ⇐⇒ Dw = (L+ U)w + b

Idea: Turn this into a fixed-point iteration. This yields the Jacobi iteration

w(j+1) = D−1(

(L+ U)w(j) + b), j = 0, 1, 2, . . .


Hope that the sequence (w(j))j∈N0 converges to a fixed-point.

Often a better convergence rate is observed with the Gauss-Seidel iteration:

w(j+1) = D−1(Lw(j+1) + Uw(j) + b

), j = 0, 1, 2, . . .

This method seems to be implicit, since w(j+1) appears on the right-hand side. A closerlook reveals, however that the Gauss-Seidel iteration is explicit because the entries ofw(j+1) can be computed one after the other:

w(j+1)1 =

1

D11

(0 +

d∑k=2

U1kw(j)k + b1

)w

(j+1)2 =

1

D22

(L21w

(j+1)1 +

d∑k=3

U2kw(j)k + b2

)w

(j+1)i =

1

Dii

( i−1∑k=1

Likw(j+1)k +

d∑k=i+1

Uikw(j)k + bi

)i = 1, . . . , d.

Generalization: Successive overrelaxation method (SOR method):

For j = 0, 1, 2, . . .For k = 1, . . . , d

w(j)k =

1

Dkk

[Lw(j+1) + Uw(j) + b

]k

k-th entry

w(j+1)k = w

(j)k + r(w

(j)k − w

(j)k )

end

end

with relaxation parameter r ∈ R. If r = 1, then w(j+1) = w(j), and we obtain the Gauss-Seidel method. For r ∈ (0, 1), w(j+1) is an interpolation between w(j) and w(j).In practice, however, typically r > 1 is chosen.

(b) The projected SOR method for linear complementary prob-lems

Back to the linear complementary problem:

(w − v)T (Mw − b) = 0 (9.6a)

w − v ≥ 0 (9.6b)

Mw − b ≥ 0 (9.6c)


In general, we cannot expect that Mw = b. There are some entries k where [Mw−b]k > 0.The complementary problem is equivalent to

min

[Mw − b]k, [w − v]k

= 0 for all k = 1, . . . , d

⇐⇒ min[

D−1((D − L− U)w − b

)︸︷︷︸w−D−1

((L+U)w+b

)]k, [w − v]k

= 0 for all k = 1, . . . , d

⇐⇒ max[D−1

((L+ U)w + b

)]k, vk

= wk for all k = 1, . . . , d

This motivates the projected SOR method:

For j = 0, 1, 2, . . .For k = 1, . . . , d

w(j)k =

1

Dkk

[Lw(j+1) + Uw(j) + b

]k

w(j+1)k = max

w

(j)k + r(w

(j)k − w

(j)k ) , vk

end

end

Theorem 9.3.1 (Cryer) Let v, b ∈ Rd, r ∈ (1, 2) and assume that M ∈ Rd×d is sym-metric and positive definite. Then, the linear complementary problem (9.6) has a uniquesolution, and the iterates w(j) of the projected SOR method converge to the solution.

The proof of uniqueness of the solution will be based on the following

Lemma 9.3.2 If M ∈ Rd×d is symmetric and positive definite, then the linear comple-mentary problem (9.6) is equivalent to the problem to find a w ∈ Rd such that

w ≥ v and J(w) ≤ J(y) for all y ≥ v (9.7)

where J(y) is the functional J(y) = 12yTMy − bTy.

Proof. Let w be a solution of (9.6) and let y ≥ v. Then, we have

J(y)− J(w) =1

2(y − w)TM(y − w)︸︷︷︸

≥0

+(y − w)T (Mw − b) ≥ (y − w)T (Mw − b)

because M is positive definite. Since y ≥ v by assumption and w is the solution of (9.6),we have

(y − w)T (Mw − b) = (y − v)T︸︷︷︸≥0

(Mw − b)︸︷︷︸≥0

− (w − v)T (Mw − b)︸︷︷︸=0

≥ 0


which proves that J(y) ≥ J(w). Conversely, assume that w is the solution of (9.7). Thismeans, in particular, that w ≥ v. We show that Mw−b ≥ 0 holds, too. Define y = w+εekfor ε > 0, where ek is the k-th unit vector. Then clearly y ≥ w ≥ v and

0 ≤ J(y)− J(w) =ε2

2eTkMek + εeTk (Mw − b) =

ε2

2Mkk + ε(Mw − b)k.

Dividing by ε and letting ε→ 0 yields (Mw−b)k ≥ 0 for arbitrary k and henceMw−b ≥ 0.It remains to show that (Mw − b)T (w − v) = 0. Assume that there is a k such that both

(Mw − b)k > 0 and (w − v)k > 0. (9.8)

If we choose ε > 0 so small that y := w − εek ≥ v, it follows that for sufficiently small ε

0 ≤ J(y)− J(w) =ε2

2Mkk − ε (Mw − b)k︸︷︷︸

>0

< 0

which yields a contradiction. For every k we thus have either (Mw−b)k = 0 or (w−v)k = 0and hence (Mw − b)T (w − v) = 0.

Proof of Theorem 9.3.1. First we show uniqueness of the solution of the linearcomplementary problem. Assume that w and w are both solutions of (9.6). Then

0 = J(w)− J(w)

=1

2(w − w)TM(w − w) + (w − w)T (Mw − b)

=1

2(w − w)TM(w − w) + (w − v)T︸︷︷︸

≥0

(Mw − b)︸︷︷︸≥0

− (w − v)T (Mw − b)︸︷︷︸=0

≥ 1

2(w − w)TM(w − w) ≥ 0

and hence w = w because M is positive definite.Next, we prove that the iterates w(j) of the SOR method converge to a solution of the min-imization problem (9.7). This implies existence of a solution of the linear complementaryproblem (9.6).

Step 1: For all j ∈ N0 and k = 1, . . . , d it can be shown that there is a rjk ∈ [0, r] suchthat

w(j+1)k = w

(j)k + rjk

(w

(j)k − w

(j)k

). (9.9)

Details: See p. 212 in [GJ10].


Step 2: Let w(j,k) be the vector obtained with the projected SOR method for givennumbers j and k. If k < d, then only the entries with indices 1, . . . , k have been updatedin the inner loop:

w(j,k) =(w

(j+1)1 , . . . , w

(j+1)k , w

(j)k+1, . . . , w

(j)d

)Tfor k ∈ 1, . . . , d− 1

w(j,0) :=(w

(j)1 , . . . , w

(j)d

)T= w(j) (no update yet in step j)

w(j,d) :=(w

(j+1)1 , . . . , w

(j+1)d

)T= w(j+1) (all updates in step j completed)

Show that the sequence J(w(j,k)) converges when the indices are changed in the followingorder

(j, k) = (0, 1), (0, 2), . . . (0, d), (1, 1), (1, 2), . . . , (1, d), (2, 1), (2, 2), . . . . (9.10)

By definition, we have that

w(j,k) − w(j,k−1) =(w

(j+1)k − w(j)

k

)ek (k = 1, . . . , d). (9.11)

Since only the first k−1 entries of w(j,k−1) have been updated, it follows from the definitionof w

(j)k in the projected SOR method that(

Mw(j,k−1) − b)k

=(Dw(j,k−1) − Lw(j,k−1) − Uw(j,k−1) − b

)k

=(Dw(j) − Lw(j+1) − Uw(j) − b

)k

= Dkk

(w

(j)k − w

(j)k

)= Mkk

(w

(j)k − w

(j)k

)(9.12)

For rjk > 0 we thus obtain from (9.11) and (9.12) that

J(w(j,k))− J(w(j,k−1))

=1

2

(w(j,k) − w(j,k−1)︸︷︷︸

(9.11)

)TM(w(j,k) − w(j,k−1)︸︷︷︸

(9.11)

)+(w(j,k) − w(j,k−1)︸︷︷︸

(9.11)

)T (Mw(j,k−1) − b

)=

1

2

(w

(j+1)k − w(j)

k

)2

eTkMek︸︷︷︸=Mkk

+(w

(j+1)k − w(j)

k

)eTk(Mw(j,k−1) − b

)︸︷︷︸(9.12)

=Mkk

2

(w

(j+1)k − w(j)

k

)2

+(w

(j+1)k − w(j)

k

)Mkk

(w

(j)k − w

(j)k

).

Equation (9.9) implies

w(j)k − w

(j)k =

(w

(j)k − w

(j+1)k

)/rjk.


Since r < 2 by assumption, this yields

J(w(j,k))− J(w(j,k−1)) = −Mkk

2

(2

rjk− 1

)(w

(j+1)k − w(j)

k

)2

(9.13)

≤ −Mkk

2

(2

r− 1

)(w

(j+1)k − w(j)

k

)2

≤ 0.

If rjk = 0, then w(j+1)k = w

(j)k and hence w(j,k) = w(j,k−1). Hence, the sequence (J(w(j,k)))j,k

is monotonically decreasing. Next, we show that J(y) is bounded from below. Let λmin

be the smallest eigenvalue of M . Since M is symmetric and positive definite, it followsthat λmin > 0 and yTMy ≥ λmin‖y‖2. Together with the Cauchy-Schwarz inequality, thisyields that

J(y) =1

2yTMy − bTy ≥ λmin

2‖y‖2 − ‖b‖ · ‖y‖ =

1

2λmin

(λmin‖y‖ − ‖b‖)2 − ‖b‖2

2λmin

≥ − ‖b‖2

2λmin

.

As a consequence, the sequence (J(w(j,k)))j,k converges.

Step 3: We show that (w(j)k )j converges, too. If rjk > 0, then (9.13) implies∣∣∣w(j+1)

k − w(j)k

∣∣∣2 =2rjk

Mkk(rjk − 2)

(J(w(j,k))− J(w(j,k−1))

)︸︷︷︸≤0

=2rjk

Mkk(2− rjk)(J(w(j,k−1))− J(w(j,k))

)︸︷︷︸≥0

≤ 2r

2− rmink

1

Mkk

(J(w(j,k−1))− J(w(j,k))

)−→ 0

in the limit specified by (9.10), because rjk ≤ r by step 1 and r < 2 by assumption. This

means that (w(j)k )j is a Cauchy sequence for every k, and hence the limit

wk := limj→∞

w(j)k

exists for every k.

Step 4: We show that w = (w1, . . . , wd)T solves the linear complementary problem

(9.6). We have

wk := limj→∞

w(j)k = M−1

kk (Lw + Uw + b)k = wk −M−1kk (Mw − b)k

wk = maxwk + r(wk − wk), vk = maxwk − rM−1

kk (Mw − b)k, vk.


This yields

minrM−1

kk (Mw − b)k, wk − vk

= 0

which is equivalent to (9.6).

9.4 Summary: Pricing American options with the

projected SOR method

• Start: Free boundary problem (9.3) with solution V (t, S).

• Reformulation as a linear complementary problem.

• Transformation:

V (t, S) −→ u(θ, x), Black-Scholes inequality −→ heat inequality

=⇒ transformed linear complementary problem (9.4).

• Truncation: Restrict x ∈ R to x ∈ [xmin, xmax], choose boundary conditions.

• Discretize time and space: wn,k ≈ u(θn, xk).

• Algorithm:For n = 0, 1, . . . , N − 1 (time points)

Solve linear complementary problem (9.5) with the

projected SOR method

For j = 0, 1, 2, . . . (iteration number)

For k = 1, . . . ,m− 1 (entry number)

...

end

end

end

• Transform back.

— FIN —

Appendix A

Some definitions from probabilitytheory

Definition A.0.1 (Probability space) The triple (Ω,F ,P) is called a probabilityspace, if the following holds:

1. Ω 6= ∅ is a set, and F is a σ-algebra (or σ-field) on Ω, i.e. a family of subsets ofΩ with the following properties:

• ∅ ∈ F• If F ∈ F , then Ω \ F ∈ F

• If Fi ∈ F for all i ∈ N, then∞⋃i=1

Fi ∈ F

The pair (Ω,F) is called a measurable space.

2. P : F −→ [0, 1] is a probability measure, i.e.

• P(∅) = 0 and P(Ω) = 1

• If Fi ∈ F for all i ∈ N are pairwise disjoint (i.e. Fi ∩ Fj = ∅ for i 6= j), then

P

(∞⋃i=1

Fi

)=∞∑i=1

P(Fi).

Definition A.0.2 (Borel σ-algebra) If U is a family of subsets of Ω, then the σ-algebra generated by U is

FU =⋂F : F is a σ-algebra of Ω and U ⊂ F.

If U is the collection of all open subsets of a topological space Ω (e.g. Ω = Rd), thenB = FU is called the Borel σ-algebra on Ω. The elements B ∈ B are called Borel sets.

For the rest of this section (Ω,F ,P) is a probability space.

117


Definition A.0.3 (Measurable functions, random variables)

• A function X : Ω −→ Rd is called F-measurable if

X−1(B) := ω ∈ Ω : X(ω) ∈ B ∈ F

for all Borel sets B ∈ B. If (Ω,F ,P) is a probability space, then every F-measurablefunctions is called a random variable.

• Random variables X1, . . . , Xn are called independent if

P

(n⋂i=1

X−1i (Ai)

)=

n∏i=1

P(X−1i (Ai)

)for all A1, . . . , An ∈ B.

• If X : Ω −→ Rd is any function, then the σ-algebra generated by X is thesmallest σ-algebra on Ω containing all the sets

X−1(B) for all B ∈ B.

Notation: FX = σXFX is the smallest σ-algebra where X is measurable.

Appendix B

The Ito integral

Let (Ω,F ,P) be a complete probability space.

B.1 The Wiener process

Robert Brown 1827, Louis Bachelier 1900, Albert Einstein 1905, Norbert Wiener 1923

Definition B.1.1 (Normal distribution) A random variable X : Ω −→ Rd with d ∈ Nis normal if it has a multivariate normal (Gaussian) distribution with mean µ ∈ Rd

and a symmetric, positive definite covariance matrix Σ ∈ Rd×d, i.e.

P(X ∈ B) =

∫B

1√(2π)d det(Σ)

exp(− 1

2(x− µ)TΣ−1(x− µ)

)dx

for all Borel sets B ⊂ Rd. Notation: X ∼ N (µ,Σ)

Remarks:

1. If X ∼ N (µ,Σ), then E(X) = µ and Σ = (σij) with σij = E[(Xi − µi)(Xj − µj)].2. Standard normal distribution ⇔ µ = 0, Σ = I (identity matrix).

3. If X ∼ N (µ,Σ) and Y = v + TX for some v ∈ Rd and a regular matrix T ∈ Rd×d,then

Y ∼ N (v + Tµ, TΣT T ). (B.1)

Definition B.1.2 (Wiener process, Brownian motion)(a) A continuous-time stochastic process Wt : t ∈ [0, T ) is called a standard Brown-ian motion or standard Wiener process if it has the following properties:

1. W0 = 0 (with probability one)

119


2. Independent increments: For all 0 ≤ t1 < t2 < . . . < tn < T the random variables

Wt2 −Wt1 , Wt3 −Wt2 , . . . , Wtn −Wtn−1

are independent.

3. Wt −Ws ∼ N (0, t− s) for any 0 ≤ s < t < T .

4. There is a Ω ⊂ Ω with P(Ω) = 1 such that t 7→ Wt(ω) is continuous for all ω ∈ Ω.

(b) If W(1)t , . . . ,W

(d)t are independent one-dimensional Wiener processes, then Wt =(

W(1)t , . . . ,W

(d)t

)is called a d-dimensional Wiener process, and

Wt −Ws ∼ N (0, (t− s)I).

Remark: This process will serve as the “source of randomness” in our model of thefinancial market.

Notation: Wt = Wt(ω) = W (t, ω) = W (t)

Numerical simulation of a Wiener process (d=1). Choose step-size τ > 0, puttn = n · τ and W0 = 0.

For n = 0, 1, 2, 3, . . .Generate random number Zn ∼ N (0, 1)

Wn+1 = Wn +√τZn

For τ −→ 0 the interpolation of W0, W1, W2, . . . approximates a path of the Wienerprocess.

How smooth is a path of a Wiener process? Consider only d = 1.

Holder continuity and non-differentiability

A function f : (a, b) −→ R is Holder continuous of order α for some α ∈ [0, 1] if thereis a constant C such that

|f(t)− f(s)| ≤ C|t− s|α for all s, t ∈ (a, b).

If α = 1, then f is Lipschitz continuous.If α > 0, then f is uniformly continuous.If α = 0, then f is bounded.

A path of the Wiener process on a bounded interval is Holder continuous of order α ∈ [0, 12)

with probability one.For α ≥ 1

2, however, the path is not Holder continuous of order α with probability one.

In particular, a path of the Wiener process is nowhere differentiable with probability one.

Proofs: [Ste01], chapter 5


Unbounded total variation

Let [a, b] be an interval and let

PN = (tn)Nn=0, a = t0 < t1 < . . . < tN = b

be a partition of [a, b] with |PN | = maxn |tn − tn−1|.Example: equidistant partition, τ = (b− a)/N , tn = a+ n · τ .The total variation of a function f : (a, b) −→ R is

TVa,b(f) = limN −→∞|PN | −→ 0

N∑n=1

|f(tn)− f(tn−1)|. (B.2)

If f is differentiable and f ′ is integrable, then it can be shown that

TVa,b(f) =

b∫a

|f ′(t)| dt

Conversely: If a function f has bounded total variation, then its derivative exists foralmost all x ∈ [a, b].Consequence: A path of the Wiener process has unbounded total variation with proba-bility one.

B.2 Construction of the Ito integral

References: [KP99, Øks03, Shr04, Ste01]

The model considered in 1.5 is clearly too simple: only two discrete times, only twopossible prices of S(T ).

Goal: Construct a more realistic model for the dynamics of S(t).

Naıve Ansatz:

dX

dt= f(t,X)︸︷︷︸

ordinary differentialequation

+ g(t,X)Z(t)︸︷︷︸random noise

, Z(t) = ?

Apply explicit Euler method: Choose t ≥ 0 and N ∈ N, let τ = t/N , tn = n · τ and defineapproximations Xn ≈ X(tn) by

Xn+1 = Xn + τf(tn, Xn) + τg(tn, Xn)Z(tn) (n = 0, 1, 2, . . .).

In the special case f(t,X) = 0 and g(t,X) = 1, we want that Xn = W (tn) is the Wienerprocess, i.e. we postulate that

W (tn+1)!

= W (tn) + τZ(tn).


This yields

Xn+1 = Xn + τf(tn, Xn) + g(tn, Xn)(W (tn+1)−W (tn)

)and after N steps

XN = X0 + τ

N−1∑n=0

f(tn, Xn) +N−1∑n=0

g(tn, Xn)(W (tn+1)−W (tn)

). (B.3)

Keep t fixed, let N −→∞, τ = t/N −→ 0. Then, (B.3) should somehow converge to

X(t) = X(0) +

t∫0

f(s,X(s)) ds+

t∫0

g(s,X(s))dW (s)

︸︷︷︸(?)

. (B.4)

Problem: We cannot define (?) as a pathwise Riemann-Stieltjes integral! When N −→∞, the sum

N−1∑n=0

g(tn, Xn(ω))(W (tn+1, ω)−W (tn, ω)

)diverges with probability one, because a path of the Wiener process has unbounded totalvariation with probability one.

New goal: Define the integral

It[u](ω) =

t∫0

u(s, ω) dWs(ω)

in a “reasonable” way for the following class of functions.

Definition B.2.1 Let (Ω,F ,P) be a probability space, and let Ft : t ∈ [0, T ] be thestandard Brownian filtration. Then, we define H2[0, T ] to be the class of functions

u = u(t, ω), u : [0, T ]× Ω −→ R

with the following properties:

• (t, ω) 7→ u(t, ω) is (B × F)-measurable.

• u is adapted to Ft : t ∈ [0, T ], i.e. u(t, ·) is Ft-measurable.

• E

T∫0

u2(t, ω) dt

<∞


Step 1: Ito integral for elementary functions

Definition B.2.2 (Elementary functions) A function φ ∈ H2[0, T ] is called elemen-tary if it is a stochastic step function of the form

φ(t, ω) = a0(ω)1[0,0](t) +N−1∑n=0

an(ω)1(tn,tn+1](t)

= a0(ω)1[0,t1](t) +N−1∑n=1

an(ω)1(tn,tn+1](t)

with a partition 0 = t0 < t1 < . . . < tN−1 < tN = T . The random variables an must beFtn-measurable with E(a2

n) <∞. Here and below,

1[c,d](t) =

1 if t ∈ [c, d]

0 else(B.5)

is the indicator function of an interval [c, d].

Skizze

For 0 ≤ c < d ≤ T , the only reasonable way to define the Ito integral of an indicatorfunction 1(c,d] is

IT [1(c,d]](ω) =

T∫0

1(c,d](s) dW (s, ω) =

d∫c

dW (s, ω) = W (d, ω)−W (c, ω).

Hence, by linearity, we define the Ito integral of an elementary function by

IT [φ](ω) =N−1∑n=0

an(ω)(W (tn+1, ω)−W (tn, ω)

).

Lemma B.2.3 (Ito isometry for elementary functions) For all elementary functionswe have

E(IT [φ]2

)= E

T∫0

φ2(t, ω) dt

or equivalently

‖IT [φ]‖L2(dP) = ‖φ‖L2(dt×dP)

with

‖φ‖L2(dt×dP) =

∫Ω

T∫0

φ2(t, ω) dt dP

12

=

E

T∫0

φ2(t, ω) dt

12

.


Proof. Since

φ2(t, ω) = a20(ω)1[0,0](t) +

N−1∑n=0

a2n(ω)1(tn,tn+1](t)

we obtain

E

T∫0

φ2(t, ω) dt

=N−1∑n=0

E(a2n

)(tn+1 − tn) (B.6)

for the right-hand side. If we let ∆Wn = W (tn+1)−W (tn), then

IT [φ]2 =

(N−1∑n=0

an∆Wn

)2

=N−1∑n=0

N−1∑m=0

anam∆Wn∆Wm. (B.7)

By definition, the Wiener process has independent increments with E(∆Wn) = 0 andE(∆W 2

n) = V(∆Wn) = tn+1 − tn. If n > m, then aman∆Wm is Ftn-measurable, and since∆Wn is independent of Ftn , it follows that

E (anam∆Wn∆Wm) =

E (anam∆Wm)E (∆Wn) = 0 if n 6= m

E (a2n) (tn+1 − tn) if n = m.

Hence, taking the expectation of (B.7) gives

E(IT [φ]2

)=

N−1∑n=0

E(a2n

)(tn+1 − tn). (B.8)

Comparing (B.6) and (B.8) yields the assertion.

Step 2: Ito integral on H2[0, T ]

Lemma B.2.4 For any u ∈ H2[0, T ] there is a sequence (φk)k∈N of elementary functionsφk ∈ H2[0, T ] such that

limk→∞‖u− φk‖L2(dt×dP) = 0

Proof: Section 6.6 in [Ste01].

Let u ∈ H2[0, T ] and let (φk)k∈N be elementary functions such that

u = limk→∞

φk in L2(dt× dP)


as in Lemma B.2.4. The linearity of IT [·] and Lemma B.2.3 yield

‖IT [φj]− IT [φk]‖L2(dP) = ‖IT [φj − φk]‖L2(dP) = ‖φj − φk‖L2(dt×dP) −→ 0

for j, k −→∞. Hence, (IT [φk])k is a Cauchy sequence in the Hilbert space L2(dP). Thus,(IT [φk])k converges in L2(dP), and we can define

IT [u] = limk→∞IT [φk].

The choice of the sequence does not matter: If (ψk)k∈N are elementary functions withu = limk→∞ ψk in L2(dt× dP), then by Lemma B.2.3 we obtain for k −→∞

‖IT [φk]− IT [ψk]‖L2(dP) = ‖IT [φk − ψk]‖L2(dP)

= ‖φk − ψk‖L2(dt×dP)

≤ ‖φk − u‖L2(dt×dP) + ‖u− ψk‖L2(dt×dP) −→ 0.

Theorem B.2.5 (Ito isometry) For all u ∈ H2[0, T ] we have

‖IT [u]‖L2(dP) = ‖u‖L2(dt×dP).

Proof: Exercise.

Step 3: The Ito integral as a process

So far we have defined the Ito integral IT [u](ω) over the interval [0, T ] for fixed T . Forapplications in mathematical finance, however, we want to consider

It[u](ω) : t ∈ [0, T ]

as a stochastic process.

If u(s, ω) ∈ H2[0, T ], then 1[0,t](s)u(s, ω) ∈ H2[0, T ]. Can we define It[u](ω) byIT [1[0,t]u](ω)?

Theorem B.2.6 For any u ∈ H2[0, T ] there is a process Xt : t ∈ [0, T ] that is acontinuous martingale with repsect to the standard Brownian filtration Ft such that theevent

ω ∈ Ω : Xt(ω) = IT [1[0,t]u](ω)

has probability one for each t ∈ [0, T ].

A proof can be found in [Ste01], Theorem 6.2, pages 83-84.


Step 4: The Ito integral on L2loc[0, T ]

So far we have defined the Ito integral for functions u ∈ H2[0, T ]; cf. Definition B.2.1.Such functions must satisfy

E

T∫0

u2(t, ω) dt

<∞, (B.9)

and this condition is sometimes too restrictive. With some more work, the Ito integralcan be extended to all functions

u = u(t, ω), u : [0, T ]× Ω −→ R

with the following properties:

• (t, ω) 7→ u(t, ω) is (B × F)-measurable.

• u is adapted to Ft : t ∈ [0, T ].

• P

T∫0

u2(t, ω) dt <∞

= 1

This class is called L2loc[0, T ]. The first two conditions are the same as for H2[0, T ], but

the third condition is weaker than (B.9). If y : R −→ R is continuous, then u(t, ω) =y(W (t, ω)

)∈ L2

loc[0, T ], because ω 7→ y(W (t, ω)

)is continuous with probability one and

hence bounded on [0, T ].

Details: Chapter 7 in [Ste01].

Notation

The process X constructed above is called the Ito integral (Ito Kiyoshi 1944) of u ∈L2loc[0, T ] and is denoted by

X(t, ω) =

t∫0

u(s, ω) dW (s, ω).

The Ito integral over an arbitrary interval [a, b] ⊂ [0, T ] is defined by

b∫a

u(s, ω) dW (s, ω) =

b∫0

u(s, ω) dW (s, ω)−a∫

0

u(s, ω) dW (s, ω).

Alternative notations:

b∫a

u(s, ω) dW (s, ω) =

b∫a

u(s, ω) dWs(ω) =

b∫a

us(ω) dWs(ω) =

b∫a

us dWs


Properties of the Ito integral

Lemma B.2.7 Let c ∈ R and u, v ∈ L2loc[0, T ]. The Ito integral on [a, b] ⊂ [0, T ] has the

following properties:

1. Linearity:

b∫a

(cu(s, ω) + v(s, ω)

)dWs(ω) = c

b∫a

u(s, ω) dWs(ω) +

b∫a

v(s, ω) dWs(ω)

with probability one.

2. E

b∫a

u(s, ω) dWs(ω)

= 0

3.

t∫a

u(s, ω) dWs(ω) is Ft-measurable for t ≥ a.

4. Ito isometry on [a, b]:

E

( b∫a

u(s, ω) dWs(ω)

)2 = E

b∫a

u2(s, ω) ds

(cf. Theorem B.2.5).

5. Martingale property: The Ito integral

X(t, ω) =

t∫0

u(s, ω) dW (s, ω).

of a function u ∈ H2[0, T ] is a continuous martingale with respect to the standardBrownian filtration; cf. Theorem 2.3.4. If u ∈ L2

loc[0, T ], then the Ito integral is onlya local martingale; cf. Proposition 7.7 in [Ste01].

The first four properties can be shown by considering elementary functions and passingto the limit.


B.3 Sketch of the proof of the Ito formula (Theorem

2.2.2).

• Equation (2.2) is the shorthand notation for

Yt = Y0 +

t∫0

(∂tF (s,Xs) + ∂xF (s,Xs) · f(s,Xs) +

1

2∂2xF (s,Xs) · g2(s,Xs)

)ds

+

t∫0

∂xF (s,Xs) · g(s,Xs)dWs

Assume that F is twice continuously differentiable with bounded partial derivatives.(Otherwise F can be approximated by such functions with uniform convergence oncompact subsets of [0,∞)× R.)Assume that (t, ω) 7→ f(t,Xt(ω)) and (t, ω) 7→ g(t,Xt(ω)) are elementary functions.(Otherwise approximate by elementary functions.) Hence, there is a partition 0 =t0 < t1 < . . . < tN = t such that

f(t,Xt(ω)) = f(0, X0(ω))1[0,t1](t) +N−1∑n=1

f(tn, Xtn(ω))1(tn,tn+1](t)

and the same equation with f replaced by g.

• Notation: For the rest of the proof, we define

f (n) := f(tn, Xtn), F (n) := F (tn, Xtn),

g(n) := g(tn, Xtn), ∂tF(n) := ∂tF (tn, Xtn)

and so on, and

∆tn = tn+1 − tn, ∆Xn = Xtn+1 −Xtn , ∆Wn = Wtn+1 −Wtn .

Since f and g are elementary functions, we have

Xtn = X0 +

tn∫0

f(s,Xs) ds+

tn∫0

g(s,Xs) dWs

Xtn = X0 +n−1∑k=0

f(tk, Xtk)︸︷︷︸f (k)

∆tk +n−1∑k=0

g(tk, Xtk)︸︷︷︸g(k)

∆Wk.

and hence

∆Xn = Xtn+1 −Xtn = f (n)∆tn + g(n)∆Wn.


• Telescoping sum:

Yt = YtN = Y0 +N−1∑n=0

(Ytn+1 − Ytn) = Y0 +N−1∑n=0

(F (n+1) − F (n)

)Apply Taylor’s theorem:

F (n+1) − F (n)

= ∂tF(n) ·∆tn + ∂xF

(n) ·∆Xn +1

2∂2t F

(n) · (∆tn)2 + ∂t∂xF(n) ·∆tn∆Xn

+1

2∂2xF

(n) · (∆Xn)2 +Rn(∆tn,∆Xn)

with a remainder term Rn. Insert this into the telescoping sum.

• Consider the limit N −→∞, ∆tn −→ 0 with respect to ‖ · ‖L2(dP). For the first twoterms, this yields

limN→∞

N−1∑n=0

∂tF(n) ·∆tn = lim

N→∞

N−1∑n=0

∂tF (tn, Xtn) ·∆tn =

t∫0

∂tF (s,Xs) ds

and

limN→∞

N−1∑n=0

∂xF(n) ·∆Xn

= limN→∞

N−1∑n=0

∂xF(n) · f (n)∆tn + lim

N→∞

N−1∑n=0

∂xF(n) · g(n)∆Wn

=

t∫0

∂xF (s,Xs) · f(s,Xs) ds+

t∫0

∂xF (s,Xs) · g(s,Xs) dWs.

• Next, we investigate the “∂2xF

(n) term”. Since

(∆Xn)2 =(f (n)∆tn + g(n)∆Wn

)2

we have

1

2

N−1∑n=0

∂2xF

(n) · (∆Xn)2 =1

2

N−1∑n=0

∂2xF

(n) ·(f (n)

)2(∆tn)2 (B.10)

+N−1∑n=0

∂2xF

(n) · f (n)g(n)∆tn∆Wn (B.11)

+1

2

N−1∑n=0

∂2xF

(n) ·(g(n))2

(∆Wn)2. (B.12)


For the right-hand side of (B.10), we obtain∥∥∥∥∥N−1∑n=0

∂2xF

(n) ·(f (n)

)2(∆tn)2

∥∥∥∥∥2

L2(dP)

=E

(N−1∑n=0

∂2xF

(n) ·(f (n)

)2(∆tn)2

)2 −→ 0.

With the abbreviation α(n) := ∂2xF

(n) · f (n)g(n) we obtain for the right-hand side of(B.11) that∥∥∥∥∥

N−1∑n=0

α(n)∆tn∆Wn

∥∥∥∥∥2

L2(dP)

= E

(N−1∑n=0

α(n)∆tn∆Wn

)2

=N−1∑n=0

N−1∑m=0

E(α(n)α(m)∆Wn∆Wm

)∆tn∆tm.

Since

E(α(n)α(m)∆Wn∆Wm

)= E

(α(n)α(m)∆Wn

)E (∆Wm)︸︷︷︸

=0

= 0

for n < m and similar for m < n, only the terms with n = m have to be considered,which yields∥∥∥∥∥

N−1∑n=0

α(n)∆tn∆Wn

∥∥∥∥∥2

L2(dP)

=N−1∑n=0

E((α(n))2

)(∆tn)2 E

[(∆Wn)2

]︸︷︷︸=∆tn

−→ 0.

The third term (B.12), however, has a non-zero limit: We show that

limN→∞

1

2

N−1∑n=0

∂2xF

(n) ·(g(n))2

(∆Wn)2 =1

2

t∫0

∂2xF (s,Xs) ·

(g(s,Xs)

)2ds

which yields the strange additional term in the Ito formula. With the abbreviation

β(n) = 12∂2xF

(n) ·(g(n))2

we have∥∥∥∥∥N−1∑n=0

β(n)((∆Wn)2 −∆tn

)∥∥∥∥∥2

L2(dP)

= E

(N−1∑n=0

β(n)((∆Wn)2 −∆tn

))2

= E

[N−1∑n=0

N−1∑m=0

β(n)β(m)((∆Wn)2 −∆tn

) ((∆Wm)2 −∆tm

)].


For n < m we have

E[β(n)β(m)

((∆Wn)2 −∆tn

) ((∆Wm)2 −∆tm

)]= E

[β(n)β(m)

((∆Wn)2 −∆tn

)]E[(

(∆Wm)2 −∆tm)]︸︷︷︸

=0

= 0

and vice versa for n > m. Hence, only the terms with n = m have to be considered,and we obtain∥∥∥∥∥

N−1∑n=0

β(n)((∆Wn)2 −∆tn

)∥∥∥∥∥2

L2(dP)

= E

[N−1∑n=0

(β(n)

)2 ((∆Wn)2 −∆tn

)2

]

=N−1∑n=0

E[(β(n)

)2]E[(

(∆Wn)2 −∆tn)2]

︸︷︷︸−→0

according to Exercise 5.

• With essentially the same arguments, it can be shown that

limN→∞

1

2

N−1∑n=0

∂2t F

(n) · (∆tn)2 = 0

limN→∞

N−1∑n=0

∂t∂xF(n) ·∆tn∆Xn = 0

and that the remainder term from the Taylor expansion can be neglected when thelimit is taken.

Bibliography

[Ben04] Fred Espen Benth. Option theory with stochastic analysis : an introduction tomathematical finance. Springer, 2004.

[BK04] N. H. Bingham and Rudiger Kiesel. Risk-neutral valuation. Pricing and hedgingof financial derivatives. Springer Finance. Springer, London, 4nd ed. edition,2004.

[CT04] Rama Cont and Peter Tankov. Financial modelling with jump processes. CRCFinancial Mathematics Series. Chapman & Hall, Boca Raton, FL, 2004.

[FK08] Helmut Fischer and Helmut Kaul. Mathematik fur Physiker. Band 2: Gewohn-liche und partielle Differentialgleichungen, mathematische Grundlagen derQuantenmechanik. Teubner Studienbucher Mathematik. Teubner, Stuttgart,3rd revised ed. edition, 2008.

[GJ10] Michael Gunther and Ansgar Jungel. Finanzderivate mit MATLAB. Mathe-matische Modellierung und numerische Simulation. Vieweg, Wiesbaden, 2nded. edition, 2010.

[HB09] Martin Hanke-Bourgeois. Grundlagen der numerischen Mathematik und deswissenschaftlichen Rechnens. Vieweg+Teubner, Wiesbaden, 3rd revised ed.edition, 2009.

[Hig02] Desmond J. Higham. Nine ways to implement the binomial method for optionvaluation in MATLAB. SIAM Rev., 44(4):661–677, 2002.

[HNW10] Ernst Hairer, Syvert P. Nørsett, and Gerhard Wanner. Solving ordinary differ-ential equations. I: Nonstiff problems. Number 8 in Springer Series in Compu-tational Mathematics. Springer, Berlin, 2nd revised ed., 3rd corrected printing.edition, 2010.

[HW10] Ernst Hairer and Gerhard Wanner. Solving ordinary differential equations.II: Stiff and differential-algebraic problems. Number 14 in Springer Series inComputational Mathematics. Springer, Berlin, reprint of the 1996 2nd reviseded. edition, 2010.

132


[KP99] Peter E. Kloeden and Eckhard Platen. Numerical solution of stochastic differ-ential equations. Number 23 in Applications of Mathematics. Springer, Berlin,corr. 3rd printing edition, 1999.

[Øks03] Bernt Øksendal. Stochastic differential equations. An introduction with appli-cations. Universitext. Springer, Berlin, 6th ed. edition, 2003.

[Sey09] Rudiger U. Seydel. Tools for computational finance. 4th revised and extendeded. Universitext. Springer, Berlin, 4th revised and extended ed. edition, 2009.

[Shr04] Steven E. Shreve. Stochastic calculus for finance. II: Continuous-time models.Springer Finance. Springer, 2004.

[Ste01] J. Michael Steele. Stochastic calculus and financial applications. Number 45 inApplications of Mathematics. Springer, New York, NY, 2001.

[Tho06] Vidar Thomee. Galerkin finite element methods for parabolic problems, vol-ume 25 of Springer Series in Computational Mathematics. Springer-Verlag,Berlin, second edition, 2006.

numerical methods in mathematical nanceversion: october 12, 2015 preface these notes are the basis...

Documents