numerical pde methods for pricing path dependent options

Numerical PDE Methodsfor

Pricing Path Dependent Options

July 13-14, 2006

University of Waterloo

c©Peter Forsyth and Ken Vetzal 2006

P.A. Forsyth∗ K.R. Vetzal†

April 20, 2006

Contents1 Instructor Biographies 5

2 The First Option Trade 6

3 The Black-Scholes Equation 63.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 A Simple Example: The Two State Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 A hedging strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.6 Geometric Brownian motion with drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.6.1 Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.6.2 Some uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6.3 Some more uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.7 The Black-Scholes Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.8 Hedging in Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.9 The option price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.10 American early exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

∗School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1, [email protected], tel: (519)888-4567x4415, fax: (519) 885-1208

†Centre for Advanced Studies in Finance, University of Waterloo, [email protected]

1

4 The Risk Neutral World 19

5 Monte Carlo Methods 215.1 Monte Carlo Error Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Random Numbers and Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 The Box-Muller Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3.1 An improved Box Muller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.4 Speeding up Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5 Estimating the mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.6 Low Discrepancy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.7 Correlated Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.8 Integration of Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.8.1 The Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.8.2 Strong and Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.9 Matlab and Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 The Binomial Model 386.1 A No-arbitrage Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7 More on Ito’s Lemma 42

8 Derivative Contracts on non-traded Assets and Real Options 458.1 Derivative Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458.2 A Forward Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

8.2.1 Convenience Yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

9 Discrete Hedging 499.1 Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509.2 Gamma Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.3 Vega Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

10 Jump Diffusion 5410.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5510.2 The Jump Diffusion Pricing Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5610.3 An Alternate Derivation of the Pricing Equation for Jump Diffusion . . . . . . . . . . . . . . . . . . 58

11 Mean Variance Portfolio Optimization 6111.1 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6211.2 The Portfolio Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6211.3 Adding a Risk-free asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6511.4 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6611.5 Individual Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

12 Stocks for the Long Run? 70

13 Further Reading 7213.1 General Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7213.2 More Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7213.3 More Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

14 Basic Finite Difference Approximations 74

15 Overview of Discretizaton Methods for PDEs 76

2

16 Introduction to Finite Volume Methods 7616.1 Finite Volume Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7716.2 A Hyperbolic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8016.3 Upstream Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8216.4 Positive Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

17 Finite Volume II 84

18 Alternate Derivation: Finite Difference Discretization with Forward Backward Differencing 8618.1 Why use Upstream Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8918.2 Finite Difference and Finite Volume: Is There a Difference? . . . . . . . . . . . . . . . . . . . . . . 9218.3 A Complete Explicit Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

19 Stability 9319.1 Implicit Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9519.2 Stability of Implicit Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9519.3 Stability of the Crank-Nicolson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

19.3.1 More on Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9919.4 Crank-Nicolson von Neumann Analysis: Log Transformation . . . . . . . . . . . . . . . . . . . . . . 100

19.4.1 von Neumann Stability Analysis: The Details . . . . . . . . . . . . . . . . . . . . . . . . . . 10019.5 Positive Coefficient Discretizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10319.6 Positive Coefficient Methods and the Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . 10519.7 Technical Point: M -matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10619.8 Suppressing Oscillations in Crank-Nicolson Timestepping . . . . . . . . . . . . . . . . . . . . . . . 10719.9 Projection of Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10819.10The Problem With Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10919.11A Simple Timestep Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

20 Constraints: American Options and Barriers 11220.1 The Penalty Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11320.2 Implicit vs. Explicit Handling of the American Constraint . . . . . . . . . . . . . . . . . . . . . . . . 11620.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11620.4 Numerical tests of implicit and explicit handling of the American constraint . . . . . . . . . . . . . . 11820.5 Analysis of Constant Timestep Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11820.6 Using A Timestep Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12020.7 Variable Timestep Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12120.8 Comparison with Lattice Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12220.9 Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

21 Discrete Dollar Dividends 128

22 Relationship of Finite Difference Methods to Lattice Methods 128

23 Uncertain volatility, transaction cost models 13223.1 Uncertain Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13223.2 Transaction Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13323.3 Solution of the nonlinear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13323.4 Uncertain Volatility Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

24 Finite Volume Upwinding vs. Forward/Backward Differencing 13524.1 An example from a Bond Pricing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

25 Boundary Conditions: a Second Look 140

3

26 Path-Dependent Options - The Augmented StateSpace Approach 14226.1 A Note on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14226.2 Asian Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14226.3 Error Analysis: Asian Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

26.3.1 Running Sum Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14626.3.2 Convergence to Continuously Observed Limit . . . . . . . . . . . . . . . . . . . . . . . . . . 146

26.4 Parisian Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14626.5 Shout Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

26.5.1 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15126.5.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15226.5.3 An Object-Oriented Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15226.5.4 A Parallel Version for Multiprocessor Architectures . . . . . . . . . . . . . . . . . . . . . . . 157

27 Convertible Bonds with Credit Risk 15927.1 Convertible Bonds: no credit risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15927.2 A Risky Bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16027.3 A Convertible Bond with Credit Risk: A Simplified Case . . . . . . . . . . . . . . . . . . . . . . . . 16127.4 Convertible Bonds with Risky Corporate Debt: the General Case . . . . . . . . . . . . . . . . . . . . 16227.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

27.5.1 T &F Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16327.5.2 T &F splitting: a simple case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16427.5.3 T &F splitting: call just before expiry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16527.5.4 T &F splitting: general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

27.6 A New General Decomposition of the Convertible Bond . . . . . . . . . . . . . . . . . . . . . . . . 16627.6.1 Some special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

27.7 Numerical Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16727.7.1 T &F Model: Numerical Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16827.7.2 V&F Model: Numerical Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

27.8 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17027.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

28 Jump Diffusion: Numerical Methods 17628.1 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17728.2 Implicit Discretization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

28.2.1 Discretization of the Integral Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17828.2.2 Discretization of the Full PIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18228.2.3 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

28.3 Crank-Nicolson Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18428.4 Fixed Point Iteration Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18628.5 Numerical issues when pricing options under the log-normal process . . . . . . . . . . . . . . . . . . 187

28.5.1 Wrap around and DFT Evaluation of the Correlation . . . . . . . . . . . . . . . . . . . . . . 18828.5.2 American Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

28.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19228.6.1 European Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19228.6.2 Non-Smooth Payoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19528.6.3 Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19728.6.4 Exotic Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20028.6.5 Comparison with Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

28.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20328.8 Von Neumann Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

28.8.1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20428.8.2 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20428.8.3 Non-uniform FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

4

29 Matlab Exercises 21029.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21029.2 Exercise #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21229.3 Exercise #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21429.4 Exercise #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21629.5 Exercise #4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21729.6 Exercise #5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21929.7 Exercise #6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22129.8 Exercise #7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22429.9 Exercise #8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22529.10Exercise #9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22629.11Exercise #10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

1 Instructor BiographiesPeter Forsyth is a Professor in the School of Computer Science at the University of Waterloo. During 1995-98, hewas the Director of the Institute for Computer Research at Waterloo. Prior to joining Waterloo in 1987, he was a SeniorSimulation Scientist with the Computer Modelling Group (1979-1985) and was also President of Dynamic ReservoirSystems (1985-87). Both of these positions were based in Calgary. Peter has research interests in computationalfinance, iterative solution of large sparse matrices, and computational fluid dynamics. He has been collaborating withmembers of the Centre for Advanced Studies in Finance at Waterloo on developing algorithms for pricing and hedgingcomplex options. He is currently a member of the Editorial Boards of Journal of Computational Finance and AppliedMathematical Finance. Further information can be found at

http://www.scicom.uwaterloo.ca/˜paforsyt/E-mail: [email protected]

Ken Vetzal is an Associate Professor with the Centre for Advanced Studies in Finance at the University of Waterloo.He holds a Ph.D. in Finance from the University of Toronto. His research interests are primarily in the area of numericalvaluation of complex derivative instruments. He has recently published work in journals such as Advances in Futuresand Options Research, Applied Mathematical Finance, Journal of Banking and Finance, Journal of ComputationalFinance, and Journal of Fixed Income. Further information can be found at

http://www.arts.uwaterloo.ca/ACCT/people/vetzal.htmE-mail: [email protected]

5

“Men wanted for hazardous journey, small wages, bitter cold, long months of complete darkness, con-stant dangers, safe return doubtful. Honour and recognition in case of success.” Advertisement placedby Earnest Shackleton in 1914. He received 5000 replies. An example of extreme risk-seeking behaviour.Hedging with options is used to mitigate risk, and would not appeal to members of Shackleton’s expedi-tion.

2 The First Option TradeMany people think that options and futures are recent inventions. However, options have a long history, going back toancient Greece.

As recorded by Aristotle in Politics, the fifth century BC philosopher Thales of Miletus took part in a sophisticatedtrading strategy. The main point of this trade was to confirm that philosophers could become rich if they so chose. Thisis perhaps the first rejoinder to the famous question “If you are so smart, why aren’t you rich?” which has doggedacademics throughout the ages.

Thales observed that the weather was very favourable to a good olive crop, which would result in a bumper harvestof olives. If there was an established Athens Board of Olives Exchange, Thales could have simply sold olive futuresshort (a surplus of olives would cause the price of olives to go down). Since the exchange did not exist, Thales puta deposit on all the olive presses surrounding Miletus. When the olive crop was harvested, demand for olive pressesreached enormous proportions (olives were not a storable commodity). Thales then sublet the presses for a profit. Notethat by placing a deposit on the presses, Thales was actually manufacturing an option on the olive crop, i.e. the mosthe could lose was his deposit. If had sold short olive futures, he would have been liable to an unlimited loss, in theevent that the olive crop turned out bad, and the price of olives went up. In other words, he had an option on a futureof a non-storable commodity.

3 The Black-Scholes EquationThis is the basic PDE used in option pricing. We will derive this PDE for a simple case below. Things get much morecomplicated for real contracts.

3.1 BackgroundOver the past few years derivative securities (options, futures, and forward contracts) have become essential tools forcorporations and investors alike. Derivatives facilitate the transfer of financial risks. As such, they may be used tohedge risk exposures or to assume risks in the anticipation of profits. To take a simple yet instructive example, a goldmining firm is exposed to fluctuations in the price of gold. The firm could use a forward contract to fix the price ofits future sales. This would protect the firm against a fall in the price of gold, but it would also sacrifice the upsidepotential from a gold price increase. This could be preserved by using options instead of a forward contract.

Individual investors can also use derivatives as part of their investment strategies. This can be done throughdirect trading on financial exchanges. In addition, it is quite common for financial products to include some form ofembedded derivative. Any insurance contract can be viewed as a put option. Consequently, any investment whichprovides some kind of protection actually includes an option feature. Standard examples include deposit insuranceguarantees on savings accounts as well as the provision of being able to redeem a savings bond at par at any time.These types of embedded options are becoming increasingly common and increasingly complex. A prominent currentexample are investment guarantees being offered by insurance companies (“segregated funds”) and mutual funds. Insuch contracts, the initial investment is guaranteed, and gains can be locked-in (reset) a fixed number of times per yearat the option of the contract holder. This is actually a very complex put option, known as a shout option. How muchshould an investor be willing to pay for this insurance? Determining the fair market value of these sorts of contracts isa problem in option pricing.

6

Stock Price = $20

Stock Price = $22Option Price = $1

Stock Price = $18Option Price = $0

FIGURE 3.1: A simple case where the stock value can either be $22 or $18, with a European call option, K = $21.

3.2 DefinitionsLet’s consider some simple European put/call options. At some time T in the future (the expiry or exercise date) theholder has the right, but not the obligation, to

• Buy an asset at a prescribed price K (the exercise or strike price). This is a call option.

• Sell the asset at a prescribed price K (the exercise or strike price). This is a put option.

At expiry time T , we know with certainty what the value of the option is, in terms of the price of the underlying assetS,

Payoff = max(S−K,0) for a callPayoff = max(K −S,0) for a put (3.1)

Note that the payoff from an option is always non-negative, since the holder has a right but not an obligation. Thiscontrasts with a forward contract, where the holder must buy or sell at a prescribed price.

3.3 A Simple Example: The Two State TreeThis example is taken from Options, futures, and other derivatives, by John Hull. Suppose the value of a stock iscurrently $20. It is known that at the end of three months, the stock price will be either $22 or $18. We assume thatthe stock pays no dividends, and we would like to value a European call option to buy the stock in three months for$21. This option can have only two possible values in three months: if the stock price is $22, the option is worth $1,if the stock price is $18, the option is worth zero. This is illustrated in Figure 3.1.

In order to price this option, we can set up an imaginary portfolio consisting of the option and the stock, in such away that there is no uncertainty about the value of the portfolio at the end of three months. Since the portfolio has norisk, the return earned by this portfolio must be the risk-free rate.

Consider a portfolio consisting of a long (positive) position of δ shares of stock, and short (negative) one calloption. We will compute δ so that the portfolio is riskless. If the stock moves up to $22 or goes down to $18, then thevalue of the portfolio is

Value if stock goes up = $22δ−1Value if stock goes down = $18δ−0 (3.2)

7

So, if we choose δ = .25, then the value of the portfolio is

Value if stock goes up = $22δ−1 = $4.50Value if stock goes down = $18δ−0 = $4.50 (3.3)

So, regardless of whether the stock moves up or down, the value of the portfolio is $4.50. A risk-free portfolio mustearn the risk free rate. Suppose the current risk-free rate is 12%, then the value of the portfolio today must be thepresent value of $4.50, or

4.50× e−.12×.25 = 4.367

The value of the stock today is $20. Let the value of the option be V . The value of the portfolio is

20× .25−V = 4.367→ V = .633

3.4 A hedging strategySo, if we sell the above option (we hold a short position in the option), then we can hedge this position in the followingway. Today, we sell the option for $.633, borrow $4.367 from the bank at the risk free rate (this means that we haveto pay the bank back $4.50 in three months), which gives us $5.00 in cash. Then, we buy .25 shares at $20.00 (thecurrent price of the stock). In three months time, one of two things happens

• The stock goes up to $22, our stock holding is now worth $5.50, we pay the option holder $1.00, which leavesus with $4.50, just enough to pay off the bank loan.

• The stock goes down to $18.00. The call option is worthless. The value of the stock holding is now $4.50, whichis just enough to pay off the bank loan.

Consequently, in this simple situation, we see that the theoretical price of the option is the cost for the seller to set upportfolio, which will precisely pay off the option holder and any bank loans required to set up the hedge, at the expiryof the option. In other words, this is price which a hedger requires to ensure that there is always just enough moneyat the end to net out at zero gain or loss. If the market price of the option was higher than this value, the seller couldsell at the higher price and lock in an instantaneous risk-free gain. Alternatively, if the market price of the option waslower than the theoretical, or fair market value, it would be possible to lock in a risk-free gain by selling the portfolioshort. Any such arbitrage opportunities are rapidly exploited in the market, so that for most investors, we can assumethat such opportunities are not possible (the no arbitrage condition), and therefore that the market price of the optionshould be the theoretical price.

Note that this hedge works regardless of whether or not the stock goes up or down. Once we set up this hedge, wedon’t have a care in the world. The value of the option is also independent of the probability that the stock goes up to$22 or down to $18. This is somewhat counterintuitive.

3.5 Brownian MotionBefore we consider a model for stock price movements, let’s consider the idea of Brownian motion with drift. SupposeX is a random variable, and in time t → t +dt, X → X +dX , where

dX = αdt +σdZ (3.4)

where αdt is the drift term, σ is the volatility, and dZ is a random term. The dZ term has the form

dZ = φ√

dt (3.5)

where φ is a random variable drawn from a normal distribution with mean zero and variance one (φ ∼ N(0,1), i.e. φ isnormally distributed).

8

If E is the expectation operator, then

E(φ) = 0 E(φ2) = 1 . (3.6)

Now in a time interval dt, we have

E(dX) = E(αdt)+E(σdZ)

= αdt , (3.7)

and the variance of dX , denoted by Var(dX) is

Var(dX) = E([dX −E(dX)]2)

= E([σdZ]2)

= σ2dt . (3.8)

Let’s look at a discrete model to understand this process more completely. Suppose that we have a discrete latticeof points. Let X = X0 at t = 0. Suppose that at t = ∆t,

X0 → X0 +∆h ; with probability pX0 → X0 −∆h ; with probability q (3.9)

where p+q = 1. Assume that

• X follows a Markov process, i.e. the probability distribution in the future depends only on where it is now.

• The probability of an up or down move is independent of what happened in the past.

• X can move only up or down ∆h.

At any lattice point X0 + i∆h, the probability of an up move is p, and the probability of a down move is q. Theprobabilities of reaching any particular lattice point for the first three moves are shown in Figure 3.2. Each move takesplace in the time interval t → t +∆t.

Let ∆X be the change in X over the interval t → t +∆t. Then

E(∆X) = (p−q)∆hE([∆X ]2) = p(∆h)2 +q(−∆h)2

= (∆h)2, (3.10)

so that the variance of ∆X is (over t → t +∆t)

Var(∆X) = E([∆X ]2)− [E(∆X)]2

= (∆h)2 − (p−q)2(∆h)2

= 4pq(∆h)2 . (3.11)

Now, suppose we consider the distribution of X after n moves, so that t = n∆t. The probability of j up moves, and(n− j) down moves (P(n, j)) is

P(n, j) =n!

j!(n− j)! p jqn− j (3.12)

which is just a binomial distribution. Now, if Xn is the value of X after n steps on the lattice, then

E(Xn −X0) = nE(∆X)

Var(Xn −X0) = nVar(∆X) , (3.13)

9

X0

X0 - ∆h

X0 - 2∆h

X0 + 2∆h

X0 + ∆hp

q

p2

q2

q3

p3

2pq

3p2q

3pq2

X0 + 3∆h

X0 - 3∆h

FIGURE 3.2: Probabilities of reaching the discrete lattice points for the first three moves.

which follows from the properties of a binomial distribution, (each up or down move is independent of previousmoves). Consequently, from equations (3.10, 3.11, 3.13) we obtain

E(Xn −X0) = n(p−q)∆h

=t

∆t (p−q)∆h

Var(Xn −X0) = n4pq(∆h)2

=t

∆t 4pq(∆h)2 (3.14)

Now, we would like to take the limit at ∆t → 0 in such a way that the mean and variance of X , after a finite time tis independent of ∆t, and we would like to recover

dX = αdt +σdZE(dX) = αdt

Var(dX) = σ2dt (3.15)

as ∆t → 0. Now, since 0 ≤ p,q ≤ 1, we need to choose ∆h = Const√

∆t. Otherwise, from equation (3.14) we get thatVar(Xn −X0) is either 0 or infinite after a finite time. (Stock variances do not have either of these properties, so this isobviously not a very interesting case).

Let’s choose ∆h = σ√

∆t, which gives (from equation (3.14))

E(Xn −X0) = (p−q)σt√∆t

Var(Xn−X0) = t4pqσ2 (3.16)

Now, for E(Xn −X0) to be independent of ∆t as ∆t → 0, we must have

(p−q) = Const.√

∆t (3.17)

10

If we choose

p−q =ασ√

∆t (3.18)

we get

p =12 [1+

ασ√

∆t]

q =12 [1− α

σ√

∆t] (3.19)

Now, putting together equations (3.16-3.19) gives

E(Xn −X0) = αt

Var(Xn −X0) = tσ2(1− α2

σ2 ∆t)

= tσ2 ; ∆t → 0 . (3.20)

Now, let’s imagine that X(tn)−X(t0) = Xn −X0 is very small, so that Xn −X0 ' dX and tn − t0 ' dt, so that equation(3.20) becomes

E(dX) = α dtVar(dX) = σ2 dt . (3.21)

which agrees with equations (3.7-3.8). Hence, in the limit as ∆t → 0, we can interpret the random walk for X on thelattice (with these parameters) as the solution to the stochastic differential equation (SDE)

dX = α dt +σ dZdZ = φ

√dt. (3.22)

Consider the case where α = 0,σ = 1, so that dX = dZ =' Z(ti)−Z(ti−1) = Zi −Zi−1 = Xi −Xi−1. Now we canwrite

Z t

0dZ = lim

∆t→0∑i(Zi+1 −Zi) = (Zn −Z0) . (3.23)

From equation (3.20) (α = 0,σ = 1) we have

E(Zn −Z0) = 0Var(Zn −Z0) = t . (3.24)

Now, if n is large (∆t → 0), recall that the binomial distribution (3.12) tends to a normal distribution. From equation(3.24), we have that the mean of this distribution is zero, with variance t, so that

(Zn −Z0) ∼ N(0, t)

=

Z t

0dZ . (3.25)

In other words, after a finite time t,R t

0 dZ is normally distributed with mean zero and variance t (the limit of a binomialdistribution is a normal distribution).

Recall that have that Zi − Zi−1 =√

∆t with probability p and Zi − Zi−1 = −√

∆t with probability q. Note that(Zi −Zi−1)2 = ∆t, with certainty, so that we can write

(Zi −Zi−1)2 ' (dZ)2 = ∆t . (3.26)

To summarize• We can interpret the SDE

dX = α dt +σ dZdZ = φ

√dt. (3.27)

as the limit of a discrete random walk on a lattice as the timestep tends to zero.

11

• Var(dZ) = dt, otherwise, after any finite time, the Var(Xn −X0) is either zero or infinite.

• We can integrate the term dZ to obtainZ t

0dZ = Z(t)−Z(0)

∼ N(0, t) . (3.28)

Going back to our lattice example, note that the total distance traveled over any finite interval of time becomesinfinite,

E(|∆X |) = ∆h (3.29)

so that the the total distance traveled in n steps is

n∆h =t

∆t ∆h

=tσ√∆t

(3.30)

which goes to infinity as ∆t → 0. Similarly,

∆x∆t = ±∞ . (3.31)

Consequently, Brownian motion is very jagged at every timescale. These paths are not differentiable, i.e. dxdt does not

exist, so we cannot speak of

E(dxdt ) (3.32)

but we can possibly define

E(dx)dt . (3.33)

3.6 Geometric Brownian motion with driftOf course, the actual path followed by stock is more complex than the simple situation described above. More realisti-cally, we assume that the relative changes in stock prices (the returns) follow Brownian motion with drift. We supposethat in an infinitesimal time dt, the stock price S changes to S+dS, where

dSS = µdt +σdZ (3.34)

where µ is the drift rate, σ is the volatility, and dZ is the increment of a Wiener process,

dZ = φ√

dt (3.35)

where φ ∼ N(0,1). Equations (3.34) and (3.35) are called geometric Brownian motion with drift. So, superimposed onthe upward (relative) drift is a (relative) random walk. The degree of randomness is given by the volatility σ. Figure3.3 gives an illustration of ten realizations of this random process for two different values of the volatility. In this case,we assume that the drift rate µ equals the risk free rate.

Note that

E(dS) = E(σSdZ +µSdt)= µSdt

since E(dZ) = 0 (3.36)

12

0 2 4 6 8 10 12Time (years)

0

100

200

300

400

500

600

700

800

900

1000As

set P

rice

($)

Risk FreeReturn

Low Volatility Caseσ = .20 per year

0 2 4 6 8 10 12Time (years)

0

100

200

300

400

500

600

700

800

900

1000

Asse

t Pric

e ($

)

Risk FreeReturn

High Volatility Caseσ = .40 per year

FIGURE 3.3: Realizations of asset price following geometric Brownian motion. Left: low volatility case; right: highvolatility case. Risk-free rate of return r = .05.

and that the variance of dS is

Var[dS] = E(dS2)− [E(dS)]2

= E(σ2S2dZ2)

= σ2S2dt (3.37)

so that σ is a measure of the degree of randomness of the stock price movement.Equation (3.34) is a stochastic differential equation. The normal rules of calculus don’t apply, since for example

dZdt = φ

1√dt

→ ∞ as dt → 0 .

The study of these sorts of equations uses results from stochastic calculus. However, for our purposes, we need onlyone result, which is Ito’s Lemma (see Derivatives: the theory and practice of financial engineering, by P. Wilmott).Suppose we have some function G = G(S, t), where S follows the stochastic process equation (3.34), then, in smalltime increment dt, G → G+dG, where

dG =

(

µS ∂G∂S +

σ2S2

2∂2G∂S2 +

∂G∂t

)

dt +σS ∂G∂S dZ (3.38)

An informal derivation of this result is given in the following section.

3.6.1 Ito’s Lemma

We give an informal derivation of Ito’s lemma (3.38). Suppose we have a variable S which follows

dS = a(S, t)dt +b(S, t)dZ (3.39)

where dZ is the increment of a Weiner process.Now since

dZ2 = φ2dt (3.40)

13

where φ is a random variable drawn from a normal distribution with mean zero and unit variance, we have that, if E isthe expectation operator, then

E(φ) = 0 E(φ2) = 1 (3.41)

so that the expected value of dZ2 isE(dZ2) = dt (3.42)

Now, it can be shown (see Section 7) that in the limit as dt → 0, we have that φ2dt becomes non-stochastic, so thatwith probability one

dZ2 → dt as dt → 0 (3.43)

Now, suppose we have some function G = G(S, t), then

dG = GSdS+Gtdt +GSSdS2

2 + ... (3.44)

Now (from (3.39) )

(dS)2 = (adt +b dZ)2

= a2dt2 +ab dZdt +b2dZ2 (3.45)

Since dZ = O(√

dt) and dZ2 → dt, equation (3.45) becomes

(dS)2 = b2dZ2 +O((dt)3/2) (3.46)

or(dS)2 → b2dt as dt → 0 (3.47)

Now, equations(3.39,3.44,3.47) give

dG = GSdS+Gtdt +GSSdS2

2 + ...

= GS(a dt +b dZ)+dt(Gt +GSSb2

2 )

= GSb dZ +(aGS +GSSb2

2 +Gt)dt (3.48)

So, we have the result that ifdS = a(S, t)dt +b(S, t)dZ (3.49)

and if G = G(S, t), then

dG = GSb dZ +(a GS +GSSb2

2 +Gt)dt (3.50)

Equation (3.38) can be deduced by setting a = µS and b = σS in equation (3.50).

3.6.2 Some uses of Ito’s Lemma

Suppose we have

dS = µdt +σdZ . (3.51)

If µ,σ = Const., then this can be integrated (from t = 0 to t = t) exactly to give

S(t) = S(0)+µt +σ(Z(t)−Z(0)) (3.52)

and from equation (3.28)

Z(t)−Z(0) ∼ N(0, t) (3.53)

14

Note that when we say that we solve a stochastic differential equation exactly, this means that we have an expres-sion for the distribution of S(T ).

Suppose instead we use the more usual geometric Brownian motion

dS = µSdt +σSdZ (3.54)

Let F(S) = logS, and use Ito’s Lemma

dF = FSSσdZ +(FSµS+FSSσ2S2

2 +Ft)dt

= (µ− σ2

2 )dt +σdZ , (3.55)

so that we can integrate this to get

F(t) = F(0)+(µ− σ2

2 )t +σ(Z(t)−Z(0)) (3.56)

or, since S = eF ,

S(t) = S(0)exp[(µ− σ2

2 )t +σ(Z(t)−Z(0))] . (3.57)

Unfortunately, these cases are about the only situations where we can exactly integrate the SDE (constant σ,µ).

3.6.3 Some more uses of Ito’s Lemma

We can often use Ito’s Lemma and some algebraic tricks to determine some properties of distributions. Let

dX = a(X , t) dt +b(X , t) dZ , (3.58)

then if G = G(X), then

dG =

[

aGX +Gt +b2

2 GXX

]

dt +GXb dZ . (3.59)

If E[X ] = X , then (b(X , t) and dZ are independent)

E[dX ] = d E[S] = dX= E[a dt]+E[b] E[dZ]

= E[a dt] , (3.60)

so that

d Xdt = E[a] = a

X = E[

Z t

0a dt

]

. (3.61)

Let G = E[(X − X)2] = var(X), then

dG = E [dG]

= E[2(X − X)a−2(X − X)a+b2] dt +E[2b(X − X)]E[dZ]

= E[b2 dt]+E[2(X − X)(a− a) dt] , (3.62)

which means that

G = var(X) = E[

Z t

0b2 dt

]

+E[

Z t

02(a− a)(X − X) dt

]

. (3.63)

15

In a particular case, we can sometimes get more useful expressions. If

dS = µS dt +σS dZ (3.64)

with µ,σ constant, then

E[dS] = dS = E[µS] dt= µS dt , (3.65)

so that

dS = µS dtS = S0eµt . (3.66)

Now, let G(S) = S2, so that E[G] = G = E[S2], then (from Ito’s Lemma)

d G = E[2µS2 +σ2S2] dt +E[2S2σ]E[dZ]

= E[2µS2 +σ2S2] dt= (2µ+σ2)G dt , (3.67)

so that

G = G0e(2µ+σ2)t

E[S2] = S20e(2µ+σ2)t . (3.68)

From equations (3.66) and (3.68) we then have

var(S) = E[S2]− (E[S])2

= E[S2]− S2

= S20e2µt(eσ2t −1)

= S2(eσ2t −1) . (3.69)

One can use the same ideas to compute the skewness, E[(S− S)3]. If G(S) = S3 and G = E[G(S)] = E[S3], then

dG = E[µS ·3S2 +σ2S2/2 ·3 ·2S] dt +E[3S2σS]E[dZ]

= E[3µS3 +3σ2S3]

= 3(µ+σ2)G , (3.70)

so that

G = E[S3]

= S30e3(µ+σ2)t . (3.71)

We can then obtain the skewness from

E[(S− S)3] = E[S3−2S2S−2SS2 + S3]

= E[S3]−2SE[S2]− S3 . (3.72)

Equations (3.66, 3.68, 3.71) can then be substituted into equation (3.72) to get the desired result.

16

3.7 The Black-Scholes AnalysisAssume

• The stock price follows geometric Brownian motion, equation (3.34).

• The risk-free rate of return is a constant r.

• There are no arbitrage opportunities, i.e. all risk-free portfolios must earn the risk-free rate of return.

• Short selling is permitted (i.e. we can own negative quantities of an asset).Suppose that we have an option whose value is given by V = V (S, t). Construct an imaginary portfolio, consisting

of one option, and a number of (−(αh)) of the underlying asset. (If (αh) > 0, then we have sold the asset short, i.e.we have borrowed an asset, sold it, and are obligated to give it back at some future date).

The value of this portfolio P isP = V − (αh)S (3.73)

In a small time dt, P → P+dP,dP = dV − (αh)dS (3.74)

Note that in equation (3.74) we not included a term (αh)SS. This is actually a rather subtle point, since we shall see(later on) that (αh) actually depends on S. However, if we think of a real situation, at any instant in time, we mustchoose (αh), and then we hold the portfolio while the asset moves randomly. So, equation (3.74) is actually the changein the value of the portfolio, not a differential. If we were taking a true differential then equation (3.74) would be

dP = dV − (αh)dS−Sd(αh)

but we have to remember that (αh) does not change over a small time interval, since we pick (αh), and then S changesrandomly. We are not allowed to peek into the future, (otherwise, we could get rich without risk, which is not permittedby the no-arbitrage condition) and hence (αh) is not allowed to contain any information about future asset pricemovements. The principle of no peeking into the future is why Ito stochastic calculus is used. Other forms of stochasticcalculus are used in Physics applications (i.e. turbulent flow).

Substituting equations (3.34) and (3.38) into equation (3.74) gives

dP = σS(

VS − (αh))

dZ +

(

µSVS +σ2S2

2 VSS +Vt −µ(αh)S)

dt (3.75)

We can make this portfolio riskless over the time interval dt, by choosing (αh) =VS in equation (3.75). This eliminatesthe dZ term in equation (3.75). (This is the analogue of our choice of the amount of stock in the riskless portfolio forthe two state tree model.) So, letting

(αh) = VS (3.76)

then substituting equation (3.76) into equation (3.75) gives

dP =

(

Vt +σ2S2

2 VSS

)

dt (3.77)

Since P is now risk-free in the interval t → t +dt, then no-arbitrage says that

dP = rPdt (3.78)

Therefore, equations (3.77) and (3.78) give

rPdt =

(

Vt +σ2S2

2 VSS

)

dt (3.79)

SinceP = V − (αh)S = V −VSS (3.80)

17

then substituting equation (3.80) into equation (3.79) gives

Vt +σ2S2

2 VSS + rSVS− rV = 0 (3.81)

which is the Black-Scholes equation. Note the rather remarkable fact that equation (3.81) is independent of the driftrate µ.

Equation (3.81) is solved backwards in time from the option expiry time t = T to the present t = 0.

3.8 Hedging in Continuous TimeWe can construct a hedging strategy based on the solution to the above equation. Suppose we sell an option at price Vat t = 0. Then we carry out the following

• We sell one option worth V . (This gives us V in cash initially).

• We borrow (S ∂V∂S −V) from the bank.

• We buy ∂V∂S shares at price S.

At every instant in time, we adjust the amount of stock we own so that we always have ∂V∂S shares. Note that this is

a dynamic hedge, since we have to continually rebalance the portfolio. Cash will flow into and out of the bank account,in response to changes in S. If the amount in the bank is positive, we receive the risk free rate of return. If negative,then we borrow at the risk free rate.

So, our hedging portfolio will be

• Short one option worth V .

• Long ∂V∂S shares at price S.

• V −S ∂V∂S cash in the bank account.

At any instant in time (including the terminal time), this portfolio can be liquidated and any obligations implied bythe short position in the option can be covered, at zero gain or loss, regardless of the value of S. Note that given thereceipt of the cash for the option, this strategy is self-financing.

3.9 The option priceSo, we can see that the price of the option valued by the Black-Scholes equation is the market price of the option atany time. If the price was higher then the Black-Scholes price, we could construct the hedging portfolio, dynamicallyadjust the hedge, and end up with a positive amount at the end. Similarly, if the price was lower than the Black-Scholesprice, we could short the hedging portfolio, and end up with a positive gain. By the no-arbitrage condition, this shouldnot be possible.

Note that we are not trying to predict the price movements of the underlying asset, which is a random process. Thevalue of the option is based on a hedging strategy which is dynamic, and must be continuously rebalanced. The priceis the cost of setting up the hedging portfolio. The Black-Scholes price is not the expected payoff.

The price given by the Black-Scholes price is not the value of the option to a speculator, who buys and holdsthe option. A speculator is making bets about the underlying drift rate of the stock (note that the drift rate does notappear in the Black-Scholes equation). For a speculator, the value of the option is given by an equation similar to theBlack-Scholes equation, except that the drift rate appears. In this case, the price can be interpreted as the expectedpayoff based on the guess for the drift rate. But this is art, not science!

18

3.10 American early exerciseActually, most options traded are American options, which have the feature that they can be exercised at any time.Consequently, an investor acting optimally, will always exercise the option if the value falls below the payoff orexercise value. So, the value of an American option is given by the solution to equation (3.81) with the additionalconstraint

V (S, t) ≥

max(S−K,0) for a callmax(K −S,0) for a put (3.82)

Note that since we are working backwards in time, we know what the option is worth in future, and therefore we candetermine the optimal course of action.

In order to write equation (3.81) in more conventional form, define τ = T − t, so that equation (3.81) becomes

Vτ =σ2S2

2 VSS + rSVS− rV

V (S,τ = 0) =

max(S−K,0) for a callmax(K −S,0) for a put

V (0,τ) → Vτ = −rV

V (S = ∞,τ) →

' S for a call' 0 for a put (3.83)

If the option is American, then we also have the additional constraints

V (S,τ) ≥

max(S−K,0) for a callmax(K −S,0) for a put (3.84)

Define the operator

LV ≡Vτ − (σ2S2

2 VSS + rSVS− rV) (3.85)

and let V (S,0) = V ∗. More formally, the American option pricing problem can be stated as

LV ≥ 0V −V∗ ≥ 0

(V −V∗)LV = 0 (3.86)

4 The Risk Neutral WorldSuppose instead of valuing an option using the above no-arbitrage argument, we wanted to know the expected valueof the option. We can imagine that we are buying and holding the option, and not hedging. If we are consideringthe value of risky cash flows in the future, then these cash flows should be discounted at an appropriate discount rate,which we will call ρ (i.e. the riskier the cash flows, the higher ρ).

Consequently the value of an option today can be considered to the be the discounted future value. This is simplythe old idea of net present value. Regard S today as known, and let V (S+dS, t +dt) be the value of the option at somefuture time t +dt, which is uncertain, since S evolves randomly. Thus

V (S, t) =1

1+ρdt E(V (S+dS, t +dt)) (4.1)

where E(...) is the expectation operator, i.e. the expected value of V (S+dS, t +dt) given that V = V (S, t) at t = t. Wecan rewrite equation (4.1) as (ignoring terms of o(dt), where o(dt) represents terms that go to zero faster than dt )

ρdtV(S, t) = E(V (S, t)+dV)−V(S, t) . (4.2)

19

Since we regard V as the expected value, so that E[V(S, t)] = V (S, t), and then

E(V (S, t)+dV)−V(S, t) = E(dV) , (4.3)

so that equation (4.2) becomes

ρdtV(S, t) = E(dV ) . (4.4)

Assume thatdSS = µdt +σdZ . (4.5)

From Ito’s Lemma (3.38) we have that

dV =

(

Vt +σ2S2

2 VSS +µSVS

)

dt +σSVS dZ . (4.6)

Noting that

E(dZ) = 0 (4.7)

then

E(dV ) =

(

Vt +σ2S2

2 VSS +µSVS

)

dt . (4.8)

Combining equations (4.4-4.8) gives

Vt +σ2S2

2 VSS +µSVS−ρV = 0 . (4.9)

Equation (4.9) is the PDE for the expected value of an option. If we are not hedging, maybe this is the value thatwe are interested in, not the no-arbitrage value. However, if this is the case, we have to estimate the drift rate µ, andthe discount rate ρ. Estimating the appropriate discount rate is always a thorny issue.

Now, note the interesting fact, if we set ρ = r and µ = r in equation (4.9) then we simply get the Black-Scholesequation (3.81).

This means that the no-arbitrage price of an option is identical to the expected value if ρ = r and µ = r. In otherwords, we can determine the no-arbitrage price by pretending we are living in a world where all assets drift at rate r,and all investments are discounted at rate r. This is the so-called risk neutral world.

This result is the source of endless confusion. It is best to think of this as simply a mathematical fluke. This doesnot have any reality. Investors would be very stupid to think that the drift rate of risky investments is r. I’d ratherjust buy risk-free bonds in this case. There is in reality no such thing as a risk-neutral world. Nevertheless, this resultis useful for determining the no-arbitrage value of an option using a Monte Carlo approach. Using this numericalmethod, we simply assume that

dS = rSdt +σSdZ (4.10)

and simulate a large number of random paths. If we know the option payoff as a function of S at t = T , then wecompute

V (S,0) = e−rT EQ(V (S,T )) (4.11)

which should be the no-arbitrage value.Note the EQ in the above equation. This makes it clear that we are taking the expectation in the risk neutral world

(the expectation in the Q measure). This contrasts with the real-world expectation (the P measure).Suppose we want to know the expected value (in the real world) of an asset which pays V (S, t = T ) at t = T in the

future. Then, the expected value (today) is given by solving

Vt +σ2S2

2 VSS +µSVS = 0 . (4.12)

20

where we have dropped the discounting term. In particular, suppose we are going to receive V = S(t = T ), i.e. just theasset at t = T . Assume that the solution to equation (4.12) is V = Const. A(t)S, and we find that

V = Const. Seµ(T−t) . (4.13)

Noting that we receive V = S at t = T means that

V = Seµ(T−t) . (4.14)

Today, we can acquire the asset for price S(t = 0). At t = T , the asset is worth S(t = T ). Equation (4.14) then says that

E[V (S(t = 0), t = 0)] = E[S(t = 0)] = S(t = 0)eµ(T) (4.15)

In other words, if

dS = Sµ dt +Sσ dZ (4.16)

then (setting t = T )

E[S] = Seµt . (4.17)

Recall that the exact solution to equation (4.16) is (equation (3.57))

S(t) = S(0)exp[(µ− σ2

2 )t +σ(Z(t)−Z(0))] . (4.18)

So that we have just shown that E[S] = Seµt by using a simple PDE argument and Ito’s Lemma. Isn’t this easier thanusing brute force statistics? PDEs are much more elegant.

5 Monte Carlo MethodsThis brings us to the simplest numerical method for computing the no-arbitrage value of an option. Suppose that weassume that the underlying process is

dSS = rdt +σdZ (5.1)

then we can simulate a path forward in time, starting at some price today S0, using a forward Euler timesteppingmethod (Si = S(ti))

Si+1 = Si +Si(r∆t +σφi√∆t) (5.2)

where ∆t is the finite timestep, and φi is a random number which is N(0,1). Note that at each timestep, we generate anew random number. After N steps, with T = N∆t, we have a single realized path. Given the payoff function of theoption, the value for this path would be

Value = Payo f f (SN) . (5.3)

For example, if the option was a European call, then

Value = max(SN −K,0)

K = Strike Price (5.4)

Suppose we run a series of trials, m = 1, ...,M, and denote the payoff after the m′th trial as payo f f (m). Then, theno-arbitrage value of the option is

Option Value = e−rT E(payo f f )

' e−rT 1M

m=M∑

m=1payo f f (m) . (5.5)

Recall that these paths are not the real paths, but are the risk neutral paths.Now, we should remember that we are

21

1. approximating the solution to the SDE by forward Euler, which has O(∆t) truncation error.

2. approximating the expectation by the mean of many random paths. This Monte Carlo error is of size O(1/√

M),which is slowly converging.

There are thus two sources of error in the Monte Carlo approach: timestepping error and sampling error.The slow rate of convergence of Monte Carlo methods makes these techniques unattractive except when the option

is written on several (i.e. more than three) underlying assets. As well, since we are simulating forward in time, wecannot know at a given point in the forward path if it is optimal to exercise or hold an American style option. Thisis easy if we use a PDE method, since we solve the PDE backwards in time, so we always know the continuationvalue and hence can act optimally. However, if we have more than three factors, PDE methods become very expensivecomputationally. As well, if we want to determine the effects of discrete hedging, for example, a Monte Carlo approachis very easy to implement.

The error in the Monte Carlo method is then

Error = O(

max(∆t, 1√M

)

)

∆t = timestepM = number of Monte Carlo paths (5.6)

Now, it doesn’t make sense to drive the Monte Carlo error down to zero if there is O(∆t) timestepping error. We shouldseek to balance the timestepping error and the sampling error. In order to make these two errors the same order, weshould choose M = O( 1

(∆t)2 ). This makes the total error O(∆t). We also have that

Complexity = O(

M∆t

)

= O(

1(∆t)3

)

∆t = O(

(Complexity)−1/3)

(5.7)

and hence

Error = O(

1( Complexity)1/3

)

. (5.8)

In practice, the convergence in terms of timestep error is often not done. People just pick a timestep, i.e. one day,and increase the number of Monte Carlo samples until they achieve convergence in terms of sampling error, and ignorethe timestep error. Sometimes this gives bad results!

Note that the exact solution to Geometric Brownian motion (3.57) has the property that the asset value S can neverreach S = 0 if S(0) > 0, in any finite time. However, due to the approximate nature of our Forward Euler method forsolving the SDE, it is possible that a negative or zero Si can show up. We can do one of three things here, in this case

• Cut back the timestep at this point in the simulation so that S is positive.

• Set S = 0 and continue. In this case, S remains zero for the rest of this particular simulation.

• Use Ito’s Lemma, and determine the SDE for logS, i.e. if F = logS, then, from equation (3.55), we obtain (withµ = r)

dF = (r− σ2

2 )dt +σdZ , (5.9)

so that now, if F < 0, there is no problem, since S = eF , and if F < 0, this just means that S is very small. Wecan use this idea for any stochastic process where the variable should not go negative.

22

Usually, most people set S = 0 and continue. As long as the timestep is not too large, this situation is probably dueto an event of low probability, hence any errors incurred will not affect the expected value very much. If negative Svalues show up many times, this is a signal that the timestep is too large.

In the case of simple Geometric Brownian motion, where r,σ are constants, then the SDE can be solved exactly,and we can avoid timestepping errors (see Section 3.6.2). In this case

S(T ) = S(0)exp[(r− σ2

2 )T +σφ√

T ] (5.10)

where φ ∼N(0,1). I’ll remind you that equation (5.10) is exact. For these simple cases, we should always use equation(5.10). Unfortunately, this does not work in more realistic situations.

Monte Carlo is popular because

• It is simple to code. Easily handles complex path dependence.

• Easily handles multiple assets.

The disadvantages of Monte Carlo methods are

• It is difficult to apply this idea to problems involving optimal decision making (e.g. American options).

• It is hard to compute the Greeks (VS,VSS), which are the hedging parameters, very accurately.

• MC converges slowly.

5.1 Monte Carlo Error EstimatorsThe sampling error can be estimated via a statistical approach. If the estimated mean of the sample is

µ =e−rT

M

m=M∑

m=1payo f f (m) (5.11)

and the standard deviation of the estimate is

ω =

(

1M−1

m=M∑

m=1(e−rT payo f f (m)− µ)2

)1/2

(5.12)

then the 95% confidence interval for the actual value V of the option is

µ− 1.96ω√M

< V < µ+1.96ω√

M(5.13)

Note that in order to reduce this error by a factor of 10, the number of simulations must be increased by 100.The timestep error can be estimated by running the problem with different size timesteps, comparing the solutions.

5.2 Random Numbers and Monte CarloThere are many good algorithms for generating random sequences which are uniformly distributed in [0,1]. Seefor example, (Numerical Recipies in C++., Press et al, Cambridge University Press, 2002). As pointed out in thisbook, often the system supplied random number generators, such as rand in the standard C library, and the infamousRANDU IBM function, are extremely bad. The Matlab functions appear to be quite good. For more details, please lookat (Park and Miller, ACM Transactions on Mathematical Software, 31 (1988) 1192-1201). Another good generatoris described in (Matsumoto and Nishimura, “The Mersenne Twister: a 623 dimensionally equidistributed uniformpseudorandom number generator,” ACM Transactions on Modelling and Computer Simulation, 8 (1998) 3-30.) Codecan be downloaded from the authors Web site.

However, we need random numbers which are normally distributed on [−∞,+∞], with mean zero and varianceone (N(0,1)).

23

Suppose we have uniformly distributed numbers on [0,1], i.e. the probability of obtaining a number between x andx+dx is

p(x)dx = dx ; 0 ≤ x ≤ 1= 0 ; otherwise (5.14)

Let’s take a function of this random variable y(x). How is y(x) distributed? Let p(y) be the probability distribution ofobtaining y in [y,y+dy]. Consequently, we must have (recall the law of transformation of probabilities)

p(x)|dx| = p(y)|dy|

or

p(y) = p(x)∣

∣

∣

∣

dxdy

∣

∣

∣

∣

. (5.15)

Suppose we want p(y) to be normal,

p(y) =e−y2/2√

2π. (5.16)

If we start with a uniform distribution, p(x) = 1 on [0,1], then from equations (5.15-5.16) we obtain

dxdy =

e−y2/2√

2π. (5.17)

Now, for x ∈ [0,1], we have that the probability of obtaining a number in [0,x] isZ x

0dx′ = x , (5.18)

but from equation (5.17) we have

dx′ =e−(y′)2/2√

2πdy′ . (5.19)

So, there exists a y such that the probability of getting a y′ in [−∞,y] is equal to the probability of getting x′ in [0,x],Z x

0dx′ =

Z y

−∞

e−(y′)2/2√

2πdy′ , (5.20)

or

x =Z y

−∞

e−(y′)2/2√

2πdy′ . (5.21)

So, if we generate uniformly distributed numbers x on [0,1], then to determine y which are N(0,1), we do the following

• Generate x

• Find y such that

x =1√2π

Z y

−∞e−(y′)2/2dy′ . (5.22)

We can write this last step as

y = F(x) (5.23)

where F(x) is the inverse cumulative normal distribution.

24

5.3 The Box-Muller AlgorithmStarting from random numbers which are uniformly distributed on [0,1], there is actually a simpler method for obtain-ing random numbers which are normally distributed.

If p(x) is the probability of finding x∈ [x,x+dx] and if y = y(x), and p(y) is the probability of finding y∈ [y,y+dy],then, from equation (5.15) we have

|p(x)dx| = |p(y)dy| (5.24)

or

p(y) = p(x)∣

∣

∣

∣

dxdy

∣

∣

∣

∣

. (5.25)

Now, suppose we have two original random variables x1,x2, and let p(xi,x2) be the probability of obtaining (x1,x2)in [x1,x1 +dx1]× [x2,x2 +dx2]. Then, if

y1 = y1(x1,x2)

y2 = y2(x1,x2) (5.26)

and we have that

p(y1,y2) = p(x1,x2)

∣

∣

∣

∣

∂(x1,x2)

∂(y1,y2)

∣

∣

∣

∣

(5.27)

where the Jacobian of the transformation is defined as

∂(x1,x2)

∂(y1,y2)= det

∣

∣

∣

∣

∣

∂x1∂y1

∂x1∂y2

∂x2∂y1

∂x2∂y2

∣

∣

∣

∣

∣

(5.28)

Recall that the Jacobian of the transformation can be regarded as the scaling factor which transforms dx1 dx2 tody1 dy2, i.e.

dx1 dx2 =

∣

∣

∣

∣

∂(x1,x2)

∂(y1,y2)

∣

∣

∣

∣

dy1 dy2 . (5.29)

Now, suppose that we have x1,x2 uniformly distributed on [0,1]× [0,1], i.e.

p(x1,x2) = U(x1)U(x2) (5.30)

where

U(x) = 1 ; 0 ≤ x ≤ 1= 0 ; otherwise . (5.31)

We denote this distribution as x1 ∼U [0,1] and x2 ∼U [0,1].If p(x1,x2) is given by equation (5.30), then we have from equation (5.27) that

p(y1,y2) =

∣

∣

∣

∣

∂(x1,x2)

∂(y1,y2)

∣

∣

∣

∣

(5.32)

Now, we want to find a transformation y1 = y1(x1,x2),y2 = y2(x1,x2) which results in normal distributions for y1,y2.Consider

y1 =√

−2logx1 cos2πx2

y2 =√

−2logx1 sin2πx2 (5.33)

25

or solving for (x2,x2)

x1 = exp(−1

2 (y21 + y2

2)

)

x2 =1

2πtan−1

[

y2y1

]

. (5.34)

After some tedious algebra, we can see that (using equation (5.34))∣

∣

∣

∣

∂(x1,x2)

∂(y1,y2)

∣

∣

∣

∣

=1√2π

e−y21/2 1√

2πe−y2

2/2 (5.35)

Now, assuming that equation (5.30) holds, then from equations (5.32-5.35) we have

p(y1,y2) =1√2π

e−y21/2 1√

2πe−y2

2/2 (5.36)

so that (y1,y2) are independent, normally distributed random variables, with mean zero and variance one, or

y1 ∼ N(0,1) ; y2 ∼ N(0,1) . (5.37)

This gives the following algorithm for generating normally distributed random numbers (given uniformly dis-tributed numbers):

Box Muller Algorithm

RepeatGenerate u1 ∼U(0,1), u2 ∼U(0,1)θ = 2πu2, ρ =

√−2logu1z1 = ρcosθ; z2 = ρsinθ

End Repeat

(5.38)

This has the effect that Z1 ∼ N(0,1) and Z2 ∼ N(0,1).Note that we generate two draws from a normal distribution on each pass through the loop.

5.3.1 An improved Box Muller

The algorithm (5.38) can be expensive due to the trigonometric function evaluations. We can use the following methodto avoid these evaluations. Let

U1 ∼U [0,1] ; U2 ∼U [0,1]

V1 = 2U1 −1 ; V2 = 2U2 −1 (5.39)

which means that (V1,V2) are uniformly distributed in [−1,1]× [−1,1]. Now, we carry out the following procedure

Rejection Method

RepeatIf ( V 2

1 +V 22 < 1 )

AcceptElse

RejectEndif

26

End Repeat

(5.40)

which means that if we define (V1,V2) as in equation (5.39), and then process the pairs (V1,V2) using algorithm (5.40)we have that (V1,V2) are uniformly distributed on the disk centered at the origin, with radius one, in the (V1,V2) plane.This is denoted by

(V1,V2) ∼ D(0,1) . (5.41)

If (V1,V2) ∼ D(0,1) and R2 = V 21 +V 2

2 , then the probability of finding R in [R,R+dR] is

p(R) dR =2πR dRπ(1)2

= 2R dR . (5.42)

From the fundamental law of transformation of probabilities, we have that

p(R2)d(R2) = p(R)dR= 2R dR (5.43)

so that

p(R2) =2R

d(R2)dR

= 1 (5.44)

so that R2 is uniformly distributed on [0,1], (R2 ∼U [0,1]).As well, if θ = tan−1(V2/V1), i.e. θ is the angle between a line from the origin to the point (V1,V2) and the V1 axis,

then θ ∼U [0,2π]. Note that

cosθ =V1

√

V 21 +V 2

2

sinθ =V2

√

V 21 +V 2

2

. (5.45)

Now in the original Box Muller algorithm (5.38),

ρ =√

−2logU1 ; U1 ∼U [0,1]

θ = 2ΠU2 ; U2 ∼U [0,1] , (5.46)

but θ = tan−1(V2/V1)∼U [0,2π], and R2 = U [0,1]. Therefore, if we let W = R2, then we can replace θ,ρ in algorithm(5.38) by

θ = tan−1(

V2V1

)

ρ =√

−2logW . (5.47)

Now, the last step in the Box Muller algorithm (5.38) is

Z1 = ρcosθZ2 = ρsinθ , (5.48)

27

but since W = R2 = V 21 +V 2

2 , then cosθ = V1/R, sinθ = V2/R, so that

Z1 = ρV1√W

Z2 = ρV2√W

. (5.49)

This leads to the following algorithm

Polar form of Box Muller

RepeatGenerate U1 ∼U [0,1], U2 ∼U [0,1].Let

V1 = 2U1 −1V2 = 2U2 −1W = V 2

1 +V 22

If( W < 1) then

Z1 = V1√

−2logW/WZ2 = V2

√

−2logW/W (5.50)

End IfEnd Repeat

Consequently, (Z1,Z2) are independent (uncorrelated), and Z1 ∼ N(0,1), and Z2 ∼ N(0,1). Because of the rejectionstep (5.40), about (1−π/4) of the random draws in [−1,+1]× [−1,+1] are rejected (about 21%), but this method isstill generally more efficient than brute force Box Muller.

5.4 Speeding up Monte CarloMonte Carlo methods are slow to converge, since the error is given by

Error = O(1√M

)

where M is the number of samples. There are many methods which can be used to try to speed up convergence. Theseare usually termed Variance Reduction techniques.

Perhaps the simplest idea is the Antithetic Variable method. Suppose we compute a random asset path

Si+1 = Siµ∆t +Siσφi√∆t

where φi are N(0,1). We store all the φi, i = 1, ..., for a given path. Call the estimate for the option price from thissample path V +. Then compute a second sample path where (φi)′ = −φi, i = 1, ...,. Call this estimate V−. Thencompute the average

V =V+ +V−

2 ,

and continue sampling in this way. Averaging over all the V , slightly faster convergence is obtained. Intuitively, wecan see that this symmetrizes the random paths.

28

Let X+ be the option values obtained from all the V + simulations, and X− be the estimates obtained from all theV− simulations. Note that Var(X+) = Var(X−) (they have the same distribution). Then

Var(X+ +X−

2 ) =14Var(X+)+

14Var(X−)+

12Cov(X+,X−)

=12Var(X+)+

12Cov(X+,X−) (5.51)

which will be smaller than Var(X+) if Cov(X+,X−) is nonpositive. Warning: this is not always the case. For example,if the payoff is not a monotonic function of S, the results may actually be worse than crude Monte Carlo. For example,if the payoff is a capped call

payoff = min(K2,max(S−K1,0))

K2 > K1

then the antithetic method performs poorly.Note that this method can be used to estimate the mean. In the MC error estimator (5.13), compute the standard

deviation of the estimator as ω =√

Var(X++X−2 ).

However, if we want to estimate the distribution of option prices (i.e. a probability distribution), then we shouldnot average each V + and V−, since this changes the variance of the actual distribution.

If we want to know the actual variance of the distribution (and not just the mean), then to compute the variance ofthe distribution, we should just use the estimates V +, and compute the estimate of the variance in the usual way. Thisshould also be used if we want to plot a histogram of the distribution, or compute the Value at Risk.

5.5 Estimating the mean and varianceAn estimate of the mean x and variance s2

M of M numbers x1,x2, ...,xM is

s2M =

1M−1

M∑i=1

(xi − x)2

x =1M

M∑i=1

xi (5.52)

Alternatively, one can use

s2M =

1M−1

M∑i=1

x2i −

1M

(

M∑i=1

xi

)2

(5.53)

which has the advantage that the estimate of the mean and standard deviation can be computed in one loop.In order to avoid roundoff, the following method is suggested by Seydel (R. Seydel, Tools for Computational

Finance, Springer, 2002). Set

α1 = x1 ; β1 = 0 (5.54)

then compute recursively

αi = αi−1 +xi −αi−1

i

βi = βi−1 +(i−1)(xi−αi−1)2

i (5.55)

so that

x = αM

s2M =

βMM−1 (5.56)

29

5.6 Low Discrepancy SequencesIn a effort to get around the 1√

M , (M = number of samples) behaviour of Monte Carlo methods, quasi-Monte Carlomethods have been devised.

These techniques use a deterministic sequence of numbers (low discrepancy sequences). The idea here is that aMonte Carlo method does not fill the sample space very evenly (after all, its random). A low discrepancy sequencetends to sample the space in a orderly fashion. If d is the dimension of the space, then the worst case error bound foran LDS method is

Error = O(

(logM)d

M

)

(5.57)

where M is the number of samples used. Clearly, if d is small, then this error bound is (at least asymptotically) betterthan Monte Carlo.

LDS methods generate numbers on [0,1]. We cannot use the Box-Muller method in this case to produce normallydistributed numbers, since these numbers are deterministic. We have to invert the cumulative normal distribution inorder to get the numbers distributed with mean zero and standard deviation one on [−∞,+∞]. So, if F(x) is the inversecumulative normal distribution, then

xLDS = uniformly distributed on [0,1]

yLDS = F(xLDS) is N(0,1) . (5.58)

Another problem has to do with the fact that if we are stepping through time, i.e.

Sn+1 = Sn +Sn(r∆t +φσ√

∆t)φ = N(0,1) (5.59)

with, say, N steps in total, then we need to think of this as a problem in N dimensional space. In other words, the k− thtimestep is sampled from the k− th coordinate in this N dimensional space. We are trying to uniformly sample fromthis N dimensional space.

Let x be a vector of LDS numbers on [0,1], in N dimensional space

x =

x1x2|

xN

. (5.60)

So, an LDS algorithm would proceed as follows, for the j′th trial

• Generate x j (the j′th LDS number in an N dimensional space).

• Generate the normally distributed vector y j by inverting the cumulative normal distribution for each component

y j =

F(x j1)

F(x j2)

|F(x j

N)

(5.61)

• Generate a complete sample path k = 0, ...,N −1

Sk+1j = Sk

j +Skj(r∆t + y j

k+1σ√

∆t)(5.62)

• Compute the payoff at S = SNj

30

The option value is the average of these trials.There are a variety of LDS numbers: Halton, Sobol, Niederrieter, etc. Our tests seem to indicate that Sobol is the

best.Note that the worst case error bound for the error is given by equation (5.57). If we use a reasonable number of

timesteps, say 50−100, then, d = 50−100, which gives a very bad error bound. For d large, the numerator in equation(5.57) dominates. The denominator only dominates when

M ' ed (5.63)

which is a very large number of trials for d ' 100. Fortunately, at least for path-dependent options, we have found thatthings are not quite this bad, and LDS seems to work if the number of timesteps is less than 100−200. However, oncethe dimensionality gets above a few hundred, convergence seems to slow down.

5.7 Correlated Random NumbersIn many cases involving multiple assets, we would like to generate correlated, normally distributed random numbers.Suppose we have i = 1, ...,d assets, and each asset follows the simulated path

Sn+1i = Sn

i +Sni (r∆t +φn

i σi√

∆t)(5.64)

where φni is N(0,1) and

E(φni φn

j) = ρi j (5.65)

where ρi j is the correlation between asset i and asset j.Now, it is easy to generate a set of d uncorrelated N(0,1) variables. Call these ε1, ...,εd . So, how do we produce

correlated numbers? Let

[Ψ]i j = ρi j (5.66)

be the matrix of correlation coefficients. Assume that this matrix is SPD (if not, one of the random variables is a linearcombination of the others, hence this is a degenerate case). Assuming Ψ is SPD, we can Cholesky factor Ψ = LLt , sothat

ρi j = ∑k

LikLtk j (5.67)

Let φ be the vector of correlated normally distributed random numbers (i.e. what we want to get), and let ε be thevector of uncorrelated N(0,1) numbers (i.e. what we are given).

φ =

φ1φ2|

φd

; ε =

ε1ε2|

εd

(5.68)

So, given ε, we have

E(εiε j) = δi j

where

δi j = 0 ; if i 6= j= 1 ; if i = j .

31

since the εi are uncorrelated. Now, let

φi = ∑j

Li jε j (5.69)

which gives

φiφk = ∑j∑

lLi jLklεlε j

= ∑j∑

lLi jεlε jLt

lk . (5.70)

Now,

E(φiφk) = E[

∑j∑

lLi jεlε jLt

lk

]

= ∑j∑

lLi jE(εlε j)Lt

lk

= ∑j∑

lLi jδl jLt

lk

= ∑l

LilLtlk

= ρi j (5.71)

So, in order to generate correlated N(0,1) numbers:

• Factor the correlation matrix Ψ = LLt

• Generate uncorrelated N(0,1) numbers εi

• Correlated numbers φi are given from

φ = Lε

5.8 Integration of Stochastic Differential EquationsUp to now, we have been fairly slack about defining what we mean by convergence when we use forward Eulertimestepping (5.2) to integrate

dS = µS dt +σS dZ . (5.72)

The forward Euler algorithm is simply

Si+1 = Si +Si(µh+φi√h) (5.73)

where h = ∆t is the finite timestep. For a good overview of these methods, check out (“An algorithmic introduction tonumerical simulation of stochastic differential equations,” by D. Higham, SIAM Review vol. 43 (2001) 525-546). Thisarticle also has some good tips on solving SDEs using Matlab, in particular, taking full advantage of the vectorizationof Matlab. Note that eliminating as many for loops as possible (i.e. computing all the MC realizations for each timestepin a vector) can reduce computation time by orders of magnitude.

Before we start defining what we mean by convergence, let’s consider the following situation. Recall that

dZ = φ√

dt (5.74)

where φ is a random draw from a normal distribution with mean zero and variance one. Let’s imagine generating a setof Z values at discrete times ti, e.g. Z(ti) = Zi, by

Zi+1 = Zi +φ√

∆t . (5.75)

32

Now, these are all legitimate points along a Brownian motion path, since there is no timestepping error here, in view ofequation (3.53). So, this set of values Z0,Z1, ..., are valid points along a Brownian path. Now, recall that the exactsolution (for a given Brownian path) of equation (5.72) is given by equation (3.57)

S(T ) = S(0)exp[(µ− σ2

2 )t +σ(Z(T )−Z(0))] (5.76)

where T is the stopping time of the simulation.Now if we integrate equation (5.72) using forward Euler, with the discrete timesteps ∆t = ti+1 − ti, using the

realization of the Brownian path Z0,Z1, ...,, we will not get the exact solution (5.76). This is because even thoughthe Brownian path points are exact, time discretization errors are introduced in equation (5.73). So, how can wesystematically study convergence of algorithm (5.73)? We can simply take smaller timesteps. However, we want to dothis by filling in new Z values in the Brownian path, while keeping the old values (since these are perfectly legitimatevalues). Let S(T )h represent the forward Euler solution (5.73) for a fixed timestep h. Let S(T ) be the exact solution(5.76). As h → 0, we would expect S(T )h → S(T ), for a given path.

5.8.1 The Brownian Bridge

So, given a set of valid Zk, how do we refine this path, keeping the existing points along this path? In particular,suppose we have two points Zi,Zk, at (ti, tk), and we would like to generate a point Z j at t j, with ti < t j < tk. Howshould we pick Z j? What density function should we use when generating Z j, given that Zk is known?

Let x,y be two draws from a normal distribution with mean zero and variance one. Suppose we have the pointZ(ti) = Zi and we generate Z(t j) = Z j, Z(tk) = Zk along the Wiener path,

Z j = Zi + x√

t j − ti (5.77)Zk = Z j + y

√

tk − t j (5.78)Zk = Zi + x

√

t j − ti + y√

tk − t j . (5.79)

So, given (x,y), and Zi, we can generate Z j,Zk. Suppose on the other hand, we have Zi, and we generate Zk directlyusing

Zk = Zi + z√

tk − ti , (5.80)

where z is N(0,1). Then how do we generate Z j using equation (5.77)? Since we are now specifying that we know Zk,this means that our method for generating Z j is constrained. For example, given z, we must have that, from equations(5.79) and (5.80)

y =z√

tk − ti− x√t j − ti√tk − t j. (5.81)

Now the probability density of drawing the pair (x,y) given z, denoted by p(x,y|z) is

p(x,y|z) =p(x)p(y)

p(z) (5.82)

where p(..) is a standard normal distribution, and we have used the fact that successive increments of a Brownianprocess are uncorrelated.

From equation (5.81), we can write y = y(x,z), so that p(x,y|z) = p(x,y(x,z)|z)

p(x,y(x,z)|z) =p(x)p(y(x,z))

p(z)

=1√2π

exp[

−12(x2 + y2 − z2)

]

(5.83)

33

or (after some algebra, using equation (5.81))

p(x|z) =1√2π

exp[

−12(x−αz)2/β2

]

α =

√

t j − titk − ti

β =

√

tk − t jtk − ti

(5.84)

so that x is normally distributed with mean αz and variance β2. Since

z =Zk −Zi√

tk − ti(5.85)

we have that x has mean

E(x) =

√t j − titk − ti

(Zk −Zi) (5.86)

and variance

E[(x−E(x))2] =tk − t jtk − ti

(5.87)

Now, let

x =

√t j − titk − ti

(Zk −Zi)+φ√

tk − t jtk − ti

(5.88)

where φ is N(0,1). Clearly, x satisfies equations (5.86) and (5.88). Substituting equation (5.88) into (5.77) gives

Z j =

(

tk − t jtk − ti

)

Zi +

(

t j − titk − ti

)

Zk +φ

√

(t j − ti)(tk − t j)

(tk − ti)(5.89)

where φ is N(0,1). Equation (5.89) is known as the Brownian Bridge.Figure 5.1 shows different Brownian paths constructed for different timestep sizes. An initial coarse path is con-

structed, then the fine timestep path is constructed from the coarse path using a Brownian Bridge. By construction, thefinal timestep path will pass through the coarse timestep nodes.

Figure 5.2 shows the asset paths integrated using the forward Euler algorithm (5.73) fed with the Brownian pathsin Figure 5.1. In this case, note that the fine timestep path does not coincide with the coarse timestep nodes, due to thetimestepping error.

5.8.2 Strong and Weak ConvergenceSince we are dealing with a probabilistic situation here, it is not obvious how to define convergence. Given a numberof points along a Brownian path, we could imagine refining this path (using a Brownian Bridge), and then seeing ifthe solution converged to exact solution. For the model SDE (5.72), we could ask that

E[

|S(T )−Sh(T )|]

≤ Const. hγ (5.90)

where the expectation in equation (5.90) is over many Brownian paths, and h is the timestep size. Note that S(T ) isthe exact solution along a particular Brownian path; the same path used to compute Sh(T ). Criterion (5.90) is calledstrong convergence. A less strict criterion is

|E [S(T )]−E[

Sh(T )]

| ≤ Const. hγ (5.91)

34

Time

Brow

nian

Path

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Time

Brow

nian

Path

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

FIGURE 5.1: Effect of adding more points to a Brownian path using a Brownian bridge. Note that the small timesteppoints match the coarse timestep points. Left: each coarse timestep is divided into 16 substeps. Right: each coarsetimestep divided into 64 substeps.

Time

Asse

tPric

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 180

85

90

95

100

105

110

115

120

Time

Asse

tPric

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 180

85

90

95

100

105

110

115

120

FIGURE 5.2: Brownian paths shown in Figure 5.1 used to determine asset price paths using forward Euler timestepping(5.73). In this case, note that the asset paths for fine and coarse timestepping do not agree at the final time (due to thetimestepping error). Eventually, for small enough timesteps, the final asset value will converge to the exact solution tothe SDE. Left: each coarse timestep is divided into 16 substeps. Right: each coarse timestep divided into 64substeps.

35

T .25σ .4µ .06S0 100

TABLE 5.1: Data used in the convergence tests.

Timesteps Strong Error (5.90) Weak Error (5.91)72 .0269 .00194

144 .0190 .00174288 .0135 .00093576 .0095 .00047

TABLE 5.2: Convergence results, 100,000 samples used. Data in Table 5.1.

It can be shown that using forward Euler results in weak convergence with γ = 1, and strong convergence with γ = .5.Table 5.1 shows some test data used to integrate the SDE (5.72) using method (5.73). A series of Brownian paths

was constructed, beginning with a coarse timestep path. These paths were systematically refined using the BrownianBridge construction. Table 5.2 shows results where the strong and weak convergence errors are estimated as

Strong Error =1N

N∑i=1

[

|S(T )i −Sh(T )i|]

(5.92)

Weak Error = | 1N

N∑i=1

[S(T )i]−1N

N∑i=1

[

Sh(T )i]

| , (5.93)

where Sh(T )i is the solution obtained by forward Euler timestepping along the i′th Brownian path, and S(T )i is theexact solution along this same path, and N is the number of samples. Note that for equation (5.72), we have the exactsolution

limN→∞

1N

N∑i=1

[S(T )i] = S0eµT (5.94)

but we do not replace the approximate sampled value of the limit in equation (5.93) by the theoretical limit (5.94). Ifwe use enough Monte Carlo samples, we could replace the approximate expression

limN→∞

1N

N∑i=1

[S(T )i]

by S0eµT , but for normal parameters, the Monte Carlo sampling error is much larger than the timestepping error, so wewould have to use an enormous number of Monte Carlo samples. Estimating the weak error using equation (5.93) willmeasure the timestepping error, as opposed to the Monte Carlo sampling error. However, for normal parameters, evenusing equation (5.93) requires a large number of Monte Carlo samples in order to ensure that the error is dominatedby the timestepping error.

In Table 5.1, we can see that the ratio of the errors is about√

2 for the strong error, and about two for the weakerror. This is consistent with a convergence rate of γ = .5 for strong convergence, and γ = 1.0 for weak convergence.

5.9 Matlab and Monte Carlo SimulationA straightforward implementation of Monte Carlo timestepping for solving the SDE


in Matlab is shown in Algorithm (5.96). This code runs very slowly.

36

Slow.m

randn(’state’,100);T = 1.00; % expiry timesigma = 0.25; % volatilitymu = .10; % P measure driftS_init = 100; % initial valueN_sim = 10000; % number of simulationsN = 100; % number of timestepsdelt = T/N; % timestep

drift = mu*delt;sigma_sqrt_delt = sigma*sqrt(delt);S_new = zeros(N_sim,1);

for m=1:N_simS = S_init;for i=1:N % timestep loop

S = S + S*( drift + sigma_sqrt_delt*randn(1,1) );

S = max(0.0, S );% check to make sure that S_new cannot be < 0

end % timestep loopS_new(m,1) = S;end % simulation loop

n_bin = 200;hist(S_new, n_bin);

stndrd_dev = std(S_new);disp(sprintf(’standard deviation: %.5g\n’,stndrd_dev));

mean_S = mean(S_new);disp(sprintf(’mean: %.5g\n’,stndrd_dev));

(5.96)

Alternatively, we can use Matlab’s vectorization capabilities to interchange the timestep loop and the simulationloop. The innermost simulation loop can be replaced by vector statements, as shown in Algorithm (5.97). This runsmuch faster.

37

Fast.m

randn(’state’,100);%T = 1.00; % expiry timesigma = 0.25; % volatilitymu = .10; % P measure driftS_init = 100; % initial valueN_sim = 10000; % number of simulationsN = 100; % number of timestepsdelt = T/N; % timestep

drift = mu*delt;sigma_sqrt_delt = sigma*sqrt(delt);

S_old = zeros(N_sim,1);S_new = zeros(N_sim,1);

S_old(1:N_sim,1) = S_init;

for i=1:N % timestep loop

% now, for each timestep, generate info for% all simulations

S_new(:,1) = S_old(:,1) +...S_old(:,1).*( drift + sigma_sqrt_delt*randn(N_sim,1) );

S_new(:,1) = max(0.0, S_new(:,1) );% check to make sure that S_new cannot be < 0

S_old(:,1) = S_new(:,1);%% end of generation of all data for all simulations% for this timestep

end % timestep loop

n_bin = 200;hist(S_new, n_bin);

stndrd_dev = std(S_new);disp(sprintf(’standard deviation: %.5g\n’,stndrd_dev));

mean_S = mean(S_new);disp(sprintf(’mean: %.5g\n’,stndrd_dev));

(5.97)

6 The Binomial ModelWe have seen that a problem with the Monte Carlo method is that it is difficult to use for valuing American styleoptions. Recall that the holder of an American option can exercise the option at any time and receive the payoff. In

38

order to determine whether or not it is worthwhile to hold the option, we have to compare the value of continuing tohold the option (the continuation value) with the payoff. If the continuation value is greater than the payoff, then wehold; otherwise, we exercise.

At any point in time, the continuation value depends on what happens in the future. Clearly, if we simulate forwardin time, as in the Monte Carlo approach, we don’t know what happens in the future, and hence we don’t know howto act optimally. This is actually a dynamic programming problem. These sorts of problems are usually solved byproceeding from the end point backwards. We use the same idea here. We have to start from the terminal time andwork backwards.

Recall that we can determine the no-arbitrage value of an option by pretending we live in a risk-neutral world,where risky assets drift at r and are discounted at r. If we let X = logS, then the risk neutral process for X is (fromequation (3.55) )

dX = (r− σ2

2 )dt +σdZ . (6.1)

Now, we can construct a discrete approximation to this random walk using the lattice discussed in Section 3.5. In fact,all we have to do is let let α = r− σ2

2 , so that equation (6.1) is formally identical to equation (3.4). In order to ensurethat in the limit as ∆t → 0, we get the process (6.1), we require that the sizes of the random jumps are ∆X = σ

√∆t and

that the probabilities of up (p) and down (q) moves are

pr =12 [1+

ασ√

∆t]

=12 [1+

( rσ− σ

2

)√∆t]

qr =12 [1− α

σ√

∆t]

=12 [1−

( rσ− σ

2

)√∆t] , (6.2)

where we have denoted the risk neutral probabilities by pr and qr to distinguish them from the real probabilities p,q.Now, we will switch to a more common notation. If we are at node j, timestep n, we will denote this node location

by Xnj . Recall that X = logS, so that in terms of asset price, this is Sn

j = eXnj .

Now, consider that at node ( j,n), the asset can move up with probability pr and down with probability qr. In otherwords

Snj → Sn+1

j+1 ; with probability pr

Snj → Sn+1

j ; with probability qr

(6.3)

Now, since in Section 3.5 we showed that ∆X = σ√

∆t, so that (S = eX )

Sn+1j+1 = Sn

j eσ√

∆t

Sn+1j = Sn

j e−σ√

∆t (6.4)

or

Snj = S0

0e(2 j−n)σ√

∆t ; j = 0, ..,n (6.5)

So, the first step in the process is to construct a tree of stock price values, as shown on Figure 6.Associated with each stock price on the lattice is the option value V n

j . We first set the value of the option at T = N∆tto the payoff. For example, if we are valuing a put option, then

V Nj = max(K −SN

j ,0) ; j = 0, ...,N (6.6)

Then, we can use the risk neutral world idea to determine the no-arbitrage value of the option (it is the expected valuein the risk neutral world). We can do this by working backward through the lattice. The value today is the discountedexpected future value

39

S00

S11

S01

S22

S12

S02

FIGURE 6.1: Lattice of stock price values

European Lattice Algorithm

V nj = e−r∆t

(

prV n+1j+1 +qrV n+1

j

)

n = N −1, ...,0j = 0, ...,n (6.7)

Rolling back through the tree, we obtain the value at S00 today, which is V 0

0 .If the option is an American put, we can determine if it is optimal to hold or exercise, since we know the continu-

ation value. In this case the rollback (6.7) becomes

American Lattice Algorithm

(V nj )c = e−r∆t

(

prV n+1j+1 +qrV n+1

j

)

V nj = max

(

(V nj )c,max(K −Sn

j ,0))

n = N −1, ...,0j = 0, ...,n (6.8)

which is illustrated in Figure 6.The binomial lattice method has the following advantages• It is very easy to code for simple cases.

• It is easy to explain to managers.

• American options are easy to handle.However, the binomial lattice method has the following disadvantages• Except for simple cases, coding becomes complex. For example, if we want to handle simple barrier options,

things become nightmarish.

• This method is algebraically identical to an explicit finite difference solution of the Black-Scholes equation.Consequently, convergence is at an O(∆t) rate.

• The probabilities pr,qr are not real probabilities, they are simply the coefficients in a particular discretization ofa PDE. Regarding them as probabilities leads to much fuzzy thinking, and complex wrong-headed arguments.

If we are going to solve the Black-Scholes PDE, we might as well do it right, and not fool around with lattices.

40

Vjn

Vjn + + 1 1

Vjn+1

p

q

FIGURE 6.2: Backward recursion step.

6.1 A No-arbitrage LatticeWe can also derive the lattice method directly from the discrete lattice model in Section 3.5. Suppose we assume that

dS = µSdt +σSdZ (6.9)

and letting X = logS, we have that

dX = (µ− σ2

2 )dt +σdZ (6.10)

so that α = µ− σ22 in equation (3.19). Now, let’s consider the usual hedging portfolio at t = n∆t,S = Sn

j ,

Pnj = V n

j − (αh)Snj , (6.11)

where V nj is the value of the option at t = n∆t,S = Sn

j . At t = (n+1)∆t,

Snj → Sn+1

j+1 ; with probability p

Snj → Sn+1

j ; with probability q

Sn+1j+1 = Sn

j eσ√

∆t

Sn+1j = Sn

j e−σ√

∆t

so that the value of the hedging portfolio at t = n+1 is

Pn+1j+1 = V n+1

j+1 − (αh)Sn+1j+1 ; with probability p (6.12)

Pn+1j = V n+1

j − (αh)Sn+1j ; with probability q . (6.13)

Now, as in Section 3.3, we can determine (αh) so that the value of the hedging portfolio is independent of p,q. We dothis by requiring that

Pn+1j+1 = Pn+1

j (6.14)

so that

V n+1j+1 − (αh)Sn+1

j+1 = V n+1j − (αh)Sn+1

j

41

which gives

(αh) =V n+1

j+1 −V n+1j

Sn+1j+1 −Sn+1

j. (6.15)

Since this portfolio is risk free, it must earn the risk free rate of return, so that

Pnj = e−r∆t Pn+1

j+1

= e−r∆t Pn+1j . (6.16)

Now, substitute for Pnj from equation (6.11), with Pn+1

j+1 from equation (6.13), and (αh) from equation (6.15) gives

V nj = e−r∆t

(

pr∗V n+1j+1 +qr∗V n+1

j

)

pr∗ =er∆t − e−σ

√∆t

eσ√

∆t − e−σ√

∆t

qr∗ = 1− pr∗ . (6.17)

Note that pr∗,qr∗ do not depend on the real drift rate µ, which is expected. If we expand pr∗,qr∗ in a Taylor Series,and compare with the pr,qr in equations (6.2), we can show that

pr∗ = pr +O((∆t)3/2)

qr∗ = qr +O((∆t)3/2) . (6.18)

After a bit more work, one can show that the value of the option at t = 0, V 00 using either pr∗,qr∗ or pr,qr is the same to

O(∆t), which is not surprising, since these methods can both be regarded as an explicit finite difference approximationto the Black-Scholes equation, having truncation error O(∆t). The definition pr∗,qr∗ is the common definition infinance books, since the tree has no-arbitrage.

What is the meaning of a no-arbitrage tree? If we are sitting at node Snj , and assuming that there are only two

possible future states

Snj → Sn+1

j+1 ; with probability p

Snj → Sn+1

j ; with probability q

then using (αh) from equation (6.15) guarantees that the hedging portfolio has the same value in both future states.But let’s be a bit more sensible here. Suppose we are hedging a portfolio of RIM stock. Let ∆t = one day. Suppose

the price of RIM stocks is $10 today. Do we actually believe that tomorrow there are only two possible prices for Rimstock

Sup = 10eσ√

∆t

Sdown = 10e−σ√

∆t ?(6.19)

Of course not. This is obviously a highly simplified model. The fact that there is no-arbitrage in the context of thesimplified model does not really have a lot of relevance to the real-world situation. The best that can be said is that ifthe Black-Scholes model was perfect, then we have that the portfolio hedging ratios computed using either pr,qr orpr∗,qr∗ are both correct to O(∆t).

7 More on Ito’s LemmaIn Section 3.6.1, we mysteriously made the infamous comment

42

...it can be shown that dZ2 → dt as dt → 0

In this Section, we will give some justification this remark. For a lot more details here, we refer the reader toStochastic Differential Equations, by Bernt Oksendal, Springer, 1998.

We have to go back here, and decide what the statement

dX = αdt + cdZ (7.1)

really means. The only sensible interpretation of this is

X(t)−X(0) =Z t

0α(X(s),s)ds+

Z t

0c(X(s),s)dZ(s) . (7.2)

where we can interpret the integrals as the limit, as ∆t → 0 of a discrete sum. For example,

Z t

0c(X(s),s)dZ(s) = lim

∆t→0

j=N−1

∑j=0

c j∆Z j

c j = c(X(Z j), t j)

Z j = Z(t j)

∆Z j = Z(t j+1)−Z(t j)

∆t = t j+1 − t j

N = t/(∆t) (7.3)

In particular, in order to derive Ito’s Lemma, we have to decide whatZ t

0c(X(s),s) dZ(s)2 (7.4)

means. Replace the integral by a sum,

Z t

0c(X(s),s) dZ(s)2 = lim

∆t→0

j=N−1

∑j=0

c(X j, t j)∆Z2j . (7.5)

Note that we have evaluated the integral using the left hand end point of each subinterval (the no peeking into thefuture principle).

From now on, we will use the notation

∑j

≡j=N−1

∑j=0

. (7.6)

Now, we claim thatZ t

0c(X(s),s)dZ2(s) =

Z t

0c(X(s),s)ds (7.7)

or

lim∆t→0

[

∑j

c j∆Z2j

]

= lim∆t→0∑j

c j∆t (7.8)

which is what we mean by equation (7.7). i.e. we can say that dZ2 → dt as dt → 0.Now, let’s consider a finite ∆t, and consider the expression

E

(

∑j

c j∆Z2j −∑

jc j∆t

)2

(7.9)

43

If equation (7.9) tends to zero as ∆t → 0, then we can say that (in the mean square limit)

lim∆t→0

[

∑j

c j∆Z2j

]

= lim∆t→0∑j

c j∆t

=

Z t

0c(X(s),s) ds (7.10)

so that in this senseZ t

0c(X ,s) dZ2 =

Z t

0c(X ,s) ds (7.11)

and hence we can say that

dZ2 → dt (7.12)

with probability one as ∆t → 0.Now, expanding equation (7.9) gives

E

(

∑j

c j∆Z2j −∑

jc j∆t

)2

= ∑i j

E[

c j(∆Z2j −∆t)ci(∆Z2

i −∆t)]

. (7.13)

Now, we assume that

E[

c j(∆Z2j −∆t)ci(∆Z2

i −∆t)]

= δi jE[

c2i (∆Z2

i −∆t)2] (7.14)

which means that c j(∆Z2j −∆t) and ci(∆Z2

i −∆t) are independent if i 6= j. This follows since

• The increments of Brownian motion are uncorrelated, i.e. Cov [∆Zi ∆Z j] = 0, i 6= j, which means that Cov[

∆Z2i ∆Z2

j

]

=

0, or E[

(∆Z2j −∆t)(∆Z2

i −∆t)]

= 0, i 6= j.

• ci = c(ti,X(Zi)), and ∆Zi are independent.

It also follows from the above properties that

E[c2j(∆Z2

j −∆t)2] = E[c2j ] E[(∆Z2

j −∆t)2] (7.15)

since c j and (∆Z2j −∆t) are independent.

Using equations (7.14-7.15), then equation (7.13) becomes

∑i j

E[

c j(∆Z2j −∆t) ci(∆Z2

i −∆t)]

= ∑i

E[c2i ] E

[

(∆Z2i −∆t)2] . (7.16)

Now,

∑i

E[c2i ] E

[

(∆Z2i −∆t)2] = ∑

iE[c2

i ](

E[

∆Z4i]

−2∆tE[

∆Z2i]

+(∆t)2) . (7.17)

Recall that (∆Z)2 is N(0,∆t) ( normally distributed with mean zero and variance ∆t) so that

E[

(∆Zi)2] = ∆t

E[

(∆Zi)4] = 3(∆t)2 (7.18)


E[

∆Z4i]

−2∆tE[

∆Z2i]

+(∆t)2 = 2(∆t)2 (7.19)

44

and

∑i

E[c2i ] E

[

(∆Z2i −∆t)2] = 2∑

iE[c2

i ](∆t)2

= 2∆t(

∑i

E[c2i ]∆t

)

= O(∆t) (7.20)

so that we have

E

(

∑c j∆Z2j −∑

jc j∆t

)2

= O(∆t) (7.21)

or

lim∆t→0

E[

(

∑c j∆Z2j −

Z t

0c(s,X(s))ds

)2]

= 0 (7.22)

so that in this sense we can write

dZ2 → dt ; dt → 0 . (7.23)

8 Derivative Contracts on non-traded Assets and Real OptionsThe hedging arguments used in previous sections use the underlying asset to construct a hedging portfolio. What if theunderlying asset cannot be bought and sold, or is non-storable? If the underlying variable is an interest rate, we can’tstore this. Or if the underlying asset is bandwidth, we can’t store this either. However, we can get around this usingthe following approach.

8.1 Derivative ContractsLet the underlying variable follow

dS = a(S, t)dt +b(S, t)dZ, (8.1)

and let F = F(S, t), so that from Ito’s Lemma

dF =

[

aFS +b2

2 FSS +Ft

]

dt +bFSdZ, (8.2)

or in shorter form

dF = µdt +σ∗dZ

µ = aFS +b2

2 FSS +Ft

σ∗ = bFS . (8.3)

Now, instead of hedging with the underlying asset, we will hedge one contract with another. Suppose we have twocontracts F1,F2 (they could have different maturities for example). Then

dF1 = µ1dt +σ∗1dZ

dF2 = µ2dt +σ∗2dZ

µi = a(Fi)S +b2

2 (Fi)SS +(Fi)t

σ∗i = b(Fi)S ; i = 1,2 . (8.4)

45

Consider the portfolio Π

Π = n1F1 +n2F2 (8.5)

so that

dΠ = n1 dF1 +n2 dF2

= n1(µ1 dt +σ∗1 dZ)+n2(µ2 dt +σ∗

2 dZ)

= (n1µ1 +n2µ2) dt +(n1σ∗1 +n2σ∗

2) dZ . (8.6)

Now, to eliminate risk, choose

(n1σ∗1 +n2σ∗

2) = 0 (8.7)

which means that Π is riskless, hence

dΠ = rΠ dt , (8.8)

so that, using equations (8.6-8.8), we obtain

(n1µ1 +n2µ2) = r(n1F1 +n2F2). (8.9)

Putting together equations (8.7) and (8.9) gives[

σ∗1 σ∗

2µ1 − rF1 µ2 − rF2

][

n1n2

]

=

[

00

]

. (8.10)

Now, equation (8.10) only has a nonzero solution if the two rows of equation (8.10) are linearly dependent. Inother words, there must be a λS = λS(S, t) (independent of the type of contract) such that

µ1 − rF1 = λSσ∗1

µ2 − rF2 = λSσ∗2 . (8.11)

Dropping the subscripts, we obtain

µ− rFσ∗ = λS (8.12)

Substituting µ,σ∗ from equations (8.3) into equation (8.12) gives

Ft +b2

2 FSS +(a−λSb)FS − rF = 0 . (8.13)

Equation (8.13) is the PDE satisfied by a derivative contract on any asset S. Note that it does not matter if we cannottrade S.

Suppose that F2 = S is a traded asset. Then we can hedge with S, and from equation (8.11) we have

µ2 − rS = λSσ∗2 (8.14)

and from equations (8.1) and (8.3) we have

σ∗2 = b

µ2 = a (8.15)

and so, using equations (8.11) and (8.15) , we have that

λS =a− rS

b (8.16)

46

and equation (8.13) reduces to

Ft +b2

2 FSS + rSFS− rF = 0 . (8.17)

Suppose

µ = Fµ′

σ∗ = Fσ′ (8.18)

so that we can write

dF = Fµ′dt +Fσ′dZ (8.19)

then using equation (8.18) in equation (8.12) gives

µ′ = r +λSσ′ (8.20)

which has the convenient interpretation that the expected return on holding (not hedging) the derivative contract F isthe risk-free rate plus extra compensation due to the riskiness of holding F . The extra return is λSσ′, where λS is themarket price of risk of S (which should be the same for all contracts depending on S) and σ′ is the volatility of F . Notethat the volatility and drift of F are not the volatility and drift of the underlying asset S.

If we believe that the Capital Asset Pricing Model holds, then a simple minded idea is to estimate

λS = ρSMλM (8.21)

where λM is the price of risk of the market portfolio, and ρSM is the correlation of returns between S and the returns ofthe market portfolio.

Another idea is the following. Suppose we can find some companies whose main source of business is based on S.Let qi be the price of this companies stock at t = ti. The return of the stock over ti − ti−1 is

Ri =qi −qi−1

qi−1.

Let RMi be the return of the market portfolio (i.e. a broad index) over the same period. We compute β as the best fit

linear regression to

Ri = α+βRMi

which means that

β =Cov(R,RM)

Var(RM). (8.22)

Now, from CAPM we have that

E(R) = r +β[

E(RM)− r]

(8.23)

where E(...) is the expectation operator. We would like to determine the unlevered β, denoted by βu, which is the βfor an investment made using equity only. In other words, if the firm we used to compute the β above has significantdebt, its riskiness with respect to S is amplified. The unlevered β can be computed by

βu =E

E +(1−Tc)Dβ (8.24)

where

D = long term debtE = Total market capitalizationTc = Corporate Tax rate . (8.25)

47

So, now the expected return from a pure equity investment based on S is

E(Ru) = r+ βu [E(RM)− r]

. (8.26)

If we assume that F in equation (8.19) is the company stock, then

µ′ = E(Ru)

= r +βu [E(RM)− r]

(8.27)

But equation (8.20) says that

µ′ = r +λSσ′ . (8.28)


λSσ′ = βu [E(RM)− r]

. (8.29)

Recall from equations (8.3) and (8.18) that

σ∗ = Fσ′

σ∗ = bFS ,

or

σ′ =bFSF . (8.30)


λS =βu [E(RM)− r

]

bFSF

. (8.31)

In principle, we can now compute λS, since

• The unleveraged βu is computed as described above. This can be done using market data for a specific firm,whose main business is based on S, and the firms balance sheet.

• b(S, t)/S is the volatility rate of S (equation (8.1)).

• [E(RM)− r] can be determined from historical data. For example, the expected return of the market index abovethe risk free rate is about 6% for the past 50 years of Canadian data.

• The risk free rate r is simply the current T-bill rate.

• FS can be estimated by computing a linear regression of the stock price of a firm which invests in S, and S. Now,this may have to be unlevered, to reduce the effect of debt. If we are going to now value the real option for aspecific firm, we will have to make some assumption about how the firm will finance a new investment. If it isgoing to use pure equity, then we are done. If it is a mixture of debt and equity, we should relever the value ofFS. At this point, we need to talk to a Finance Professor to get this right.

8.2 A Forward ContractA forward contract is a special type of derivative contract. The holder of a forward contract agrees to buy or sell theunderlying asset at some delivery price K in the future. K is determined so that the cost of entering into the forwardcontract is zero at its inception.

The payoff of a (long) forward contract expiring at t = T is then

V (S,τ = 0) = S(T )−K . (8.32)

48

Note that there is no optionality in a forward contract.The value of a forward contract is a contingent claim. and its value is given by equation (8.13)

Vt +b2

2 VSS +(a−λSb)VS − rV = 0 . (8.33)

Now we can also use a simple no-arbitrage argument to express the value of a forward contract in terms of theoriginal delivery price K, (which is set at the inception of the contract) and the current forward price f (S,τ). Supposewe are long a forward contract with delivery price K. At some time t > 0, (τ < T ), the forward price is no longer K.Suppose the forward price is f (S,τ), then the payoff of a long forward contract, entered into at (τ) is

Payoff = S(T )− f (S(τ),τ) .

Suppose we are long the forward contract struck at t = 0 with delivery price K. At some time t > 0, we hedge thiscontract by going short a forward with the current delivery price f (S,τ) (which costs us nothing to enter into). Thepayoff of this portfolio is

S−K− (S− f ) = f −K (8.34)

Since f ,K are known with certainty at (S,τ), then the value of this portfolio today is

( f −K)e−rτ . (8.35)

But if we hold a forward contract today, we can always construct the above hedge at no cost. Therefore,

V (S,τ) = ( f −K)e−rτ . (8.36)

Substituting equation (8.36) into equation (8.33), and noting that K is a constant, gives us the following PDE forthe forward price (the delivery price which makes the forward contract worth zero at inception)

fτ =b2

2 fSS +(a−λSb) fS (8.37)

with terminal condition

f (S,τ = 0) = S (8.38)

which can be interpreted as the fact that the forward price must equal the spot price at t = T .Suppose we can estimate a,b in equation (8.37), and there are a set of forward prices available. We can then

estimate λS by solving equation (8.37) and adjusting λS until we obtain a good fit for the observed forward prices.

8.2.1 Convenience YieldWe can also write equation (8.37) as

ft +b2

2 fSS +(r−δ)S fS = 0 (8.39)

where δ is defined as

δ = r− a−λSbS . (8.40)

In this case, we can interpret δ as the convenience yield for holding the asset. For example, there is a convenience toholding supplies of natural gas in reserve.

9 Discrete HedgingIn practice, we cannot hedge at infinitesimal time intervals. In fact, we would like to hedge as infrequently as possible,since in real life, there are transaction costs (something which is ignored in the basic Black-Scholes equation, butwhich can be taken into account and results in a nonlinear PDE).

49

9.1 Delta HedgingRecall that the basic derivation of the Black-Scholes equation used a hedging portfolio where we hold VS shares. Infinance, VS is called the option delta, hence this strategy is called delta hedging.

As an example, consider the hedging portfolio P(t) which is composed of

• A short position in an option −V(t).

• Long α(t)hS(t) shares

• An amount in a risk-free bank account B(t).

Initially, we have

P(0) = 0 = −V(0)+α(0)hS(0)+B(0)

α = VS

B(0) = V (0)−α(0)hS(0)

The hedge is rebalanced at discrete times ti. Defining

αhi = VS(Si, ti)

Vi = V (Si, ti)

then, we have to update the hedge by purchasing αi −αi−1 shares at t = ti, so that updating our share position requires

S(ti)(αhi −αh

i−1)

in cash, which we borrow from the bank if (αhi −αh

i−1) > 0. If (αhi −αh

i−1) < 0, then we sell some shares and depositthe proceeds in the bank account. If ∆t = ti − ti−1, then the bank account balance is updated by

Bi = er∆tBi−1 −Si(αhi −αh

i−1)

At the instant after the rebalancing time ti, the value of the portfolio is

P(ti) = −V(ti)+α(ti)hS(ti)+B(ti)

Since we are hedging at discrete time intervals, the hedge is no longer risk free (it is risk free only in the limit as thehedging interval goes to zero). We can determine the distribution of profit and loss ( P& L) by carrying out a MonteCarlo simulation. Suppose we have precomputed the values of VS for all the likely (S, t) values. Then, we simulate aseries of random paths. For each random path, we determine the discounted relative hedging error

error =e−rT ∗P(T ∗)V (S0, t = 0)

(9.1)

After computing many sample paths, we can plot a histogram of relative hedging error, i.e. fraction of Monte Carlotrials giving a hedging error between E and E + ∆E. We can compute the variance of this distribution, and also thevalue at risk (VAR). VAR is the worst case loss with a given probability. For example, a typical VAR number reportedis the maximum loss that would occur 95% of the time. In other words, find the value of E along the x-axis such thatthe area under the histogram plot to the right of this point is .95× the total area.

As an example, consider the case of an American put option, T = .25, σ = .3, r = .06,K = S0 = 100. At t = 0,S0 = 100. Since there are discrete hedging errors, the results in this case will depend on the stock drift rate, which weset at µ = .08. The initial value of the American put, obtained by solving the Black-Scholes linear complementarityproblem, is $5.34. Figure 9.1 shows the results for no hedging, and hedging once a month. The x-axis in these plotsshows the relative P & L of this portfolio (i.e. P & L divided by the Black-Scholes price), and the y-axis shows therelative frequency.

Relative P& L =Actual P& L

Black-Scholes price (9.2)

50

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency

Relative hedge error

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency


FIGURE 9.1: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. Left: no hedging, right:rebalance hedge once a month. American put, T = .25, σ = .3, r = .06, µ = .08,K = S0 = 100. The relative P&L iscomputed by dividing the actual P&L by the Black-Scholes price.

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency


0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency


FIGURE 9.2: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. Left: rebalance hedge once aweek, right: rebalance hedge daily. American put, T = .25, σ = .3, r = .06, µ = .08,K = S0 = 100. The relative P&L iscomputed by dividing the actual P&L by the Black-Scholes price.

Note that the no-hedging strategy actually has a high probability of ending up with a profit (from the option writer’spoint of view) since the drift rate of the stock is positive. In this case, the hedger does nothing, but simply pockets theoption premium. Note the sudden jump in the relative frequency at relative P&L = 1. This is because the maximumthe option writer stands to gain is the option premium, which we assume is the Black-Scholes value. The writer makesthis premium for any path which ends up S > K, which is many paths, hence the sudden jump in probability. However,there is significant probability of a loss as well. Figure 9.1 also shows the relative frequency of the P&L of hedgingonce a month (only three times during the life of the option).

In fact, there is a history of Ponzi-like hedge funds which simply write put options with essentially no hedging.In this case, these funds will perform very well for several years, since markets tend to drift up on average. However,then a sudden market drop occurs, and they will blow up. Blowing up is a technical term for losing all your capitaland being forced to get a real job. However, usually the owners of these hedge funds walk away with large bonuses,and the shareholders take all the losses.

Figure 9.2 shows the results for rebalancing the hedge once a week, and daily. We can see clearly here that themean is zero, and variance is getting smaller as the hedging interval is reduced. In fact, one can show that the standarddeviation of the hedge error should be proportional to

√∆t where ∆t is the hedge rebalance frequency.

51

9.2 Gamma HedgingIn an attempt to account for some the errors in delta hedging at finite hedging intervals, we can try to use secondderivative information. The second derivative of an option value VSS is called the option gamma, hence this strategy istermed delta-gamma hedging.

A gamma hedge consists of• A short option position −V (t).

• Long αhS(t) shares

• Long β another derivative security I.

• An amount in a risk-free bank account B(t).Now, recall that we consider αh,β to be constant over the hedging interval (no peeking into the future), so we can

regard these as constants (for the duration of the hedging interval).The hedge portfolio P(t) is then

P(t) = −V +αhS+βI +B(t)

Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval,then we require that

∂P∂S = −∂V

∂S +αh +β∂I∂S = 0

∂2P∂S2 = −∂2V

∂S2 +β∂2I∂S2 = 0 (9.3)

Note that

• If β = 0, then we get back the usual delta hedge.

• In order for the gamma hedge to work, we need an instrument which has some gamma (the asset S has secondderivative zero). Hence, traders often speak of being long (positive) or short (negative) gamma, and try tobuy/sell things to get gamma neutral.

So, at t = 0 we have

P(0) = 0 ⇒ B(0) = V (0)−αh0S0 −β0I0

The amounts αhi ,βi are determined by requiring that equation (9.3) hold

−(VS)i +αhi +βi(IS)i = 0

−(VSS)i +βi(ISS)i = 0 (9.4)

The bank account balance is then updated at each hedging time ti by

Bi = er∆t Bi−1 −Si(αhi −αh

i−1)− Ii(βi −βi−1)

We will consider the same example as we used in the delta hedge example. For an additional instrument, we willuse a European put option written on the same underlying with the same strike price and a maturity of T=.5 years.

Figure 9.3 shows the results of gamma hedging, along with a comparison on delta hedging. In principle, gammahedging produces a smaller variance with less frequent hedging. However, we are exposed to more model error in thiscase, since we need to be able to compute the second derivative of the theoretical price.

52

0

5

10

15

20

25

30

35

40

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency


0

5

10

15

20

25

30

35

40

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

Rela

tive

frequ

ency


FIGURE 9.3: Relative frequency (y-axis) versus relative P&L of gamma hedging strategies. Left: rebalance hedge oncea week, right: rebalance hedge daily. Dotted lines show the delta hedge for comparison. American put, T = .25, σ = .3,r = .06, µ = .08,K = 100,S0 = 100. Secondary instrument: European put option, same strike, T = .5 years. The relativeP&L is computed by dividing the actual P&L by the Black-Scholes price.

9.3 Vega HedgingThe most important parameter in the option pricing is the volatility. What if we are not sure about the value of thevolatility? It is possible to assume that the volatility itself is stochastic, i.e.

dS = µSdt +√

vSdZ1

dv = κ(θ− v)dt +σv√

vdZ2 (9.5)

where µ is the expected growth rate of the stock price,√

v is its instantaneous volatility, κ is a parameter controllinghow fast v reverts to its mean level of θ, σv is the “volatility of volatility” parameter, and Z1,Z2 are Wiener processeswith correlation parameter ρ.

If we use the model in equation (9.5), the this will result in a two factor PDE to solve for the option price and thehedging parameters. Since there are two sources of risk (dZ1,dZ2), we will need to hedge with the underlying assetand another option (Heston, A closed form solution for options with stochastic volatility with applications to bond andcurrency options, Rev. Fin. Studies 6 (1993) 327-343).

Another possibility is to assume that the volatility is uncertain, and to assume that

σmin ≤ σ ≤ σmax,

and to hedge based on a worst case (from the hedger’s point of view). This results in an uncertain volatility model(Avellaneda, Levy, Paris, Pricing and Hedging Derivative Securities in Markets with Uncertain Volatilities, Appl.Math. Fin. 2 (1995) 77-88). This is great if you can get someone to buy this option at this price, because the hedger isalways guaranteed to end up with a non-negative balance in the hedging portfolio. But you may not be able to sell atthis price, since the option price is expensive (after all, the price you get has to cover the worst case scenario).

An alternative, much simpler, approach (and therefore popular in industry), is to construct a vega hedge. Weassume that we know the volatility, and price the option in the usual way. Then, as with a gamma hedge, we constructa portfolio

• A short option position −V (t).

• Long αhS(t) shares

• Long β another derivative security I.

• An amount in a risk-free bank account B(t).The hedge portfolio P(t) is then

P(t) = −V +αhS+βI +B(t)

53

Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval,then we require that

∂P∂S = −∂V

∂S +αh +β∂I∂S = 0

∂P∂σ

= −∂V∂σ

+β∂I∂σ

= 0 (9.6)

Note that if we assume that σ is constant when pricing the option, yet do not assume σ is constant when we hedge,this is somewhat inconsistent. Nevertheless, we can determine the derivatives in equation (9.6) numerically (solve thepricing equation for several different values of σ, and then finite difference the solutions).

In practice, we would sell the option priced using our best estimate of σ (today). This is usually based on lookingat the prices of traded options, and then backing out the volatility which gives back today’s traded option price (thisis the implied volatility). Then as time goes on, the implied volatility will likely change. We use the current impliedvolatility to determine the current hedge parameters in equation (9.6). Since this implied volatility has likely changedsince we last rebalanced the hedge, there is some error in the hedge. However, taking into account the change in thehedge portfolio through equations (9.6) should make up for this error. This procedure is called delta-vega hedging.

In fact, even if the underlying process is a stochastic volatility, the vega hedge computed using a constant volatilitymodel works surprisingly well (Hull and White, The pricing of options on assets with stochastic volatilities, J. ofFinance, 42 (1987) 281-300).

10 Jump DiffusionRecall that if

dS = µSdt +σS dZ (10.1)

then from Ito’s Lemma we have

d[logS] = [µ− σ2

2 ] dt +σ dZ. (10.2)

Now, suppose that we observe asset prices at discrete times ti, i.e. S(ti) = Si, with ∆t = ti+1 − ti. Then from equation(10.2) we have

logSi+1 − logSi = log(Si+1Si

)

' [µ− σ2

2 ] ∆t +σφ√

∆t (10.3)

where φ is N(0,1). Now, if ∆t is sufficiently small, then ∆t is much smaller than√

∆t, so that equation (10.3) can beapproximated by

log(Si+1 −Si +Si

Si) = log(1+

Si+1 −SiSi

)

' σφ√

∆t. (10.4)

Define the return Ri in the period ti+1 − ti as

Ri =Si+1−Si

Si(10.5)


log(1+Ri) ' Ri = σφ√

∆t.

54

−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.60

5

10

15

20

FIGURE 10.1: Probability density functions for the S&P 500 monthly returns 1982− 2002, scaled to zero mean andunit standard deviation and the standardized Normal distribution.

Consequently, a plot of the discretely observed returns of S should be normally distributed, if the assumption (10.1)is true. In Figure 10.1 we can see a histogram of monthly returns from the S&P500 for the period 1982−2002. Thehistogram has been scaled to zero mean and unit standard deviation. A standard normal distribution is also shown.Note that for real data, there is a higher peak, and fatter tails than the normal distribution. This means that there ishigher probability of zero return, or a large gain or loss compared to a normal distribution.

As ∆t → 0, Geometric Brownian Motion (equation (10.1)) assumes that the probability of a large return also tendsto zero. The amplitude of the return is proportional to

√∆t, so that the tails of the distribution become unimportant.

But, in real life, we can sometimes see very large returns (positive or negative) in small time increments. Ittherefore appears that Geometric Brownian Motion (GBM) is missing something important.

10.1 The Poisson ProcessConsider a process where most of the time nothing happens (contrast this with Brownian motion, where some changesoccur at any time scale we look at), but on rare occasions, a jump occurs. The jump size does not depend on the timeinterval, but the probability of the jump occurring does depend on the interval.

More formally, consider the process dq where, in the interval [t, t +dt],

dq = 1 ; with probability λdt= 0 ; with probability 1−λdt. (10.6)

Note, once again, that size of the Poisson outcome does not depend on dt. Also, the probability of a jump occurringin [t, t + dt] goes to zero as dt → 0, in contrast to Brownian motion, where some movement always takes place (theprobability of movement is constant as dt → 0), but the size of the movement tends to zero as dt → 0. For futurereference, note that

E[dq] = λ dt ·1+(1−λ dt) ·0= λ dt (10.7)

and

Var(dq) = E[(dq−E[dq])2]

= E[(dq−λ dt)2]

= (1−λ dt)2 ·λ dt +(0−λ dt)2 · (1−λ dt)= λ dt +O((dt)2) . (10.8)

55

0 0.05 0.1 0.15 0.2 0.2540

60

80

100

120

140

160

180

Time (years)

Ass

et P

rice

FIGURE 10.2: Some realizations of a jump diffusion process which follows equation (10.10).

Now, suppose we assume that, along with the usual GBM, occasionally the asset jumps, i.e. S → JS, where J isthe size of a (proportional) jump. We will restrict J to be non-negative.

Suppose a jump occurs in [t, t +dt], with probability λdt. Let’s write this jump process as an SDE, i.e.

[dS] jump = (J −1)S dq

since, if a jump occurs

Sa f ter jump = Sbe f ore jump +[dS] jump

= Sbe f ore jump +(J−1)Sbe f ore jump

= JSbe f ore jump (10.9)

which is what we want to model. So, if we have a combination of GBM and a rare jump event, then

dS = µS dt +σS dZ +(J−1)S dq (10.10)

Assume that the jump size has some known probability density g(J), i.e. given that a jump occurs, then the probabilityof a jump in [J,J +dJ] is g(J) dJ, and

Z +∞

−∞g(J) dJ =

Z ∞

0g(J) dJ = 1 (10.11)

since we assume that g(J) = 0 if J < 0. For future reference, if f = f (J), then the expected value of f is

E[ f ] =Z ∞

0f (J)g(J) dJ . (10.12)

The process (10.10) is basically geometric Brownian motion (a continuous process) with rare discontinuous jumps.Some example realizations of jump diffusion paths are shown in Figure 10.2.

Figure 10.3 shows the price followed by a listed drug company. Note the extreme price changes over very smallperiods of time.

10.2 The Jump Diffusion Pricing EquationNow, form the usual hedging portfolio

P = V −αS . (10.13)

56

FIGURE 10.3: Actual price of a drug company stock. Compare with simulation of a jump diffusion in Figure 10.2.

Now, consider

[dP]total = [dP]Brownian +[dP] jump (10.14)

where, from Ito’s Lemma

[dP]Brownian = [Vt +σ2S2

2 VSS]dt +[VS−αS](µS dt +σS dZ) (10.15)

and, noting that the jump is of finite size,

[dP] jump = [V (JS, t)−V(S, t)] dq−α(J−1)S dq . (10.16)

If we hedge the Brownian motion risk, by setting α = VS, then equations (10.14-10.16) give us

dP = [Vt +σ2S2

2 VSS]dt +[V(JS, t)−V(S, t)]dq−VS(J−1)S dq . (10.17)

So, we still have a random component (dq) which we have not hedged away. Let’s take the expected value of thischange in the portfolio, e.g.

E(dP) = [Vt +σ2S2

2 VSS]dt +E[V(JS, t)−V(S, t)]E[dq]−VSSE[J−1]E[dq] (10.18)

where we have assumed that probability of the jump and the probability of the size of the jump are independent.Defining E(J −1) = κ, then we have that equation (10.18) becomes

E(dP) = [Vt +σ2S2

2 VSS]dt +E[V(JS, t)−V(S, t)]λ dt −VSSκλ dt . (10.19)

Now, we make a rather interesting assumption. Assume that an investor holds a diversified portfolio of thesehedging portfolios, for many different stocks. If we make the rather dubious assumption that these jumps for differentstocks are uncorrelated, then the variance of this portfolio of portfolios is small, hence there is little risk in thisportfolio. Hence, the expected return should be

E[dP] = rP dt . (10.20)

Now, equating equations (10.19 and (10.20) gives

Vt +σ2S2

2 VSS +VS[rS−Sκλ]− (r +λ)V +E[V(JS, t)]λ = 0 . (10.21)

Using equation (10.12) in equation (10.21) gives

Vt +σ2S2

2 VSS +VS[rS−Sκλ]− (r +λ)V +λZ ∞

0g(J)V(JS, t) dJ = 0 . (10.22)

57

Equation (10.22) is a Partial Integral Differential Equation (PIDE).A common assumption is to assume that g(J) is log normal,

g(J) =exp(

− (log(J)2−µ)2γ2

)

√2πγJ

. (10.23)

where, some algebra shows thatE(J−1) = κ = exp(µ+ γ2/2)−1 . (10.24)

Now, what about our dubious assumption that jump risk was diversifiable? In practice, we can regard σ, µ,γ,λ asparameters, and fit them to observed option prices. If we do this, (see L. Andersen and J. Andreasen, Jump-Diffusionprocesses: Volatility smile fitting and numerical methods, Review of Derivatives Research (2002), vol 4, pages 231-262), then we find that σ is close to historical volatility, but that the fitted values of λ, µ are at odds with the historicalvalues. The fitted values seem to indicate that investors are pricing in larger more frequent jumps than has beenhistorically observed. In other words, actual prices seem to indicate that investors do require some compensation forjump risk, which makes sense. In other words, these parameters contain a market price of risk.

Consequently, our assumption about jump risk being diversifiable is not really a problem if we fit the jump param-eters from market (as opposed to historical) data, since the market-fit parameters will contain some effect due to riskpreferences of investors.

One can be more rigorous about this if you assume some utility function for investors. See (Alan Lewis, Fear ofjumps, Wilmott Magazine, December, 2002, pages 60-67) or (V. Naik, M. Lee, General equilibrium pricing of optionson the market portfolio with discontinuous returns, The Review of Financial Studies, vol 3 (1990) pages 493-521.)

10.3 An Alternate Derivation of the Pricing Equation for Jump DiffusionWe will give a pure hedging argument in this section, in order to derive the PIDE for jump diffusion. Initially, wesuppose that there is only one possible jump size J, i.e. after a jump, S → JS, where J is a known constant. Suppose

dS = a(S, t)dt +b(S, t)dZ +(J−1)S dq, (10.25)

where dq is the Poisson process. Consider a contract on S, F(S, t), then

dF =

[

aFS +b2

2 FSS +Ft

]

dt +bFSdZ +[F(JS, t)−F(S, t)] dq, (10.26)

or, in more compact notation

dF = µ dt +σ∗ dZ +∆F dq

µ = aFS +b2

2 FSS +Ft

σ∗ = bFS

∆F = [F(JS, t)−F(S, t)] . (10.27)

Now, instead of hedging with the underlying asset, we will hedge one contract with another. Suppose we have threecontracts F1,F2,F3 (they could have different maturities for example).

Consider the portfolio Π

Π = n1F1 +n2F2 +n3F3 (10.28)

so that

dΠ = n1 dF1 +n2 dF2 +n3 dF3

= n1(µ1 dt +σ∗1 dZ +∆F1 dq)

+n2(µ2 dt +σ∗2 dZ +∆F2 dq)

+n3(µ3 dt +σ∗3 dZ +∆F3 dq)

= (n1µ1 +n2µ2 +n3µ3) dt+(n1σ∗

1 +n2σ∗2 +n3σ∗

3) dZ+(n1∆F1 +n2∆F2 +n3∆F3) dq . (10.29)

58

Eliminate the random terms by setting

(n1∆F1 +n2∆F2 +n3∆F3) = 0(n1σ∗

1 +n2σ∗2 +n3σ∗

3) = 0 . (10.30)

This means that the portfolio is riskless, hence

dΠ = rΠ dt , (10.31)

hence (using equations (10.29-10.31))

(n1µ1 +n2µ2 +n3µ3) = (n1F1 +n2F2 +n3F3)r . (10.32)

Putting together equations (10.30) and (10.32), we obtain

σ∗1 σ∗

2 σ∗3

∆F1 ∆F2 ∆F3µ1 − rF1 µ2 − rF2 µ3 − rF3

n1n2n3

=

000

. (10.33)

Equation (10.33) has a nonzero solution only if the rows are linearly dependent. There must be λB(S, t),λJ(S, t) suchthat

(µ1 − rF1) = λBσ∗1 −λJ∆F1

(µ2 − rF2) = λBσ∗2 −λJ∆F2

(µ3 − rF3) = λBσ∗3 −λJ∆F3 . (10.34)

(We will show later on that λJ ≥ 0 to avoid arbitrage). Dropping subscripts, we have

(µ− rF) = λBσ∗−λJ∆F (10.35)

and substituting the definitions of µ,σ∗,∆F , from equations (10.27), we obtain

Ft +b2

2 FSS +(a−λBb)FS − rF +λJ[F(JS, t)−F(S, t)] = 0 . (10.36)

Note that λJ will not be the real world intensity of the Poisson process, but J will be the real world jump size.In the event that, say, F3 = S is a traded asset, we note that in this case

σ∗3 = b

µ3 = a∆F3 = (J −1)S . (10.37)

From equation (10.34) we have that

(µ3 − rF3) = λBσ∗3 −λJ∆F3 , (10.38)

or, using equation (10.37),

a−λBb = rS−λJ(J −1)S . (10.39)

Substituting equation (10.39) into equation (10.36) gives

Ft +b2

2 FSS +[r−λJ(J−1)]SFS− rF +λJ[F(JS, t)−F(S, t)] = 0 . (10.40)

Note that equation (10.36) is valid if the underlying asset cannot be used to hedge, while equation (10.40) is valid onlyif the underlying asset can be used as part of the hedging portfolio.

59

Let τ = T − t, and set a = 0,b = 0,r = 0 in equation (10.36), giving

Fτ = λJ [F(JS, t)−F(S, t)] . (10.41)

Now, suppose that

F(S,τ = 0) = 0 ; if S ≥ K= 1 ; if S < K . (10.42)

Now, consider the asset value S∗ > K, and let J = K/(2 ∗ S∗). Imagine solving equation (10.41) to an infinitesimaltime τ = ε 1. We will obtain the following value for F ,

F(S∗,ε) ' ελJ . (10.43)

Since the payoff is nonnegative, we must have λJ ≥ 0 to avoid arbitrage.Now, suppose that there are a finite number of jump states, i.e. after a jump, the asset may jump to to any state JiS

S → JiS ; i = 1, ...,n . (10.44)

Repeating the above arguments, now using n+2 hedging instruments in the hedging portfolio

Π =i=n+2

∑i=1

niFi (10.45)

so that the diffusion and jumps are hedged perfectly, we obtain the following PDE

Ft +b2

2 FSS +(a−λBb)FS − rF +i=n∑i=1

λiJ [F(JiS, t)−F(S, t)] = 0 . (10.46)

If we can use the underlying to hedge, then we get the analogue of equation (10.40)

Ft +b2

2 FSS +(rS−i=n∑i=1

λiJS(Ji −1))FS− rF +

i=n∑i=1

λiJ [F(JiS, t)−F(S, t)] = 0 . (10.47)

Now, let

p(Ji) =λi

J∑i=n

i=1 λiJ

λ∗ =i=n∑i=1

λiJ (10.48)

then we can write equation (10.46) as

Ft +b2

2 FSS +(a−λBb)FS − rF +λ∗i=n∑i=1

p(Ji)[F(JiS, t)−F(S, t)] = 0 . (10.49)

Note that since λiJ ≥ 0, p(Ji) ≥ 0, and λ∗ ≥ 0.

Taking the limit as the number of jump states tends to infinity, then p(J) tends to a continuous distribution, so thatequation (10.49) becomes

Ft +b2

2 FSS +(a−λBb)FS − rF +λ∗Z ∞

0p(J)[F(JS, t)−F(S, t)] dJ = 0 . (10.50)

In the case where we can hedge with the underlying asset S, we obtain

Ft +b2

2 FSS +(r−λ∗E[J −1])SFS− rF +λ∗Z ∞

0p(J)[F(JiS, t)−F(S, t)] dJ = 0 , (10.51)

where

E[J−1] =

Z ∞

0p(J)(J−1) dJ . (10.52)

Note that λ∗ and p(J) are not the real arrival rate and real jump size distributions, since they are based on hedgingarguments which eliminate the risk. Consequently, λ∗, p(J) must be obtained by calibration to market data.

60

11 Mean Variance Portfolio OptimizationAn introduction to Computational Finance would not be complete without some discussion of Portfolio Optimization.Consider a risky asset which follows Geometric Brownian Motion with drift

dSS = µ dt +σ dZ , (11.1)

where as usual dZ = φ√

dt and φ ∼ N(0,1). Suppose we consider a fixed finite interval ∆t, then we can write equation(11.1) as

R = µ′ +σ′φ

R =∆SS

µ′ = µ∆tσ′ = σ

√∆t , (11.2)

where R is the actual return on the asset in [t, t + ∆t], µ′ is the expected return on the asset in [t, t + ∆t], and σ′ is thestandard deviation of the return on the asset in [t, t +∆t].

Now consider a portfolio of N risky assets. Let Ri be the return on asset i in [t, t +∆t], so that

Ri = µ′i +σ′iφi (11.3)

Suppose that the correlation between asset i and asset j is given by ρi j = E[φiφ j]. Suppose we buy xi of each asset att, to form the portfolio P

P =i=N∑i=1

xiSi . (11.4)

Then, over the interval [t, t +∆t]

P+∆P =i=N∑i=1

xiSi(1+Ri)

∆P =i=N∑i=1

xiSiRi

∆PP =

i=N∑i=1

wiRi

wi =xiSi

∑ j=Nj=1 x jS j

(11.5)

In other words, we divide up our total wealth W = ∑i=Ni=1 xiSi into each asset with weight wi. Note that ∑i=N

i=1 wi = 1.To summarize, given some initial wealth at t, we suppose that an investor allocates a fraction wi of this wealth to

each asset i. We assume that the total wealth is allocated to this risky portfolio P, so that

i=N∑i=1

wi = 1

P =i=N∑i=1

xiSi

Rp =∆PP =

i=N∑i=1

wiRi . (11.6)

61

The expected return on this portfolio Rp in [t, t +∆t] is

Rp =i=N∑i=1

wiµ′i , (11.7)

while the variance of Rp in [t, t +∆t] is

Var(Rp) =i=N∑i=1

j=N

∑j=1

wiw jσ′iσ

′jρi j . (11.8)

11.1 Special CasesSuppose the assets all have zero correlation with one another, i.e. ρi j ≡ 0,∀i 6= j (of course ρii = 1). Then equation(11.8) becomes

Var(Rp) =i=N∑i=1

(σ′i)

2(wi)2 . (11.9)

Now, suppose we equally weight all the assets in the portfolio, i.e. wi = 1/N,∀i. Let maxiσ′i = σ′

max, then

Var(Rp) =1

N2

i=N∑i=1

(σ′i)

2

≤ N(σ′max)

2

N2

= O(

1N

)

, (11.10)

so that in this special case, if we diversify over a large number of assets, the standard deviation of the portfolio tendsto zero as N → ∞.

Consider another case: all assets are perfectly correlated, ρi j = 1,∀i, j. In this case

Var(Rp) =i=N∑i=1

j=N

∑j=1

wiw jσ′iσ

′j

=

[

j=N

∑j=1

w jσ′j

]2

(11.11)

so that if sd(R) =√

Var(R) is the standard deviation of R, then, in this case

sd(Rp) =j=N

∑j=1

w jσ′j , (11.12)

which means that in this case the standard deviation of the portfolio is simply the weighted average of the individualasset standard deviations.

In general, we can expect that 0 < |ρi j| < 1, so that the standard deviation of a portfolio of assets will be smallerthan the weighted average of the individual asset standard deviation, but larger than zero.

This means that diversification will be a good thing (as Martha Stewart would say) in terms of risk versus reward.In fact, a portfolio of as little as 10−20 stocks tends to reap most of the benefits of diversification.

11.2 The Portfolio Allocation ProblemDifferent investors will choose different portfolios depending on how much risk they wish to take. However, allinvestors like to achieve the highest possible expected return for a given amount of risk. We are assuming that risk andstandard deviation of portfolio return are synonymous.

62

Let the covariance matrix C be defined as

[C]i j = Ci j = σ′iσ

′jρi j (11.13)

and define the vectors µ = [µ′1,µ′2, ...,µ′N ]t , w = [w1,w2, ...,wN ]t . In theory, the covariance matrix should be symmetricpositive semi-definite. However, measurement errors may result in C having a negative eigenvalue, which should befixed up somehow.

The expected return on the portfolio is then

Rp = wt µ , (11.14)

and the variance is

Var(Rp) = wtCw . (11.15)

We can think of portfolio allocation problem as the following. Let α represent the degree with which investorswant to maximize return at the expense of assuming more risk. If α → 0, then investors want to avoid as much riskas possible. On the other hand, if α → ∞, then investors seek only to maximize expected return, and don’t care aboutrisk. The portfolio allocation problem is then (for given α) find w which satisfies

minw

wtCw−αwt µ (11.16)

subject to the constraints

∑i

wi = 1 (11.17)

Li ≤ wi ≤Ui ; i = 1, ...,N . (11.18)

Constraint (11.17) is simply equation (11.6), while constraints (11.18) may arise due to the nature of the portfolio.For example, most mutual funds can only hold long positions (wi ≥ 0), and they may also be prohibited from havinga large position in any one asset (e.g. wi ≤ .20). Long-short hedge funds will not have these types of restrictions. Forfixed α, equations (11.16-11.18) constitute a quadratic programming problem.

Let

sd(Rp) = standard deviation of Rp

=√

Var(Rp) (11.19)

We can now trace out a curve on the (sd(Rp),Rp) plane. We pick various values of α, and then solve the quadraticprogramming problem (11.16-11.18). Figure 11.1 shows a typical curve, which is also known as the efficient frontier.The data used for this example is

µ =

.15

.20

.08

; C =

.20 .05 −.01

.05 .30 .015−.01 .015 .1

L =

000

; U =

∞∞∞

(11.20)

We have restricted this portfolio to be long only. For a given value of the standard deviation of the portfolio return(sd(Rp)), then any point below the curve is not efficient, since there is another portfolio with the same risk (standarddeviation) and higher expected return. Only points on the curve are efficient in this manner. In general, a linearcombination of portfolios at two points along the efficient frontier will be feasible, i.e. satisfy the constraints. Thisfeasible region will be convex along the efficient frontier. Another way of saying this is that a straight line joining anytwo points along the curve does not intersect the curve except at the given two points. Why is this the case? If thiswas not true, then the efficient frontier would not really be efficient. (see Portfolio Theory and Capital Markets, W.Sharpe, McGraw Hill, 1970, reprinted in 2000).

63

Standard Deviation

Expe

cted

Ret

urn

0.2 0.3 0.4 0.5 0.60.1

0.125

0.15

0.175

0.2

0.225

0.25

Efficient Frontier

FIGURE 11.1: A typical efficient frontier. This curve shows, for each value of portfolio standard deviation SD(Rp), themaximum possible expected portfolio return Rp. Data in equation (11.20).

Figure 11.2 shows results if we allow the portfolio to hold up to .25 short positions in each asset. In other words,the data is the same as in (11.20) except that

L =

−.25−.25−.25

. (11.21)

In general, long-short portfolios are more efficient than long-only portfolios. This is the advertised advantage oflong-short hedge funds.

Since the feasible region is convex, we can actually proceed in a different manner when constructing the efficientfrontier. First of all, we can determine the maximum possible expected return (α = ∞ in equation (11.16)),

minw

−wt µ

∑i

wi = 1

Li ≤ wi ≤Ui ; i = 1, ...,N (11.22)

which is simply a linear programming problem. If the solution weight vector to this problem is (w)max, then themaximum possible expected return is (Rp)

max = wtmaxµ.

Then determine the portfolio with the smallest possible risk, (α = 0 in equation (11.16) )

minw

wtCw

∑i

wi = 1

Li ≤ wi ≤Ui ; i = 1, ...,N . (11.23)

If the solution weight vector to this quadratic program is given by wmin, then the minimum possible portfolio returnis (Rp)min = wt

minµ. We then divide up the range [(Rp)min,(Rp)max] into a large number of discrete portfolio returns(Rp)k;k = 1, ...,Npts. Let e = [1,1, ...,1]t , and

A =

[

µt

et

]

; Bk =

[

(Rp)k1

]

(11.24)

64

Standard Deviation

Expe

cted

Ret

urn

0.2 0.3 0.4 0.5 0.60.1

0.125

0.15

0.175

0.2

0.225

0.25

Long Only

Long-Short

FIGURE 11.2: Efficient frontier, comparing results for long-only portfolio (11.20) and a long-short portfolio (same dataexcept that lower bound constraint is replaced by equation (11.21).

then, for given (Rp)k we solve the quadratic program

minw

wtCw

Aw = Bk

Li ≤ wi ≤Ui ; i = 1, ...,N , (11.25)

with solution vector (w)k and hence portfolio standard deviation sd((Rp)k) =√

(w)tkC(w)k. This gives us a set of pairs

(sd((Rp)k),(Rp)k),k = 1, ...,Npts.

11.3 Adding a Risk-free assetUp to now, we have assumed that each asset is risky, i.e. σ′

i > 0,∀i. However, what happens if we add a risk free assetto our portfolio? This risk-free asset must earn the risk free rate r′ = r∆t, and its standard deviation is zero. The datafor this case is (the risk-free asset is added to the end of the weight vector, with r′ = .03).

µ =

.15

.20

.08

.03

; C =

.20 .05 −.01 0.0

.05 .30 .015 0.0−.01 .015 .1 0.00.0 0.0 0.0 0.0

L =

000−∞

; U =

∞∞∞∞

(11.26)

where we have assumed that we can borrow any amount at the risk-free rate (a dubious assumption).If we compute the efficient frontier with a portfolio of risky assets and include one risk-free asset, we get the result

labeled capital market line in Figure 11.3. In other words, in this case the efficient frontier is a straight line. Note thatthis straight line is always above the efficient frontier for the portfolio consisting of all risky assets (as in Figure 11.1).In fact, given the efficient frontier from Figure 11.1, we can construct the efficient frontier for a portfolio of the samerisky assets plus a risk free asset in the following way. First of all, we start at the point (0,r ′) in the (sd(Rp),Rp) plane,corresponding to a portfolio which consists entirely of the risk free asset. We then draw a straight line passing through(0,r′), which touches the all-risky-asset efficient frontier at a single point (the straight line is tangent the all-risky-assetefficient frontier). Let the portfolio weights at this single point be denoted by wM . The portfolio corresponding to the

65

Standard Deviation

Expe

cted

Ret

urn

0 0.1 0.2 0.3 0.4 0.5 0.60

0.025

0.05

0.075

0.1

0.125

0.15

0.175

0.2

0.225

0.25

Efficient FrontierAll Risky Assets

Risk FreeReturn

MarketPortfolio

Capital MarketLine

Lending

Borrowing

FIGURE 11.3: The efficient frontier from Figure 11.1 (all risky assets), and the efficient frontier with the same assets asin Figure 11.1, except that we include a risk free asset. In this case, the efficient frontier becomes a straight line, shownas the capital market line.

weights wM is termed the market portfolio. Let (Rp)M = wtM µ be the expected return on this market portfolio, with

corresponding standard deviation sd((Rp)M). Let wr be the fraction invested in the risk free asset. Then, any pointalong the capital market line has

Rp = wrr′ +(1−wr)(Rp)M

sd(Rp) = (1−wr) sd((Rp)M) . (11.27)

If wr ≥ 0, then we are lending at the risk-free rate. If wr < 0, we are borrowing at the risk-free rate.Consequently, given a portfolio of risky assets, and a risk-free asset, then all investors should divide their assets

between the risk-free asset and the market portfolio. Any other choice for the portfolio is not efficient. Note that theactual fraction selected for investment in the market portfolio depends on the risk preferences of the investor.

The capital market line is so important, that the equation of this line is written as Rp = r′ + λM sd((Rp)), whereλM is the market price of risk. In other words, all diversified investors, at any particular point in time, should havediversified portfolios which plot along the capital market line. All portfolios should have the same Sharp ratio

λM =Rp− r′sd(Rp)

. (11.28)

11.4 CriticismIs mean-variance portfolio optimization the solution to all our problems? Not exactly. We have assumed that µ′,σ′ areindependent of time. This is not likely. Even if these parameters are reasonably constant, they are difficult to estimate.In particular, µ′ is hard to determine if the time series of returns is not very long. Remember that for short time series,the noise term (Brownian motion) will dominate. If we have a long time series, we can get a better estimate for µ′,but why do we think µ′ for a particular firm will be constant for long periods? Probably, stock analysts should beestimating µ′ from company balance sheets, sales data, etc. However, for the past few years, analysts have been toobusy hyping stocks and going to lunch to do any real work. So, there will be lots of different estimates of µ′,C, andhence many different optimal portfolios.

66

In fact, some recent studies have suggested that if investors simply use the 1/N rule, whereby initial wealth isallocated equally between N assets, that this does a pretty good job, assuming that there is uncertainty in the estimatesof µ′,C.

We have also assumed that risk is measured by standard deviation of portfolio return. Actually, if I am long anasset, I like it when the asset goes up, and I don’t like it when the asset goes down. In other words, volatility whichmakes the price increase is good. This suggests that perhaps it may be more appropriate to minimize downside riskonly (assuming a long position).

Perhaps one of the most useful ideas that come from mean-variance portfolio optimization is that diversifiedinvestors (at any point in time) expect that any optimal portfolio will produce a return

Rp = r′ +λMσ′p

Rp = Expected portfolio returnr′ = risk-free return in period ∆tλM = market price of riskσ′

p = Portfolio volatility , (11.29)

where different investors will choose portfolios with different σ′p (volatility), depending on their risk preferences, but

λM is the same for all investors. Of course, we also have

RM = r′ +λMσ′M . (11.30)

Note: there is a whole field called Behavioural Finance, whose adherents don’t think much of mean-varianceportfolio optimization.

Another recent approach is to compute the optimal portfolio weights using using many different perturbed inputdata sets. The input data (expected returns, and covariances) are determined by resampling, i.e. assuming that theobserved values have some observational errors. In this way, we can get an some sort of optimal portfolio weightswhich have some effect of data errors incorporated in the result. This gives us an average efficient frontier, which, it isclaimed, is less sensitive to data errors.

11.5 Individual SecuritiesEquation (11.30) refers to an efficient portfolio. What is the relationship between risk and reward for individualsecurities? Consider the following portfolio: divide all wealth between the market portfolio, with weight wM andsecurity i, with weight wi. By definition

wM +wi = 1 , (11.31)

and we define

RM = expected return on the market portfolioRi = expected return on asset i

σ′M = s.d. of return on market portfolioσ′

i = s.d. of return on asset iCi,M = σ′

Mσ′iρi,M

= Covariance between i and M (11.32)

Now, the expected return on this portfolio is

Rp = E[Rp] = wiRi +wMRM

= wiRi +(1−wi)RM (11.33)

and the variance is

Var(Rp) = (σ′p)

2 = w2i (σ

′i)

2 +2wiwMCi,M +w2M(σ′

M)2

= w2i (σ

′i)

2 +2wi(1−wi)Ci,M +(1−wi)2(σ′

M)2 (11.34)

67

For a set of values wi, equations (11.33-11.34) will plot a curve in expected return-standard deviation plane (Rp,σ′p)

(e.g. Figure 11.3). Let’s determine the slope of this curve when wi → 0, i.e. when this curve intersects the capitalmarket line at the market portfolio.

2(σ′p)

∂(σ′p)

∂wi= 2wi(σ′

i)2 +2(1−2wi)Ci,M +2(wi −1)(σ′

M)2

∂Rp∂wi

= Ri −RM . (11.35)

Now,

∂Rp∂(σ′

p)=

∂Rp∂wi

∂(σ′p)

∂wi

=(Ri −RM)(σ′

p)

wi(σ′i)

2 +(1−2wi)Ci,M +(wi −1)(σ′M)2 . (11.36)

Now, let wi → 0 in equation (11.36), then we obtain

∂Rp∂(σ′

p)=

(Ri −RM)(σ′M)

Ci,M − (σ′M)2 (11.37)

But this curve should be tangent to the capital market line, equation (11.30) at the point where the capital market linetouches the efficient frontier. If this curve is not tangent to the capital market line, then this implies that if we choosewi = ±ε, then the curve would be above the capital market line, which should not be possible (the capital market lineis the most efficient possible portfolio). This assumes that positions with wi < 0 in asset i are possible.

Assuming that the slope of the Rp portfolio is tangent to the capital market line gives (from equations (11.30,11.37))

RM − r′(σ′

M)=

(Ri −RM)(σ′M)

Ci,M − (σ′M)2 (11.38)

or

Ri = r′ +βi(RM − r′)

βi =Ci,M

(σ′M)2 . (11.39)

The coefficient βi in equation (11.39) has a nice intuitive definition. Suppose we have a time series of returns

(Ri)k = Return on asset i, in period k(RM)k = Return on market portfolio in period k . (11.40)

Typically, we assume that the market portfolio is a broad index, such as the TSX 300. Now, suppose we try to obtaina least squares fit to the above data, using the equation

Ri ' αi +biRM . (11.41)

Carrying out the usual least squares analysis (e.g. do a linear regression of Ri vs. RM), we find that

bi =Ci,M

(σ′M)2 (11.42)

so that we can write

Ri ' αi +βiRM . (11.43)

68

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04

−0.1

−0.05

0

0.05

0.1

0.15

Return on market

Retu

rn o

n st

ock

Rogers Wireless Communications

FIGURE 11.4: Return on Rogers Wireless Communications versus return on TSE 300. Each point represents pairs ofdaily returns. The vertical axis measures the daily return on the stock and the horizontal axis that of the TSE300.

This means that βi is the slope of the best fit straight line to a ((Ri)k,(RM)k) scatter plot. An example is shown inFigure 11.4. Now, from equation (11.39) we have that

Ri = r′ +βi(RM − r′) (11.44)

which is consistent with equation (11.43) if

Ri = αi +βiRM + εi

E[εi] = 0αi = r′(1−βi)

E[εi,RM ] = 0 , (11.45)

since

E[Ri] = Ri = αi +βiRM . (11.46)

Equation (11.46) has the interpretation that the return on asset i can be decomposed into a drift component, a partwhich is correlated to the market portfolio (the broad index), and a random part uncorrelated with the index. Make thefollowing assumptions

E[εiε j ] = 0 ; i 6= j= e2

i ; i = j (11.47)

e.g. that returns on each each asset are correlated only through their correlation with the index. Consider once again aportfolio where the wealth is divided amongst N assets, each asset receiving a fraction wi of the initial wealth. In thiscase, the return on the portfolio is

Rp =i=N∑i=1

wiRi

Rp =i=N∑i=1

wiαi +RMi=N∑i=1

wiβi (11.48)

69

and

s.d.(Rp) =

√

√

√

√(σ′M)2

i=N∑i=1

j=N

∑j=1

wiw jβiβ j +i=N∑i=1

w2i e2

i

=

√

√

√

√(σ′M)2

(

i=N∑i=1

wiβi

)2

+i=N∑i=1

w2i e2

i . (11.49)

Now, if wi = O(1/N), then

i=N∑i=1

w2i e2

i (11.50)

is O(1/N) as N becomes large, hence equation (11.49) becomes

s.d.(Rp) ' σ′M

∣

∣

∣

∣

∣

i=N∑i=1

wiβi

∣

∣

∣

∣

∣

. (11.51)

Note that if we write

Ri = r′ +λiσ′i (11.52)

then we also have that

Ri = r′ +βi(RM − r′) (11.53)

so that the market price of risk of security i is

λi =βi(RM − r′)

σ′i

(11.54)

which is useful in real options analysis.

12 Stocks for the Long Run?Conventional wisdom states that investment in a diversified portfolio of equities has a low risk for a long term investor.However, in a recent article (”Irrational Optimism,” Fin. Anal. J. E. Simson, P.Marsh, M. Staunton, vol 60 (January,2004) 25-35) an extensive analysis of historical data of equity returns was carried out. Projecting this informationforward, the authors conclude that the probability of a negative real return over a twenty year period, for a investorholding a diversified portfolio, is about 14 per cent. In fact, most individuals in defined contribution pension plans havepoorly diversified portfolios. Making more realistic assumptions for defined contribution pension plans, the authorsfind that the probability of a negative real return over twenty years is about 25 per cent.

Let’s see if we can explain why there is this common misconception about the riskiness of long term equityinvesting. Table 12.1 shows a typical table in a Mutual Fund advertisement. From this table, we are supposed toconclude that

• Long term equity investment is not very risky, with an annualized compound return about 3% higher than thecurrent yield on government bonds.

• If S is the value of the mutual fund, and B is the value of the government bond, then

B(T ) = B(0)erT

r = .03S(T ) ' S(0)eαT

α = .06, (12.1)

70

1 year 2 years 5 years 10 years 20 years 30 years 30 year bondyield

-2% -5% 10% 8% 7% 6% 3%

TABLE 12.1: Historical annualized compound return, XYZ Mutual Equity Funds. Also shown is the current yield on along term government bond.

for T large, which gives

S(T=30)S(0)

B(T=30)B(0)

= e1.8−.9 = e.9 ' 2.46, (12.2)

indicating that you more than double your return by investing in equities compared to bonds (over the longterm).

A convenient way to measure the relative returns on these two investments (bonds and stocks) is to compare thetotal compound return

Compound return: stocks = log[

S(T )

S(0)

]

= αT

Compound return: bonds = log[

B(T )

B(0)

]

= rT , (12.3)

or the annualized compound returns

Annualized compound return: stocks =1T log

[

S(T )

S(0)

]

= α

Annualized compound return: bonds =1T log

[

B(T )

B(0)

]

= r . (12.4)

If we assume that the value of the equity portfolio S follows a Geometric Brownian Motion


then from equation (3.56) we have that

log(

S(T )

S(0)

)

∼ N((µ− σ2

2 )T,σ2T ) , (12.6)

i.e. the compound return in is normally distributed with mean (µ− σ22 )T and variance σ2T , so that the variance of the

total compound return increases as T becomes large.Since var(aX) = a2var(X), it follows that

1T log

(

S(T )

S(0)

)

∼ N((µ− σ2

2 ),σ2/T ) , (12.7)

so that the the variance of the annualized return tends to zero at T becomes large.Of course, what we really care about is the total compound return (that’s how much we actually have at t = T ,

relative to what we invested at t = 0) at the end of the investment horizon. This is why Table 12.1 is misleading. Thereis significent risk in equities, even over the long term (30 years would be long-term for most investors).

Figure 12.1 shows the results of 100,000 simulations of asset prices assuming that the asset follows equation(12.5), with µ = .08,σ = .2. The investment horizon is 5 years. The results are given in terms of histograms of theannualized compound return (equation (12.4)) and the total compound return ((equation (12.3)).

71

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80

0.5

1

1.5

2

2.5

3

x 104 Annualized Returns − 5 years

Ann. Returns ($)−3 −2 −1 0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 104 Log Returns − 5 years

Log Returns ($)

FIGURE 12.1: Histogram of distribution of returns T = 5 years. µ = .08,σ = .2, 100,000 simulations. Left: annualizedreturn 1/T log[S(T)/S(0)]. Right: return log[S(T )/S(0)].

Figure 12.2 shows similar results for an investment horizon of 30 years. Note how the variance of the annualizedreturn has decreased, while the variance of the total return has increased (verifying equations (12.6-12.7)).

Assuming long term bonds yield 3%, this gives a total compound return over 30 years of .90, for bonds. Lookingat the right hand panel of Figure 12.2 shows that there are many possible scenarios where the return on equities willbe less than risk free bonds after 30 years. The number of scenarios with return less than risk free bonds is given bythe area to the left of .9 in the histogram.

13 Further Reading13.1 General Interest

• Peter Bernstein, Capital Ideas: the improbable origins of modern Wall street, The Free Press, New York, 1992.

• Peter Bernstein, Against the Gods: the remarkable story of risk, John Wiley, New York, 1998, ISBN 0-471-29563-9.

• Burton Malkeil, A random walk down Wall Street, W.W. Norton, New York, 1999, ISBN 0-393-32040-5.

• N. Taleb, Fooled by Randomness, Texere, 2001, ISBN 1-58799-071-7.

13.2 More Background• A. Dixit and R. Pindyck, Investment under uncertainty, Princeton University Press, 1994.

• John Hull, Options, futures and other derivatives, Prentice-Hall, 1997, ISBN 0-13-186479-3.

• S. Ross, R. Westerfield, J. Jaffe, Corporate Finance, McGraw-Hill Irwin, 2002, ISBN 0-07-283137-5.

• W. Sharpe, Portfolio Theory and Capital Markets, Wiley, 1970, reprinted in 2000, ISBN 0-07-135320-8. (Stilla classic).

• Lenos Triegeorgis, Real Options: Managerial Flexibility and Strategy for Resource Allocation, MIT Press,1996, ISBN 0-262-20102-X.

72

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80

0.5

1

1.5

2

2.5

3

x 104 Annualized Returns − 30 years

Ann. Returns ($)−3 −2 −1 0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 104 Log Returns − 30 years

Log Returns ($)

FIGURE 12.2: Histogram of distribution of returns T = 30 years. µ = .08,σ = .2, 100,000 simulations. Left: annualizedreturn 1/T log[S(T)/S(0)]. Right: return log[S(T )/S(0)],

13.3 More Technical• P. Brandimarte, Numerical Methods in Finance: A Matlab Introduction, Wiley, 2002, ISBN 0-471-39686-9.

• Boyle, Broadie, Glassermman, Monte Carlo methods for security pricing, J. Econ. Dyn. Con., 21:1267-1321(1997)

• P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer (2004) ISBN 0-387-00451-3.

• D. Higham, An Introduction to Financial Option Valuation, Cambridge (2004) ISBN 0-521-83884-3.

• P. Jackel, Monte Carlo Methods in Finance, Wiley, 2002, ISBN 0-471-49741-X.

• Y.K. Kwok, Mathematical Models of Finance, Springer Singapore, 1998, ISBN 981-3083-565.

• S. Neftci, An Introduction to the Mathematics of Financial Derivatives, Academic Press (2000) ISBN 0-12-515392-9.

• R. Seydel, Tools for Computational Finance, Springer, 2002, ISBN 3-540-43609-X.

• D. Tavella and C. Randall, Pricing Financial Instruments: the Finite Difference Method, Wiley, 2000, ISBN0-471-19760-2.

• D. Tavella, Quantitative Methods in Derivatives Pricing: An Introduction to Computational Finance, Wiley(2002) ISBN 0-471-39447-5.

• N. Taleb, Dynamic Hedging, Wiley, 1997, ISBN 0-471-15280-3.

• P. Wilmott, S. Howison, J. Dewynne, The mathematics of financial derivatives: A student introduction, Cam-bridge, 1997, ISBN 0-521-49789-2.

• Paul Wilmott, Paul Wilmott on Quantitative Finance, Wiley, 2000, ISBN 0-471-87438-8.

73

14 Basic Finite Difference ApproximationsThis section gives a brief review of finite difference methods, and can be skipped if you are familiar with theseconcepts. Given a function f (x) known at discrete data points f (xi) = fi, define

xi+1 − xi = ∆xi+1/2

xi − xi−1 = ∆xi−1/2. (14.1)

A Taylor series expansion gives

f (xi +∆xi+1/2) = fi +∂ f (xi)

∂x ∆xi+1/2 +∂2 f (xi)

∂x2∆x2

i+1/22 +O(∆x3

i+1/2). (14.2)

Letting

f (xi +∆xi+1/2) = fi+1

f (xi −∆xi−1/2) = fi−1

( fx)i =∂ f (xi)

∂x

( fxx)i =∂2 f (xi)

∂x2

( fxxx)i =∂3 f (xi)

∂x3

( fxxxx)i =∂4 f (xi)

∂x4 , (14.3)

then we have from equation (14.2)

fi+1 = fi +( fx)i∆xi+1/2 +( fxx)i∆x2

i+1/22 +O(∆x3

i+1/2), (14.4)

or( fx)i =

fi+1 − fi∆xi+1/2

+O(∆xi+1/2). (14.5)

Thus we can compute an approximation for the first derivative at x = xi by using

( fx)i 'fi+1 − fi∆xi+1/2

. (14.6)

Of course equation (14.6) is not exact, since there is an error term (or truncation error) which is of size O(∆xi+1/2)as given in equation (14.5). This means that we can expect the error in using equation (14.6) (to approximate the firstderivative) to tend to zero as ∆xi+1/2 → 0 only as the first power of ∆xi+1/2. Consequently, equation (14.6) is called afirst order approximation. Equation (14.5) is called a forward difference, since we use information at points i, i+1 toapproximate the derivative at point i.

Similarly, we can write a Taylor series going backwards, i.e.

fi−1 = fi − ( fx)i∆xi−1/2 +( fxx)i∆x2

i−1/22 +O(∆x3

i−1/2), (14.7)

or( fx)i =

fi − fi−1∆xi−1/2

+O(∆xi−1/2). (14.8)

Note that this backward difference is also only first order, which means that the truncation error term is O(∆x1i−1/2).

74

Now, let’s consider using three points fi, fi−1, fi+1, so we need both Taylor series going forward and backwards

fi+1 = fi +( fx)i∆xi+1/2 +( fxx)i∆x2

i+1/22 +( fxxx)i

∆x3i+1/23! +O(∆x4

i+1/2) (14.9)

fi−1 = fi − ( fx)i∆xi−1/2 +( fxx)i∆x2

i−1/22 − ( fxxx)i

∆x3i−1/23! +O(∆x4

i−1/2). (14.10)

Subtracting equation (14.10) from equation (14.9) gives

fi+1 − fi−1 = ( fx)i(∆xi+1/2 +∆xi−1/2)+( fxx)i∆x2

i+1/2−∆x2i−1/2

2

+ ( fxxx)i∆x3

i−1/2 +∆x3i+1/2

3! +O(∆x4i−1/2,∆x4

i+1/2), (14.11)

or

( fx)i =fi+1 − fi−1

∆xi+1/2 +∆xi−1/2− ( fxx)i

∆xi+1/2 −∆xi−1/22 +O

(

∆x3i−1/2 +∆x3

i+1/2∆xi+1/2 +∆xi−1/2

)

=fi+1 − fi−1

∆xi+1/2 +∆xi−1/2+O(∆xi+1/2−∆xi−1/2)+O

(

∆x3i−1/2 +∆x3

i+1/2∆xi+1/2 +∆xi−1/2

)

. (14.12)

Equation (14.12) is called a central difference, since it uses points symmetrically located about f i. Note that if∆xi+1/2 6= ∆xi−1/2, then equation (14.12) is only first order correct. However, in the special case that ∆xi+1/2 =∆xi−1/2 = ∆x, equation (14.12) becomes

( fx)i =fi+1 − fi−1

2∆x +O(∆x2). (14.13)

This is second order correct since the size of the truncation error in this case is O(∆x2). Clearly, if a central differenceapproximation is used with ∆xi+1/2 = ∆xi−1/2, then this will converge at a faster rate as ∆x → 0 compared to either aforward or backward difference. Note that provided that the grid spacing ∆xi−1/2,∆xi+1/2 does not change too rapidly,a central difference will be more accurate than either a forward or backward difference, even if ∆xi+1/2 6= ∆xi−1/2.

As an aside, we could have been a bit more clever in getting a central approximation for ( fx)i, using equations(14.9) and (14.10). Rearranging equations (14.9) and (14.10) gives

fi+1 − fi∆xi+1/2

= ( fx)i +( fxx)i∆xi+1/2

2 +( fxxx)i∆x2

i+1/23! +O(∆x3

i+1/2) (14.14)

fi−1 − fi∆xi−1/2

= −( fx)i +( fxx)i∆xi−1/2

2 − ( fxxx)i∆x2

i−1/23! +O(∆x3

i−1/2). (14.15)

The cause of the first order error term in equation (14.12) is the term involving ( fxx)i. We can eliminate this term if wemultiply equation (14.14) by ∆xi−1/2 and equation (14.15) by ∆xi+1/2, and then subtract the two resulting equations toget

( fx)i =

∆xi−1/2∆xi+1/2

( fi+1 − fi)−∆xi+1/2∆xi−1/2

( fi−1 − fi)

∆xi+1/2 +∆xi−1/2+O(∆xi+1/2∆xi−1/2). (14.16)

This is always second order correct, even if ∆xi+1/2 6= ∆xi−1/2. (If we refine the grid by reducing all grid spacing by1/2, then the error in equation (14.16) is reduced by 1/4.) This demonstrates that in general, if we use three pointsfi, fi−1, fi+1, we can always get second order accuracy for the first derivative. How would you get a second orderaccurate approximation for ( fx)i−1, using the points fi−1, fi, fi+1?

However, equation (14.16) is rarely used in practice. This is because usually the grid size changes fairly slowly, sothat equation (14.12) suffices.

75

In a similar way, we can obtain an approximation for ( fxx)i. If we add equations (14.14) and (14.15), we obtain

fi+1 − fi∆xi+1/2

+fi−1 − fi∆xi−1/2

= ( fxx)i∆xi−1/2 +∆xi+1/2

2

+ ( fxxx)i∆x2

i+1/2−∆x2i−1/2

3! +O(∆x3i−1/2,∆x3

i+1/2). (14.17)

Solving equation (14.17) for ( fxx)i gives

( fxx)i =

(

fi+1 − fi∆xi+1/2

+fi−1 − fi∆xi−1/2

)

(∆xi+1/2 +∆xi−1/22

) − ( fxxx)i2(∆xi+1/2−∆xi−1/2)

3!

+ O(

∆x3i+1/2

∆xi+1/2 +∆xi−1/2,

∆x3i−1/2

∆xi+1/2 +∆xi−1/2

)

=

(

fi+1 − fi∆xi+1/2

+fi−1 − fi∆xi−1/2

)

(∆xi+1/2 +∆xi−1/22

) +O(∆xi+1/2−∆xi−1/2)

+ O(

∆x3i+1/2

∆xi+1/2 +∆xi−1/2,

∆x3i−1/2

∆xi+1/2 +∆xi−1/2

)

. (14.18)

Equation (14.18) is only first order correct if ∆xi+1/2 6= ∆xi−1/2. If ∆xi+1/2 = ∆xi−1/2 = ∆x, then equation (14.18)becomes

( fxx)i =fi+1 − fi

∆x2 +fi−1 − fi

∆x2 +O(∆x2)

=fi+1 −2 fi + fi−1

∆x2 +O(∆x2), (14.19)

which is second order correct.

15 Overview of Discretizaton Methods for PDEsThere are several methods for discretizing PDEs, i.e. converting a differential equation into a set of discrete algebraicequations. Popular methods include: finite difference, finite element, and finite volume. Finite volume methods arecommon in engineering applications, since it is possible to develop an intuition about what the discrete equationsmean. This is useful, since a physical model helps to make sure our discrete equations do sensible things.

Finite difference methods are more straightforward to derive. We simply replace the differential operators by finitedifference operators. However, sometimes we can use several different finite difference methods, and it is sometimeshard to figure out which one is best.

In many cases, the discrete equations end up being almost the same. It is usually quite easy to construct softwareso that you can switch between finite difference and finite volume methods. We will give two derivations of thediscretization of the Black-Scholes equation, and point out where the discrete equations differ. If you are an Engineer,you will probably prefer the finite volume approach. If you are a Mathematician, you probably like the finite differencemethod.

16 Introduction to Finite Volume MethodsOriginally, the standard method for numerically solving PDEs was the finite difference method, where all derivativeswere simply replaced by finite differences. In the 1970s, the finite element method came to prominence. This method

76

could be used with unstructured grids, and so was extremely useful for practical problems. Moreover, much rigorousanalysis of convergence of finite element methods has been carried out [8, 51, 32].

For example, in one dimension a finite element discretization of the parabolic heat equation

ut = uxx (16.1)

on an unequally spaced grid (i.e. ∆xi+1/2 6= ∆xi−1/2) converges at a rate (h = maxi ∆xi):

Convergence = O(h2). (16.2)

Note that if mass lumping is used, with linear basis functions, then the discrete finite element (FE) equations are iden-tical to the discrete finite difference equations. From finite difference analysis, one would suppose that convergencewould occur only at a first order rate (for an unequally spaced grid). Similarly, it would appear that finite differencemethods are inapplicable if the initial conditions do not satisfy certain smoothness criteria. Again, finite elementanalysis can show that in fact convergence is obtained even if the initial conditions are discontinuous.

All this goes to show that, although finite difference truncation analysis is sometimes indicative of what is goingon, it is not the whole story, and should be taken with a grain of salt.

However, standard finite element methods have problems when:

• The parabolic PDEs degenerate to hyperbolic PDEs, or become drift dominated. This is common in financialproblems near boundaries, for mean-reverting processes, and for continuously observed path dependent options(e.g. continuously observed Asian options).

• The PDE is in non-conservative (non self-adjoint) form. This is the usual situation in finance.

Consequently, we will use a finite volume [57, 62] discretization. In many cases, finite volume, finite element,and finite difference methods all result in the same discrete equations. In this event, what you choose to call yourdiscretization method is simply a matter of philosophy. However, in complex situations, the finite volume approachalways produces the correct result, and hence is to be preferred. In addition, a finite volume method handles degenerateconditions at boundaries in a very natural way.

16.1 Finite Volume DiscretizationWe will begin by discretizing the usual Black-Scholes equation for a single asset:

Vτ =σ2

2 S2VSS + rSVS− rV, where

V = value of optionT = expiry time of optionτ = T − tσ = volatilityS = asset pricer = risk free interest rate. (16.3)

Equation (16.3) is a convection-diffusion or drift-diffusion equation with a decay term. The diffusion term is

σ2

2 S2VSS,

while the drift/convection term is

rSVS,

and the decay/discounting term is

−rV.

77

XX XSi+1SiSi-1

(Si + Si-1)/2 (Si + Si+1)/2

Cell i

Si+1/2Si-1/2

FIGURE 16.1: Finite volume cell.

Define a set of nodes at discrete values of S = Si as shown in Figure 16.1. Around each node, define a cell or finitevolume. Cell boundaries are placed halfway between nodes. Let

Ai = area of finite volume

=

(

Si+1 +Si2

)

−(

Si−1 +Si2

)

Si+1/2 =Si +Si+1

2

Si−1/2 =Si +Si−1

2∆Si+1/2 = Si+1 −Si

∆Si−1/2 = Si −Si−1.

We integrate equation (16.3) over the ith cell to obtainZ

AiVτ ds =

Z

Ai

σ2S2

2 VSS ds+

Z

AirSVS ds−

Z

AirV ds. (16.4)

For the time being, let’s focus on the terms on the right hand side of equation (16.4). Let

V (Si,τn) = V ni

and ignore the time dependence on τn for the terms on the right hand side of equation (16.4). We will focus on theasset price (space-like) dimension.

The diffusion is approximated asZ

Ai

σ2S2

2 VSS ds ' σ2S2i

2

Z

AiVSS ds

=σ2S2

i2(

(VS)i+1/2 − (VS)i−1/2)

. (16.5)

78

XX XVi+1ViVi-1

Cell i

Vi+1/2Vi-1/2

FIGURE 16.2: Location of discrete values.

The locations of Vi±1/2 are shown in Figure 16.2. The term (VS)i+1/2 is approximated by

(VS)i+1/2 ' Vi+1−Vi∆Si+1/2

(VS)i−1/2 ' Vi −Vi−1∆Si−1/2

. (16.6)

Equations (16.5) and (16.6) giveZ

Ai

σ2S2

2 VSS ds ' σ2S2i

2

(

Vi+1 −Vi∆Si+1/2

+Vi−1 −Vi∆Si−1/2

)

. (16.7)

The remaining terms on the right hand side of equation (16.4) are approximated asZ

AirV ds ' rViAi (16.8)

Z

AirSVS ds ' rSi

[

Vi+1/2−Vi−1/2]

. (16.9)

What approximation should we use for Vi+1/2? The obvious thing to do is to use central weighting

Vi+1/2 =Vi +Vi+1

2Vi−1/2 =

Vi +Vi−12 , (16.10)

which is a second order accurate approximation. In this case, using equations (16.9) and (16.10) we obtainZ

AirSVS ds ' rSi

[

Vi+1−Vi−12

]

. (16.11)

79

The left hand side of equation (16.4) is approximated by a forward differenceZ

AiVτ ds ' Ai

[

V n+1i −V n

i∆τ

]

. (16.12)

Now, if we evaluate the right hand side of equation (16.4) at time level n, we have

Ai

[

V n+1i −V n

i∆τ

]

=σ2S2

i2

[

V ni+1−V n

i∆Si+1/2

+V n

i−1 −V ni

∆Si−1/2

]

+rSi2[

V ni+1−V n

i−1]

− rAiV ni . (16.13)

This is an explicit method for solving the PDE (16.3). However, there are sometimes problems with the use of centralweighting for the first order derivative term, as in equation (16.10). We will discuss this effect in detail in the nextsection.

16.2 A Hyperbolic ProblemThe Black-Scholes equation is typically described as a parabolic equation. However, there are many cases in op-tion pricing where the PDE either degenerates to a hyperbolic PDE or behaves numerically as if it were hyperbolic.Examples of this situation include

• volatility surfaces

• stochastic volatility

• mean-reverting processes.

In computational fluid dynamics (CFD) this is known as drift/convection dominated flow, which is more difficultto handle (compared to parabolic situations).

Consider a simple model hyperbolic problem. Suppose we have a chemical with concentration c(x, t). We releasethe the chemical in a duct where the fan blows air at a speed u, as illustrated in Figure 16.3.

We can derive the PDE satisfied by the concentration c(x, t) as follows. Imagine dividing the duct into cells oflength ∆x (see Figure 16.3). Consider discrete values of c(x, t)

c(i∆x,n∆t) = cni .

The mass of chemical in cell i at t = n∆t is

mass in cell i = cni ∆x. (16.14)

The mass in cell i at t = (n+1)∆t ismass in cell i = cn+1

i ∆x. (16.15)

Mass conservation dictates

change in mass = flow in - flow out , (16.16)

or∆x(

cn+1i − cn

i)

= u∆t(

cni−1/2 − cn

i+1/2

)

, (16.17)

implying

cn+1i − cn

i∆t = −u

(ci+1/2 − ci−1/2∆x

)

.

Taking the limit as ∆x,∆t → 0 givesct +ucx = 0. (16.18)

80

c(x,t)

u

X XX

∆x

Ci-1/2 Ci+1/2

Ci-1Ci+1

Ci

FIGURE 16.3: Convection example.

Now, let’s discretize equation (16.18) using an explicit finite volume method. Integrating equation (16.18) overeach cell (as in Figure 16.3), we obtain

Z

cell(ct +ucx) dx ' ∆x(ct)

ni +u(cn

i+1/2− cni−1/2)

'(

cn+1i − cn

i∆t

)

∆x+u(cni+1/2− cn

i−1/2)

= 0, (16.19)

so equation (16.19) becomescn+1

i − cni

∆t = u(

cni−1/2− cn

i+1/2∆x

)

. (16.20)

If we use central weighting for ci±1/2, i.e.

cni±1/2 =

cni + cn

i±12 , (16.21)

we getcn+1

i − cni

∆t = −u(

cni+1 − cn

i−12∆x

)

. (16.22)

Now suppose we have a case where

cni = ε

cni+1 = 0

cni+2 = 0

cni−1 = 0

cni−2 = 0. (16.23)

81

At point t = (n+1)∆t,x = (i+1)∆x, we have

cn+1i+1 − cn

i+1∆t = −u

(

cni+2 − cn

i2∆x

)

,

or

cn+1i+1 = cn

i+1 −u∆t2∆x

(

cni+2 − cn

i)

.

In view of equation (16.23), this becomes

cn+1i+1 = +

εu∆t2∆x . (16.24)

Similarly, at point t = (n+1)∆t,x = i∆x we have

cn+1i = cn

i −u∆t2∆x

(

cni+1 − cn

i−1)

,

which, from equation (16.23) becomes

cn+1i = cn

i . (16.25)

Finally, at point t = (n+1)∆t,x = (i−1)∆x we have

cn+1i−1 = cn

i−1 −u∆t2∆x

(

cni − cn

i−2)

,

which, by equation (16.23) becomes

cn+1i−1 = −εu∆t

2∆x . (16.26)

Assume that u > 0 (i.e. the wind is blowing in the positive x-direction). Suppose that at t = 0 we inject an amount ofchemical at a single point i. An instant later, we have

cn+1i+1 = +

εu∆t2∆x

which means that some of the chemical has blown downstream, as expected. However, at point i we have

cn+1i = cn

i ,

which says that the chemical concentration hasn’t changed at all. Even more ridiculous, at point i − 1, which isupstream of where we injected the chemical, we get a negative concentration

cn+1i−1 = −εu∆t

2∆x ,

which is physically impossible. Clearly, something is wrong with our discretization.

16.3 Upstream WeightingAn alternative to central weighting (16.21), is upstream or upwind weighting. This is defined as

cni+1/2 = cn

i if u > 0= cn

i+1 if u < 0. (16.27)

This has a simple physical interpretation: if the wind is blowing in the positive x-direction, then an estimate of cni+1/2

is obtained using the upstream value only (cni ). Consider the situation shown in Figure 16.4. You can only smell the

pigs if you are downwind of the pigpen. You cannot smell the pigs if you are upwind of the pigpen.

82

Pig Pen

Wind Direction

FIGURE 16.4: A happy farmer and a sad farmer.

Now, if u > 0, using upstream weighting (16.27) in equation (16.20) we obtain

cn+1i − cn

i∆t = u

(

cni−1/2− cn

i+1/2∆x

)

=u

∆x(

cni−1 − ci

)

. (16.28)

Looking at the same initial condition as before

cni = ε

cni+1 = 0

cni+2 = 0

cni−1 = 0

cni−2 = 0,

we obtain:

cn+1i−1 = 0

cn+1i = ε

(

1− u∆t∆x

)

cn+1i+1 =

εu∆t∆x . (16.29)

In this case, if we assume that the timestep ∆t is selected so that

u∆t∆x < 1,

we have the following properties:

• no information is transported upstream

• all concentrations are positive

• information is transported downstream

83

• concentration at the injection point has decreased.

We have obtained a more physically reasonable result, but at a cost. Upstream weighting is only formally firstorder accurate, while central weighting is second order accurate, (although for the pure hyperbolic model problem,central weighting is completely useless!). We will see in later sections how to improve simple upstream weightingusing nonlinear flux limiters.

Observe that if we have the equation

ct = acx (16.30)

with a > 0, this corresponds to the wind blowing in the negative x-direction since we can write this equation as

ct +(−a)cx = 0.

Hence a finite volume discretization of equation (16.30) would be

cn+1i − cn

i∆t = (−a)

(

cni−1/2 − cn

i+1/2∆x

)

, (16.31)

where upstream weighting is

cni+1/2 = cn

i if (−a) > 0= cn

i+1 if (−a) < 0. (16.32)

16.4 Positive CoefficientsNote that the centrally weighted discretization of

ct +ucx = 0

is

cn+1i = cn

i −[

u∆t2∆x

]

cni+1 +

[

u∆t2∆x

]

cni−1, (16.33)

while upstream weighting gives

cn+1i = cn

i

(

1− u∆t∆x

)

+ cni−1

u∆t∆x . (16.34)

All coefficients of cni ,cn

i+1,cni−1 are nonnegative in equation (16.34) (assuming that u∆t/∆x < 1). By contrast, in

equation (16.33) some of the coefficients of cni ,cn

i+1,cni−1 are negative. Having all coefficients positive is the precise

mathematical requirement for a reasonable discretization for a hyperbolic problem.

17 Finite Volume IIGoing back to our original example of the Black-Scholes equation

Vτ =σ2

2 S2VSS + rSVS− rV, (17.1)

using central weighting in a finite volume discretization gives (for an explicit method)

Ai

[

V n+1i −V n

i∆τ

]

=σ2S2

i2

[

Vi+1−Vi∆Si+1/2

+Vi−1 −Vi∆Si−1/2

]n+

rSi2[

V ni+1−V n

i−1]

− rAiV ni . (17.2)

84

Recalling that

Ai =∆Si+1/2 +∆Si−1/2

2 ,

equation (17.2) can be written as

V n+1i = V n

i (1− (αci +βc

i + r)∆τ)+αci ∆τV n

i−1 +βci ∆τV n

i+1, (17.3)

where

αci =

σ2S2i

2Ai∆Si−1/2− rSi

2Ai

βci =

σ2S2i

2Ai∆Si+1/2+

rSi2Ai

. (17.4)

Alternatively, we can go back to the original finite volume discretization, which we write as

Ai

[

V n+1i −V n

i∆τ

]

=σ2S2

i2

[

V ni+1−V n

i∆Si+1/2

+V n

i−1−V ni

∆Si−1/2

]

+ (−rSi)[

V ni−1/2−V n

i+1/2

]

− rAiV ni . (17.5)

Note that in equation (17.5), we have written this in a form where we can see that the information flow in thedrift/convective term is in the negative S direction. Upstream weighting is then defined as

V ni+1/2 = V n

i+1

V ni−1/2 = V n

i . (17.6)

However, equation (17.6) holds only for the Black-Scholes equation. In general, (e.g. for mean-reverting processes)equation (16.32) must be used.

Using equation (17.6) in equation (17.5) gives

Ai

[

V n+1i −V n

i∆τ

]

=σ2S2

i2

[

V ni+1−V n

i∆Si+1/2

+V n

i−1 −V ni

∆Si−1/2

]

+ (−rSi)[

V ni −V n

i+1]

− rAiV ni . (17.7)

We can also write equation (17.7) in shorter form as[

V n+1i −V n

i∆τ

]

= (LV )ni

(LV )ni ≡

1Ai

(

σ2S2i

2

[

V ni+1 −V n

i∆Si+1/2

+V n

i−1 −V ni

∆Si−1/2

]

)

+1Ai

(

(−rSi)[

V ni −V n

i+1]

− rAiV ni)

. (17.8)

After some algebra we obtain

V n+1i = V n

i (1− (αui +βu

i + r)∆τ)+αui ∆τV n

i−1 +βui ∆τV n

i+1, (17.9)

where

αui =

σ2S2i

2Ai∆Si−1/2

βui =

σ2S2i

2Ai∆Si+1/2+

rSiAi

. (17.10)

85

As a result, with either upstream or central weighting for the (rSVS)ni term , we obtain

V n+1i = V n

i (1− (αi +βi + r)∆τ)+αi∆τV ni−1 +βi∆τV n

i+1 (17.11)

where

αi = αui ; βi = βu

i upstream weighting used in cell iαi = αc

i ; βi = βci central weighting used in cell i.

As we shall see shortly, we would like αi and βi to be nonnegative. One way to do this would be simply to use upstreamweighting all the time. However, we would like to use central weighting as much as possible, since it is more accurate.A simple way to do this would be accomplished by the following preprocessing step

For i = 0, . . . ,αi := αc

iβi := βc

iIf ( αi < 0 or βi < 0 ) then

αi := αui

βi := βui

EndIfEndFor (17.12)

Note that either central or upstream weighting can be used within a cell. You cannot mix weightings within a cell.From now on, we will assume that

αi ≥ 0βi ≥ 0.

It is interesting to determine when upstream weighting would be used for the simple Black-Scholes equation. Fromequation (17.4) we can see that βc

i ≥ 0 and that αci is nonnegative if

σ2 ≥r∆Si−1/2

Si. (17.13)

Let’s suppose that ∆Si−1/2 = ∆Si+1/2 = ∆S, and that

Si = i∆S.

Then equation (17.13) becomes

σ2 ≥ ri (17.14)

(we can exclude the point i = 0 since the PDE degenerates to an ODE at S = 0). If r = .05,σ = .4, then equation(17.14) always holds for i > 0. If σ = .2,r = .10, then equation (17.14) is true for i ≥ 3.

Consequently, for reasonable parameter values upstream weighting is only used for a few nodes near S = 0. Thuswe can assume for practical purposes that the spatial discretization is second order accurate. However, in more complexsituations (bond pricing for example) upstream weighting is very important, especially near boundaries.

18 Alternate Derivation: Finite Difference Discretization with Forward Back-ward Differencing

In this section, we will derive the discretized equations using an alternative approach, based directly on finite differ-encing the Black-Scholes PDE.

86

Si-1 Si Si+1∆Si-1/2 ∆Si+1/2

Vi-1n Vi

n Vi+1n

FIGURE 18.1: Finite difference grid.

Given the Black Scholes equation for option pricing

∂V∂τ

=σ2S2

2 VSS + rSVS− rV

V = S ;S → ∞ for a callV = 0 ;S → ∞ for a put (18.1)

where V (S,τ) is the option price, S is the asset price, τ = T − t, T is the expiry time of the option, t is time (in theforward direction), σ is the volatility, and r is the risk free rate of interest. The initial condition (payoff of the option)for equation (18.1) is

V (S,0) = max(S−K,0) for a callV (S,0) = max(K −S,0) for a put (18.2)

where K is the exercise price.Define a grid of points in the (S,τ) plane

S0,S1, ...,Sm 0 ≤ Si ≤ Smax

τn = n∆τ 0 ≤ τn ≤ N∆τ = T(18.3)

then letV (Si,τn) = V n

i (18.4)

Let

∆Si+1/2 = Si+1 −Si

∆Si−1/2 = Si −Si−1 (18.5)

as shown in Figure 18.1.Equation (18.1) can the be approximated by replacing each term in the equation by a discrete approximation, i.e.

(

∂V∂τ

)n

i=

(

σ2S2

2 VSS

)n

i+(rSVS)

ni − (rV )n

i (18.6)

For definiteness here, we are evaluating the rhs of equation (18.6) at τ = τn. This is called an explicit method. We arenow going to approximate the derivatives by finite differences (based on Taylor series expansions).

87

The time derivative in equation (18.6) is approximated by(

∂V∂τ

)n

i' V n+1

i −V ni

∆τ(18.7)

The second order derivative term in equation (18.6) is approximated by

(

σ2S2

2 VSS

)n

i' σ2S2

i2

(

V ni+1 −V n

i∆Si+1/2

)

−(

V ni −V n

i−1∆Si−1/2

)

∆Si+1/2 +∆Si−1/22

(18.8)

The discounting term in equation (18.6) is represented by

(rV )ni = rV n

i (18.9)

The first derivative term on the right hand side of equation (18.6) can be approximated in three ways.A central difference:

(rSVS)ni ' rSi

( V ni+1−V n

i−1∆Si+1/2 +∆Si−1/2

)

(18.10)

A forward difference(rSVS)

ni ' rSi

(

V ni+1−V n

i∆Si+1/2

)

(18.11)

A backward difference(rSVS)

ni ' rSi

(V ni −V n

i−1∆Si−1/2

)

(18.12)

If ∆Si+1/2 = ∆Si−1/2, then equation (18.10) is second order, while equations (18.11-18.12) are only first order.Consequently, we will want to use the central approximation as much as possible.

Substituting equations (18.7, 18.8, 18.9) and one of (18.11), (18.12) or (18.10) into equation (18.6) gives a discreteequation of the form

V n+1i = V n

i (1− (αi +βi + r)∆τ)+V ni−1∆ταi +V n

i+1∆τβi (18.13)

The form of αi and βi depends on the choice of finite difference stencil. Discretizing the first derivative term ofequation (18.6) with central differences leads to

αcentrali =

[

σ2S2i

(Si −Si−1)(Si+1 −Si−1)− rSi

Si+1 −Si−1

]

βcentrali =

[

σ2S2i

(Si+1 −Si)(Si+1 −Si−1)+

rSiSi+1 −Si−1

]

. (18.14)

If αcentrali is negative, oscillations may appear in the solution (βcentral

i is always positive). We will discuss thisin some detail in later Sections. The oscillations can be avoided by using forward differences at the problem nodes,leading to

αforwardi =

σ2S2i

(Si −Si−1)(Si+1 −Si−1)

βforwardi =

[

σ2S2i

(Si+1 −Si)(Si+1 −Si−1)+

rSiSi+1−Si

]

. (18.15)

Another alternative is to use backward differences, leading to

αbackwardi =

[

σ2S2i

(Si −Si−1)(Si+1 −Si−1)− rSi

Si −Si−1

]

βbackwardi =

σ2S2i

(Si+1 −Si)(Si+1 −Si−1)(18.16)

88

Of course if r > 0, then normally we would only choose between central and forward differencing to ensure thatα,β > 0.

Define upstream weighting for finite difference methods as

If (αforwardi ≥ 0 and βforward

i ≥ 0) then

αui = αforward

i

βui = βforward

iElse

αui = αbackward

i

βui = βbackward

iEndIf

(18.17)

Algorithmically, we decide between a central or upstream (i.e. central or forward or backward ) discretization ateach node for equation (18.13) as follows:

For i = 0, . . .

If (αcentrali ≥ 0 and βcentral

i ≥ 0) thenαi = αcentral

i

βi = βcentrali

Elseαi = αu

iβi = βu

iEndIf

EndFor

(18.18)

where αui ,βu

i are given by algorithm (18.17). Algorithm (18.18) uses central differencing as much as possible at eachnode, since central differencing is more accurate. We use upstream differencing only where necessary to ensure thatαi ≥ 0,βi ≥ 0.

18.1 Why use Upstream WeightingWe can get an intuitive idea as to why we need to use forward/backward differencing sometimes if we look at anextreme case. This is often a good idea: stress the discrete equations by putting in extreme values of the parametersand then seeing what happens.

Consider extreme case: set σ = 0 in B-S PDE

Vτ = rSVS− rV

For simplicity, assume

∆S = ∆Si+1/2 = ∆Si−1/2

Si = i∆S . (18.19)

Then, central weighting gives (from equation (18.14))

αcentrali =

−ri2

βcentrali =

ri2 (18.20)

89

Sp-1 Sp Sp+1

ε

FIGURE 18.2: The supershare payoff used to stress the finite difference equations.

and consequently, from equation (18.13)

V n+1i = V n

i (1− r∆τ)

−V ni−1

ir∆τ2 +V n

i+1ir∆τ

2 (18.21)

Now, suppose we have an option which pays off at τ = 0, (t = T )

V 0i = ε ; Si−1/2 ≤ Sp ≤ Si+1/2

= 0 ; otherwise

e.g. the option only pays off if S is near Sp (a supershare option). This payoff is illustrated in Figure 18.2. The initialcondition is then going to be

V 0i = 0 ; i 6= p

= ε ; i = p . (18.22)

With this initial condition, let’s take one timestep of our central differencing method. Substitute (18.22) intoequation (18.21) which gives (at nodes p−1, p, p+1)

V 1p = ε(1− r∆t)

V 1p+1 = −ε

(p+1)r∆τ2

V 1p−1 = +ε

(p−1)r∆τ2 (18.23)

Now, if we choose ∆t small enough

r∆t < 1

then

V 1p ≥ 0

V 1p−1 ≥ 0 .

90

But, no matter how small ∆t,∆S, we have

V 1p+1 = −ε

(p+1)r∆τ2

< 0 .

But this is ridiculous• An option with a non-negative payoff, must always have non-negative value

• Not possible to have a negative option value!Let’s try forward differencing (recall that σ = 0, and Si = i∆S). From equation (18.15)

αforwardi = 0

βforwardi = ri (18.24)

which gives, using equation (18.13)

V n+1i = V n

i (1− (1+ i)r∆τ)+V n

i+1ir∆τ . (18.25)

Assuming the same initial conditions as before

V 0i = 0 ; i 6= p

= ε ; i = p , (18.26)

and substituting (18.26) into (18.25) gives

V 1p = ε(1− (1+ p)r∆τ)

V 1p−1 = ε(p−1)r∆τ

V 1p+1 = 0 (18.27)

Provided that

(1+ i)r∆τ ≤ 1 ∀i (18.28)

then

V 1p−1 ≥ 0 ; V 1

p ≥ 0 ; V 1p+1 ≥ 0 (18.29)

So, we can see that in this case, use of a formally low order method (forward differencing) gives a more sensibleresult, compared to better second order central differencing.

Writing down the these two discretizations again:

• Central weighting

V n+1i = V n

i (1− r∆τ)

−V ni−1

ir∆τ2 +V n

i+1ir∆τ

2 (18.30)

• Forward differencing

V n+1i = V n

i (1− (1+ i)r∆τ)+V n

i+1ir∆τ (18.31)

we can see that if ∆t sufficiently small, then,• all coefficients of (Vi−1,Vi,Vi+1) on the right hand side of equation (18.31) are nonnegative if forward differenc-

ing used,

• some of the coefficients of (Vi−1,Vi,Vi+1) on the right hand side of equation (18.30) are negative if centraldifferencing used.

91

18.2 Finite Difference and Finite Volume: Is There a Difference?Note that in all cases, we end up with a discrete equation of the form (18.13). This looks just the same as the finitevolume discretization we derived in previous sections.

In fact, if the grid is equally spaced (∆Si+1/2 = ∆Si−1/2,∀i) then the two discretization methods give identicalresults. If central differencing is always used for the finite difference approach, and central weighting is always usedfor the finite volume approach, and even if unequal grid spacing is used, then the two methods again give identicaldiscretizations.

However, if the grid is unequally spaced, and there are some nodes which require a forward difference (or upstreamweighting) in order to maintain non-negative αi,βi, then the discretizations are slightly different.

Which one should you use? Engineers like to derive discrete equations using a finite volume approach. Mathe-maticians prefer finite difference. You can see that in both cases, the discrete equations look the same. It is very easyto have an option in software to switch from finite volume to finite difference.

However, in my experience, the two methods are almost the same, except in some unusual situations. Particularly,when estimating VS,VSS (which we need for hedging purposes), you will find that the finite difference approach (usingforward differencing if necessary) gives more reliable results. This is because, for an unequally spaced grid, the finitedifference forward difference is consistent (i.e. Taylor series error goes to zero as ∆S → 0) while the upstream weithtedfinite volume method is not consistent (for an unequally spaced grid). Actually, you can show that the finite volumemethod is still convergent, but this takes a lot of work.

Bottom line: probably best to use finite difference with forward (or backward) differencing if necessary to maintainpositivity of αi,βi.

We’ll show some numerical examples later on in Section 24 showing the differences in using these two methods.

18.3 A Complete Explicit MethodTo summarize, in order to solve the Black-Scholes equation using an explicit finite volume or finite difference method,we would proceed as follows: first, a mesh would be defined, Si; i = 0, . . . ,m. We choose Sm sufficiently large sothat asymptotic boundary conditions can be used (e.g. for typical values of T,σ, if K is the strike, then Sm = 10K issufficient). Note that unequal mesh spacing should generally be used with a fine spacing near K, it being cheap to havea few nodes with very large cell sizes as S → Sm. Also, in general, Smax = K exp[σ

√T ] is a good rule of thumb.

At τ = 0 we have

V 0i = max(Si −K,0) ; call

= max(K −Si,0) ; put. (18.32)

We need boundary conditions at S0,Sm. These are

V nm = Sm ; call

= 0 ; put. (18.33)

At S = 0, (i = 0) the Black-Scholes equation reduces to

Vτ = −rV, (18.34)

which can be discretized asV n+1

0 = V n0 (1− r∆t). (18.35)

For all points i = 1, . . . ,m−1, we have the discrete equations

V n+1i = V n


i+1. (18.36)

Equations (18.36, 18.33, 18.35) constitute a complete specification of the discrete problem.

92

To summarize, the complete algorithm is given in equation (18.37).

V 0i = Payoff(Si)

For n = 0, . . .

V n+10 = V n

0 (1− r∆t)V n+1

m = V nm

For i = 1, . . . ,m−1V n+1

i = V ni (1− (αi +βi + r)∆τ)+αi∆τV n

i−1 +βi∆τV ni+1

EndForEndFor

(18.37)

19 StabilityConsider the explicit method for the Black-Scholes equation

V n+1i = V n


i+1. (19.1)

Generally speaking, the computed solution V ni will not be equal to the exact solution V exact(Si,τn) to the Black-Scholes

equation. This is due to

• truncation errors introduced at each step by the finite volume approximations

• round-off errors introduced since floating point arithmetic is not exact.

In certain cases, such errors can cause the recurrence equation (19.1) to become unstable, i.e. errors grow exponentiallywith n (the number of discrete timesteps). Stability is discussed in many standard texts on the numerical solution ofPDEs, so we will simply proceed in an informal way. A basic test for stability is to require that the discrete value ofthe option is finite for fixed T , but as the number of steps becomes large (we would be very surprised if the value ofthe option was infinite!). So, if we imagine taking the limit as ∆τ → 0, then the total number of steps N = T/∆τ → ∞as ∆τ → 0. Our requirement for stability is simply that for a finite computational domain (S ∈ [0,Smax]), V n

i remainsbounded as n → ∞.

In particular, to avoid some extra algebra due to boundary conditions, let’s assume that we are pricing a Europeanput option, with boundary condition

V nm = 0 ; put

Sm = Smax.

Consider equation (19.1). We will assume that upstream or central weighting is selected according to equation(17.12), so that

αi ≥ 0βi ≥ 0. (19.2)

From equations (19.1) and (19.2) we obtain

|V n+1i | ≤ |V n

i (1− (αi +βi + r)∆τ) |+ |V ni−1|αi∆τ+ |V n

i+1|βi∆τ (19.3)

Now if we assume that1− (αi +βi + r)∆τ ≥ 0 ; ∀i, (19.4)

equation (19.3) becomes

|V n+1i | ≤ |V n

i |(1− (αi +βi + r)∆τ)+ |V ni−1|αi∆τ+ |V n

i+1|βi∆τ. (19.5)

93

Let

‖V n‖ = maxi

|V ni |.

Then equation (19.5) becomes

|V n+1i | ≤ ‖V n‖(1− r∆τ)

≤ ‖V n‖, (19.6)

which is true for any i. In particular, we can choose i = i∗ in equation (19.6) so that

|V n+1i∗ | = max

i|V n+1

i | = ‖V n+1‖,

meaning that equation (19.6) becomes

‖V n+1‖ ≤ ‖V n‖. (19.7)

Now, assuming that ‖V 0‖ is bounded and also that the boundary conditions are bounded ∀n, equation (19.7) says thatthe discrete solution is bounded as n → ∞. In other words, the discretization is stable. The key stability condition isequation (19.4), which can be rewritten as

∆τ ≤ 1αi +βi + r ; ∀i, (19.8)

or

∆τ ≤ mini

(

1αi +βi + r

)

.

Note that the above argument establishes that condition (19.8) is sufficient for stability. It can also be shown thatcondition (19.8) is a necessary condition.

Assume for ease of exposition that

∆Si+1/2 = ∆Si−1/2 = ∆S.

Then condition (19.8) becomes, from equation (17.4),

∆τ ≤ 1σ2S2

i(∆S)2 + r

' (∆S)2

σ2S2i

; ∆S → 0. (19.9)

This timestep limitation can be quite severe, especially if

• local refinement of the mesh is used near barriers, or the strike of a digital option. In this case, the stable timestepbecomes extremely small.

• we are interested in pricing long-term options, or bonds. Since the solution smooths out after a short time, largetimesteps would be desirable, since the truncation error is small. However, if we use an explicit method, thenwe still have to use small timesteps, even though nothing is going on! Note that a binomial or trinomial latticemethod is algebraically identical to an explicit finite difference scheme, with a maximum stable timestep.

94

19.1 Implicit MethodsAn alternative which avoids the timestep restriction due to an explicit discretization is an implicit method. Let’s writethe Black-Scholes equation as

Vτ = LV

LV ≡ σ2S2

2 VSS + rSVS− rV . (19.10)

Suppose we use either finite difference or finite volume method to discretize LV , so that an explicit method can bewritten as

[

V n+1i −V n

i∆τ

]

= (LV )ni . (19.11)

Another possibility is to evaluate the right hand side of equation (19.11) implicitly, i.e.[

V n+1i −V n

i∆τ

]

= (LV )n+1i . (19.12)

Both implicit and explicit methods have global time truncation error of size O(∆τ). We can obtain an higher level ofaccuracy in terms of convergence as ∆τ → 0 by using a weighted average for the right hand side of equation (19.12).This is called Crank-Nicolson timestepping:

[

V n+1i −V n

i∆τ

]

=12(

(LV )n+1i +(LV )n

i)

. (19.13)

After some algebra, we can rewrite equations (19.12) and (19.13) as (for both finite difference and finite volumemethods)

Implicit

V n+1i = V n

i +∆τ[

αi(V n+1i−1 −V n+1

i )+βi(V n+1i+1 −V n+1

i )− rV n+1i]

(19.14)

Crank-Nicolson

V n+1i = V n

i +∆τ2[

αi(V n+1i−1 −V n+1

i )+βi(V n+1i+1 −V n+1

i )− rV n+1i]

+∆τ2[

αi(V ni−1 −V n

i )+βi(V ni+1 −V n

i )− rV ni]

(19.15)

where αi,βi are defined for either central or upstream weighting as in equations (17.4) or (17.10), and we assume thatprocedure (17.12) is used so that αi ≥ 0;βi ≥ 0.

19.2 Stability of Implicit MethodsWe can analyze the stability of the fully implicit method (19.14) using an approach similar to the explicit case. Wecan write equation (19.14) as

V n+1i [1+∆τ(r +αi +βi)] = V n

i +∆ταiV n+1i−1 +∆τβiV n+1

i+1 . (19.16)

It follows from equation (19.16), noting that αi,βi,r are all nonnegative, that

|V n+1i | [1+∆τ(r +αi +βi)] ≤ |V n

i |+ |Vn+1i−1 |∆ταi + |V n+1

i+1 |∆τβi. (19.17)

Letting

‖V n‖ = maxi

|V ni |,

95

equation (19.17) gives|V n+1

i | [1+∆τ(r +αi +βi)] ≤ ‖V n‖+∆τ(αi +βi)‖V n+1‖. (19.18)

Since equation (19.17) is true for all i, we can choose i∗ such that

|V n+1i∗ | = max

i|V n+1

i |,


‖V n+1‖ [1+∆τ(r +αi∗ +βi∗)] ≤ ‖V n‖+∆τ(αi∗ +βi∗)‖V n+1‖

⇒ ‖V n+1‖ ≤ ‖V n‖1+ r∆τ

≤ ‖V n‖. (19.19)

Therefore a fully implicit method is unconditionally stable, i.e. there is no restriction on timestep size. Note that afully implicit method can also be written as

V n+1i =

V ni +∆ταiV n+1

i−1 +∆τβiV n+1i+1

1+∆τ(r +αi +βi). (19.20)

The coefficients of V n+1i−1 ,V n+1

i+1 ,V ni on the right hand side of equation (19.20) are nonnegative, and sum to less than

one. Why is this significant? Let

V maxi = max(V n+1

i−1 ,V n+1i+1 ,V n

i )

V mini = min(V n+1

i−1 ,V n+1i+1 ,V n

i ).

Then it follows from equation (19.20) that

V n+1i ≤ V max

i1+∆τ(αi +βi)

1+∆τ(r +αi +βi)

V n+1i ≥ V min

i1+∆τ(αi +βi)

1+∆τ(r +αi +βi). (19.21)

Now, if V 0i ≥ 0, then, if we use a fully implicit method with αi,βi ≥ 0m, then V n

i ≥ 0. This makes financial sense:nonnegative payoffs should always result in nonnegative option values. In this case, V max

i ≥ 0 and V mini ≥ 0, so that

equation (19.21) implies

V mini

1+ r∆τ≤V n+1

i ≤V maxi . (19.22)

This can be interpreted as follows: the maximum and minimum values of V n+1i are bounded by the maximum and

minimum values of the neighboring grid values (at the new timestep), and the value at node i at the old timestep (apartfrom a discounting effect). This means that spurious oscillations cannot occur in the discrete solution. We will havemore to say about this later.

For the explicit discretization

V n+1i = V n


i+1,

we can see that if the stability condition

1− (αi +βi + r)∆τ ≥ 0

is satisfied (and αi,βi are nonnegative), then this is also a positive coefficient discretization, i.e. the coefficients ofV n

i ,V ni−1,V n

i+1 are nonnegative and sum to less than one (1− r∆τ in this case).

96

19.3 Stability of the Crank-Nicolson MethodFor the Crank-Nicolson method, we must proceed differently. We can write equation (19.15) as

V n+1i + V n+1

i

[

∆τ2 (αi +βi + r)

]

− V n+1i−1 (

∆ταi2 )−V n+1

i+1 (∆τβi

2 )

= V ni −

[

V ni (

∆τ2 )(αi +βi + r)−Vn

i−1(∆ταi

2 )−V ni+1(

∆τβi2 )

]

. (19.23)

Let

V n+1 =

V n+10

V n+11|

V n+1m

; V n =

V n0

V n1|

V nm

, (19.24)

and let M be the tridiagonal matrix with entries

[MV n]i = −∆ταi2 V n

i−1 +∆τ(αi +βi + r)

2 V ni − ∆τβi

2 V ni+1. (19.25)

Note that equation (19.25) is valid for i 6= 0,m, since the boundary conditions will alter the first and last rows of M.In particular, note that the first row of M will satisfy equation (19.25) with αi,βi = 0. If we impose the boundarycondition V n+1

m = V nm, then the last row of M is identically zero).

Combining equations (19.25) and (19.23) we obtain

[I + M]V n+1 = [I − M]V n. (19.26)

If we define M = 2M, then a fully implicit method can be written as

[I +M]V n+1 = V n. (19.27)

We can write fully implicit or Crank-Nicolson timestepping in one equation

[I +(1−θ)M]Vn+1 = [I−θM]V n (19.28)

where θ = 1/2 for Crank-Nicolson, and θ = 0 for fully implicit.Technical Point: Note that equation (19.25) is valid for i 6= 0,m, since the boundary conditions will alter the first

and last rows of M. Since the Black-Scholes equation reduces to Vτ =−rV at S = 0, then the first row of M will satisfyequation (19.25) with αi,βi = 0. If we impose the boundary condition V n+1

m = V nm = Vspeci f ied , then the last row of M

is also identically zero). If we are solving an equation for interest rate derivatives, with mean reverting processes, wehave to be careful about what we do at r = 0 (use forward differencing at this point).

Now, we can show that the Crank-Nicolson method satisfies the necessary conditions for stability. This makes itat least plausible that C-N is unconditionally stable (but it’s not a proof). One can use other methods to show that C-Nsatisfies sufficient conditions for stability as well (at least for simple problems), but that would take us a bit far afieldhere.

We need to review some linear algebra. The matrix M is of size (m+1)× (m+1), where there are m+1 nodes inthe discretization. We assume that M has m+1 linearly independent eigenvectors. (The case where the eigenvectorsare not all linearly independent can also be treated, but the algebra gets more complicated. See any text on numericalsolution of ODEs).

If Xk is the k′th eigenvector of M, then

MXk = λkXk

where λk is the eigenvalue associated with Xk. Now, equation (19.26) can be written

V n+1 = [I + M]−1[I− M]V n

⇒ V n+1 =[

[I + M]−1[I − M]]n+1V 0. (19.29)

97

Assuming M has a complete set of eigenvectors, then

V 0 = ∑k

CkXk (19.30)

for some coefficients Ck. Equations (19.29) and (19.30) then give

V n+1 = ∑k

Ck

[

1−λk1+λk

]n+1X0

k . (19.31)

In order for equation (19.31) to remain bounded at n → ∞, we need∣

∣

∣

∣

1−λk1+λk

∣

∣

∣

∣

≤ 1 ; ∀k (19.32)

which will be true as long as

Re(λk) ≥ 0 ; ∀k. (19.33)

Note that M has the properties

Mii ≥ 0|Mii| ≥ ∑

j 6=i|Mi j |, (19.34)

i.e. M has non-negative diagonal entries and is row diagonally dominant. (Note that the last row of M is identicallyzero.

From the Gershgorin circle theorem, we know that every eigenvalue λ of M satisfies at least one of the inequalities

|Mii −λ| ≤ ∑j 6=i

|Mi j |

≤ |Mii|

since M is diagonally dominant. In the complex plane λ = x+√−1y, this means that every λk lies in some disk with

center x = |Mii| and radius strictly less than |Mii|, and therefore

Re(λk) ≥ 0 ; ∀k.

(Note that there is a degenerate case where λi ≡ 0, corresponding to the last row). Thus Crank-Nicolson timesteppingsatisfies the necessary conditions for unconditional stability. It takes a bit of work to show that Crank-Nicolson alsosatisfies the sufficient conditions as well. But this also follows from the fact that the Gershgorin circles are in the righthalf complex plane [61].

However, note that the Crank-Nicolson discretization can be written as

V n+1i

(

1+∆τ2 (αi +βi + r)

)

= (V ni−1 +V n+1

i−1 )αi∆τ

2 +(V ni+1 +V n+1

i+1 )βi∆τ

2

+ V ni (1− ∆τ

2 (αi +βi + r)). (19.35)

This implies that Crank-Nicolson timestepping will not be a positive coefficient discretization unless

∆τ ≤ 2αi +βi + r ,

which is twice the size of the maximum stable explicit timestep size. So, although C-N is unconditionally stable (i.e.we can take any timestep size and the method will be stable), it is a positive coefficient method only if the timestep isof the order of the stable explicit timestep size.

98

In many cases, this lack of positivity does not matter because there is sufficient diffusion in the problem to smoothout any difficulties. However, in some cases (e.g. barriers, digital payoffs, callable bonds with a notice period), if C-Ntimestepping is used with a timestep is larger than the explicit timestep, then oscillations can be introduced into thesolution. Even though these oscillations may be small if we look at the option value, they become magnified whencomputing the option delta and gamma [101]. Remember, when using a positive coefficient method, these oscillationsare guaranteed not to occur. However, it may the case that if we use a scheme (such as C-N) which is not a positivecoefficient method, the solution might not be oscillatory, but we can’t be sure. We would like, as much as possible, touse a second order method like C-N, since we get faster convergence compared to a first order fully implicit method.

We will have some suggestions about how to suppress oscillations in C-N timestepping in a later section. For now,let’s review some key ideas of positive coefficient discretizations.

19.3.1 More on StabilityActually, the whole idea of stability becomes quite tricky for discretizations which result in nonsymmetric matrices.Recall that the basic equation for C-N timestepping is

[I + M]V n+1 = [I − M]V n. (19.36)

which we can rewrite as

V n+1 = BV n

B = [I + M]−1[I− M] (19.37)

or

V n = BnV 0

. (19.38)

So, a precise definition of stability is

‖V n‖ = bounded (19.39)

independent of n and the dimension of B. The notation ‖.‖, refers to a vector norm (i.e. l2, l∞, etc.) Let the size of Bbe s× s. As we refine the mesh, then s → ∞, and as we reduce ∆τ, then n → ∞. So, we would like equation (19.39)to hold as s,n → ∞. If we didn’t let s → ∞, then the eigenvalue condition discussed in the previous Section would besufficient.

If B is symmetric, then we can decompose V 0 into a linear combination of a set of orthogonal eigenvectors, so anecessary and sufficient condition for stability is that all eigenvalues of B be less than one in magnitude. We couldalso decompose any errors (i.e. roundoff, etc) into a linear combination of eigenvectors as well, so these errors wouldalso remain bounded too.

However, if B is nonsymmetric, the eigenvector argument fails. The eigenvalue condition is necessary, but notsufficient. An alternative method for anlysing stability is the von Neumann method, which uses a discrete FourierTransform to analyse stability for each Fourier frequency (see any text on numerical solution of PDEs), but thismethod only works for constant coefficient problems.

In fact, it is very hard to prove condition (19.39), so the idea of algebraic stability has been introduced in [61, 78,43, 79, 17, 16]. In this case, we seek to show that

‖Bn‖ ≤ Const. nαsβ (19.40)

where the constant is independent of n,s. This is often referred to as bounding the power of a matrix. Note that (19.40)implies that

‖V n‖ ≤ ‖V 0‖Const. nαsβ. (19.41)

In many cases one can show that α,β are less than one, so that we have a very mild growth of errors (i.e. algebraic,not exponential). If equation (19.40) is satisfied, then convergence can be shown, at least in some cases, as ∆τ,∆S → 0[43]. In [43], there is an interesting discussion of the relationship of the idea of algebraic stability, with the morecommon idea of Lax stability.

Using the results in [61], we can show that

99

• Crank-Nicoloson methods satisfy equation (19.40) with β = 0, α = 1/2.

assuming that we impose known value boundary conditions on the solution as S → ∞.

19.4 Crank-Nicolson von Neumann Analysis: Log TransformationIn the previous section, we indicated that Crank-Nicolson timestepping can be shown to be algebraically stable. Inpractice, Crank-Nicolson appears to satisfy stronger stability conditions, even if the coefficients of the BS equation arenot constant.

In this section, we will carry out a Von Neumann stability analysis for Crank-Nicolson timestepping for the specialcase of constant coefficients and an equally spaced grid in logS coordinates.

The Black-Scholes equation can be written as

Vτ =12σ2S2VSS + rSVS− rV (19.42)

Using the change of variablex = log(S), S = exp(x),

and substituting into (19.42), we find

V τ =12σ2V xx +(r− 1

2σ2)V x − rV (19.43)

where V (x,τ) = V (exp(x),τ).A Crank-Nicolson discretization of equation (19.43) is

V n+1i −V n

i∆τ

=

12

[

12σ2

(

V n+1i+1 −2V n+1

i +V n+1i−1

∆x2

)

+ (r− 12σ2)

V n+1i+1 −V n+1

i−12∆x − rV n+1

i

]

+12

[

12σ2

(

V ni+1 −2V n

i +V ni−1

∆x2

)

+ (r− 12σ2)

V ni+1 −V n

i−12∆x − rV n

i

]

(19.44)

Equation (19.44) can be written as

V n+1i

[

1+(α+β+ r)∆ τ2

]

− ∆τ2 βV n+1

i+1 − ∆τ2 αV n+1

i−1

= V ni

[

1− (α+β+ r)∆τ2

]

+∆τ2 βV n

i+1 +∆τ2 αV n

i−1 (19.45)

where

α =12σ2 1

∆x2 − (r− 12σ2)

12∆x ,

β =12σ2 1

∆x2 +(r− 12σ2)

12∆x .

19.4.1 von Neumann Stability Analysis: The Details

Let V n = [V n0,V

n1, ...,V

nM]′, be the discrete solution vector to equation (19.45). Suppose the initial solution vector is

perturbed, i.e.

V 0 = V 0 +E0 (19.46)

100

where En = [En0 , ...,En

M]′ is the perturbation vector. Note that EnM = 0 since Dirichlet boundary conditions are imposed

at this node.Note that V satisfies an equation of the form

V n+1i

[

1+(α+β+ r)∆ τ2

]

− ∆τ2 βV n+1

i+1 − ∆τ2 αV n+1

i−1

= V ni

[

1− (α+β+ r)∆τ2

]

+∆τ2 βV n

i+1 +∆2 ταV n

i−1 (19.47)

Our objective here is to show that as the number of steps becomes large, the effect of any initial perturbation isbounded. This initial perturbation could be due to roundoff or discretization error.

Subtracting equations (19.47) and (19.45), noting that Eni = V n

i −V ni , gives the following equation

En+1i

[

1+(α+β+ r)∆ τ2

]

− ∆τ2 βEn+1

i+1 − ∆τ2 αEn+1

i−1

= Eni

[

1− (α+β+ r)∆τ2

]

+∆τ2 βEn

i+1 +∆τ2 αEn

i−1 (19.48)

In the following we determine the stability of our discretization scheme using the von Neumann approach. Inorder to apply the Fourier transform method, we suppose that the boundary conditions can be replaced by periodicconditions.

We define the inverse discrete Fourier transform (DFT) as follows (we have selected a particular scaling factor)

Eni =

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N ik), (19.49)

where Ck corresponds to the discrete Fourier coefficient of E, and XN = xN/2 − x−N/2+1 is the width of the domainalong the x-axis. Note that the notation (Ck)

n is interpreted as Ck to the power n (n is not a superscript). .Using the discrete orthogonality condition

N2

∑j=− N

2 +1exp(

√−12π

N jk)exp(−√−12π

N jl) =

N if l = k0 otherwise, (19.50)

we can deduce that the forward transforms are

(Ck)n =

XNN

N2

∑i=− N

2 +1En

i exp(−√−12π

N ik), (19.51)

Substituting equation (19.49) into (19.48)

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N ik)[

1+(α+β+ r)∆τ2

]

− ∆τ2 β

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N (i+1)k)− ∆τ2 α

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N (i−1)k)

=

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N ik)[

1− (α+β+ r)∆τ2

]

+∆τ2 β

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N (i+1)k)

+∆τ2 α

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N (i−1)k)

(19.52)

101

Note that the ‖‖2 norm of En is given by

‖En‖2 =

N2

∑j=− N

2 +1En

j (Enj )

∗ (19.53)

where (Enj )

∗ denotes the complex conjugate. From (19.49) we have

N2

∑j=− N

2 +1En

j (Enj )

∗ =1

X2N

N2

∑j=− N

2 +1

N2

∑k=− N

2 +1

N2

∑p=− N

2 +1(Ck)

n((Cp)n)∗ exp(

√12π

N jk)exp(−√

12πN jp)

(19.54)

which after noting (19.50), becomes

‖En‖2 =NX2

N

N2

∑k=− N

2 +1(|Ck|2)n (19.55)

Consequently, in order for ‖En‖2 to be bounded, we require that each Fourier coefficient |Ck|2n be bounded as n → ∞.This means that we must have

|Ck| ≤ 1 (19.56)

We can then look at each Fourier component Ck separately. Equation (19.52) becomes

(Ck)n+1 exp(

√−12π

N ik)[

1+(α+β+ r)∆τ2

]

− ∆τ2 β(Ck)

n+1 exp(√−12π

N (i+1)k)− ∆τ2 α(Ck)

n+1 exp(√−12π

N (i−1)k)

=

(Ck)n exp(

√−12π

N ik)[

1− (α+β+ r)∆τ2

]

+∆τ2 β(Ck)

n exp(√−12π

N (i+1)k)

+∆τ2 α(Ck)

n exp(√−12π

N (i−1)k)

(19.57)

Dividing equation (19.57) by (Ck)n exp(

√−1 2π

N ik), we obtain

Ck

[

1+(α+β+ r)∆τ2

]

− ∆τ2 βCk exp(

√−12π

N k)− ∆τ2 αCk exp(−

√−12π

N k)

=[

1− (α+β+ r)∆τ2

]

+∆τ2 βexp(

√−12π

N k)+∆τ2 αexp(−

√−12π

N k)+∆τ2 (19.58)

Factoring the Ck term, equation (19.58) becomes

Ck =

[

1− (α+β+ r)∆τ2]

+ ∆τ2 βexp(

√−1 2π

N k)+ ∆τ2 αexp(−

√−1 2π

N k)[

1+(α+β+ r)∆τ2]

− ∆τ2 βexp(

√−1 2π

N k)− ∆τ2 αexp(−

√−1 2π

N k)(19.59)

Recall that

α =12σ2 1

∆x2 − (r− 12σ2)

12∆x ,

β =12σ2 1

∆x2 +(r− 12σ2)

12∆x .

102

It follows that

α+β+ r =12σ2 1

∆x2 − (r− 12σ2)

12∆x +

12σ2 1

∆x2 +(r− 12σ2)

12∆x + r

= σ2 1∆x2 + r

and

∆τβexp(√−12π

N k) + ∆ταexp(−√−12π

N k) =

∆τ12σ2 1

∆x2

[

exp(√−12π

N k)+ exp(−√−12π

N k)]

+∆τ(r− 12σ2)

12∆x

[

exp(√−12π

N k)− exp(−√−12π

N k)]

= ∆τσ2 1∆x2 cos(2π

N k)+√−1∆τ

1∆x (r− 1

2σ2)sin(2πN k)

Using the above results in equation (19.59), we find

Ck =[

1− ( σ2

∆x2 + r)∆τ2

]

+ 12

[

∆τσ2

∆x2 cos( 2πN k)+

√−1 ∆τ

∆x (r− 12 σ2)sin( 2π

N k)]

[

1+ 12( σ2

∆x2 + r)∆τ]

− 12

[

∆τσ2∆x2 cos( 2π

N k)+√−1 ∆τ

∆x (r− 12 σ2)sin( 2π

N k)] . (19.60)

Equation (19.60) gives

|Ck|2 =[

1− ( σ2

∆x2 + r)∆τ2 + ∆τσ2

2∆x2 cos( 2πN k)

]2+[ ∆τ

2∆x (r− 12 σ2)sin( 2π

N k)]2

[

1+( σ2

∆x2 + r)∆τ2 − ∆τσ2

2∆x2 cos( 2πN k)

]2+[ ∆τ

2∆x (r− 12 σ2)sin( 2π

N k)]2

(19.61)

or

|Ck|2 =[

1− r− ∆τσ2

2∆x2 (1− cos( 2πN k))

]2+[ ∆τ

2∆x (r− 12 σ2)sin( 2π

N k)]2

[

[1+ r + ∆τσ2

2∆x2 (1− cos( 2πN k))

]2+[ ∆τ

2∆x (r− 12 σ2)sin( 2π

N k)]2

(19.62)

It then follows that (∀k ∈ −N/2+1, ...,+N/2)

|1+ r +∆τσ2

2∆x2 (1− cos(2πN k))| ≥ |1− r− ∆τσ2

2∆x2 (1− cos(2πN k))|

(19.63)

and consequently |Ck| < 1,∀k. Hence Crank-Nicolson timestepping is unconditionally stable.

19.5 Positive Coefficient DiscretizationsWe will proceed a bit more formally here. Suppose we have an explicit discretization:

V n+1i = αi

iV ni + ∑

j∈ηi

αijV n

j ,

ηi = i−1, i+1. (19.64)

103

Note that the αi are not necessarily the same αi as we derived for the Black-Scholes equation, they are simply genericcoefficients. Suppose that

αii + ∑

j∈ηi

αij = 1

αii > 0

αij ≥ 0. (19.65)

We will call equation (19.64) with conditions (19.65) a positive coefficient discretization. Two properties of this are:

Property 1

V n+1i ≤ max

j∈ηi[V n

i ,V nj ]

V n+1i ≥ min

j∈ηi[V n

i ,V nj ]. (19.66)

This follows since, if we let

Ui = maxj∈ηi

[V ni ,V n

j ],

then from equations (19.64,19.65) we have

V n+1i ≤ αi

iUi + ∑j∈ηi

αijUi

= Ui

[

αii + ∑

j∈ηi

αij

]

= Ui. (19.67)

Similarly,

V n+1i ≥ Ui

Ui = minj∈ηi

[V ni ,V n

j ].

Property 2V n+1

i −V ni = ∑

j∈ηi

αij(V n

j −V ni ). (19.68)

We can prove this as follows. Since

V n+1i = αi

iV ni + ∑

j∈ηi

αijV n

j

αii = 1− ∑

j∈ηi

αij

⇒ V n+1i = ∑

j∈ηi

αijV n

j +

(

1− ∑j∈ηi

αij

)

V ni , (19.69)

so thatV n+1

i −V ni = ∑

j∈ηi

αij(V n

j −V ni ). (19.70)

Property 1 implies that the value of V n+1i is bounded by the values V n

i−1,V ni ,V n

i+1. Property 2 says that if V ni is a local

maximum at τn, then V n+1i < V n

i , and similarly, if V ni is a local minimum at τn, then V n+1

i > V ni . In other words, local

maxima are not increased, and local minima are not decreased. This is called a Local Extremum Diminishing (LED)property. (This is actually a misnomer; it would be more correct to call this Local Extremum Non-increasing). In anycase, a positive coefficient method will not generate spurious oscillations.

104

More generally, suppose we have a method

V n+1i = αi

iV ni + ∑

j∈ηi

αijV n+1

j + ∑j∈ηi

βijV n

j , (19.71)

where a suitable choice of α,β will generate an explicit, fully implicit, or C-N method. If

αii > 0 ; αi

j ≥ 0 ; βij ≥ 0

αii +∑ j∈ηi αi

j +∑ j∈ηi βij = 1 , (19.72)

then we obtain

V n+1i ≤ max

j∈ηiV n

i ,V n+1j ,V n

j

V n+1i ≥ min

j∈ηiV n

i ,V n+1j ,V n

j

V n+1i −V n

i =1

(

1− ∑j∈ηi

αij

)

[

∑j∈ηi

αij(V n+1

j −V n+1i )+ ∑

j∈ηi

βij(V n

j −V ni )

]

(19.73)

so that this discretization is LED. Note that we could have α,β as functions of V n+1j ,V n

i , etc. In other words, we couldhave nonlinear coefficients, and if α,β were positive, then the scheme would be LED. This is the basis of nonlinearflux limiters.

19.6 Positive Coefficient Methods and the Black-Scholes EquationNote that we do not quite have all the necessary conditions for a positive coefficient method (as we have defined ithere). For example, if we use a fully implicit method (see equation (19.14))

V n+1i =

V ni +∆ταiV n+1

i−1 +∆τβiV n+1i+1

[1+∆τ(r +αi +βi)], (19.74)

the coefficients are positive if we use central and upstream weighting as appropriate, but the sum of coefficients on theright hand side of equation (19.74) is

sum of coeffs =1+∆ταi +∆τβi

[1+∆τ(r +αi +βi)]

≤ 1 if r > 0.

However, we can write equation (19.74) as

V n+1i =

V ni

1+ r∆τ+

∆ταi1+ r∆τ

(V n+1i−1 −V n+1

i )+∆τβi

1+ r∆τ(V n+1

i+1 −V n+1i ).

In this case we can see that a fully implicit finite volume or finite difference discretization of the Black-Scholes equa-tion satisfies a discounted LED property. In other words, local extrema are non-increasing relative to the discountedfuture value (recall that we are working backward in time). This result is financially reasonable.

From now on, we will describe any discretization of the Black-Scholes equation which has the property that

• the coefficients of V ni ,V n+1

j ,V nj are nonnegative and sum to less than one

as a positive coefficient discretization. Strictly speaking, this is not positive coefficient in the usual sense, since weallow the coefficients to sum to less than one. However, in finance, this is simply the effect of the discounting term. Apositive coefficient method as described above will not cause spurious oscillations, which is our main concern.

105

19.7 Technical Point: M -matricesRecall that we can define the vectors

V n+1 =

V n+10

V n+11|

V n+1m

; V n =

V n0

V n1|

V nm

, (19.75)

and let M be the tridiagonal matrix with entries

[MV n]i = −∆ταiV ni−1 +∆τ(αi +βi + r)V n

i −∆τβiV ni+1. (19.76)

then fully implicit timestepping can be written as

[I +M]V n+1 = V n . (19.77)

Before proceeding further, note the following definition

Definition 1 (M matrix) A matrix Q is an M matrix if

[Q]ii > 0 ; ∀i[Q]i j ≤ 0 ; i 6= j (19.78)

and

[Q]ii ≥ −∑j 6=i

[Q]i j ; ∀i , (19.79)

with strict inequality in at least one row.

An M matrix has the following interesting property

Theorem 19.1 (Inverse of an M matrix) If a matrix Q is an M matrix, then

Q−1 ≥ 0diag(Q−1) > 0 . (19.80)

In other words, all entries of the inverse of an M matrix are nonnegative, and the diagonal entries of the inverse of anM matrix are strictly positive. You can find out everything you ever wanted to know about M matrices in [85].

Provided that we use forward/backward differencing as appropriate to ensure a positive coefficient discretization,then we have that [I +M] in equation (19.76) is an M matrix.

Suppose we have two exact solutions of the Black Scholes equation, V (S,τ),W (S,τ), then the following propertymust hold:

Theorem 19.2 (Arbibrage Inequality) If V (S,0)≥W (S,0) then

V (S,τ) ≥W (S,τ) ∀τ (19.81)

This result is in fact model independent. It must hold for all contingent claims (options), otherwise there is anarbitrage opportunity.

Of course, we would like to ensure that the arbitrage inequality holds for our discrete solutions as well. Let V n andW n be the vectors of discrete solutions. Then, we have the following result

Theorem 19.3 (Discrete Arbitrage Inequality) If fully implicit timestepping is used, and the difference scheme en-sures that [I +M] is an M matrix, then if V n ≥W n, then V n+1 ≥W n+1.

106

Proof . Note that W n+1,V n+1 satisfy

[I +M]W n+1 = W n (19.82)

and

[I +M]V n+1 = V n . (19.83)

Subtracting equation (19.82 from equation (19.83 gives

[I +M] (V n+1 −W n+1) = (V n −W n) (19.84)

or

(V n+1 −W n+1) = [I +M]−1 (V n −W n)

≥ 0 . (19.85)

Once again, this illustrates the importance of a positive coefficient discretization. Of course, this discrete arbitrageinequality will not hold (in general) for Crank-Nicolson timestepping or second order BDF.

19.8 Suppressing Oscillations in Crank-Nicolson TimesteppingAs we have seen, C-N timestepping can have problems if the timestep is larger than the explicit timestep size sinceC-N is not a positive coefficient method. Normally, this problem is not severe. However, the following situations cancause difficulties:

• digital payoffs

• barriers

• callable bonds with a notice period

• computation of delta and gamma at the strike (for vanilla payoffs).

In the above cases, we can often observe slow convergence (not at the theoretically expected second order rate), orobvious spurious oscillations. However, finite element analysis comes to our rescue here. In the constant coefficientcase, a finite volume method with central weighting is identical to a finite element discretization using linear basisfunctions and mass lumping. We can then appeal to some results from finite element analysis for our purposes.

In [87, 72], solution of parabolic equations having rough initial conditions is discussed in detail. If we assume thatwe are taking constant timesteps, then the following simple techniques can be used to restore second order convergence(and suppress oscillations):

• the initial conditions must be l2 projected onto the space of basis functions. In our case, this means that theinitial condition should be approximated by continuous, piecewise linear basis functions. This corresponds tothe folklore about smoothing initial conditions. However, note the following:

– if we have a vanilla put/call payoff, which has a discontinuous derivative at the strike, and we have a nodeat the strike, no smoothing is required because we have a piecewise linear representation of the initialcondition, consistent with the implied basis functions used in the finite volume method. This also explainswhy binomial lattice methods have odd convergence behavior (there is no node at the strike sometimes),while trinomial methods are better behaved.

– in the case of a discontinuous initial condition, since this is not in our space of continuous piecewise linearbasis functions, we have to smooth the initial condition. A very simple remedy, if we use an equal gridspacing near the point of discontinuity, is to replace the undefined value at that node by one-half the sumof the limits from the left and right (see Figure 19.1). (This is not the same as a projection, which is moremathematically correct, but works in simple cases).

107

FIGURE 19.1: Dots represent smoothed payoff for digital option.

• after each non-smooth initial state, we take two fully implicit timesteps, and then use C-N thereafter (payoffswith discontinuous derivatives qualify as non-smooth).

The above techniques are not guaranteed to produce oscillation free results, but they are guaranteed to produce sec-ond order convergence. Intuitively, we can see that the initial smoothing, followed by two steps of fully implicit (whichis guaranteed to not produce oscillations) smooths out the solution enough that C-N does not cause any problems. Itis remarkable that only two steps of fully implicit timestepping is required.

In practice, say for a digital payoff, the above method works remarkably well. Using a smoothed initial state,followed by two fully implicit steps, and C-N thereafter, converges at an observable second order rate to the exactsolution, at the strike price.

However, in cases where discontinuities are applied frequently, say with daily discrete barrier observations, theremay not be enough timesteps between observations to smooth out the solution sufficiently. In these cases, we usuallyhave to use a fully implicit method and be satisfied with first order convergence.

19.9 Projection of Initial ConditionsIn order to smooth the payoff if discontinuous, we project the initial conditions onto the space of linear basis functions[72, 87].

Note that a projection operator P applied to some initial data f , i.e. P f has the nice property that P2 f = P f , sothere is no question about when the data is smooth enough. This method is a bit more complicated than the averagingtechnique described in the last section, but can be generalized to more complex (i.e. two and three factor) situationseasily.

Let Ni be the linear basis functions associated with node i.

Ni = 0 ; S < Si−1

=S−Si−1Si −Si−1

; Si−1 ≤ S ≤ Si

=Si+1 −SSi+1−Si

; Si ≤ S ≤ Si+1

= 0 ; S > Si+1

If f (S) is the payoff (or the initial state after the application of a barrier), then the actual initial state used in the

108

numerical algorithm is f = ∑ciNi where

∑j(Ni,N j)c j = ( f ,Ni) (19.86)

where (g,h) =

Z

gh dS (19.87)

Note that discontinuities should occur at the nodes, and that the integrals should be carried out so that they are exactfor quadratics.

Equation (19.86) is a set of linear equations to solve for the unknown c j. This set of equations always has asolution, and for single factor options, this is tridiagonal system, and so is easy to solve. Once the c j are known, thenwe the discrete initial data at node i is simply ci.

Some useful resultsZ Si+1

SiNiNi+1 dS =

(Si+1 −Si)

6Z Si+1

SiNiNi dS =

(Si+1 −Si)

3 (19.88)

Note that care must be taken when integratingZ Si+1

SiNi f dS

if f is discontinuous at at node. If Simpsons rule is used to evaluate the integral, then the correct limits (from the rightor left) must be used when evaluating nodal values of f . Simpson’s rule for evaluating

Z Si+1

Sig dS

isZ Si+1

Sig dS ' (Si+1 −Si)

6(

gi +4gi+1/2 +gi+1)

so thatZ Si+1

Sif Ni dS ' (Si+1 −Si)

6(

fi +2 fi+1/2)

Z Si+1

Sif Ni+1 ' (Si+1 −Si)

6(

2 fi+1/2 + fi+1)

19.10 The Problem With Crank-NicolsonWe can illustrate the basic problem that arises in a Crank-Nicolson method by looking at discretizing the simpleordinary differential equation

dPdt = −λP

λ > 0. (19.89)

Note that the exact solution to this equation is

P = P0e−λt . (19.90)

Also, observe that the exact solution has the following properties:

• P(t) > 0,∀t if P0 > 0.

109

• P(t) decreases to zero as t → ∞.

Suppose we take constant timesteps ∆t and let Pn = P(n∆t). Then if we discretize this equation with an explicitmethod, we obtain

Pn+1−Pn = −λ∆tPn

Pn+1 = (1−λ∆t)Pn

Pn+1 = (1−λ∆t)n+1P0. (19.91)

Therefore the maximum stable timestep requires λ∆t < 2.If we use a fully implicit method (also known as backward Euler) we obtain

Pn+1 −Pn = −λ∆tPn+1

Pn+1 =Pn

1+λ∆t

Pn+1 =1

(1+λ∆t)n+1 P0. (19.92)

Note that in equation (19.92) that sgn(Pn+1) = sgn(Pn), and that Pn+1 → 0 as λ∆t → ∞.Now, let’s discretize this ODE using Crank-Nicolson:

Pn+1−Pn = −λ∆t(

Pn+1

2 +Pn

2

)

Pn+1 =

1− λ∆t2

1+λ∆t2

Pn

Pn+1 =

1− λ∆t2

1+λ∆t2

n+1

P0. (19.93)

We can see that an immediate problem with Crank-Nicolson is that if λ∆t > 2 (larger than twice the explicit timestepsize), then sgn(Pn+1) = −sgn(Pn), which is a bit disturbing. As well, we can see that as λ∆t → ∞, |Pn+1| → P0.

In terms of propagation of errors, this means that a fully implicit method damps errors rapidly for large timesteps,while Crank-Nicolson does not have this property. A timestepping method which damps errors rapidly for largetimesteps (large here means large compared to an explicit step) is called strongly A-stable [77]. Crank Nicolson isnot strongly A-stable, which is a mathematical way of saying that oscillations in the solution can persist for manytimesteps.

Popular current methods for numerically solving ODEs for stiff systems are based on Backward Difference Formu-lae (BDF) methods. (A stiff system is one where the timestep required for accuracy is much larger than that requiredfor stability. Discretized PDEs generally result in stiff systems.) The simplest BDF method is Backward Euler (fullyimplicit). However, Backward Euler is only first order. The next higher order BDF method is a second order method.By using a Taylor series, we can obtain the following second order discretization of equation (19.90)

32Pn+1 −2Pn +

Pn−1

2 = −∆tλPn+1. (19.94)

Note that this second order BDF method is a multistep method, i.e. information is required at time levels n+1,n,n−1. The stability properties of equation (19.94) can be analyzed by substituting

Pn = µn (19.95)

into equation (19.94) to obtain

3µ2

2 −2µ+12 = −λ∆tµ2 (19.96)

110

The roots of a polynomial are a continuous function of the coefficients, so that as λ∆t → ∞, equation (19.96) becomes

µ2 = 0, (19.97)

implying that all roots of equation (19.96) tend to zero for large λ∆t. This means that errors are strongly dampedout for large timesteps. However, note that the second order BDF method is not a positive coefficient method. Thismeans that initial conditions can generate oscillations, but the strong damping property should cause these oscillationsto disappear fairly quickly. It would appear then that a second order BDF method would be a good candidate multisteptechnique for option pricing.

To see how to use a second order BDF method in the context of the Black-Scholes equation, define the discreteoperator (L)n

i as in equation (19.10) and let

∆τn+1 = τn+1 − τn

∆τn = τn − τn−1,

where we are allowing non-constant timesteps. Then a second order BDF method is obtained by using the followingapproximation for the time derivative [15]:

Ai

[

(

1+∆τn+1

∆τn

)

(V n+1i −V n

i )

∆τn+1 −(

∆τn+1

∆τn

)

V n+1i −V n−1

i∆τn+1 +∆τn

]

= (LV )n+1i . (19.98)

Since we need information at timesteps τn+1,τn,τn−1, a small fully implicit step is used to start this method.In our experiments, the BDF method does work well for European options. Unfortunately, it is our experience that

method (19.98) is very poor for complex American type options. An extensive study of BDF methods compared toC-N timestepping for shout options was carried out in [92]. Shout options have complex American-type constraints.The tests in [92] show very poor convergence of the BDF method (19.98) in this case. Intuitively, this is not surprising.The second order BDF method was derived assuming a certain smoothness in the time direction, so that informationfrom previous timesteps can be used to extrapolated information to future timesteps. However, this previous timestepinformation has no knowledge about how close the solution is to being constrained. Hence, at any nodes where anAmerican constraint is being turned on or off during a step, the solution will be very poor. Consequently, we do notrecommend the second order BDF method as a general technique.

19.11 A Simple Timestep SelectorConstant timesteps are usually quite inefficient. For example, small timesteps are usually required after a discretebarrier observation in order to resolve rapid solution gradients. Thereafter, timesteps can be rapidly increased. Longterm options (e.g. pension guarantees), and bond pricing typically require solution over periods of up to 30 years. Inthe case of bonds, for example, we have found that after the initial transients have died out, timesteps of up to one yearcan be taken.

A very simple timestep selector, which we have found to be very effective, is discussed in [51]. This method usesonly information from the current timestep to predict a suitable timestep adjustment for the next timestep. It employsa relative change criterion. Specifically, given an initial timestep ∆τn+1, then a new timestep ∆τn+2 is selected so that

∆τn+2 =

mini

dnorm|V (Si,τn +∆τn+1)−V(Si,τn)|

max(D, |V (Si,τn +∆τn+1)|, |V (Si,τn)|)

∆τn+1 (19.99)

where dnorm is a target relative change (during the timestep) specified by the user. The scale D is selected so thatthe timestep selector does not take an excessive number of timesteps in regions where the value is small (for optionsvalued in dollars, D = 1 is often used).

More sophisticated timestepping methods can be used which estimate the truncation error of a given step. However,these methods typically use higher derivative information (in the time direction). These methods seem to behave poorlyfor American options, due to the lack of solution smoothness as a function of time.

111

A classic alternative is to compute the solution at V n+1i using both a C-N and fully implicit method. The truncation

error is then estimated from the difference between these two results. The step is rejected if the estimated error is toolarge, or increased if the error is less than a tolerance. A successful step then requires twice the work of a standardC-N step. However, this procedure is completely automatic, and deserves some attention.

20 Constraints: American Options and BarriersSo far we have been discussing European options. Suppose we have to price an American-style option, where theholder can exercise at any time and receive a payoff

Payoff = V ∗(S, t).

This pricing problem can be formally stated as a linear complementarity problem

Vτ −(

σ2

2 S2VSS + rSVS− rV)

≥ 0 (20.1)

(V −V∗) ≥ 0, (20.2)

where one of equations (20.1) or (20.2) holds with strict equality.There are several ways to tackle the linear complementarity problem. For example, we could simply enforce the

constraint explicitly. If we use an implicit method for example, we first determine the solution at τn+1 as if there wasno constraint

V n+1i [1+∆τ(r +αi +βi)] = V n

i +∆ταiV n+1i−1 +∆τβiV n+1

i+1 , (20.3)

and then simply apply the constraint explicitly, before proceeding on to the next step:

For i=0,. . . ,mV n+1

i := max(V ∗i ,V n+1

i )EndFor (20.4)

This is similar to the usual approach in lattice methods. However, one problem with this is that the solution is incon-sistent. For example, the delta is not continuous across the early exercise boundary [20]. In addition, in some casesconvergence to the solution is demonstrably slower than with alternative methods [102].

Another approach is to solve the discrete linear complementarity problem by a relaxation method. For ease ofexposition, we will assume that a fully implicit method is being used, and that the relaxation parameter is unity. If(V n+1

i )k is the kth iteration for (V n+1i ), then the algorithm is

For k=1,. . . ,until convergedFor i=1,. . . ,m-1

(V n+1i )k+1 :=

V ni +∆ταi(V n+1

i−1 )k+1 +∆τβi(V n+1i+1 )k

[1+∆τ(r +αi +βi)]

(V n+1i )k+1 := max((V n+1

i )k+1,V ∗i )

EndForIf (converged) quit

EndForThis does converge, but it is very slow. Consider the case when there is no American constraint, and we are simplypricing a European option. In this case, we can factor and solve a tridiagonal matrix of size N in 5N multiply-divides.A lower bound for the complexity of a relaxation method for solving a European option would be to accelerate thismethod with a conjugate-gradient type acceleration. In this case, the condition number of the matrix would be

∆t(∆S)2 = O(

1∆t )

= O(N),

112

assuming that ∆t = O(∆S) = O(1/N) where N is the number of steps. Consequently, the number of iterations requiredto solve the matrix using a PCG acceleration would be O(N1/2). The total complexity for N steps would then beO(N5/2). This can be compared with O(N2) if the matrix is solved directly at each step.

It is possible to use LP type algorithms for solving the discrete linear complementarity problem directly [25],but these methods do not generalize easily to more than one dimension. Newton type methods appear to have goodconvergence and can be generalized to more than one dimension [20].

We will use a method which is closely related to the Newton methods in [20]. We convert the complementarityproblem into a nonlinear algebraic problem, through use of a penalty method [97]. The nonlinear equations are thensolved by Newton iteration (although the Jacobian is trivial to compute in this case).

20.1 The Penalty MethodThe penalty method outlined here has the advantages that

• it is trivially generalized to higher dimensions (multifactor options) [97]

• each iteration generates a well-behaved sparse matrix, which can be solved using standard direct [42] or iterative[24, 23, 50, 34, 75] methods (for multifactor options).

We can write equations (20.1) and (20.2) as a single equation

Vτ =σ2

2 S2VSS + rSVS− rV +Q(V,V ∗), (20.5)

where the penalty term Q(V,V ∗) satisfies

Q(V,V ∗) = 0 ; if V > V ∗

Q(V,V ∗) > 0 ; if V = V ∗. (20.6)

Intuitively, we can see how this might work. If Q(V,V ∗) = 0, then we have

V > V ∗

Vτ =σ2

2 S2VSS + rSVS− rV. (20.7)

On the other hand, if Q(V,V ∗) > 0, then Q(V,V ∗) is defined so that

|V −V∗| < ε ; ε 1

Vτ −(

σ2

2 S2VSS + rSVS− r(t)V)

= Q(V,V ∗) > 0. (20.8)

Consider a C-N timestepping method (with a finite volume or finite difference discretization) of the Black-Scholesequation with a discrete penalty term qn+1

i . Recall the definition of the discrete operator (LV )ni in equation (19.10).

Then, we can write the C-N timestepping as[

V n+1i −V n

i∆τ

]

=12(

(LV )n+1i +(LV )n

i)

+qn+1i , (20.9)

where the penalty term qn+1i is defined as

qn+1i =

1∆τ

(V ∗i −V n+1

i )Large ; if V n+1i < V ∗

i

= 0 ; otherwise, (20.10)

and where Large is a large number (we will define this precisely in terms of the convergence tolerance later).

113

Now, equations (20.9-20.10) represent a set of nonlinear algebraic equations due to the penalty term (20.10). So,how do we go about solving equations (20.9-20.10)? Let’s write the equations in matrix form

V n+1 =

V n+10

V n+11|

V n+1m

; V n =

V n0

V n1|

V nm

; V ∗ =

V ∗0

V ∗1|

V ∗m

, (20.11)

with[

MV n]row i = −∆ταi

2 V ni−1 +

∆τ(αi +βi + r)2 V n

i − ∆τβi2 V n

i+1 (20.12)

where αi,βi are as defined in equation (17.12). Note that the first and last rows of M will have to be modified to takeinto account the boundary conditions. Let the diagonal matrix P be given by

P(V n+1)ii = Large ; if V n+1i < V ∗

i= 0 ; otherwise

P(V n+1)i j = 0 ; if i 6= j. (20.13)

Then we can write equations (20.9-20.10) as[

I + M + P(V n+1)]

V n+1 =[

I− M]

V n +[

P(V n+1)]

V ∗

(V ∗)i = V ∗i . (20.14)

We can develop an iterative solution of equation (20.14) as follows: let (V n+1)k be the kth estimate for V n+1. If

(V n+1)0 = V n (20.15)

then

Penalty American Constraint Iteration

For k = 0, . . . until convergence

[

I + M + P((V n+1)k)]

(V n+1)k+1 =[

I− M]

V n + P((V n+1)k)V ∗

if maxi

|(V n+1i )k+1 − (V n+1

i )k|max(1, |(V n+1

i )k+1|)< tol quit (20.16)

EndFor

Note that if there is no constraint triggered during a timestep, then algorithm (20.16) converges in one iterationwith O(N) work (N being the number of unknowns), while a relaxation method requires O(N3/2) work (assumingoptimal acceleration, as discussed previously). So we have an improvement, at least for this simple case (comparedwith the relaxation algorithm). In practice, we observe that if ∆t = O(∆S), then the number of nonlinear iterations pertimestep (as ∆S → 0) stays roughly constant when using the penalty method. This gives a complexity of O(N 2) for thecomplete penalty method for N steps (assuming ∆t = O(∆S)).

Now, let’s see what this penalty method is doing. Suppose we are using algorithm (20.16) to solve for (V n+1).Suppose the initial estimate is

(V n+1i )0 = V n

i

114

and suppose V ni > V ∗

i , i.e. it is not optimal to exercise early at τ = τn,S = Si. Since (V n+1i )0 = V n

i , Pii = 0. After oneiteration of algorithm (20.16), if (V n+1

i )1 > V ∗i , then it is still not optimal to exercise, so Pii = 0.

On the other hand, if (V n+1i )1 < V ∗

i , this suggests that it is optimal to exercise early, so Pii = Large, which meansthat qn+1

i > 0, so the penalty term adds value to the solution at this node. Eventually, we converge to a solution whereeither

V n+1i > V ∗

i → qi = 0or

V ∗i −V n+1

i > 0 → qi > 0|V ∗

i −V n+1i | < ε ; ε 1.

Therefore, the converged solution either

• satisfies a discrete form of the PDE with V n+1i > V ∗

i , or

• V n+1i = V ∗

i − ε .

Thus we have converged to an approximate solution of the discrete linear complementarity problem. But, how smallis ε? How big should Large be? Let’s go back and look at the discrete equations with the penalty term (20.9). Assumethat at node i, V n+1

i < V ∗i , so that[

(1+Large)Vn+1i∆τ

− V ni

∆τ

]

=12(

(LV )n+1i +(LV )n

i)

+V ∗

i Large∆τ

. (20.17)

In the limit as Large → ∞, assuming that the discrete derivative terms on the right hand side of equation (20.17) arebounded, equation (20.17) becomes approximately

V n+1i ' V ∗

i Large1+Large .

This can be written

V ∗i −V n+1

i =V ∗

i1+Large ,

so that (assuming V ∗i 6= 0)

∣

∣

∣

∣

∣

V ∗i −V n+1

iV ∗

i

∣

∣

∣

∣

∣

' 1Large .

If we want an accuracy tol in algorithm (20.16), we must have∣

∣

∣

∣

∣

V ∗i −V n+1

iV ∗

i

∣

∣

∣

∣

∣

< tol,

implying that

Large ' 1tol .

Observe that it is a bad idea to choose Large 1/tol, since this may cause round-off error problems.For those worried about the convergence of the above method (i.e. does a solution found by the penalty method with

Large → ∞ converge to the solution of the linear complementarity problem?), it is reassuring to note the following: ananalytic penalty method [41] is often used to prove existence and uniqueness of the linear complementarity problem(20.2). The penalty algorithm described above can be viewed as a discrete form of the analytic penalty technique.Hence convergence to the linear complementarity problem is ensured, as Large → ∞, and as ∆S,∆τ → 0.

Details about this method can be found in [38].

115

20.2 Implicit vs. Explicit Handling of the American ConstraintIn many circumstances, the American constraint can be handled explicitly (see algorithm (20.4)). However, in thiscase, we expect that the convergence will be O(∆t) even if C-N is used. This is because there will be an O(∆t) errordue to the explicit constraint. We will explore this idea further, in some numerical tests.

20.3 Numerical ExamplesIn order to carry out a careful convergence study, we need to take into consideration the fact that the payoff functionhas only piecewise smooth derivatives. This can cause problems if Crank-Nicolson timestepping is used. Specifically,oscillatory solutions can be generated [101]. For example, if we consider a simple European put option, then we knowthat the asymptotic solution near the expiry time τ = 0, near the strike K, is [90]

Put Value = O(√

τ) (20.18)

which would suggest that

Vttt = O((τ)−5/2). (20.19)

The local finite difference truncation error for a Crank-Nicolson step (near τ = 0) would then be

Local Error = O(Vttt (∆τ)3) (20.20)

and if we set τ = ∆τ (the first step) then

Local Error = O(√

∆τ) (20.21)

which would result in poor convergence. Fortunately, this analysis is a bit too simplistic. The behavior of the solutionin equation (20.18) is due to the non-smooth payoff near the strike, which causes VSS to behave (near τ = 0,S = K) as[90]

VSS = O(

1√τ

)

. (20.22)

This large value of VSS causes a very rapid smoothing effect, due to the parabolic nature of the Black-Scholes equation.Consequently, if an appropriate timestepping method is used, we can expect the initial errors to be damped veryquickly. However, there is a problem with Crank-Nicolson timestepping. Crank-Nicolson is only A-stable and notstrongly A-stable, which means that some errors are very slowly damped. This is manifested by oscillations in thenumerical solution.

As discussed in a previous section, we can guarantee second order convergence (for European options) if we havea node at the strike, and if we take two steps of fully implicit, and Crank-Nicolson thereafter.

Note that second order convergence does not guarantee that the solution is non-oscillatory, but in practice the abovemethods work well.

We can demonstrate the effectiveness of the simple idea of taking two fully implicit methods at the start, andCrank-Nicolson thereafter (which we will describe as Rannacher smoothing [72] in the remainder of this section), fora European put option with known solution. We will use the rather extreme value of σ = .8 to illustrate these ideas.

From Table 20.1 we can observe that the solution with no smoothing converges erratically as the grid spacingand timestep size are reduced. The smoothed solution shows quadratic convergence. The reason for the poor conver-gence of the non-smoothed runs can be be explained by examining the value, delta (VS) and gamma (VSS) plots, asshown in Figure 20.1. (It is of practical importance to determine delta (VS) and gamma (VSS), since these are hedgingparameters[47]). Note that although the value appears smooth, oscillations appear in delta (near the strike) and aremagnified in gamma. The same problem was run using Rannacher smoothing, and the results are shown on Figure20.2. The oscillations in delta and gamma have disappeared. All subsequent runs will use Rannacher smoothing.

It might appear appropriate to use a timestepping method with better error damping properties, such as a secondorder BDF method [15]. However, as we discussed earlier, this method works fine for European options, but is poor forcomplex American options. We conjecture that this is due to a lack of smoothness in the time direction for Americantype problems, which causes problematic behavior for multistep methods. This effect will be addressed in some detailbelow.

116

Nodes timesteps Value Change RatioNo smoothing

68 25 14.50470135 50 14.41032 .09438269 100 14.43238 .02215 4.3539 200 14.44246 .01008 2.21073 400 14.44726 .00480 2.1

Rannacher smoothing68 25 14.41872135 50 14.44357 .02485269 100 14.44982 .00625 4.0539 200 14.45138 .00156 4.01073 400 14.45177 .00039 4.0

TABLE 20.1: Value of a European Put, σ = .8, T = .25, r = .10, K = 100, S = 100. Exact solution (to seven figures):14.45191. Change is the difference in the solution from the coarser grid. Ratio is the ratio of the changes on successivegrids.

50 60 70 80 90 100 110 120 130 140 1500

5

10

15

20

25

30

35

40

45

50

Asset Price

Opt

ion

Valu

e

European Put Option Value

50 60 70 80 90 100 110 120 130 140 150−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

Asset Price

Opt

ion

Delta

European Put Option Delta

50 60 70 80 90 100 110 120 130 140 150−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Asset Price

Opt

ion

Gam

ma

European Put Option Gamma

FIGURE 20.1: Value, delta (VS) and gamma (VSS) of a European Put, σ = .8, T = .25, r = .10, K = 100. Crank-Nicolsontimestepping, no smoothing. Grid with 135 nodes used.

50 60 70 80 90 100 110 120 130 140 1500

5

10

15

20

25

30

35

40

45

50

Asset Price

Opt

ion

Valu

e

European Put Option Value

50 60 70 80 90 100 110 120 130 140 150−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

Asset Price

Opt

ion

Delta

European Put Option Delta

50 60 70 80 90 100 110 120 130 140 1502

4

6

8

10

12

14x 10−3

Asset Price

Opt

ion

Gam

ma

European Put Option Gamma

FIGURE 20.2: Value, delta (VS) and gamma (VSS) of a European Put, σ = .8, T = .25, r = .10, K = 100. Crank-Nicolsontimestepping, Rannacher smoothing. Grid with 135 nodes used.

117

Nodes timesteps Nonlinear Value Change Ratio Workiterations (flops)

explicit constraint55 25 25 3.04600 8.3×103

109 50 50 3.06049 .01449 3.3×104

217 100 100 3.06598 .00549 2.6 1.3×105

433 200 200 3.06822 .00224 2.5 5.2×105

865 400 400 3.06922 .00100 2.2 2.1×106

implicit constraint55 25 58 3.05607 3.2×104

109 50 117 3.06555 .00948 1.3×105

217 100 234 3.06854 .00299 3.2 5.1×105

433 200 471 3.06953 .00099 3.0 2.0×106

865 400 937 3.06988 .00035 2.8 8.1×106

TABLE 20.2: Value of an American Put, σ = .2, T = .25, r = .10, K = 100, S = 100. Change is the difference in thesolution from the coarser grid. Ratio is the ratio of the changes on successive grids. Constant timesteps. Rannachersmoothing used. Work is measured in terms of number of multiply/divides.

20.4 Numerical tests of implicit and explicit handling of the American constraintWe will now compare an implicit treatment of the American constraint (using the penalty technique) with an explicittreatment, using constant timesteps. In these examples we use a convergence tolerance of tol = 10−6 (see pseudo code(20.16)), and consequently a value of Large = 106.

As an additional check on the accuracy, for all runs we also monitored the quantity

max American error = maxn,i

max(0,(V ∗i −V n

i ))

max(1,V ∗i )

(20.23)

which is a measure of the maximum relative error in enforcing the American constraint using the penalty method. Inall the following examples, the observed value of the maximum American error equation (20.23) was ' 10−9 if thepenalty method was used with tol = 10−6.

Two volatility values were used in these examples: σ = .2, .8. We truncate the computational domain at S = Smax,where the boundary conditions are applied. The grid for σ = .2 used Smax = 200, while the grid for σ = .8, usedSmax = 1000. Both grids were identical for 0 < S < 200. The grid for σ = .8 added additional nodes for 200 < S < 1000.

Table 20.2 compares results for implicit (penalty method) and explicit handling of the American constraint, forσ = .2, with constant timesteps. Taking into account the work per unit accuracy, the implicit method is slightly superiorto the explicit technique. However, note that the implicit method does not appear to be converging quadratically (theratio of changes is about 3 instead of 4, which we would expect for quadratic convergence). The explicit methodappears to be converging at a first order rate (ratio of 2).

Table 20.3 compares explicit and implicit methods for a case with high volatility (σ = .8). Note that for the penaltymethod, the total number of nonlinear iterations for σ = .8 (Table 20.3), and σ = .2 (Table 20.2) at each refinementlevel is roughly the same, which indicates that the number of nonlinear iterations is relatively unaffected by volatility.

Taking into account the total work, it would appear that in this case the explicit method is a slightly superiormethod, compared to the implicit treatment of the American constraint. The implicit method appears to have an errorratio of about 2.9, while the explicit method has convergence rate somewhat lower.

20.5 Analysis of Constant Timestep ExamplesIn terms of approximately solving the discrete LCP problem, the penalty method performs as the analysis predicts.The number of iterations per timestep is typically of the order 2.2−2.4, independent of the volatility, for reasonabletimesteps. Note that the algorithm (20.16) requires at least two iterations per timestep. The a posteriori check (20.23)(a maximum relative error of 10−9, with tol = 10−6 in terms of satisfaction of the discrete LCP constraint) indicates

118



135 50 50 14.65685 .4002 4.1×104

269 100 100 14.67045 .01360 2.9 1.6×105

539 200 200 14.67542 .00497 2.7 6.5×105

1073 400 400 14,67738 .00196 2.5 2.6×106


135 50 112 14.66219 .03511 1.5×105

269 100 226 14.67324 .01105 3.2 6.1×105

537 200 461 14.67686 .00362 3.1 2.5×106

1073 400 940 14.67813 .00127 2.9 1.0×107

TABLE 20.3: Value of an American Put, σ = .8, T = .25, r = .10, K = 100, S = 100. Change is the difference in thesolution from the coarser grid. Ratio is the ratio of the changes on successive grids. Constant timesteps. Rannachersmoothing used. Work measured in terms of number of multiply/divides.

that the error introduced by the penalty method is quite small. This error is a function of tol (pseudo-code (20.16)),and hence can be adjusted to the desired level (which would impact the number of nonlinear iterations of course).

However, in terms of the convergence of the discretization of the LCP problem, as the grid is refined and thenumber of timesteps increased, the results are disappointing. We do not observe quadratic convergence for the implicithandling of the American constraint.

An error ratio of about 2.8 would be consistent with global timestepping convergence at a rate of O((∆τ)3/2). Now,from [74, 89], we know that the value of an American call (with proportional dividends) near the exercise boundary,near τ = 0 behaves like

V = const.+O((τ)3/2) (20.24)

which would give a value for Vttt (near τ = 0, S = exercise boundary) of

Vttt = O((τ)−3/2). (20.25)

It appears that the behavior of the American put, near τ = 0, near the exercise boundary is [58]

V = const.+O((τ logτ)3/2)

' const.+O((τ1−ε)3/2) ; ε > 0,ε 1,τ → 0. (20.26)

In the following we will ignore the ε in equation (20.26) and assume that the behavior of Vttt is given by equation(20.25).

From equation (20.25), we have that the local time truncation error (for Crank-Nicolson timestepping) is

Local Error = O(

(∆τ)3

(τ)3/2

)

. (20.27)

Note that in this case, because of the smooth-pasting condition [90] (i.e. delta (VS) continuous across the exerciseboundary), there is no parabolic smoothing effect, since this diffusion term in the Black-Scholes equation approachesa finite limit at the exercise boundary. Assuming that the global error is of the order of the sum of the local errors, thenfrom equation (20.27) we obtain

Global Error = O(

i=1/∆τ

∑i=1

(∆τ)3

(i∆τ)3/2

)

' O((∆τ)3/2) (20.28)

119

which is consistent with the observed rates of convergence. Now, instead of taking constant timesteps, suppose wetake timesteps which satisfy

maxi

(|V n+1i −V n

i |) ' d (20.29)

where d is a specified constant. In order to take the limit to convergence, at each grid refinement, we will halve thegrid spacing and halve d. Now, we can assume that the maximum change over a timestep (at least near τ = 0) willoccur near the strike. So, from equation (20.18), we can see that

∆V n+1 ' O(

∆τn+1√

τ

)

∆V n+1 = maxi

(|V n+1i −V n

i |) (20.30)

and therefore, from equations (20.29) and (20.30), we have

∆τn+1 = O(d√

τ). (20.31)

Assuming a local error of the form (20.27), this gives a local error with the variable timesteps given by equation(20.29) as (using equations (20.27) and (20.31) )

Local Error = O(

((∆τ)n+1)3

(τ)3/2

)

= O(

d3τ3/2

τ3/2

)

= O(d3). (20.32)

This gives a global error (with O(1/d) timesteps) of

Global Error = O(d2). (20.33)

Consequently, if we take variable timesteps according to the prescription (20.29), then at each refinement stage,where we double the number of grid nodes, and double the number of timesteps (by halving d), we should see quadraticconvergence. Note that we should reduce the initial timestep ∆τ0 by four at each refinement.

20.6 Using A Timestep SelectorIn order to ensure the correct timestepping behavior, so that quadratic convergence is obtained, will use the timestepselector (19.99).

Since V (Si = K,τ ' 0) ' 0, we expect that the denominator of equation (19.99) will take its largest value nearS = K, since V increases rapidly at S ' K. Consequently, the timestep selector (19.99) will approximately enforce thecondition that

∆V n+1 ' D×dnorm (20.34)

and hence we will have

∆τn+1 = O(dnorm√

τ), (20.35)

so that we should see a global error of O((dnorm)2), which follows from equation (20.33).Note that timestep selector (19.99) estimates the change in the solution at the new timestep based on changes

observed over the old timestep. Some adjustments can be made to this simple model if a more precise form for thetime evolution of the solution is assumed, but we prefer (19.99) since it is simple and conservative.

In practice, we select a (∆τ)0 for the coarsest grid, and then (∆τ)0 is cut by four at each grid refinement. Notethat there is not much of a penalty for underestimating a suitable (∆τ)0 since the timestep will increase rapidly if theestimate is too conservative. In practice, we can enforce the condition that the observed (as opposed to predicted)changes in the solution over a timestep meet the required conditions by repeating the timestep with a smaller value ofthe timestep if the observed changes are larger (say by a factor of two) then specified by dnorm. In the following runs,on the coarsest grid we used a value of (∆τ)0 = 10−3 and dnorm = .2. The value of dnorm was reduced by two oneach grid refinement.

120



109 33 33 3.05825 .01326 4.0×104

217 63 63 3.06425 .00600 2.2 1.5×105

433 122 122 3.06717 .00292 2.1 5.8×105

865 239 239 3.06863 .00146 2.0 2.3×106


109 33 85 3.06867 .00464 1.2×105

217 63 164 3.06975 .00108 4.3 4.6×105

433 122 322 3.07002 .00027 4.0 1.8×106

865 239 636 3.07008 .00006 4.5 7.2×106

TABLE 20.4: Value of an American Put, σ = .2, T = .25, r = .10, K = 100, S = 100. Change is the difference in thesolution from the coarser grid. Ratio is the ratio of the changes on successive grids. Variable timesteps. Rannachersmoothing used. Work is measured in terms of number of multiply/divides.

50 60 70 80 90 100 110 120 130 140 1500

5

10

15

20

25

30

35

40

45

50

Asset Price

Opt

ion

Valu

e

American Put Option Value

50 60 70 80 90 100 110 120 130 140 150−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Asset Price

Opt

ion

Delta

American Put Option Delta

50 60 70 80 90 100 110 120 130 140 150−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Asset Price

Opt

ion

Gam

ma

American Put Option Gamma

FIGURE 20.3: Value, delta (VS) and gamma (VSS) of an American Put, σ = .2, T = .25, r = .10, K = 100. Crank-Nicolson timestepping, Rannacher smoothing, variable timesteps. Grid with 433 nodes used. Constraint imposed ex-plicitly.

20.7 Variable Timestep ExamplesTable 20.4 shows results for σ = .2, this time using the timestep selector (19.99). The timestep selector requires onedivide/node, so we assume that the work required for each iteration of the implicit method is 13 multiply/divides pernode, while the explicit method requires 11 multiply/divides per node per timestep. In this case, the implicit methodappears to be a clear winner in terms of flops per unit accuracy. Use of variable timesteps actually seems to degradethe convergence of the explicit method. This can be explained by looking at the timestep history. The timestep selectoruses small timesteps at the start, and then takes large steps at the end. Note that the average timestep size (total timedivided by number of timesteps) is larger for the variable timestep run compared to the constant timestep run (Table20.2). This clearly negatively impacts the explicit method, which seems to show a first order rate of convergence. Onthe other hand, the implicit method seems to show close to quadratic convergence. Clearly, for this example, use of animplicit method with variable timesteps appears to be the best method in terms of flops (comparing Tables 20.4 and20.2).

Figures 20.3 and 20.4 compare value, delta (VS) and gamma (VSS) for the σ = .2 case, using an explicit and implicittreatment of the American constraint. Although the value and delta appear similar for both cases, there are clearly largeoscillations in the gamma near the early exercise boundary for the explicit method. The implicit method does show

121

50 60 70 80 90 100 110 120 130 140 1500

5

10

15

20

25

30

35

40

45

50

Asset Price

Opt

ion

Valu

e


50 60 70 80 90 100 110 120 130 140 150−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Asset PriceO

ptio

n De

lta


50 60 70 80 90 100 110 120 130 140 150−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Asset Price

Opt

ion

Gam

ma


FIGURE 20.4: Value, delta (VS) and gamma (VSS) of an American Put, σ = .2, T = .25, r = .10, K = 100. Crank-Nicolson timestepping, Rannacher smoothing, variable timesteps. Grid with 433 nodes used. Constraint imposed im-plicitly.

50 60 70 80 90 100 110 120 130 140 1500

5

10

15

20

25

30

35

40

45

50

Asset Price

Opt

ion

Valu

e


50 60 70 80 90 100 110 120 130 140 150−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Asset Price

Opt

ion

Delta


50 60 70 80 90 100 110 120 130 140 150−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Asset Price

Opt

ion

Gam

ma


FIGURE 20.5: Value, delta (VS)and gamma (VSS) of an American Put, σ = .2, T = .25, r = .10, K = 100. Fully implicittimestepping, Rannacher smoothing, variable timesteps. Grid with 433 nodes used. Constraint imposed implicitly.

some small oscillations near the exercise boundary. However, this is due to the use of Crank-Nicolson timestepping,as noted in [20]. These oscillations disappear if fully implicit timestepping is used as shown in Figure 20.5.

Table 20.5 shows similar results for the σ = .8 case, using the timestep selector. As for the σ = .2 case, theconvergence rate for the implicit method appears to be close to quadratic. The explicit method appears to be trendingtowards linear convergence. In this case, the implicit penalty method is superior to the explicit method, in terms ofnumber of flops per unit of accuracy. Comparing Tables 20.3 and 20.5, we can see that the implicit method withvariable timesteps is the best performer.

20.8 Comparison with Lattice MethodsIt is interesting to compare the results here with those obtained using the standard binomial lattice method, which iscommonly used in finance [90]. This method is optimized for problems which assume that the underlying process isgeometric Brownian motion, and that the volatility is independent of S. To be precise, let

Snm = u2m−nS0

0 ; m = 0, ...,n (20.36)

be the value of the asset price at timestep tn = n∆t, and lattice point m, where

u = eσ√

∆t

∆t =TN (20.37)

122



135 66 66 14.66856 .02022 9.8×104

269 136 136 14.67472 00616 3.3 4.0×105

537 276 276 14.67703 .00231 2.7 1.6×106

1073 554 554 14.67800 .00097 2.4 6.5×106


135 66 161 14.67417 .01554 2.8×105

269 136 325 14.67778 .00361 4.3 1.1×106

537 276 655 14.67862 .00084 4.3 4.6×106

1073 554 1290 14.67882 .00020 4.2 1.8×107

TABLE 20.5: Value of an American Put, σ = .8, T = .25, r = .10, K = 100, S = 100. Change is the difference in thesolution from the coarser grid. Ratio is the ratio of the changes on successive grids. Variable timesteps. Rannachersmoothing used. Work is measured in terms of number of multiply-adds.

and where T is the expiry time of the option, and N is the number of timesteps. Note that we are considering time tgoing forward in this case, in contrast to τ = T − t (time going backwards) as in the previous sections. This results ina solution algorithm which procedes backwards from t = T to t = 0, so that we proceed from t = tN to t = 0. This isawkward, but corresponds to the notation commonly used in finance.

Let V nm be the value of the option associated with asset price Sn

m, at time t = n∆t, then

V Nm = max(K −SN

m,0) m = 0, ...,N . (20.38)

If the risk-neutral probabilities p are defined so that

p =er∆t − e−σ

√∆t

eσ√

∆t − e−σ√

∆t, (20.39)

then the value of the American put option V 00 (the value at the single point S = S0

0, t = 0) is obtained by the followingalgorithm

Binomial Lattice Algorithm

For n = N −1, ...,0For m = 0, ...,n

V = e−r∆t (pV n+1m+1 +(1− p)Vn+1

m)

(20.40)V n

m = max(K −Snm,V )

EndForEndFor

The above method is usually derived in the financial literature based on probabilistic arguments. In fact, we cansee that this is equivalent to an explicit finite difference method with a particular choice for the timestep.

Consider the Black-Scholes equation for a European option:

Vt +σ2

2 S2VSS + rSVS− rV = 0 (20.41)

Define a new variable X = logS, so that equation (20.41) becomes

Vt +σ2

2 VXX +(r− σ2

2 )VX − rV = 0. (20.42)

123

Let V = ertW , then equation (20.42) becomes

Wt +σ2

2 WXX +(r− σ2

2 )WX = 0. (20.43)

Let

W nm = W (logS0

0 +(2m−n)σ√

∆τ,n∆τ) ; m = 0, ...,n (20.44)

Then, discretizing equation (20.43) using central differencing in the X direction, and an explicit timestepping method,we obtain

W nm =

[

p∗(

W n+1m+1)

+(1− p∗)(

W n+1m)]

+O((∆t)2)

p∗ =12

[

1+√

∆t( r

σ− σ

2

)]

. (20.45)

Writing equation (20.45) in terms of V nm we obtain

V nm = e−r∆t [p∗

(

V n+1m+1)

+(1− p∗)(

V n+1m)]

+O((∆t)2). (20.46)

Expanding p in equation (20.39) in a Taylor series, noting the definition of p∗ in equation (20.45), and assuming thatV n+1

m+1 −V n+1m = O(

√∆τ), we obtain

V nm = e−r∆t [p

(

V n+1m+1)

+(1− p)(

V n+1m)]

+O((∆t)2). (20.47)

Comparing equation (20.47) with algorithm (20.40) we can see that the binomial lattice method is simply an explicitfinite difference discretization of the discrete LCP problem, with the American constraint applied explicitly.

The binomial lattice method requires about 3/2N2 flops (counting only multiplies, and assuming all necessaryfactors are precomputed). Note that we obtain the value of the option at t = 0 only at the single point S0

0, in contrastto the PDE methods which obtain values for all S ∈ [0,Smax]. Consequently, the methods are not directly comparable.Nevertheless, assuming that we are only interested in obtaining the solution at a single point, it is interesting to comparethese two techniques.

If N = O(1/(∆τ)), then the complexity of the standard binomial method is

Complexity binomial lattice = O(N2). (20.48)

The error in the lattice method is O(∆τ) = O(1/N), so that

Error binomial lattice = O(

1(Complexity)1/2

)

. (20.49)

On the other hand, if an implicit finite volume method is used, with Crank-Nicolson timestepping, and the penaltymethod is employed for handling the American constraint, then we have that the complexity is

Complexity implicit finite volume = O(N2), (20.50)

where we have assumed that N = O(1/(∆S)). (This is the case if we carry use the timestep selector (19.99) anddnorm = O(∆S)). It is also assumed that the the number of nonlinear iterations per timestep is constant as ∆S → 0,which is observed as long as dnorm = O(∆S).

Now, assuming that the timesteps are selected using (19.99), we have observed that the convergence is quadratic,i.e.

Error implicit finite volume = O(

1N2

)

= O(

1Complexity

)

, (20.51)

124

timesteps Value Change Ratio Work(flops)

σ = .250 3.06186 3.8×103

100 3.06611 0.00425 1.5×104

200 3.06810 0.00199 2.1 6.0×104

400 3.06913 0.00103 1.9 2.4×105

800 3.06962 0.00049 2.1 9.6×105

1600 3.06987 0.00025 2.0 3.8×106

3200 3.06999 0.00012 2.1 1.5×107

σ = .850 14.62649 3.8×103

100 14.65269 0.02620 1.5×104

200 14.66582 0.01313 2.0 6.0×104

400 14.67238 0.00656 2.0 2.4×105

800 14.67563 0.00325 2.0 9.6×105

1600 14.67726 0.00163 2.0 3.8×106

3200 14.67807 0.00081 2.0 1.5×107

TABLE 20.6: Binomial lattice method. Value of an American Put, T = .25, r = .10, K = 100, S = 100. Change is thedifference in the solution from the coarser grid. Ratio is the ratio of the changes on successive grids. Work is measuredas the number of multiplies.

which implies that the implicit finite volume method is asymptotically superior to the binomial lattice method, even ifthe solution at only one point (at t = 0) is desired.

It is interesting to determine at what levels of accuracy we can expect the implicit PDE method to become moreefficient than the binomial method. Table 20.6 gives the results for a binomial lattice solution (algorithm (20.40)) forthe problems solved in the previous sections using an implicit PDE approach. These tables should be compared toTables 20.4-20.5.

For further points of comparison, we also compute solutions to the problem used in [44]. We have used twoversions of the problem in [44], one with an expiry time of T = 1 and the other with T = 5. Table 20.7 summarizesthe results for the PDE method, and Table 20.8 shows the results for the binomial lattice.

Figure 20.6 summarizes the convergence of both the binomial and lattice methods for all four problems. Theabsolute error is computed by taking the exact solution as obtained by extrapolating the PDE solution down to zerogrid and timestep size, assuming quadratic behavior.

The PDE method becomes more efficient than the binomial lattice method at tolerances between .01− .001 de-pending on the problem parameters. It appears that for short term or low volatility options, the crossover point is closerto .001, while for long term or high volatility options, the crossover is closer to .01. These crossover points occur attolerances which would be used in practice.

Note that in these comparisons, we are putting the best possible light on the binomial lattice method, since weignore the fact that we obtain much more information with the implicit PDE technique.

20.9 BarriersOption contracts with barrier provisions impose constraints which can be handled in either an explicit or an implicitmanner. This is explored in detail in [102]. Basically, the appropriate way to impose the constraint depends on thecontract. For discretely monitored contracts, where the barrier is applied at distinct points in time, the constraint canbe handled explicitly. For continuously monitored contracts, there are two possible cases: i) the barrier(s) are constant,in which case one can simply impose Dirichlet boundary conditions; and ii) the barrier(s) are time-varying, for whichan implicit constraint application is well-suited. See [102] for further details.

125


T = 1.068 36 88 13.27673 7.8×104

135 68 169 13.29133 .0146 3.0×105

269 133 332 13.29467 .00334 4.4 1.2×106

537 262 632 13.29547 .00080 4.2 4.4×106

1073 520 1248 13.29567 .00020 4.0 1.7×107

T = 5.075 60 138 23.04009 1.3×105

149 121 281 23.04986 .00977 5.4×105

297 241 561 23.05272 .00316 3.1 2.2×106

593 482 1123 23.05355 .00083 3.8 8.7×106

1185 963 2236 23.05375 .00020 4.2 3.4×107

TABLE 20.7: Implicit finite volume (PDE) method. Value of an American Put, σ = .4, T = 1.0,5.0, r = .06, K = 100, S =100 [44]. Change is the difference in the solution from the coarser grid. Ratio is the ratio of the changes on successivegrids. Variable timesteps. Rannacher smoothing used. Work is measured in terms of number of multiply/divides.

timesteps Value Change Ratio Work(flops)

T = 1.050 13.25694 3.8×103

100 13.27619 0.01925 1.5×104

200 13.28618 0.00999 1.9 6.0×104

400 13.29097 0.00479 2.1 2.4×105

800 13.29339 0.00242 2.0 9.6×105

1600 13.29456 0.00117 2.1 3.8×106

3200 13.29515 0.00059 2.0 1.5×107

T = 5.050 22.96395 3.8×103

100 23.01268 0.04873 1.5×104

200 23.03502 0.02234 2.2 6.0×104

400 23.04390 0.00888 2.5 2.4×105

800 23.04899 0.00509 1.7 9.6×105

1600 23.05141 0.00242 2.1 3.8×106

3200 23.05263 0.00122 2.0 1.5×107

TABLE 20.8: Binomial lattice method. Value of an American Put, T = 1.0,5.0, r = .06, σ = .4, K = 100, S = 100 [44].Change is the difference in the solution from the coarser grid. Ratio is the ratio of the changes on successive grids. Workis measured as the number of multiplies.

126

103 104 105 106 107 108

Flops10-5

10-4

10-3

10-2

10-1

Abso

lute

Erro

r

PDE

Binomial LatticeT = .25σ = .2r = .1

103 104 105 106 107 108

Flops10-5

10-4

10-3

10-2

10-1

Abso

lute

Erro

r

PDE

Binomial Lattice

T = .25σ = .8r = .1

103 104 105 106 107 108

Flops10-5

10-4

10-3

10-2

10-1

Abso

lute

Erro

r

PDE

Binomial LatticeT = 1.0σ = .4r = .06

103 104 105 106 107 108

Flops10-5

10-4

10-3

10-2

10-1

Abso

lute

Erro

r

PDE

Binomial Lattice

T = 5.0σ = .4r = .06

FIGURE 20.6: Absolute error as a function of number of floating point operations (flops), measured as the number ofmultiplies/divides for all the test problems, at S = 100, t = 0.

127

21 Discrete Dollar DividendsBefore discussing option pricing problems where extra state variables are required, we should review the payment ofdiscrete dividends. This introduces the idea of no-arbitrage and jump conditions.

Suppose that a stock pays D∗ dividends at t = ti. To avoid the unrealistic situation of the dividend being larger thanthe stock price, we’ll assume that the actual realized dividend D is

D = min(D∗,S).

Now, the stock price the instant after the dividend date t+i will be

S(t+i ) = S(t−i )−D

since otherwise there would be an arbitrage opportunity (ignoring frictions such as taxes and transactions costs, whicharen’t in our model anyways). No cash flows accrue to the holder of an option during the interval that the dividend ispaid. No-arbitrage then implies

V (S(t+i ), t+i ) = V (S(t−i ), t−i ). (21.1)

We can write this as

V (S(t−i )−D, t+i ) = V (S(t−i ), t−i ),

or just

V (S−D, t+i ) = V (S, t−i ). (21.2)

Since we are solving the PDE backward in time,

τ+i = T − t−i

τ−i = T − t+i .

Thus, in terms of τ, equation (21.2) becomes

V (S,τ+i ) = V (S−D,τ−i ). (21.3)

If we were solving an American early exercise problem, with payoffV ∗(S), then equation (21.3) would be modifiedto give

V (S,τ+i ) = max(V ∗(S),V (S−D,τ−i )).

Assume that the option is European, so that the jump condition is given by equation (21.3). Then, since S−D willnot generally coincide with a grid node, we have to interpolate to determine V (S−D,τ−i ). This situation is shown inFigure 21.1.

Since there are only a relatively small number of dividend payments, linear interpolation is all that is required.However, we will see that the situation changes if this type of interpolation is carried out at a large number of timesteps.

Note that we generally have to update the boundary conditions using the jump conditions (21.3) as well.

22 Relationship of Finite Difference Methods to Lattice MethodsRecall the binomial lattice pricing model equation (6.7)

European Lattice Algorithm

V nj = e−r∆t

(

p∗V n+1j+1 +(1− p∗)V n+1

j

)

n = N −1, ...,0j = 0, ...,n (22.1)

128

S

τ

x x x x

x

x xxxx

x x xxx x x x

V(S,τi+)

V(S-D,τi-)

τi+

τi-

FIGURE 21.1: Interpolation for a discrete dividend.

where V nj = V (Sn

j , tn), and

Sn+1j+1 = Sn

j eσ√

∆t

Sn+1j = Sn

j e−σ√

∆t

Snj = S0

0e(2 j−n)σ√

∆t

tn = n∆t (22.2)

and

p∗ =er∆t − e−σ

√∆t

eσ√

∆t − e−σ√

∆t. (22.3)

Now, recall the usual Black-Scholes equation

Vτ =σ2S2

2 VSS + rSVS− rV . (22.4)

If we let Z = logS, then equation (22.5) becomes

Vτ =σ2

2 VZZ +(r− σ2

2 )VZ − rV . (22.5)

If we discretize equation (22.5) using central differences on an equally spaced Z grid, we get

V n+1i −V n

i∆τ

=

[(V ni+1−V n

i−12∆Z

)(

r− σ2

2

)]

+σ2

2

[V ni+1−2V n

i +V ni−1

(∆Z)2

]

−rV n+1i +O(∆τ)+O((∆Z)2) (22.6)

where V ni =V (i∆Z,n∆τ), and we have used an explicit method (except for the discounting term). Rearranging equation

(22.6) gives

V n+1i =

11+ r∆τ

[

V ni

(

1− σ2∆τ(∆Z)2

)

+V ni+1

(

∆τσ2

2(∆Z)2 +(r− σ2

2 )∆τ

2∆Z

)

+Vi−1

(

∆τσ2

2(∆Z)2 − (r− σ2

2 )∆τ

2∆Z

)]

. (22.7)

129

Choosing

∆Z = σ√

∆τ (22.8)

and substituting into equation (22.7) gives

V n+1i =

11+ r∆τ

[

pV ni+1 +(1− p)Vn

i−1]

' e−r∆τ [pV ni+1 +(1− p)Vn

i−1]

(22.9)

since

e−r∆τ =1

er∆τ =1

1+ r∆τ+(r∆τ)2 + ...

and we have defined

p =12

[

1+√

∆τ(rσ− σ

2 )]

. (22.10)

Now, in order to compare equation (22.10) with the lattice method (22.1), we have to remember that for the PDEmethod, since we write down the equations in terms of τ = T − t, then we use equation (22.10) recursively in the ordern,n + 1,n + 2, .... But the binomial lattice is written in terms of the forward time t. To make the comparison clearer,let’s write down equation (22.9) in terms of t = n∆t,

V ni = e−r∆τ [pV n+1

i+1 +(1− p)Vn+1i−1]

V ni = V (Zi,n∆t) . (22.11)

Now, for the finite difference method, V ni depends on V n+1

i+1 ,V n+1i−1 , at Z values Zi+1,Zi,Zi−1, where

Zi = logSi

∆Z = Zi+1 −Zi (22.12)

and, from equations (22.8-22.12) , we have that

Si+1 = Sieσ√

∆t

Si−1 = Sie−σ√

∆t . (22.13)

Now, in the binomial tree notation

Sn+1j+1 = Sn

j eσ√

∆t

Sn+1j = Sn

j e−σ√

∆t

V nj = V (Sn

j ,n∆t)V n+1

j = V (Sn+1j ,(n+1)∆t) . (22.14)

while in the finite difference notation

Si+1 = Sieσ√

∆t

Si−1 = Sie−σ√

∆t

V ni = V (Si,n∆t) . (22.15)

Now, we can switch from finite difference notation to tree notation, if we use the rules in Table 22.1So, using the rules in Table 22.1 in equation (22.11), we obtain

V nj = e−r∆t

[

pV n+1j+1 +(1− p)Vn+1

j

]

p =12

[

1+√

∆τ(rσ− σ

2 )]

. (22.16)

130

Finite Difference TreeV n

i V nj

V n+1i+1 V n+1

j+1V n+1

i−1 V n+1j

TABLE 22.1: Rules for changing notation from finite difference to lattice.

Equation (22.16) looks like equation (22.1) except that p∗ appears in equation (22.1), and p appears in equation(22.16). If we expand p∗ as defined in equation (22.3) in a Taylor series we obtain

p∗ = p+O((∆t)3/2)

= p+C1(∆t)3/2 ; for ∆t sufficiently small . (22.17)

If (V nj )FD is the finite difference solution, and (V n

j )Tree is the tree solution, then

(V nj )FD = e−r∆t

[

p(V n+1j+1 )FD +(1− p)(Vn+1

j )FD]

+O((∆t)2) (22.18)

(V nj )Tree = e−r∆t

[

(p+C1(∆t)3/2)(V n+1j+1 )Tree +(1− p−C1(∆t)3/2)(V n+1

j )Tree]

= e−r∆t[

p(V n+1j+1 )Tree +(1− p))(Vn+1

j )Tree]

+e−r∆t[

C1(∆t)3/2(

(V n+1j+1 )Tree − (V n+1

j )Tree)]

. (22.19)

Recall that

V n+1j+1 = V (Zn+1

j+1 , tn+1)

V n+1j = V (Zn+1

j , tn+1)

Zn+1j+1 = log(Sn+1

j+1) = σ√

∆t + logSnj

Zn+1j = log(Sn+1

j ) = −σ√

∆t + logSnj

Zn+1j+1 −Zn+1

j = 2σ√

∆t = 2∆Z . (22.20)

Consequently, equation (22.20) implies that(

(V n+1j+1 −V n+1

j )

2∆Z

)Tree

=

(

∂V∂Z

)Tree+O((∆Z)2)

⇒(

V n+1j+1 −V n+1

j

)Tree= O(

√∆t) (22.21)

(where we have assumed that the solution is sufficiently smooth for equation (22.21) to hold). Combining equations(22.19-22.21) gives

(V nj )Tree = e−r∆t

[

p(V n+1j+1 )Tree +(1− p))(Vn+1

j )Tree]

+O((∆t)2) . (22.22)

Defining Enj = (V n

j )FD)− (V nj )Tree, then from equations (22.18-22.22), we have

Enj ≤ pEn+1

j+1 +(1− p)En+1j +O((∆t)2) . (22.23)

Now, noting that 0 ≤ p ≤ 1, and defining

‖En‖ = maxj

|Enj |

131

we have that

|Enj | ≤ ‖En+1‖+O((∆t)2) (22.24)

so that if N = T/(∆t), then

|E00 | ≤ T

∆t O((∆t)2)+‖EN‖= TO(∆t)= O(∆t)

since ‖EN‖ = 0 (same payoff). Therefore,

|(V 00 )FD − (V 0

0 )Tree| → 0 ; as ∆t → 0 . (22.25)

Thus, the option prices computed using an expicit finite difference method and a lattice method are the same to O(∆t),which is the same as the truncation error in both methods. Note we have assumed that the solution is smooth enoughso that equation (22.21) holds. If we use more sophisticated convergence analysis, we find that this condition is notrequired.

23 Uncertain volatility, transaction cost modelsUp to now, we have been looking at linear problems, i.e. the value of a portfolio of options is equal to the sum of theindividual values.

However, when we deal with nonlinear models, this is not true anymore. As a simple example, consider theuncertain volatility models [6, 7, 89] and the transaction cost models [60, 18, 66].

23.1 Uncertain VolatilityThe basic idea of the uncertain volatility model can be stated as follows: suppose we are uncertain about the futureevolution of volatility, we are only willing to say that

σmin ≤ σ ≤ σmax. (23.1)

In other words, the volatility will be within certain bands, but is otherwise uncertain. We would like to price and hedgethe option under worst case scenarios. If we carry out a delta hedge at infinitesimal intervals, then this will ensure thatwe always have a nonnegative balance in the hedging portfolio, regardless of what happens to σ. assuming equation(23.1) is satisfied.

On can show that the worst case value of an option to an investor with a long position, where σ is given by equation(23.1), is given by the solution of the Black Scholes equation with

σ(Γ)2 =

σ2max if Γ < 0

σ2min if Γ > 0 (23.2)

where Γ = VSS.So, we simply solve

Vτ =

(

σ(Γ)2

2 S2VSS + rSVS− rV)

0 (23.3)

with σ(Γ) given by equation (23.2).Conversely, the best case value for an investor with a long position is determined from the solution to equation

(23.3) with σ(Γ) given by

σ(Γ)2 =

σ2max if Γ > 0

σ2min if Γ < 0 . (23.4)

132

Why would we care about the best case? If we priced and hedged based on the best case, we would go bankruptvery quickly! This is because the worst case value for an investor with a short position (a negative payoff) in theoption is given by the negative of the solution to equations (23.3) and (23.4). Consequently, the worst/best long caseprices would correspond to the bid/ask prices for an option if the buyers/sellers priced the option assuming worst casescenarios.

23.2 Transaction CostsThe Leland [60] transaction cost model was extended for arbitrary payoffs in [46, 89]. In this case, the value of a longposition is given by the solution to

Vτ =σ2

c2 S2 [1−α sgn(Γ)]VSS + rSVS− rV (23.5)

and that of a short position is

Vτ =σ2

c2 S2 [1+α sgn(Γ)]USS + rSUS− rU (23.6)

where σc is the (known and constant) volatility. In [89], the transaction cost parameter α is given by

α =κσc

√

8πδt (23.7)

where δt is the hedge rebalancing interval, and κ is the transaction cost per unit of the asset.Equations (23.5-23.6) can be cast into the form of equation (23.3) if we define

σ(Γ)2 =

σ2c [1−α sgn(Γ)] for a long position

σ2c [1+α sgn(Γ)] for a short position . (23.8)

If we let

σ2min = σ2

c(1−α)

σ2max = σ2

c(1+α) , (23.9)

then the transaction cost model becomes identical to an uncertain volatility model if

σ(Γ)2 =

σ2max if Γ < 0

σ2min if Γ > 0

for a long position

σ(Γ)2 =

σ2max if Γ > 0

σ2min if Γ < 0

for a short position .

(23.10)

Note that the transaction cost model becomes ill-posed, in general, if α > 1 (since this causes a negative diffusion).In this case, an alternative model has been proposed which, in the case of a short position, uses σ = σmax if Γ > 0, andσ = 0 if Γ < 0 with a suitable constraint [66]. In this work, we assume that α < 1.

Consequently, the solution of equation (23.3) can be considered to be the value of best/worst case scenarios for anoption with uncertain volatility, if σ(Γ) is given from equations (23.2-23.4), or the value of long/short positions in anoption with transaction costs given by the model in [46, 89], if σ(Γ) is given from equation (23.8).

23.3 Solution of the nonlinear equationsSkipping the by now familiar details, we can write the discrete nonlinear equations (for a European option, with CrankNicolson weighting) as

[

I + Mn+1]V n+1 =[

I− Mn]V n. (23.11)

Note that in this case the discretized operator Mn+1 is a function of V n+1 through Γn+1 (equations (23.2-23.4))).

133

Now we can solve this equation using Newton iteration, which turns out to be quite simpleAssume

(V n+1)0 = V n (23.12)

then

Uncertain Volatility Iteration

For k = 0, ... until convergence

[

I + M((V n+1)k)]

(V n+1)k+1 =[

I − M((V n)]

V n

if maxi

|(V n+1i )k+1 − (V n+1

i )k|max(1, |(V n+1

i )k+1|)< tol quit (23.13)

EndFor

One can show that this method always converges. There are some interesting questions about whether the solutionobtained is the financially correct one. For nonlinear problems, one has to worry about obtaining the viscosity solution[21, 12]. The one known result is that a monotone discrete scheme converges to the viscosity solution. If we use afully implicit discretization, then the discrete scheme is unconditionally monotone.

What is a monotone scheme? Suppose we write down the nonlinear discrete equations at node i as

gi(V n+1i ,V n+1

i+1 ,V n+1i−1 ,V n

i ,V ni+1,V n

i−1) = 0 ; i = 1, ...M (23.14)

Now, this is a monotone discretization if a positive perturbation of any of V n+1i+1 ,V n+1

i−1 ,V ni ,V n

i+1,V ni−1 produces a

positive perturbation of V n+1i . i.e. Given V n+1,V n which satisfy (23.14), then

gi(V ∗,V n+1i+1 + p1,V n+1

i−1 + p2,V ni + p3,V n

i+1 + p4,V ni−1 + p5) = 0 pk ≥ 0

⇒V ∗ ≥V n+1i . (23.15)

We hope that this condition is sufficient, not necessary, since it is very difficult to construct monotone schemes fortwo factor options with non-zero correlation.

23.4 Uncertain Volatility ExampleNote that if Γ has the same sign for all S, then nothing very interesting happens with the uncertain volatility example.We either get σ = σmax or σ = σmin. This would be the case for vanilla put or call.

However, suppose we use have an butterfly spread option produced by

• Buy a call at price K1.

• Buy a call at price K2.

• Sell two calls at price (K1 +K2)/2.

so the the payoff is

payoff = max(S−K1,0)+max(S−K2)−2max(S− K1 +K22 ,0) (23.16)

134

50 60 70 80 90 100 110 120 130 140 1500

1

2

3

4

5

6

7

8

9

10

Asset Price

Opt

ion

Valu

e

Worst Case European Butterfly Option Value

50 60 70 80 90 100 110 120 130 140 1500

1

2

3

4

5

6

7

8

9

10

Asset Price

Opt

ion

Valu

e

Best Case European Butterfly Option Value

FIGURE 23.1: European butterfly spread, σmax = .25 σmin = .15, r = .1,T = .25. Left: worst case long; right: bestcase long (worst case short). Uncertain volatility model.

or

payoff = 0 ; S < K1

= S−K1 ; K1 ≤ S ≤ K1 +K22

= K2 −S ; K1 +K22 ≤ S ≤ K2

= 0 ; S > K2 (23.17)

The first example is shown in Figure 23.1. In this case, we have

σmax = .25σmin = .15

Expiry time = .25r = .10

(23.18)

with a butterfly payoff.As another example, we can try using the uncertain volatility model with in conjunction with American early

exercise. The results are shown in Figure 23.2. Note that since the basic building blocks of this option are calls, itwould never be optimal to exercise these individually. However, taken as s portfolio, the holder of the long positionshould exercise early, while the holder of the short position will not. Note that we are assuming that all components ofthe butterfly are either held or early exercised together. If we allow each component of the butterfly to be excercised ornot independently, then the problem becomes more complicated in terms of determining a worst/best case price. Wehave to consider a hierarchy of problems with different exercise, no-exercise combinations.

24 Finite Volume Upwinding vs. Forward/Backward DifferencingConsider a general form for option pricing PDEs:

Vτ +UVS = DVSS − rV (24.1)

135

50 60 70 80 90 100 110 120 130 140 1500

1

2

3

4

5

6

7

8

9

10

Asset Price

Opt

ion

Valu

e

Worst Case American Butterfly Option Value

50 60 70 80 90 100 110 120 130 140 1500

1

2

3

4

5

6

7

8

9

10

Asset Price

Opt

ion

Valu

e

Best Case American Butterfly Option Value

FIGURE 23.2: American butterfly spread, σmax = .25 σmin = .15, r = .1,T = .25. Left: worst case long; right: bestcase long (worst case short). Uncertain volatility model.

where r is the risk free rate of interest, and D is the diffusion term, and U is the drift term. Note that for geometricbrownian motion,

D =σ2S2

2U = −rS

where σ is the volatility.Suppose we use a finite volume method for discretizing equation (24.1). To keep things simple, we will use a fully

implicit discretization.

V n+1i [1+(αi +βi + r)∆τ] = V n

i +αi∆τV n+1i−1 +βi∆τV n+1

i+1 (24.2)

αi and βi are defined as follows. If central weighting is used, then

αci =

DiAi∆Si−1/2

+Ui2Ai

βci =

DiAi∆Si+1/2

− Ui2Ai

. (24.3)

where

Ai =

(

Si+1 +Si2

)

−(

Si−1 +Si2

)

Si+1/2 =Si +Si+1

2

Si−1/2 =Si +Si−1

2∆Si+1/2 = Si+1 −Si

∆Si−1/2 = Si −Si−1.

If αci ≥ 0 and βc

i ≥ 0, then

αi = αci (24.4)

βi = βci (24.5)

136

If αci < 0 in equation (24.3) then, we must have Ui < 0, so that the upwind direction is in the positive S direction.

If we use upwinding then we obtain (αci < 0):

αu+i =

DiAi∆Si−1/2

βu+i =

DiAi∆Si+1/2

− UiAi

. (24.6)

and we then set

αi = αu+i

βi = βu+i

On the other hand, if βci < 0, then the upwind direction is in the negative S direction (Ui > 0), and we obtain:

αu−i =

DiAi∆Si−1/2

+UiAi

βu−i =

DiAi∆Si+1/2

(24.7)

and we then set

αi = αu−i

βi = βu−i .

Now, suppose we had simply used a finite difference method to discretize equation (24.1). In the case of centralweighting for the VS term

VS ' Vi+1 −Vi−1∆Si−1/2 +∆Si+1/2

(24.8)

we obtain exactly the same discretization as defined by equation (24.3).However, to ensure a positive coefficient discretization, it is traditional in finite difference circles to use forward,

backward differencing. If αci < 0, then we approximate VS by (in this case Ui < 0)

VS ' Vi+1−Vi∆Si+1/2

(24.9)

which gives

αu+i =

DiAi∆Si−1/2

βu+i =

DiAi∆Si+1/2

− Ui∆Si+1/2

. (24.10)

while if βci < 0, then we use a backward difference (Ui > 0)

VS ' Vi −Vi−1∆Si−1/2

(24.11)

which gives

αu−i =

DiAi∆Si−1/2

+Ui

∆Si−1/2

βu−i =

DiAi∆Si+1/2

(24.12)

137

T 5λ 0.0κ 0.5σ .03θ .04c .5

TABLE 24.1: Data for CIR model, equation (24.13).

Discretization Coarse FineForward/Backward .803970 .803987

Finite Volume .803969 .803987

TABLE 24.2: Value of Bond at r = .05. Coarse grid: 43 nodes, 23 timesteps; fine grid: 169 nodes, 83 timesteps).

Note that equation (24.10) is identical to equation (24.6) only if ∆Si−1/2 = ∆Si+1/2. Similarly, equation (24.12) isidentical to equation (24.7) only if ∆Si−1/2 = ∆Si+1/2.

Interestingly enough, the finite volume upwind approach is formally inconsistent (i.e. has O(1) error) for unequallyspaced grids, in terms of simple finite difference truncation error analysis. However, in the case of a conservative formof the PDE (24.1), convergence has been proven in [33], for unequally spaced finite volume grids. Convergence offirst order finite volume methods for nonconservative PDEs has been proven in [54], also for unequally spaced grids.

24.1 An example from a Bond Pricing ModelIn spite of the fact that the upwind finite volume method is convergent on unequally spaced grids, we might expect thatthere is some effect on the convergence when the grid size changes abruptly, compared to a forward-backward finitedifference approach. Note that it is a trivial matter to implement either method, since only small changes are requiredin the discrete equation coefficients.

Consider a CIR interest rate model:

Vτ =σ2r2c

2 Vrr +(κ(θ− r)−λσrc)Vr − rV (24.13)

where r is the spot interest rate, c,κ,θ are parameters, and λ is the market price of risk.This PDE is of the form (24.1) with

D =σ2r2c

2U = −(κ(θ− r)−λσrc) .

A five year zero coupon bond was priced using both finite volume and forward/backward differencing. Data forthis this example is given in Table 24.1. An abrupt change in the grid spacing was used at r = 100% (the grid spacingwas increased by a factor of 10). Coarse and fine grid runs are presented in Table 24.2 and Figures 24.1 and 24.2.

Fine grid nodes are inserted between coarse grid nodes at each level of refinement, so that the fine grid still has asudden change in grid size at r = 100%. As shown in Figure 24.1, the effect of the sudden change in grid spacing canbe seen as a kink in the solution at r = 100%. As the grid is refined, this kink becomes smaller. Note that the kink ismuch less pronounced when using the backward difference method in Figure 24.2.

However, it is interesting to see that the solution at r = .05, which would be near interest rates of practical impor-tance, is relatively unaffected by the behavior at r = 100%, as shown in Table 24.2.

The advantage of a finite volume method is that it is easy to generalize to handle complex boundaries, and irregular2d meshes. As well, it is straightforward to increase the accuracy of an upwinded finite volume method by using anonlinear flux limiter.

However, if we are only interested in using simple upwind ideas in one dimension, then the forward/backwarddifference method is very easy to implement (a very small change from the finite volume upwinding), and seems

138

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4Interest Rate

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Zero

Cou

pon

Bond

Val

ue

Fine GridCoarse Grid

FIGURE 24.1: Finite volume discretization for CIR model.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4Interest Rate

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Zero

Cou

pon

Bond

Val

ue

Fine Grid

Coarse Grid

FIGURE 24.2: Forward/backward difference discretization for CIR model.

139

to introduce smaller errors when the grid size changes abruptly. Consequently, forward/backward differencing isprobably the method of choice for these simple cases.

25 Boundary Conditions: a Second LookUp to now, we have been using simple Dirichlet conditions as S → ∞. For example

V ' S ; for a call' 0 ; for a put (25.1)

These conditions are simply handled by fixing the value of V ni at i = imax,S = Smax, where Smax is the largest S

value in the computational domain. This is method we have been using up to now, and we will refer to this as theDirichlet method.

The good news is that if Smax is sufficiently large, it doesn’t really matter what boundary condition is applied atS = Smax. The bad news is that in practice, an incorrect condition may have some pollution effect on the solution inthe area of interest.

For example, in [53], for a European call, with exercise price K, if we choose

V (Smax,τ) = Smax −K (25.2)

then Smax can be selected to give the desired error at S = K.

Smax ≥ K exp[

√

2σ2T log(K/Error)]

(25.3)

Determining the correct boundary condition can be difficult if the payoff is a linear combination of digitals andvanillas. Path dependent options often involve jump conditions at observation dates. Trying to determine the correctasymptotic form for the boundary condition can be become a nightmare to implement in general purpose software.

In many cases (although not all), it is reasonable to assume that the value of the option has the form

V ' a(τ)S+b(τ) (25.4)

as as S → ∞, where a(t),b(t) are unknown, or difficult to determine.We would like to have an automatic procedure to determine a(τ),b(τ) without going to a lot of trouble. One way

to do this is to substitute equation (25.4) into the original PDE. This gives an ODE to solve for a(τ),b(τ). The initialvalues a(τ = 0),b(τ = 0) can be determined from the payoff condtions. This method (which we shall refer to as theODE method in the following), works well for problems with complex payoffs, but if there are jump conditions appliedfrequently, this technique becomes complex to code, and is prone to some instability.

Another procedure which is suggested in the literature [89] is the following:• At i = imax, set VSS = 0.0

• Use a backward difference for the VS term. (This is equivalent to imagining a node Simax+1 where Simax+1 −Simax = Simax − Simax−1, using a discrete form of VSS = 0.0 at i = imax, and then using central weighting for afinite volume discretization.)

To keep things simple, we will use a fully implicit discretization.

V n+1i [1+(αi +βi + r)∆τ] = V n

i +αi∆τV n+1i−1 +βi∆τV n+1

i+1 (25.5)

This means that at i = imax, we set

αi = αci = − rSi

(Si−Si−1)

βi = βci = 0.0 (25.6)

From equation (25.6), it is somewhat disturbing that we get αimax < 0. This would no longer be a positive coeffi-cient discretization if we used a fully implicit method. It is also not obvious that this method is stable! As well, the

140

Boundary SpecificationDirichlet ODE VSS = 0 Analytic

Refinement level Value at S = $100 ($)1 14.47 14.23 14.23 14.232 14.23 14.23 14.23 ↓3 14.23 14.23 14.23

TABLE 25.1: Effect of imposing various artificial boundaries on the numerical solution of a European call optionwith K = $100, T = 1 year, σ = 30%, r = 5%. The numerical computations were performed using Crank-Nicholsontimeweighting with constant timesteps and a asset price grid with variable spacing with endpoints [0,Smax]. The firstrefinement level used: ∆τ = .01 years, Smax = $250 and had 100 asset price nodes. Subsequent refinement levels doubledSmax as well as reducing the timestep size and grid spacing by factors of two.

Boundary SpecificationDirichlet ODE VSS = 0 Analytic

Refinement level Value at S = $100 ($)1 77.62 66.42 66.41 67.322 71.38 67.26 67.25 ↓3 68.63 67.31 67.31

TABLE 25.2: Effect of imposing various artificial boundaries on the numerical solution of long maturity Europeancall option with K = $100, T = 10 years, σ = 50%, r = 5%. The numerical computations were performed on a gridwith constant spacing with endpoints [0,Smax] using a fully implicit time-weighting with constant timesteps. The firstrefinement level used: ∆τ = .01 years Smax = $500 and had 50 asset price nodes. Subsequent refinement levels doubledSmax as well as reducing the timestep size and grid spacing by factors of two.

coefficient matrix loses diagonal dominance. Since the coefficient matrix is no longer an M-matrix, then there is noguarantee that the option price is always positive.

However, a detailed analysis in [91] shows that in fact this method is stable. For a finite volume method, as long as

Si −Si−1 ≥ Si−1 −Si−2 ; Si → Simax

(25.7)

then this method is stable. In other words, for i large, we have make the grid spacing between successive nodesnon-decreasing. The method is always stable if we use forward-backward differencing instead of the finite volumeupstream method (another advantage of using forward-backward differencing).

So, this method of applying the asymptotic boundary condition can be shown to be numerically correct. Aswell, this technique will automatically do the right thing in the case of complex jump conditions. Of course, if ourassumption that VSS → 0 is not right, then this won’t help!

An example of a simple problem is given in Table 25.1. The expiry time is one year, and σ = .30. On the coarsestgrid, the value of Smax = 250 (K = 100). Smax is doubled on each grid refinement. In this case, there is little to choosebetween the various methods, they all give good results.

To see the effect of the different boundary methods in more detail, we will look at more interesting case. Theexpiry time is ten years, and σ = .50. On the coarsest grid, the value of Smax = 500 (K = 100). Smax is doubled on eachgrid refinement. In this case, the solution is more sensitive to the boundary condition method. Table 25.2 shows theresults in this case. Here, we can see that the the Dirichlet method performs poorly compared to the VSS = 0 technique.

141

26 Path-Dependent Options - The Augmented StateSpace Approach

In this section, we will discuss the idea behind pricing path-dependent options (in the PDE context) using an augmentedstate space. We will first look at discretely observed Asian and Parisian options, discuss some convergence issues, andthen go on to briefly describe some more complex options which require additional variables. We will describe ageneral object-oriented approach for developing software for these applications.

26.1 A Note on NotationIn this section, we will take it for granted that we have available a robust method for solving 1-d PDE pricing problems,using one of the methods previously described. We can now simply regard the 1-d PDE solver as a black box, whoseinternal workings are not of interest. To avoid complication, and to conform with the usual notation, we will alwayswrite down the equations in terms of the real time t and not τ = T − t, although we still are solving backwards in time.

26.2 Asian OptionsA discretely observed Asian option has a payoff which is a function of the arithmetic average after n observationintervals t1, t2, . . . , tn

An =

n∑i=1

S(ti)

n . (26.1)

Notice that

An = An−1 +S(tn)−An−1

n . (26.2)

If t+ and t− are the times the instant before and after the nth observation date, then we can write equation (26.2) as

A(S, t+) = A(S, t−)+S−A(S, t−)

n , (26.3)

where S = S(tn), i.e. the current asset price at this sampling time. We simplify the notation of equation (26.3) furtherby writing A+ = A(S, t+) and A− = A(S, t−) so that equation (26.3) can be written

A+ = A− +S−A−

n . (26.4)

At the terminal date t = T , there are many possible average values for a given asset price S. Consequently, wehave to introduce an additional state variable, which is the value of the discretely observed average along a particularrealized path of the asset. Since the average can take on any value, this new state variable is a continuous variable.

Consequently, the value of an Asian option is given by V = V (S,A, t). Now, across each sampling date, we have(by no-arbitrage)

V (S,A+, t+) = V (S,A−, t−), (26.5)

or

V (S,A+S−A

n , t+) = V (S,A, t−). (26.6)

Away from observation dates, if we assume the usual geometric Brownian motion, we have

Vt +σ2S2

2 VSS + rSVS− rV = 0. (26.7)

142

S

A

S=A

1d PDEA = constant

FIGURE 26.1: Asian option grid.

Note that there is no A dependence in equation (26.7).So, we define a 2-d grid with S = S0, . . . ,Sm, and A0, . . . ,Am. For each fixed value of Ak, we discretize the PDE (see

figure 26.1). Each line of A = const. represents a solution for a given value of the observed average, but for variousasset prices.

This gives a set of m+1 PDEs. We discretize each of these PDEs using one of the methods described before. Wehave clearly made an approximation here, since the average A can take on any value, not just one of the discrete valuesA0, . . . ,Am, but as the mesh ∆A,∆S → 0, we should converge to the exact solution.

Note that in certain cases (geometric Brownian motion with a floating strike, or geometric Brownian motion,European-style option), there is a similarity transformation which reduces the 2-d PDE to a single 1-d PDE. However,this transformation will not work if we use an implied volatility surface, i.e. σ = σ(S, t), or if there are discrete dollardividends, and so on. For more details about these special cases, see [13, 5, 98]. We follow the general approach usedin [99, 26].

As a specific example, suppose we want to value a fixed strike Asian call, with payoff

V (S,A, t = T ) = max(A−K,0). (26.8)

Initially, we set the boundary conditions for each 1-d problem (as S → ∞) to the payoff condition (26.8). We then solveeach of the 1-d PDEs backwards in time to the instant after the last observation time t = tn. Denote this time by t+.(This is the instant before the observation in terms of backward time τ.) At this point, we advance the solution to t−,the time the instant before tn, by using the jump condition

V (S,A, t−) = V (S,A+S−A

n , t+).

In general, we will have to interpolate to determine this value (as illustrated in Figure 26.2).One possibility is to use simple linear interpolation, as we did for discrete dividends. However, as we shall see, in

general quadratic interpolation is preferred. To motivate our discussion of this point, consider an Asian option wherethe average is computed continuously

A =

Z t

0S(t ′) dt ′

t . (26.9)

143

S

A

S=A

V-

V+

V-

V+

FIGURE 26.2: Interpolation operation at Asian observation dates.

In this case, the value of an Asian option is V = V (S,A, t) [13] where V satisfies:

Vt +σ2S2

2 VSS + rSVS +1t (S−A)VA− rV = 0. (26.10)

A more standard form for equation (26.10) can be obtained by letting τ = T − t, so that equation (26.10) becomes:

Vτ −1

T − τ(S−A)VA =

σ2S2

2 VSS + rSVS− rV. (26.11)

This equation has the familiar form of a convection-diffusion equation. Note that the drift/convective term has avelocity in the A direction given by

− 1T − τ

(S−A), (26.12)

which becomes infinite as τ → T . The information flow (from this term) is from the diagonal of the grid (S = A)outward in the A direction. We can regard discrete averaging as being the limit as the term (26.12) becomes very large(i.e. a delta function) near the sampling dates. In this case, the continuous problem becomes a drift dominated PDE[98], and special care has to be taken with handling the first order hyperbolic term in equation (26.10).

By analogy with high order limiter methods [4, 35, 98], we will use a limited quadratic upstream biased inter-polation method. For ease of exposition, consider the case of a tensor product grid where the coordinates of node(i, j) are (Si,A j). Again, if we view the discrete averaging problem as the limiting case of a continuous average, thenthe upstream node (at the sampling date) for any point (Si,A) is the node (Si,Aups) closer to the diagonal (A = Si).Suppose that the jump condition (26.6) requires a value of V = V (Si,A′) at point A′ which is not a grid point A j. If

A j ≤ A′ ≤ A j+1, (26.13)

144

then

jup =

j +1 A′ < Sij A′ > Si

jdown = j +( j +1)− jup

j2up =

j +2 A′ < Sij−1 A′ > Si

(26.14)

where jup, jdown, and j2up are the upstream, downstream, and second upstream points respectively [84]. The value ofV (Si,A′) is then estimated using quadratic Lagrange interpolation using data at the points (i, jdown), (i, jup), (i, j2up).The interpolation is limited to suppress oscillations [59]

V (Si,A′) ≤ max[

V(

Si,A jup

)

,V(

Si,A jdown

)]

V (Si,A′) ≥ min[

V(

Si,A jup

)

,V(

Si,A jdown

)]

. (26.15)

This approach is similar to MUSCL methods [4]. Note that if A j2up > Si and A′ < Si or A j2up < Si and A′ > Si, thenlinear interpolation is used. This is because interpolation should not be used across the S = A line (the velocity inequation (26.12) changes sign at S = A). We also assume that there is always a node in the mesh where A j = Si.

26.3 Error Analysis: Asian OptionsThe error in each 1-d PDE solution will be the usual discretization error. If ∆τ = O(∆S), then assuming a second ordermethod is used, the global truncation error is O(∆τ)2). Suppose we are interested in a situation where the number ofobservations becomes very large, so that we take the limit as the observation interval δt tends to zero. In this case

An =

i=n∑i=1

Ai

n →

Z T

0S(t ′)dt ′

T . (26.16)

We can take the limit as ∆t (the timestep) and δt (the observation interval) go to zero simply by setting δt = ∆t.It could be argued that this is an unrealistic case, since all Asian options are discretely observed in practice.

However, we can simply imagine, for the purposes of error analysis, that the number of observations is very large (e.g.daily observations for a year), so that this is a reasonable limit to study for error analysis purposes.

Consequently, we can imagine applying the discrete jump conditions at each timestep. For simplicity, we canassume that ∆S = ∆A = O(∆t). At each step, the error introduced by the interpolation is

Interpolation Error = O((∆t)q)

q = 1 nearest neighbor interpolationq = 2 linear interpolationq = 3 quadratic interpolation. (26.17)

Now, if we make an interpolation error each timestep, we make the error O(1/(∆t)) times. The cumulative worst caseeffect of this interpolation error is then

Cumulative Interpolation Error = O(1∆t (∆t)q)

= O((∆t)q−1). (26.18)

In order to get the effect of the cumulative interpolation error to be the same order as the PDE discretization error, weneed q = 3 (quadratic interpolation).

Note that if nearest neighbor interpolation is used in equation (26.18), then we do not get convergence. In [13],the forward shooting grid method (FSG) is developed. This method is a lattice technique, which uses either nearestneighbor or linear interpolation to handle the jump conditions. In this case, the spacing in the A direction is O(

√∆t),

which means that the cumulative interpolation error is [39]

Cumulative interpolation error: FSG method = O(1∆t (

√∆t)q), (26.19)

145

which means that the nearest neighbor and linear interpolation methods used in [13] are actually non-convergent. Thiswas verified in some numerical tests carried out in [39].

26.3.1 Running Sum FormulationAs an alternative to using the average as the auxiliary variable, the running sum

In =n∑i=1

Si (26.20)

has been suggested in [27, 90, 71]. In this case, the jump conditions at a sampling date are

U(S, I+, t+) = U(S, I−, t−)

I+ = I− +S. (26.21)

Let n be the total number of observation times. Then the payoff conditions for a fixed strike are

U(S, I, t = T ) =

max(I/n−K,0) for a callmax(K − I/n,0) for a put . (26.22)

However, our experience in [99] indicates that the use of the average as an auxiliary variable is better behaved than therunning sum formulation[99].

26.3.2 Convergence to Continuously Observed Limit

If we are actually interested in obtaining a solution in the limit as the observation becomes continuous, the discretelyobserved formulation will only converge at a rate of O(∆t), owing to the discrete approximation of the samplingintervals.

In this case, it is probably better to use the continuously observed full 2-d formulation

Vt +σ2S2

2 VSS + rSVS +1t (S−A)VA− rV = 0. (26.23)

Note that this PDE is a function of (S,A), but there is no diffusion term in the A direction. The PDE is then degenerateparabolic, or drift dominated. It is crucial to use an effective upstream-like method for discretizing the first orderderivative VA. A flux limiter was used to solve this problem in [98].

26.4 Parisian OptionsParisian options have been suggested as a type of barrier option which has a soft barrier. One of the advantages of aParisian option is that it is harder for traders to manipulate prices to cause a barrier to be triggered.

A typical example would be a Parisian up and out option. The option would expire as worthless if the asset isobserved over a barrier Sb for J∗ consecutive days in a row.

We define a new state variable J which counts the number of consecutive observations the asset is over Sb. J isreset to zero if J < J∗ and S < Sb at any observation. This is illustrated in Figure 26.3. Note that J is an integer in therange [0,J∗].

Our rule for updating J at observation dates is

J+ =

min(J∗,J− +1) S ≥ Sb0 S < Sb

, (26.24)

where J+ = J(t+) and J− = J(t−), and t+ (t−) is the time the instant after (before) the observation date. No-arbitrageconsiderations [89] then imply that

V (S,J−, t−) = V (S,J+, t+). (26.25)

146

S

tObservation times

J = 0J = 4

J = 1

J = 0

Barrier

FIGURE 26.3: Value of the counter J at various times. Parisian up and out option.

J = 1

J = 2

J = 3

J = 4

J = J*

S

FIGURE 26.4: Computational domain for Parisian options.

147

Sb

J

S

FIGURE 26.5: Flow of information at each observation date for a Parisian up and out option.

The payoff conditions at t = T are then (for a call)

V (S,J, t = T ) =

V (S,J,T ) = max(S−K,0) J 6= J∗V (S,J,T ) = 0 J = J∗ (26.26)

where K is the strike price. So, proceeding as for the Asian option case, we define a 2-d mesh (S,J), as in Figure 26.4.

We now solve J∗+1 one dimensional PDE problems, one for each value of J = const. Initally, we set the boundaryconditions, for each 1-d problem at (for a call)

V (S → ∞,J,T ) = Smax J 6= J∗

V (S → ∞,J,T ) = 0 J = J∗. (26.27)

We then solve the set of PDEs as usual, backwards in time, until we reach the last observation date. We then apply thejump conditions (26.25). Note that we do not update the solution along the line J = J∗. We do, however, update theboundary conditions at S → ∞,J 6= J∗. In this case, no interpolation is required, assuming that we use J∗+1 values ofJ in the discretization. The flow of information at each observation date is shown in Figure 26.5. Note that the zerovalue at J = J∗ percolates down to the J = 0 line for S > Sb. At t = 0 we are interested in the solution along the lineJ = 0.

26.5 Shout OptionsA shout option gives the holder the right to modify the contract (in this case the strike) during the option lifetime. Shoutoptions can be considered to be a basic building block of segregated fund guarantees, which are offered by Canadianlife insurance companies. Shout options have been described in [82, 19]. It is also worth noting that some energyderivative contracts have recently included a feature called a swing option [49], which is similar in many respects to acomplex shout option. Our focus in this section is to consider shout options in the context of pension fund guarantees.

Consider a put maturing at time T , with initial strike K = K∗. A shout option allows the holder to shout and resetthe strike K to the current value of the underlying asset at any time (up to Umax times/year). To price this option, weneed to define two new path-dependent state variables:

148

S

KU=0

U=1Si

Si

V*

U=Umax

FIGURE 26.6: V ∗, value obtained upon shouting for plane U = 0.

• U = number of shouts used in this period, U = 0,1, . . . ,Umax.

• K = current value of strike at last reset.

The value of shout option is then given by V = V (S,K,U, t), where S is the asset price and t is time. Suppose thatthe holder has a maximum number of shout opportunities per year (Umax). Then, if t− and t+ are the times just beforeand after the shouts used counter U is reset to zero, no-arbitrage implies

V (S,K,U, t−) = V (S,K,0, t+). (26.28)

At any point in the solution space (S,K,U, t), let V ∗ be the value of the contract that the holder receives if theydecide to shout:

V ∗(S,K,U, t) =

V (S,K = S,U +1, t) if U +1 ≤Umax,−∞ otherwise.

(26.29)

Consequently, in order to define V ∗ for a given number of shouts used U , we have to compare the current value withthe value on the next higher plane U +1 along the diagonal K = S. This is illustrated in Figure 26.6.

The value of the shout option V = V (S,K,U, t) is given by an American-type PDE problem

∂V∂t + r(t)S ∂V

∂S +12σ(S, t)2S2 ∂2V

∂S2 − r(t)V ≤ 0 (26.30)

V ∗ ≤V (26.31)

Note that V ∗ is the value of the option obtained by shouting. At each point in the solution domain, one of (26.30),(26.31) must hold with equality. Observe that the PDE (26.30) depends on (K,U) only through V ∗ and the terminalcondition.

We can reformulate equations (26.30) through use of the penalty term Q(V,V ∗):

∂V∂t + r(t)S ∂V

∂S +12σ(S, t)2S2 ∂2V

∂S2 − r(t)V +Q(V,V∗) = 0. (26.32)

It would appear to be a formidable task to solve the three dimensional, time-dependent nonlinear PDE (26.32)which is constrained by the value of other similarly complex equations. However, this PDE can be simplified by

149

exploiting the structure of the dependence on the variables U and K. This will be discussed in detail below, but it isworth noting at this juncture that the problem simplifies under the Black-Scholes assumption that σ(S, t) = const. fora wide variety of contract specifications. In this case, the PDE (26.32) can be seen to admit the similarity solution

V (S,K,U, t) = KV (ζ,U, t) (26.33)

where ζ = S/K, as long as the payoff function can be written as

payoff = Kg(ζ).

In this situation, the penalty function can be written as Q = KQ(V (ζ,U, t),V (ζ∗,U + 1, t)). This class of contractsincludes the important cases of call/put payoffs where the holder is allowed to reset the strike according to K∗ = cSwhere c is some constant. In general, if σ = σ(S, t), it will not be possible to use a similarity reduction.

We are ultimately interested in the value of a shout option with a specific initial strike K = K0 with no shouts usedU = 0 at the time of sale of the security, for specified values of the price of the underlying asset. Observe that equation(26.32) modelling the value, V , for a fixed U and K is not defined until we have determined the constraint

V ∗(S,K,U, t) = V (S,K = S,U +1, t),

the value of the security received upon shouting. This requires that we know the solution V on the plane U +1 with Kranging over all possible settings of the strike under the function K = S. However, once we have determined V ∗ andspecified the terminal condition (max(K −S,0)) the differential equation (26.32) is independent of the variables K andU .

Of course it would be prohibitively expensive to store the complete solution through time for V ∗. Using a timestep-ping numerical algorithm it is only necessary to know the solution on the U + 1 plane during the current timestep.Consequently, if we imagine stepping through time, then at each timestep we should evaluate the solutions recursivelyin the order U = Umax, . . . ,0.

Timestepping Algorithm

For discrete times n = N, . . . ,1

• Determine solution for planes (S,K,U = const) in order

U = Umax,Umax −1, . . . ,0

• Note that for U = Umax, no shouts are left, so there is no American-type constraint.• Once solution is obtained for U = const., then V ∗ (value obtained upon shouting) can be determined for

plane U −1.EndFor (26.34)

Within each plane U = const., we discretize the problem in the K direction. Note that PDE contains no derivativesw.r.t. K. For each line of K = const., we use a standard finite volume method to discretize the PDE in the S direction.To determine V ∗ (value obtained upon shouting) for the next lower plane (one less shout used), we have to evaluatethe solution at specific points:

V ∗(S,K,U −1, tn−1) = V (S,K = S,U, tn−1).

In general, we have to interpolate to determine V ∗.In fact, many numerical tests indicated that it is efficient to use a mesh which is locally refined near S = K, on each

line of K = const. [92]. This mesh is shown in Figure 26.7.

150

! "$# %& $

' (() *+ ,-. /012 34*5/126 76 /189 :4:

FIGURE 26.7: A rectangular computational domain for the numerical solution of a shout option sold with an initialstrike K0 for fixed U = const. The shaded areas represent refined regions in the S-grid.

26.5.1 Boundary Conditions

The original problem is posed on an unbounded domain (S,K) ∈ [0,∞)× [0,∞). However, computationally it isnecessary to approximate this with a truncated finite region (S,K) ∈ [0,Smax]× [0,Kmax] where Smax, Kmax are suitablylarge numbers. We will discuss a methodology for choosing Smax, Kmax appropriately.

In conventional option valuation applications, the boundary conditions which are applied at S → ∞ are constructedunder the asymptotic assumption that S K. For example, the appropriate boundary condition for a standard putoption is given by V (S K) = 0. This is because as S → ∞ it becomes very unlikely that the asset price will fallbelow the (finite, fixed) strike K, rendering the contract worthless. In the context of shout options, if we construct arectangular computational domain then we require Smax Kmax in order to accurately apply these asymptotic boundaryconditions.

Consider the rectangular computational domain shown in Figure 26.7, with Smax Kmax. In general, we are unableto determine the constraint, V ∗, in cases where the strike is reset to K∗ > Kmax upon shouting. For example, if we takethe case of a standard shout floor where the strike is set according to K∗ = S upon shouting, we will not have thenecessary data within the computational domain to determine V ∗ for S > Kmax. An exception occurs when there existsa similarity solution, but as we are interested in valuing a general class of shout options we develop a methodologywhich can be adapted to more complex securities and modelling situations.

To avoid an ad hoc procedure for estimating V ∗ for S > Kmax, we will modify the contract so that there is nouncertainty about computing V ∗. Consider an alternative contract where the holder is not allowed to reset to a higherstrike level than Kmax. In other words, if the original contract specifies

V ∗ = V (S,K = S,U +1, t) (26.35)

for U +1 ≤Umax, then we modify (26.35) so that V ∗ is given by

V ∗ = V (S,K = min(S,Kmax),U +1, t) (26.36)

for U + 1 ≤ Umax. This contract will only require information which is available within our computational domain.Furthermore, this contract becomes equivalent to the original specification as Kmax → ∞.

Given that for any fixed value of K,Smax K, we are now in a position to specify boundary conditions as S → ∞.

151

S

Kj+1

Kj o

o o

o

oo o

o oo

Kj-1oo oo

K=S

Interpolation Point

(a) Axis aligned interpolation.

S

Kj+1

Kj o

o o

o

oo o

o oo

Kj-1oo oo

K=S

Interpolation Point

(b) Diagonal interpolation along the strike setting curveK = S.

FIGURE 26.8: Different interpolation strategies.

Initally, at the terminal time, we specify that

V (Smax,K,U,T ) = 0 (26.37)

which is the usual condition for a put. Thereafter, we set

V (S → ∞,K,U, t) =

V ∗ if it is optimal to shoutVt = 0 otherwise.

(26.38)

The above boundary condition is particularly useful if a similarity transformation is used.

26.5.2 Interpolation

In order to determine the value obtained upon shouting (26.29), we will in general have to interpolate from the knowndiscrete values of the option. At times near the terminal time T , the solution is not smooth as a function of S for fixed K.Since, as for the Asian option case, we need to use quadratic interpolation in order to obtain second order convergence,this may cause problems if we use an axis aligned interpolation strategy, i.e. interpolate along the coordinate axes (seeFigure 26.8). A little consideration reveals that (see Figure 26.9) the solution is smoother along the diagonal K = S,so that diagonal interpolation is superior to axis aligned interpolation. This is confirmed in numerical tests.

26.5.3 An Object-Oriented Implementation

Recall that the value of the contract depends on the four independent variables S, K, U , and t. We are ultimatelyinterested in the value of a shout option for a given level of the underlying asset S0 with a specific initial strike K0 andwith no shouts used (U0 = 0) at the time of sale of the security. This problem is a three dimensional, time-dependentnonlinear differential equation (26.32).

However, this PDE can be simplified by exploiting the structure of the dependence on the variables U and K. Forexample, note that equation (26.32) depends on (K,U) only through the penalty term Q(V,V ∗). If we imagine solvingthe PDE (26.32) by stepping through time, we can see that once the solution has been obtained for a given value of U ,then this determines the value of V ∗ for all discrete values of K for the plane U − 1. Consequently, at each discretetimestep, we solve for each plane of U = const. in the order Umax,Umax −1, . . . ,0. Within each plane U = const., theone dimensional PDEs for different discrete values of K are completely independent, since V ∗ for this plane is knownfrom the solution in plane U +1.

152

;=<=>?A@!BDCE

F

G

HJI KMLNPOQ RTSUVIWNYXZK[L\]O

^_`a$bdc

e fg b hjikl m fg

FIGURE 26.9: Interpolation of a shout put option with payoff g(S,K) = max(K − S,0) using coordinate axis alignedinterpolation. The determination of V ∗ =V (S,K∗ = S,U +1,t) requires us to interpolate the strike-setting curve K∗ = S.This is not approximated well by linear or quadratic interpolants for fixed S due to the discontinuous derivative at thepoint of interest.

153

nporq

npots

npo nvuJwyx

z

|~

FIGURE 26.10: An object-oriented interpretation of a shout option. The line K = K0 represents the contract thatwas initially sold to the investor. We define the object type ShoutPlane to represent planes with U = const. EachShoutPlane will contain a number of Base1d objects which represent a line of K = const. within a plane of U = const.

Recall that the discrete variable U represents the number of shouts currently expended. Therefore, planes ofU = const. represent contracts which have the same number of shouts used (see Figure 26.10). It is convenientto define an object type ShoutPlane which encapsulates planes of U = const. We reiterate here that, within eachtimestep, we compute solutions in the order Umax,Umax −1, . . . ,0. As a result, within the plane U = const., we have aset of independent one dimensional PDEs. This framework of objects which encapsulate planes of U = const. allowsus to internally schedule many aspects of the solution process such as the timestepping algorithm. This will be helpfulin a later section when we discuss the parallelization of the algorithm. This also gives us the ability to define genericoperations such as interpolation and data manipulation easily and safely. For example, often our algorithm will requireinformation not contained exactly within our discrete mesh. Defining these objects in an abstract manner allows us todefine operations such as interpolation,

double ShoutPlane::interpolate( double S, double K );

giving us the ability to externally query the object in a manner naturally defined by the contract. The user of thesefunctions need not be concerned with any of the details of the S and K discretizations. In fact, there may not even bea discretization for one of the state variables. For example, if σ(S, t) = const., then the dimensionality of the problemcan be reduced by defining the similarity variable η = S/K, and consequently we need solve only for a single value ofa reference strike. This is because we can write

V (S,K,U, t) =KK0

V (S K0K ,K0,U, t). (26.39)

(Note that as long as the node S = K0 is in the S direction mesh, then no interpolation is required in the S direction todetermine V ∗.) When using a similarity reduction, the ShoutPlane maintains only a single line of K = const. and theinterpolation member function is defined appropriately. This improves the maintainability of the software by allowingthe same interface to be used for the ShoutPlane regardless of the implementation details chosen.

Within each ShoutPlane, the line K = const. represents a particular setting of the strike price. The solution alongeach of these lines will be encapsulated in a Base1d object. As a result, each of the Base1d objects contains both

154

Base1d

Black-Scholes MeanReverting

VolatiltySurface

advance_solution()

FIGURE 26.11: Base class for one dimensional solver.

the S discretizations and the solution for a particular strike setting. These Base1d objects are then given methods toset minimum constraints, interpolate, and solve the partial differential equation (26.32) using techniques described in[92]. In fact, it can be seen that these objects are merely solving equations modelling a standard American optionwith a time varying constraint. It should be noted that many exotic options can be formulated in a similar fashion,i.e. in terms of a collection of these building block contracts. Some examples are provided in [99, 86]. Therefore, arobust implementation of the Base1d object can allow algorithms for these complex options to be developed efficientlythrough code reuse.

So far we have discussed a structure for maintaining the data which must be available to determine the value of ashout option. We must also be able to control and schedule the communication of this data. For this we define a sched-uler/controller object which is responsible for scheduling the solution of the building block objects and determiningthe minimum value constraint V ∗. Each of the planes of constant U must be solved for all strike prices K and assetprices S on the grid and advanced (backwards) through time using a timestepping algorithm. Should the holder of thesecurity shout, they receive a security defined on the next higher plane U ∗ = U + 1 with strike K∗ = F (S,K,U, t).Consequently, the controller class schedules the ShoutPlane objects in the order U = Umax, . . . ,0.

It is worthwhile at this point to give a brief description of the base and derived classes. First, a generic Base1dclass was defined, which contains all the necessary functions and data required to solve a 1-d option pricing model.Figure 26.11 shows the generic 1-d base class, which is instantiated for specific types of model.

Note that an important member function associated with each Base1d object is the advance solution() func-tion. Given an initial state, this function advances the solution of the specific 1-d model to a specified time (goingbackwards).

Collections of 1-d objects are packaged in the Base2dContainer container class, which contains an array ofBase1d objects. This is shown in Figure 26.12.

In the case of a ShoutPlane object (which is a derived type of the Base2dContainer class), this object containsthe data and functions required for each plane of U = const., as shown in Figure 26.10. Each Base1d object containedin a single ShoutPlane object corresponds to the solution with U = const., K = const. We have also shown thehierarchy for Asian and Parisian objects. In the case of Asian objects, each Base1d object would correspond to

155

Base2dContainer

Array of Base1d Objects

Asian

Parisian

ShoutPlane

advance_solution()

FIGURE 26.12: Two dimensional container class.

the solution for average value = const. Note that the Base2DContainer class has an advance solution() memberfunction, which simply calls the advance solution() member functions of each of the contained Base1d objects.

Finally, in Figure 26.13, we show the class hierarchy for Base3dContainer objects. The Shout object is acontainer class which contains an array of generic 2-d classes. Again, in order to carry out such high level operationsas advance solution(), the user of this class need not know the details of the Base2DContainer implementation.We also show the class hierarchy for some other path-dependent options, including a class for pricing segregated funds.It is clear that software for pricing these other options can be developed using the same generic base classes.

Note that we can also use this framework to use more sophisticated models of volatility. For example, there hasbeen some recent evidence that stochastic volatility models might be more effective for hedging than volatility surfacemodels. We could still use the software framework described here, simply by replacing the Base1d objects withStochVol objects (which would be 2-d PDEs). Some of the lower level functions (e.g. interpolation) would have behave to be re-implemented, but the structure of the framework would be unchanged. Note that in the case of the Hestonmodel [45], a similarity reduction can be used to reduce the dimensionality of the problem [40].

Observe that upon shouting information is collected from the plane one level above the current position accordingto the “look up function” K = S. The value is only reset to V ∗ in the case that V ∗ ≥ V . This effectively sets anAmerican-type floor to the value of the contract which is imposed continuously through time. The scheduler/controllercan be implemented in the following manner:

For each U = 0, . . . ,Umax // initializationShoutPlane[U] = new ShoutPlaneFor each discrete K

ShoutPlane[U].OneD[K] = new Black-ScholesShoutPlane[U].OneD[K].value = payoff(K,U)

End for KEnd for U

156

Base3dContainer

Array of Base2dContainer Objects

Shout

Parisian-Asian

Double Parisian

SegFundCube

advance_solution()

FIGURE 26.13: Three dimensional container class.

t := T // timestep loopWhile t > 0 do

t = max(t −∆t,0)For U = Umax, . . . ,0

For each discrete KShoutPlane[U].OneD[K].constraint = get_constraint(K,U,t)ShoutPlane[U].OneD[K].advance_solution()

End for KEnd for U

End while (26.40)

To illustrate further, consider the case of a standard shout put option where the strike level is reset to the currentasset price upon shouting. In this case for any S the controller determines the constraint by looking at the value of thecontract on the diagonal K∗ = S on the next higher plane U +1 (see Figure 26.14).

26.5.4 A Parallel Version for Multiprocessor Architectures

If we are interested in utilizing a more complicated specification than the standard Black-Scholes setting with constantvolatility, we must solve the full three dimensional problem. Other cases where we cannot use the similarity solutioninclude some types of contract specifications and uncertain parameter models. In these cases, the complexity ofthe numerical solution may become so large that we cannot solve these problems quickly enough on a uniprocessormachine.

Using our object-oriented construction described above, we note that the valuation of the option for fixed U andK is independent of other values of U and K except through the application of the constraint V ∗(S,K,U, t) defined in(26.29). As a result, each of the Base1d objects in a given plane of U = const. can be solved independently once theminimum value constraint V ∗ has been determined by the controller class.

157

Zr

]

]

FIGURE 26.14: The flow of information in a standard shout option with F (S,K,U,t) = S. The controller class deter-mines the minimum value constraint by interpolating the solution on the next higher ShoutPlane. Notice that all thevalues along the line S = Si are compared with the same constraint value V ∗ = V (Si,Si,U + 1,t) to determine whetheror not shouting is optimal.

In this work, we use the OpenMP library. This library is widely available on many multiprocessor machines, andallows for development of portable software. The OpenMP library consists of a set of preprocessor compiler directives(pragmas) which are ignored on single processor architectures. We emphasize here that no changes were made to thebasic algorithms or software for these tests. We simply added the compiler directives in a small number of locations.

Since all of the data has been safely encapsulated within the Base1d objects, we can easily parallelize the codewithout worrying about potential data sharing problems. The OpenMP library allows us to allocate multiple threadsfor parallel regions by simply specifying appropriate pragma calls. Since the one dimensional PDE problems allrequire approximately the same number of floating point operations, we find that static scheduling affords the bestperformance. Further, we allocate chunksizes of 8 iterations through the loop since this improves the performance ofthe caching system. The following is a code fragment which implements the parallelization of the advance_solutionsection of the pseudo code (26.40) given above.

void ShoutPlane::advance_solution()

#pragma omp parallel for schedule(static,8) default(shared)for(int k = 0; k < nk; k++ )

OneD[k].advance_solution(); // solve the OneD problems// for a fixed U in parallel

// end ShoutPlane::advance_solution()(26.41)

Code fragment (26.41) indicates that the addition of multithreading directives to the software is not an onerous task.If we study the serial version of implementation of the software, we find that approximately 81% of the processor

time is spent advancing the solution and 17% of the time is spent interpolating the numerical solution to determinethe constraint V ∗.1 Both of these sections of the software can be parallelized using a few compiler directives similarto fragment (26.41). We did not attempt to expose any other parallelism in this algorithm, being satisfied with 98%.

1The interpolation is expensive since we solve an inverse problem to determine mesh points to use in the diagonal interpolation described in [92]which dramatically improves convergence.

158

Number of Processors2 4 8 16

K discretizations speed up128 nodes 1.94 3.69 7.07 12.08256 nodes 1.95 3.68 6.96 12.65Theoretical 1.96 3.77 7.02 12.31

TABLE 26.1: Timing results for parallel version of algorithm on SGI Origin 2000 cc-NUMA multi-processor server;speed up = time serial

time parallel . The number of nodes in the K discretization gives the number of independent problems oneach plane of U = const. The last row gives the theoretical speed up using (26.42) assuming 98% parallelization. Theshout option has four exercise opportunities per year and a five year maturity. Black-Scholes modelling assumptionswith σBS = .2, r = .1, K0 = $100. These results are for a three dimensional case (although a similarity solution existsfor this particular problem, we did not use it in these computations).

Since both the time advance and interpolation sections can be parallelized, we can estimate the maximum theoreticalspeed up using the formula:

speed up =100

98P +2

(26.42)

where P denotes the number of processors used. Table 26.1 shows typical results for a full three dimensional shoutoption valuation. As can be seen from the results, the observed increase in speed is very close to the theoretical limitin equation (26.42).

27 Convertible Bonds with Credit RiskIn this article, we will investigate various methods for modelling credit risk in convertible bonds. We will see that isnecessary to specify precisely the events that occur when a company goes into default. In particular, the stock pricemay or may not jump to zero on default, and some of the bond value may be recovered.

In some cases, a system of coupled linear complementarity problems must be solved. We will discuss variousnumerical approaches for timestepping so that the problems become decoupled.

Since our main interest in this article is the modelling of default risk, we will restrict attention in the following tomodels where the interest rate is assumed to be a known function of time, and the stock price is stochastic. We caneasily extend the models in this paper to handle the case where the risk-free rate and the hazard rate are stochastic.However, this would detract us from our main goal here, which is to determine how to incorporate the hazard rate intothe basic convertible pricing model. We also note that practitioners often regard a convertible bond primarily as anequity instrument, where the main risky factor is the stock price, and the stochasticity of the risk-free rate is of secondorder importance.

27.1 Convertible Bonds: no credit riskThe standard approach to valuing convertible bonds implicitly assumes that there is no credit risk in the bond compo-nent of the company issuing the convertible bond.

We assume that interest rates are known functions of time, and the stock price is stochastic. We assume that

dS = µSdt +σSdz (27.1)

where S is the asset price, µ is the drift rate, dz is the increment of a Wiener process and σ is the volatility. Followingthe usual arguments, the no-arbitrage value V (S, t) of any contingent claim on S is given by

Vt +

(

σ2

2 S2VSS +(r(t)−q)SVS− r(t)V)

= 0, (27.2)

where r(t) is the known interest rate, and q is the dividend rate.We assume that a convertible bond has the following characteristics

159

• Continuous (time dependent) put provisions, put price Bp.

• Continuous (time dependent) conversion provision. At any time, the bond can be converted to κ shares.

• Continuous (time dependent) call provision. At any time, the issuer can call the bond for price Bc > Bp. How-ever, the holder can convert the bond if it is called.

LetLV ≡−Vt −

(

σ2

2 S2VSS +(r−q)SVS− rV)

(27.3)

We will consider the points in the solution domain where κS ≥ Bc and κS < Bc separately.

• Bc > κS. In this case, we can write the convertible bond pricing problem as a linear complementarity problem

LV = 0(V −max(Bp,κS)) ≥ 0

(V −Bc) ≤ 0

∨

LV ≥ 0(V −max(Bp,κS)) = 0

(V −Bc) < 0

∨

LV ≤ 0(V −max(Bp,κS)) ≥ 0

(V −Bc) = 0

(27.4)

where the notation (x = 0)∨ (y = 0)∨ (z = 0) is to be interpreted as at least one of x = 0, y = 0, z = 0 holds ateach point in the solution domain.

• Bc ≤ κS. In this case, the convertible value is simply

V = κS (27.5)

since the holder would choose to convert immediately.

As far as boundary conditions are concerned, we merely alter the operator LV at S = 0,∞. At S = 0, LV becomes

LV ≡−(Vt − rV ) ; S → 0, (27.6)

while as S → ∞, then we assume that the unconstrained solution is linear in S

LV ≡VSS ; S → ∞ . (27.7)

The terminal condition is given by

V (S, t = T ) = max(F,κS) (27.8)

where F is the face value of the bond.Equations (27.4) has been derived by many authors. However, in practice, corporate bonds are not risk free. To

highlight the modelling issues, we will consider a simplified model of risky corporate debt in the next section.

27.2 A Risky BondTo motivate our discussion of credit risk, consider the valuation of a simple coupon bearing bond, issued by a corpo-ration having a non-zero default risk. We will assume that the risk-free rate is a known deterministic function. Let theprobability of default in the time period t to t +dt be p(S, t)dt, where p(S, t) is a deterministic hazard rate.

Let B(S, t) be the price of a risky corporate bond. Construct the usual hedging portfolio

Π = B−αS . (27.9)

In the absence of default, if we choose α = BS, the usual arguments give

dΠ =

[

Bt +σ2S2

2 BSS

]

dt +o(dt) , (27.10)

where o(dt) denotes terms that go to zero faster than dt. Now, making the assumptions that

160

• The probability of default in t → t +dt is p dt.

• The value of the bond after default is RB where 0 ≤ R ≤ 1 is the recovery factor.

• The stock price S is unchanged on default.

then equation (27.10) becomes

dΠ = (1− p dt)[

Bt +σ2S2

2 BSS

]

dt − p dt(B(1−R))+o(dt)

=

[

Bt +σ2S2

2 BSS

]

dt − p dt(B(1−R))+o(dt) . (27.11)

We now make the assumption that default risk is diversifiable, so that

E(dΠ) = rΠ dt (27.12)

where E is the expectation operator, and r is the risk-free rate. Taking the expectation of equation (27.11) and equatingto equation (27.12) gives

Bt + rSBS +σ2S2

2 BSS − (r +(1−R)p)B = 0 . (27.13)

Note that if p = p(t), then the solution to equation (27.13) for a zero coupon bond with face value F payable at t = Tis

B = F exp[

−Z T

t(r(u)+ p(u)(1−R)) du

]

(27.14)

which corresponds to the intuitive idea of a spread s = p(1−R).We can alter the above assumptions about the stock price in the event of default. If we assume that the stock price

S jumps to zero in the case of default, then equation (27.11) becomes

dΠ = (1− p dt)[

Bt +σ2S2

2 BSS

]

dt − p dt(B(1−R)−αS)+o(dt)

=

[

Bt +σ2S2

2 BSS

]

dt − p dt(B(1−R)−αS)+o(dt) . (27.15)

Following the same steps as above, with α = VS, we obtain

Bt +(r + p)SBS +σ2S2

2 BSS − (r + p)B+ pRB = 0 (27.16)

Note that in this case, p appears in the drift term as well as in the discounting term.

27.3 A Convertible Bond with Credit Risk: A Simplified CaseWe now consider adding credit risk to the convertible bond model described in Section 27.1, using the approachdiscussed in Section 27.2 for incorporating credit risk. This section follows the same line of reasoning described in[9]. Consider a convertible bond with credit risk. Let the price of the convertible bond be denoted by V (S, t). Weassume that the interest rate is a known function of time r = r(t). To avoid complications at this stage, we assume thatthere are no put/call features, and conversion is only allowed at the terminal time, or in the event of default. We willallow a continuum of possibilities, in terms of stock price, when the company defaults on bond payments. Let S+ bethe stock price after default, and S− be the stock price before default. We will assume that

S+ = S−(1−η) , (27.17)

161

where 0 ≤ η ≤ 1. Note that η = 1 corresponds to the total default case (stock price jumps to zero), and η = 0corresponds to the partial default case (company defaults but price of stock is unchanged).

As usual, we construct the hedging portfolio

Π = V −αS (27.18)

In the case where p = 0, then if we choose α = VS, the standard arguments give

dΠ =

[

Vt +σ2S2

2 VSS

]

dt +o(dt) . (27.19)

Now, consider the case where the hazard rate p is nonzero. We make the following assumptions

• On default, the stock changes by equation (27.17).

• On default, the convertible bond holders have the option of receiving

(a) The amount RB, where where 0 ≤ R ≤ 1 is the recovery factor, and B is the bond component of the convert-ible security (we discuss models for B in later sections).

(b) The shares worth κS(1−η).

Under these assumptions, the change in value of the hedging portfolio in t → t +dt is

dΠ = (1− p dt)[

Vt +σ2S2

2 VSS

]

dt − p dt(V −αSη))+ pmax(κS(1−η),RB)+o(dt)

=

[

Vt +σ2S2

2 VSS

]

dt − p dt(V −VSSη)+ pmax(κS(1−η),RB)+o(dt) (27.20)

Assuming the expected return on the portfolio is given by equation (27.12) and equating this with the expectation ofthe equation (27.20), then we obtain

r [V −SVS] dt =

[

Vt +σ2S2

2 VSS

]

dt − p [V −VSSη] dt

+p [max(κS(1−η),RB)] dt +o(dt) . (27.21)

which gives

Vt +(r + pη)SVS +σ2S2

2 VSS − (r + p)V + pmax(κS(1−η),RB) = 0 . (27.22)

Note that r+ pη appears in the drift term, and r+ p appears in the discounting term in equation (27.22). In the casethat R = 0, η = 1, which is the total default case with no recovery, the final result is especially simple: we simply solvethe full convertible bond problem (27.4), with r replaced by r+ p. There is no need to solve an additional equation forthe bond component. This has been noted in [11].

Defining

M V ≡−Vt −(

σ2

2 S2VSS +(r + pη−q)SVS− (r + p)V)

(27.23)

then we can write equation(27.22) for the case where the stock pays a proportional dividend q as

M V − pmax(κS(1−η),RB) = 0 . (27.24)

27.4 Convertible Bonds with Risky Corporate Debt: the General CaseWe are now in a position to consider the complete problem for convertible bonds with risky debt. We can generalizeproblem (27.4), using equation (27.24).

162

• Bc > κS

M V −max(κS(1−η),RB) = 0(V −max(Bp,κS)) ≥ 0

(V −Bc) ≤ 0

∨

M V −max(κS(1−η),RB)≥ 0(V −max(Bp,κS)) = 0

(V −Bc) < 0

∨

M V −max(κS(1−η),RB)≤ 0(V −max(Bp,κS)) ≥ 0

(V −Bc) = 0

(27.25)

• Bc ≤ κS

V = κS . (27.26)

27.5 Previous WorkThere have been several attempts to determine the bond component of a convertible bond, in order to model the creditrisk of the bond component.

An early attempt is described in [76]. In this article, the probability of conversion is estimated, and the discountrate is a weighted average of the risk free rate and the risk free rate plus spread, where the weighting factor is theprobability of conversion.

More recently, the a model described in [83] has become popular. In the following, we will refer to this model asthe T &F model. This model is outlined in the latest edition of [47], and has been adopted by several software vendors.We will discuss this model in some detail.

27.5.1 T&F Model

Although not stated in [83], we deduce that the model described therein is a partial default model (stock price un-changed on default), since the equity part of the convertible is discounted at the risk-free rate. (Although the drift ratein [83] is the growth rate of the stock which is not precisely defined, so this is not entirely clear). In the following, wewill assume that the growth rate of the stock is the risk-free rate (the usual risk-neutral valuation of options).

In [83], the following variational inequality is proposed for the total convertible bond value V

• Bc > κS

LV + p(1−R)B = 0(V −max(Bp,κS)) ≥ 0

(V −Bc) ≤ 0

∨

LV + p(1−R)B≥ 0(V −max(Bp,κS)) = 0

(V −Bc) < 0

∨

LV + p(1−R)B≤ 0(V −max(Bp,κS)) ≥ 0

(V −Bc) = 0

(27.27)

• Bc ≤ κS

V = κS . (27.28)

It is convenient to describe the decomposition of the total convertible price as V = B +C, where B is the bondcomponent, and C is the equity component. In general, we can express the solution for V,B,C in terms of a coupledset of equations. Assuming that equation (27.27-27.28) is also being solved for V , then we can specify B,C. In theT &F [83] model, the following decomposition is suggested

• Bp > κS

LC = 0 ; LB+ p(1−R)B = 0 if V 6= Bp ;V 6= Bc

B = Bp ; C = 0 if V = Bp

B = 0 ; C = Bc if V = Bc (27.29)

163

+

Digital

Assetor nothing

FIGURE 27.1: Illustration of the T&F method [83] for decomposing a convertible bond.

• Bp ≤ κS

LC = 0 ; LB+ p(1−R)B = 0 if V 6= max(κS,Bc)

C = max(κS,Bc) ; B = 0 if V = max(κS,Bc) . (27.30)

It is easy to verify that the sum of equations (27.29-27.30) gives equations (27.27 - 27.28), noting that V = B+C.The terminal conditions for the T &F decomposition are

C(S, t = T ) = H(κS−F)(max(κS−F,0)+F)

B(S, t = T ) = H(F −κS)F (27.31)

where

H(x) = 1 if x ≥ 0= 0 if x < 0. (27.32)

However, the splitting in equations (27.29-27.30) does not seem to be based on specifying precisely what happens inthe case of default. In the original article [83], there is no discussion of the actual events in the case of default, andhow this would affect the hedging portfolio. There is no clear statement as to what happens to the stock price in theevent of default. Figure 27.1 shows the decomposition of the convertible bond using this idea.

27.5.2 T &F splitting: a simple case

In this section, we critique the T &F model, using a simple limiting case for ease of exposition.Consider a case where

• There are no put/call provisions.

• Conversion is only allowed at the terminal time.

164

• κ = 1

• The recovery rate R = 0.

In this case the payoff of the convertible is

V (S,T ) = max(S,F) (27.33)

In this simple case, the equations for the bond and call components become decoupled. From equations (27.30)and (27.31), we have that

LB+ pB = 0B(S,T ) = H(F −S)F , (27.34)

and the value of the asset or nothing call option C is

LC = 0C(S,T ) = H(S−F)S . (27.35)

Consider a case where default occurs for S < F, t = T −ε. In this case, since B ' F, the value of the convertible isalmost nothing (for this S < F). However, if we assume that we can convert at t = T , the convertible should be worth' S, which appears to be inconsistent.

27.5.3 T&F splitting: call just before expiry

Consider the following case

• There are no put provisions.

• The bond can only be called the instant before expiry, at t = T−. The call price Bc = F − ε, ε > 0, ε << 1.

• Conversion is only allowed at the terminal time, or at the call time.

• There are no coupons.

• κ = 1

Suppose that the bond is called at t = T−. From equations (27.30) and (27.31), we conclude that we end up solvingthe original problem with the altered payoff at t = T−

LV + pB = 0V (S,T−) = max(S,F − ε)

LB+ pB = 0B(S,T−) = 0 . (27.36)

Note that the condition on B at t = T−, is due to the boundary condition (27.30). Now, since the solution of equation(27.36) for B, with B = 0 initially, is B ≡ 0 for all t < T−, then the equation for the convertible bond is simply

LV = 0V (S,T−) = max(S,F − ε) . (27.37)

(27.38)

In other words, there is no effect of the hazard rate in this case. This peculiar situation comes about since the T &Fmodel requires that the bond value be zero if V = Bc, even if the effect of the call on the total convertible bond value atthe instant of the call is infinitesimally small. In the philosophy of the T &F model, this boundary condition is requiredfor consistency, since normally V = Bc only near where conversion takes place, and B = 0 at the conversion boundary.

165

27.5.4 T &F splitting: general case

Looking at most general case (equations (27.30) and (27.31)), we can see there are some problems with the boundaryconditions when B = Bp, and when V = κS.

Consider a case where at some time t << T , the bond can be put for Bp, where Bp > V for S < S∗. In this case,from equations (27.30) and (27.31), we have that B = V,C = 0 for S < S∗. In the event of default (with R = 0), thisimplies that the total value of the convertible is zero for S < S∗. However, since the convertible is a contingent claimwhich can be converted into κS shares at t = T , it must always be worth at least κS (assuming that S does not jump tozero on default).

27.6 A New General Decomposition of the Convertible BondAssume that the total convertible bond value is given by equations (27.25-27.26). We will now devise a splitting ofthe convertible bond into two componoents, such that V = B+C, where B is the bond component and C is the equitycomponent. The bond component, in the event the that there are no put/call provisions, should satisfy an equationsimilar to equation (27.16).

In this case, we propose the following splitting, which we will refer to as the V&F splitting in subsequent sections.

M C− pmax(κS(1−η)−RB,0) = 0(C− (max(Bc,κS)−B))≤ 0

(C− (κS−B)≥ 0

∨

M C− pmax(κS(1−η)−RB,0)≤ 0C = max(Bc,κS)−B

∨

M C− pmax(κS(1−η)−RB,0)≥ 0C = κS−B

. (27.39)

M B−RpB = 0B−Bc ≤ 0

B− (Bp−C) ≥ 0

∨

M B−RpB ≤ 0B = Bc

∨

M B−RpB≥ 0B = Bp −C

. (27.40)

It can be easily verified that by adding together equations (27.39-27.40), then V = B +C, satisfies equations (27.25-27.26).

We can write the payoff of the convertible as

V (S,T ) = F +max(κS−F,0) (27.41)

which suggests terminal conditions of

C(S,T ) = max(κS−F,0)

B(S,T ) = F (27.42)

Note that equations (27.39-27.40) do not force B → 0 when conversion is optimal. Consider a case where p =const., B < Bc, Bp = 0. In this case, the solution for B is

B = F exp[

−Z T

t(r(u)+ p(u)(1−R)) du

]

(27.43)

independent of S. It would seem clear that all holders of the convertible bond, if they have not converted, wouldbe entitled to RB, independent of S. Hence B cannot be forced to zero at the conversion boundary, since this wouldpenalize holders of the bond who have not converted, near the exercise boundary. Of course, those who have convertedwould obtain nothing on default, but non-zero B will not affect the price of the convertible once conversion has takenplace (it is forced to be κS).

166

27.6.1 Some special cases

If we assume that

• The stock price does not change on default, the partial default case (η = 0).


• The convertible bond is continuously convertible.

then equations (27.39-27.40) become (in the continuation region)

M V + p(V −κS) = 0 (27.44)

which has the simple intuitive interpretation: the convertible is discounted at the risk-free rate plus spread whenV >> κS and at the risk free rate when V ' κS, with smooth interpolation between these values. Equation (27.44)was suggested in [9]. Note that in this case, we need only solve a single linear complementarity problem for the totalconvertible value V .

Making the assumption that

• The stock price jumps to zero on default (the total default case with η = 1).


then equations (27.39-27.40) reduce to (in the continuation region)

M V = 0 (27.45)

which agrees with [80]. In this case, there is no need to split the convertible bond into equity and bond components.In the case of non-zero recovery, our model is slightly different from that in [80]. In [80], it is assumed that upondefault, the holder recovers RV , compared to our assumption that the holder recovers RB. Consequently, in the case ofnonzero R, our approach requires solution of the coupled set of linear complementarity problems (27.39-27.40), whilethe assumption in [80] requires only solution of a single linear complementarity problem. Since the total convertiblebond value V includes a fixed income component and an option component, it seems more reasonable to us that inthe event of total default (the assumption made in [80]), that the option component is by definition worthless, and thatonly a fraction of the bond component can be recovered. The total default case also appears to be similar to the modelsuggested in [22].

Note that if we assume that the stock price of a company jumps to zero on default, then when pricing vanillaoptions we would replace the risk free rate r by r + p in both the drift term and the discounting term. This wouldappear to be inconsistent with usual practice.

27.7 Numerical MethodDefine τ = T − t, so that the operator LV becomes

LV = Vτ −(

σ2

2 S2VSS +(r−q)SVS− rV)

, (27.46)

andM V = Vτ −

(

σ2

2 S2VSS +(r + pη−q)SVS− (r + p)V)

(27.47)

It is also convenient to define

H V ≡(

σ2

2 S2VSS +(r−q)SVS

)

, (27.48)

and

PV ≡(

σ2

2 S2VSS +(r + pη−q)SVS

)

, (27.49)

167

so that equation (27.46) can be written

LV = Vτ − (H V − rV) , (27.50)

and equation (27.47) becomes

M V = Vτ − (PV − (r + p)V) . (27.51)

The terms H V and PV are discretized using standard methods, see [100, 37, 36]. Let

V ni ≡ V (Si,τn) ,

and denote the discrete form of H V at (Si,τn) by (H V )ni , and the discrete form of M V by (M V )n

i . In the following,for ease of exposition, we will describe the timestepping method for a fully implicit discretization of equation (27.46).In actual practice, we use Crank-Nicolson timestepping with the modification suggested in [72] to handle non-smoothinitial conditions (which generally occur at each coupon payment). The reader should have no difficulty generalizingthe equations to the Crank-Nicolson or BDF [15] case.

27.7.1 T&F Model: Numerical Method

In this section, we describe the method used to solve equations (27.29-27.30). We denote the total value of theconvertible bond computed using explicit constraints by V E . A corrected total convertible value, obtained by applyingestimates for the constraints in implicit fashion is denoted by V I . Given initial values of (V E)n

i ,(V I)ni and Bn

i , thetimestepping proceeds as follows. First, the value of Bn+1

i is estimated, ignoring any constraints. We denote thisestimate by (B∗)n+1

i :

(B∗)n+1i −Bn

i∆τ

= (H B∗)n+1i − (r + p)n+1

i (B∗)n+1i . (27.52)

Then, this value of (B∗)n+1i is used to compute (V E)n+1

i

(V E)n+1i − (V E)n

i∆τ

= (H V E)n+1i − (rV E)n+1

i − (pB∗)n+1i . (27.53)

Then, we check the minimum value constraintsFor i = 1, ...,

Bn+1i = (B∗)n+1

iIf (Bp > κS) then

If ((V E)n+1i < Bp) then

(B)n+1i = Bp ; (V E)n+1

i = BpEndif

ElseIf ((V E)n+1

i < κS) then(B)n+1

i = 0.0 ; (V E)n+1i = κS

EndifEndif

EndforThen, the maximum value constraints are applied:

For i = 1, ...,If ((V E)n+1

i > max(Bc,κS)) then(B)n+1

i = 0 ; (V E)n+1i = max(Bc,κS)

EndifEndfor

In principle, we could simply go on to the next timestep at this point, using Bn+1i ,(V E)n+1

i . However, we havefound that convergence (as the timestep size is reduced) is enhanced and the delta and gamma values are smoother ifwe add the following steps:

168

Let

(Q V )n+1i ≡

(

(V I)n+1i − (V I)n

i∆τ

)

−(

(H V I)n+1i − (rV I)n+1

i − (pB)n+1i)

, (27.54)

then (V I))n+1i is determined by solving the discrete linear complimentary problem if Bc > κS

(Q V I)n+1 = 0((V I)n+1 −max(Bp,κS)) ≥ 0

((V I)n+1 −Bc)) ≤ 0

∨

(Q V I)n+1 ≥ 0((V I)n+1 −max(Bp,κS)) = 0

((V I)n+1 −Bc) < 0

∨

(Q V I)n+1 ≤ 0((V I)n+1 −max(Bp,κS)) ≥ 0

((V I)n+1 −Bc) = 0

(27.55)

and if Bc ≤ κS, then we apply the Dirichlet conditions

(V I)n+1i = κSi . (27.56)

A penalty method [37] is used to solve the discrete complementarity problem (27.55). Finally, we set

(V E)n+1 = (V I)n+1

Bn+1i := min(Bn+1

i ,(V I)n+1). (27.57)

The above algorithm essentially decouples the system of linear complementarity problems for B and V , by applyingthe constraints in a partially explicit fashion. However, we apply the constraints as implicitly as possible, withouthaving to solve the fully coupled linear complementarity problem. Consequently, we can only expect first orderconvergence (in the timestep size ∆τ) even if Crank-Nicolson timestepping is used. However, this approach makes itcomparatively straightforward to experiment with the different convertible bond models. As well, it is unlikely that forpractical convergence tolerances, the overhead of the fully coupled approach will result in lower computational costcompared to the decoupled method above.

27.7.2 V&F Model: Numerical Method

In this section, we desribe the numerical method used to solve discrete forms of (27.25-27.26) and (27.39-27.40).Given initial values of Cn

i and Bni , and the total value V n+1

i , the timestepping proceeds as follows. First, the value ofBn+1

i is estimated, ignoring any constraints. We denote this estimate by (B∗)n+1i :

(B∗)n+1i −Bn

i∆τ

= (P B)n+1i − (r + p)n+1

i Bn+1i +(pRB)n+1

i (27.58)

Then, Cn+1 is estimated, also ignoring constraints. We denote this estimate by (C∗)n+1i :

(C∗)n+1i −Cn

i∆τ

= (PC)n+1i − ((r + p)C∗)n+1

i + pmax(κS(1−η)−R(B∗)n+1i ,0) (27.59)

Then, we check the minimum value constraintsFor i = 1, ...,

Bn+1i = min(Bc,(B∗)n+1

i ) // overriding maximum constraintCn+1

i = (C∗)n+1i

If (Bp > κSi) thenBn+1

i := max(Bn+1i ,Bp −Cn+1

i )Else

169

Cn+1i := max(κSi −Bn+1

i ,Cn+1i )

EndifEndfor

Then, the maximum value constraints are applied:For i = 1, ...,

Cn+1i := min(Cn+1

i ,max(κSi,Bc)−Bn+1i )

EndforIn principle, we could simply continue on to the next timestep, setting V n+1

i = Bn+1i +Cn+1

i . However, as for theT &F splitting, convergence is enhanced if we carry out the following additional steps:

Let

(T V )n+1i ≡

(

V n+1i −V n

i∆t

)

−(

(PV )n+1i − ((r + p)V)n+1

i + pmax(κSi(1−η),(RB)n+1i )

)

, (27.60)

then V n+1i (the total convertible value) is determined by solving the discrete linear complimentary problem if Bc > κS

(T V )n+1 = 0((V )n+1 −max(Bp,κS)) ≥ 0

((V )n+1 −Bc) ≤ 0

∨

(T V I)n+1 ≥ 0(V n+1 −max(Bp,κS)) = 0

(V n+1 −Bc) < 0

∨

(T V I)n+1 ≤ 0(V n+1 −max(Bp,κS)) ≥ 0

(V n+1 −Bc) = 0

(27.61)

and if Bc ≤ κS, then we apply the Dirichlet conditions

V n+1i = κSi . (27.62)

A penalty method [37] is used to solve the discrete complementarity problem (27.55). Finally, we set

Bn+1i := min(Bn+1

i ,V n+1i )

Cn+1i := V n+1

i −Bn+1i (27.63)

which ensures that

V n+1i = Cn+1

i +Bn+1i . (27.64)

27.8 Numerical ExampleIn order to be precise about the way the put and call provisions are handled, we will describe the method used tocompute the effect of accrued interest and the coupon payments in some detail.

The payoff condition for the convertible bond is (at t = T )

V (S,T ) = max(κS,F +Klast) , (27.65)

where Klast is the last coupon payment.Let t be the current time in the forward direction. Let tp the time of the previous coupon payment, and tn be the

time of the next pending coupon payment, i.e. tp ≤ t ≤ tn. Then, we define the accrued interest on the pending couponpayment as

AccI(t) = Knt − tptn − tp

(27.66)

where Kn is the coupon payment at t = tn.The dirty call price Bc and the dirty put price Bp, which are used in equations (27.39-27.40) and equations (27.29-

27.30) are given by

Bc(t) = Bclc (t)+AccI(t)

Bp(t) = Bclp (t)+AccI(t) (27.67)

170

T 5Clean Call Price 110 in years 2−5

0 in years 1−2Clean Put Price 105 at 3 years

r .05p .02σ .20

Conversion ratio 1.0Recovery Factor R 0.0Face Value of Bond 100

Coupon dates .5,1.0,1.5, ...,5.0Coupon Payments 4.0

Total Default η = 1.0Partial Default η = 0.0

TABLE 27.1: Data for numerical example. Partial and total default cases defined by equation (27.17).

Nodes Timesteps This Work This Work T &F(Partial Default) (Total Default)

200 200 124.9158 122.7341 124.0025400 400 124.9175 122.7333 123.9916800 800 124.9178 122.7325 123.9821

1600 1600 124.9178 122.7319 123.97543200 3200 124.9178 122.7316 123.9714

TABLE 27.2: Comparison of this work (partial and total default) and T&F models, value at t = 0,S = 100. Data givenin Table 27.1. For the T&F model, partially implicit application of constraints. Total default (η = 1.0) and partialdefault (η = 0.0), defined in equation (27.17).

where Bclc and Bcl

p are the clean prices.Let t+i be the forward time the instant after a coupon payment, and t−i be the forward time the instant before a

coupon payment. If Ki is the coupon payment at t = ti, then the discrete coupon payments are handled by setting

V (S, t−i ) = V (S, t+i )+Ki

B(S, t−i ) = B(S, t+i )+Ki

C(S, t−i ) = C(S, t+i ) (27.68)

where V is the total convertible value, and B is the bond component. The coupon payments are modelled in the sameway for both T &F and V&F models.

The data used for the numerical examples is given in Table 27.1, which is similar to the data used in [83] (exceptthat some data, such as the volatility of the stock, was not given in [83]). We will confine our numerical examples tothe two common assumptions of total default (η = 1.0) or partial default (η = 0.0) (see equation (27.17).

Table 27.2 demonstrates the convergence of the numerical methods for both models. It is interesting to note thatthe partial and total default models (V&F) appear to give solutions correct to $.01 on coarse grids/timesteps, whileconsiderably finer grids/timesteps are required for the T &F model. This is not surprising, when we recall that thebond component of the T &F model effectively involves a time dependent knock-out barrier, which is difficult to solveaccurately. Note that the V&F partial default model gives a price which is about $1.00 higher than the T&F price. Incontrast, the V&F total default model is about $1.00 less than the T &F price.

The results in Table 27.2 should be contrasted with the results in Table 27.3, where the hazard rate p is set to zero.In this case, the value of the bond is about $2.00 more than the T&F model, and about $1.00 more than the V&Fpartial default model.

Table 27.4 shows the convergence of the T &F model, where the last fully implicit solution of the total bond value

171

Nodes Timesteps Value (p = 0)200 200 125.9500400 400 125.9523800 800 125.9528

1600 1600 125.95293200 3200 125.9529

TABLE 27.3: Value of convertible bond, value at t = 0,S = 100. Data given in Table 27.1, except that the hazard ratep = 0. In this case, both T&F and V &F give the same result.

Nodes Timesteps T &F (Table 27.2) T &F(partially implicit constraints) (explicit constraints)

200 200 124.00249 124.09519400 400 123.99160 124.05384800 800 123.98210 124.02508

1600 1600 123.97538 124.007983200 3200 123.97141 123.994336400 6400 123.97050 123.98531

TABLE 27.4: T&F model, value at t = 0,S = 100. Data given in Table 27.1. Comparison of partially implicit constraints(use equations (27.55-27.57)), and explicit application of constraints (omit equations (27.55-27.57)).

in equations (27.55-27.57) is omitted. In this case, we can compare the results in Table 27.4 to those in Table 27.2,and we observe that the extra implicit solve (equations (27.55-27.57)) does indeed speed up convergence as the grid isrefined and the timestep size is reduced.

It is interesting to see the behavior of the T &F bond component and the T &F total convertible value, an instantbefore t = 3 years. Recall from Table 27.1 the bond is puttable at t = 3, and there is a pending coupon payment aswell. Figure 27.3 shows the discontinuous behavior of the bond component near the put price, for the T &F model.Since V = B+C, then the call component also has a discontinuity.

Figure 27.4 shows results for the V &F total default model, with different recovery factors R (equation (27.25). Wealso show the case with no default risk (p = 0) for comparison. Note the rather curious fact that for R = 100%, thevalue of the convertible bond is above the value with no default risk.

This can be explained with reference to the hedging portfolio (27.18). Since VS ≥ 0, the portfolio is long the bondand short the stock. If there is a default, and the recovery factor is high the hedger obtains a windfall profit, since thereis a gain on the short position, and a very small loss on the bond position.

The previous examples used a constant hazard rate ( Table 27.1). However, it is more realistic to model the hazardrate as increasing as the stock price decreases. A parsimonious model of the hazard rate is given by

p(S) = p0

(

SS0

)α(27.69)

where p0 is the hazard rate observed at S = S0, which can be deduced from straight bonds issued by the same corpo-ration issuing the convertible bond.

In [65], a function of the form (27.69) was observed to be a reasonable fit to bonds rated BB+ and below, in theJapanese market. Typical values for α are in the range α = −1.2 to α = −2.0 [65].

In Figure 27.5 we compare the value of the total default model (V&F), for constant p(S), as well as p(S) given byequation (27.69). Data are as in Table 27.1, except that for the non-constant p(S) cases, we use equation (27.69), withp0 = .02, S0 = 100. Figures 27.6 and 27.7 show the corresponding delta and gamma values.

27.9 ConclusionsEven in the simple case of deterministic interest rates, but stochastic stock prices, there have been several modelsproposed for modelling default risk for convertible bonds. In order to value convertible bonds with credit risk, it is

172

80 90 100 110 120Stock Price

100

110

120

130

140

150

Con

verti

ble

Valu

e

No Default

This Work(Total Default)

T & F

This Work(Partial Default)

FIGURE 27.2: Convertible bond values, t = 0, showing the results for no-spread, the T&F model, and this work (partialand total default). Data as in Table 27.1. Total default (η = 1.0) and partial default (η = 0.0), see equation (27.17).

50 60 70 80 90 100 110 120 130 140 150Stock Price

0102030405060708090

100110120130140150

val

ue

Total Value

Non-equity Value

FIGURE 27.3: T&F model, total and non-equity (bond) component, t = 3 years, just before coupon payment and beforeput provision. Data as in Table 27.1.

173

80 90 100 110 120Stock Price

100

110

120

130

140

150

Con

verti

ble

Valu

e

No Default

Total Default (R = 0)

Total Default (R = 50%)

Total Default (R = 100%)

FIGURE 27.4: Total default model with different recovery rates. Data as in Table 27.1. Total default (η = 1.0) andpartial default (η = 0.0), see equation (27.17).

40 60 80 100 120Stock Price

50

75

100

125

150

Con

verti

ble

Valu

e Constant hazard rate

α = -1.2

α = -2.0

FIGURE 27.5: Total default models (V &F), constant hazard rate, and non-constant hazard rate with different exponentsin equation (27.69). S0 = 100, p0 = .02. Other data as in Table 27.1.

174

40 60 80 100 120Stock Price

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

Del

ta Constant hazard rate

α = -1.2

α = -2.0

FIGURE 27.6: Delta, total default models, constant hazard rate, and non-constant hazard rate with different exponentsin equation (27.69). S0 = 100, p0 = .02. Other data as in Table 27.1.

40 60 80 100 120Stock Price

-0.05

0

0.05

0.1

Gam

ma

Constant hazard rate

α = -1.2

α = -2.0

FIGURE 27.7: Gamma, total default models, constant hazard rate, and non-constant hazard rate with different exponentsin equation (27.69). S0 = 100, p0 = .02. Other data as in Table 27.1.

175

necessary to specify precisely what happens to the components of the hedging portfolio in the event of a default.In this work, we consider a continuum of possibilities for the value of the stock price after default. Two special

cases are

• Partial Default: The stock price is unchanged on default. The holder of the convertible bond can elect to

(a) Receive a recovery factor times the bond component value.(b) Convert the bond to shares.

• Total Default: The stock jumps to zero on default. The equity component of the convertible bond is, by defini-tion, zero. A fraction of the bond value of the convertible is recovered.

In the case of total default, and recovery factor zero, our model agrees with that in [80]. In this case, there is noneed to split the convertible bond into equity and bond components. In the case of non-zero recovery, our model isslightly different from that in [80]. This would appear to be due to a different assumption about the meaning of theterm recovery factor.

In the partial default case, the model developed in this work uses a different spitting (i.e. bond and equity compo-nents) than that used in [83]. We have presented several arguments as to why we think the model in [83] is somewhatinconsistent.

It is possible to make other assumptions about the behaviour of the stock price on default. As well, there may belimits on conversion rights on default. The approach described in this paper can handle a variety of these assumptions.

Convertible bond pricing results in a complex coupled system of linear complementarity problems. We have used apartially implicit method to decouple the system of linear complementarity problems at each timestep. The final valueof the convertible bond is computed by solving a full linear complementarity problem, (but with explicitly computedsource terms) which gives good convergence as the mesh and timestep are reduced, and also results in smooth deltaand gamma values.

It is clear that the value of a convertible bond depends on the precise behaviour assumed when the issuer goes intodefault. Given any particular assumption, it is straightforward to model these effects in the framework presented in thispaper. A decision concerning which assumptions are appropriate requires an extensive empirical study, for differentclasses of corporate debt.

28 Jump Diffusion: Numerical MethodsIn this section, we give a brief overview of numerical methods for jump diffusion. More details can be found in[29, 28].

As discussed in Section 10, it well known that the standard Black-Scholes model, which assumes a log-normalstock distribution, a constant interest rate and a constant volatility is flawed. To calibrate derivative prices, traders mustuse different volatilities or implied volatility surfaces for different strikes and maturities. As well, several extensionsto the Black-Scholes model (1973) have been suggested. The stochastic volatility approach assumes that the volatilityfollows a stochastic process which is correlated with the stock price [45, 95, 94, 48]. However, when calibrating thismodel, the correlation between the stock and volatility may be unrealistically high [2].

An alternative approach suggested by Merton (1976) [64] introduces the idea that stock prices have discontinuousreturns and presents the concept of a jump diffusion stochastic process. Empirical evidence supports the fact that theBlack-Scholes model assumption that stock price returns are log-normally distributed is not entirely satisfactory.

Several authors have demonstrated the importance of jumps for options close to maturity [89, 14, 10]. It has beensuggested that jumps could account for the fat tailed distributions observed in real market data. As a consequence,researchers have worked extensively to price options under the jump diffusion process.

Using a jump diffusion model, Andersen and Andreasen [2] fit S&P option prices, and obtain excellent fits withstable parameters. If the same fitting exercise is attempted without the Poisson jumps, then the parameters are muchless stable.

In general, pricing of a contingent claim under a jump diffusion process results in a Partial Integral DifferentialEquation (PIDE). In some cases, it is possible to derive analytical solutions [64, 2] for pricing contingent claims as-suming that the underlying asset follows a jump diffusion process. However, in most cases it is not possible to obtainan analytical result for exotic and/or path dependent options. Very little work has been carried out for numerically

176

pricing exotic options under jump diffusion processes. The method suggested by Amin (1993) [1] is based on multi-nomial trees, which is equivalent to an explicit finite difference method. Explicit methods, of course, have timesteplimitations due to stability considerations, and are generally only first order correct. In [93], a method is devlopedwhich uses an explicit method for the jump integral term, and an implicit method for the PDE term. However, ratherrestrictive stability conditions were required in [93]. More recently, a method based on use of a wavelet transformwas suggested in [63]. The basic idea is to use a wavelet transform to approximate the dense matrix discrete integraloperator by dropping small terms.

Andersen and Andreasen [2] used an operator splitting type approach combined with a Fast Fourier Transform(FFT) evaluation of a convolution integral to price European options with jump diffusion. In this secion, we developa different approach. We prove that the jump diffusion term can be discretized explicitly, and, when coupled with animplicit treatment of the usual PDE, the resulting timestepping method is unconditionally stable. However, in order toachieve second order convergence, an implicit treatment of the jump integral term is preferred. We prove that a simplefixed point iteration scheme can be used to solve the discretized algebraic equations, and that this iteration is globallyconvergent. In fact, for typical values of the timestep size and Poisson arrival intensity, the error is reduced by twoorders of magnitude at each iteration.

We also develop a method for rapidly computing the jump integral term. We make no assumptions about theprobability density for the jump term. This general approach requires the evaluation of correlation type integrals, as in[93], compared to the convolution integral which appears to be common in the literature.

The correlation integral term can be rapidly computed using FFT methods. In contrast with previous work, we donot assume that the grid is equally spaced in S space (S = asset price). We also show how to eliminate the wrap-aroundeffects which often plague FFT methods.

A major advantage of the method developed here is that it is straightforward to add a jump process to existingoption pricing software. In particular, existing software, which uses an implicit approach for valuing American options,can be simply modified to price American options with jump diffusion. Non-linear pricing models (transaction costs,uncertain volatility) can also be easily extended to model jumps, as well as path dependent pricing problems (Asian,Parisian) [99].

28.1 Mathematical modelLet S represent the underlying stock price. The potential stock paths followed by the stock can be modeled by astochastic differential equation given by

dSS = ξdt +σdZ +(η−1)dq (28.1)

where

ξ is the drift rate,r is the risk free interest rate,

dq is the independent Poisson process,

dq =

0 with probability 1−λdt1 with probability λdt,

λ is the mean arrival time of the Poisson process,η−1 is an impulse function producing a jump from S to Sη,

κ is the expected relative change in the stock price, κ = E[η−1],

σ is the volatility,dZ is an increment of the standard Gauss-Wiener process.

In the following, we will assume that r > 0. Under equation (28.1), the stock price S has two sources of uncertainty.The term σdZ corresponds to normal uncertainty (e.g. sales, profits) while the term dq describes exceptional events(e.g. CEO lay-off, accounting practices). If the Poisson event does not occur (λ = 0), then equation (28.1) is equivalentto the usual stochastic process used for the Black and Scholes equation [89]. If, on the other hand, the Poisson event

177

occurs, then equation (28.1) can be written asdSS ' (η−1), (28.2)

where η− 1 is an impulse function producing a jump from S to Sη. Consequently, the resulting sample path forthe stock S (equation (28.1)) will be continuous most of the time with finite negative or positive jumps with variousamplitudes occurring at discrete points in time.

Let V (S, t) be the value of a contract that depends on the underlying stock price S and time t. The followingbackward Partial Integral Differential Equation (PIDE) for the value of V (S,τ) is found to be [64, 89, 2]

Vτ =12σ2S2VSS +(r−λκ)SVS− rV +

(

λZ ∞

0V (Sη)g(η)dη−λV

)

, (28.3)

where

T is the expiry/maturity date,t is the current time,τ = T − t,

g(η) is the probability density function of the jump amplitude η such that,

∀η,g(η) ≥ 0, andZ ∞

0g(η)dη = 1.

A crucial assumption which is made in the derivation of equation (28.3), is that jump risk is diversifiable.For brevity, the details of the derivation of equation (28.3) have been omitted (see [64, 89, 2]). Equation (28.3) can

be rewritten asVτ =

12σ2S2VSS +(r−λκ)SVS− (r +λ)V +λ

Z ∞

0V (Sη)g(η)dη. (28.4)

Note that if we set λ = 0 in (28.4), then the classical Black-Scholes partial differential equation for pricing Euro-pean option contracts is recovered [89].

28.2 Implicit Discretization Methods28.2.1 Discretization of the Integral Term

Straightforward evaluation of equation (28.4) could use standard numerical discretization methods [81, 95, 96] com-bined with numerical integration methods such as Simpon’s rule or Gaussian quadrature [70]. However, this straight-forward approach is computationally expensive [81]. It is computationally efficient to transform the integral in equa-tion (28.4) into a correlation integral. This allows efficient Fast Fourier Transform (FFT) methods to be used toevaluate the integral for all values of S.

LetI(S) =

Z ∞

0V (Sη)g(η)dη. (28.5)

Using the change of variable

y = log(Sη), η =exp(y)

S ,and dη =exp(y)

S dy, (28.6)

and substituting (28.6) into (28.5), we obtain

I =

Z ∞

−∞V (exp(y))g

(

exp(y)S

)

exp(y)S dy. (28.7)

Let f (u) = g(u)u. It follows that (28.7) becomes

I =Z ∞

−∞V (exp(y)) f

(

exp(y)S

)

dy. (28.8)

178

Let x = log(S), V (x,τ) = V (exp(x),τ) = V (S,τ) and f (logy) = f (y). Substituting into (28.8), we have

I =

Z ∞

−∞V (y) f (y− log(S))dy

=

Z ∞

−∞V (y) f (y− x)dy

=

Z ∞

−∞V (x+ y) f (y))dy. (28.9)

Note that f (x) is the probability density of a jump of size x = logη. Equation (28.9) corresponds to the correlationproduct ⊗ of V (y) and f (y) [70]. Equation (28.9) can be written as

I = V ⊗ f . (28.10)

If f is an even function ( f (x) = f (−x)), then (28.10) corresponds to the convolution product [70].As a specific example, consider the probability density function suggested by [81, 64];

g(η) =exp(

− (log(η)2−µ)2γ2

)

√2πγη

. (28.11)

It follows that if f (η) = g(η)η, then

f (x) = g(exp(x))exp(x) (28.12)

=exp(

− (x−µ)2

2γ2

)

√2πγexp(x)

exp(x) (28.13)

=exp(

− (x−µ)2

2γ2

)

√2πγ

. (28.14)

Let E[·] be the expecation operator. Then, E[η] = exp(µ+ γ2/2), which means that E[η−1] = κ = exp(µ+ γ2/2)−1.Returning to the general case, recall that the correlation integral is

I(x) =Z ∞

−∞V (x+ y) f (y))dy, (28.15)

Let

xi = xmin + i∆x

∆x =xmin − xmax

N −1 (28.16)

where xmin = logSmin, and xmax = logSmax. Smax is the largest value of S in our computational grid, while Smin isselected to be near S = 0. Let

Ii = I(xmin + i∆x) ; i = 0, ...,N −1V i = V (xmin + i∆x) ; i = 0, ...,N −1 (28.17)

and

f j =

Z x j+∆x/2

x j−∆x/2f (x) dx

x j = j∆xj = −N/2+1, ...,N/2 (28.18)

179

then the correlation integral can be approximated by (choosing ∆y = ∆x)

Ii =j=N/2

∑j=−N/2+1

V i+ j f j∆y+O((∆y)2), (28.19)

We have assumed in equation (28.19) that N is selected sufficiently large so that f (±N/2∆y) ' 0. We have assumedthat V (logS) = V (S).

Technical Point Let’s write equation (28.19) as

Ii =j=N/2

∑j=0

V i+ j f j +j=−1

∑j=−N/2+1

V i+ j f j . (28.20)

Let k = j +N, so that

j=−1

∑j=−N/2+1

V i+ j f j =k=N−1

∑k=N/2+1

V i+k−N f k−N . (28.21)

Now, define

V j−N = V j+N = V j (28.22)

and

f ′i = f i ; i = 0, ..,N/2f ′i = f i−N ; i = N/2+1, ...,N−1 . (28.23)

Then

k=N−1

∑k=N/2+1

V i+k−N f k−N =j=N−1

∑j=N/2+1

V j+k f ′j (28.24)

or, combining equations (28.21) and (28.24) we have

j=−1

∑j=−N/2+1

V i+ j f j =j=N−1

∑j=N/2+1

V j+k f ′j . (28.25)

Now, using equations(28.20,28.23,28.25) we obtain

Ii =j=N−1

∑j=0

V i+ j f ′j . (28.26)

Equation (28.26) is now in the standard form which can be used as imput to an FFT routine for fastevaluation of I. Note that the FFT should be computed using f ′j, as defined in equation (28.23), not f j.Note as well that we have defined a periodic extension to V as defined in equation (28.22). We have to becareful that this does not cause wrap around pollution.

An important property to note, which will be used in later sections, is that

f j ≥ 0 , ∀ jj=N/2

∑j=−N/2+1

f j∆y ≤ 1, (28.27)

180

since f (y) is a probability density, and f j is defined by equation (28.18).The discrete form of the correlation integral (28.19) uses an equally spaced grid in logS coordinates. While this is

convenient for FFT evaluation of the correlation integral, this is not particularly convenient for discretizing the PDE.We will use an unequally spaced grid in S coordinates for the PDE discretization [S0, ...,Sp]. Let

V ni = V (Si,τn) . (28.28)

Now, V j will not necessarily coincide with any of the discrete values Vk in equation (28.28). Consequently, we willlinearly interpolate (using Lagrange basis functions defined on the S grid) to determine the appropriate values, i.e. if

Sϒ( j) ≤ e j∆x ≤ Sϒ( j)+1, (28.29)

then

V j = ψϒ( j)Vϒ( j) +(1−ψϒ( j))Vϒ( j)+1 +O((∆Sϒ( j)+1/2)2), (28.30)

where ψϒ( j) is an interpolation weight, and

∆Si+1/2 = Si+1 −Si . (28.31)

We are now faced with the problem that the integral Ii is evaluated at a point S = exi which does not coincide with agrid point Sk. We simply linearly interpolate the Ii to get the desired value. If

exΠ(k) ≤ Sk ≤ exΠ(k)+1 , (28.32)

then

I(Sk) = φΠ(k)IΠ(k) +(1−φΠ(k))IΠ(k)+1 +O((exΠ(k) − exΠ(k)+1)2) , (28.33)

where φΠ(k) is an interpolation weight. Note that that

0 ≤ φi ≤ 10 ≤ ψi ≤ 1. (28.34)

Putting equations (28.19), (28.30), and (28.33) together gives

I(Sk) =j=N/2

∑j=−N/2+1

χ(V,k, j) f j∆y, (28.35)

where V = [V0,V1, ...,Vp]T and

χ(V,k, j) =

φΠ(k)[

ψϒ(Π(k)+ j)Vϒ(Π(k)+ j) +(1−ψϒ(Π(k)+ j))Vϒ(Π(k)+ j)+1]

+(1−φΠ(k))[

ψϒ(Π(k)+1+ j)Vϒ(Π(k)+1+ j) +(1−ψϒ(Π(k)+1+ j))Vϒ(Π(k)+1+ j)+1]

.

(28.36)

For future reference, note that χ(V,k, j) is linear in V , and that if 1 = [1,1, ...,1], then it follows from properties(28.34) that

χ(1,k, j) = 1 ∀k, j . (28.37)

181

28.2.2 Discretization of the Full PIDE

Equation (28.4) can now be approximated by replacing derivatives by difference approximations. The integral termis approximated using equation (28.35). We use a fully implicit method for the usual PDE, and then use a weightedtimestepping method for the jump integral term. The discrete equations can then be written

V n+1i [1+(αi +βi + r +λ)∆τ]−∆τβiV n+1

i+1 −∆ταiV n+1i−1

= V ni +(1−θJ)∆τλ

j=N/2

∑j=−N/2+1

χ(V n+1, i, j) f j∆y

+θJ∆τλj=N/2

∑j=−N/2+1

χ(V n, i, j) f j∆y. (28.38)

Discretizing the first derivative term of equation (28.4) with central differences leads to

αni,central =

σ2i S2

i(Si −Si−1)(Si+1 −Si−1)

− (r−λκ)SiSi+1 −Si−1

βni,central =

σ2i S2

i(Si+1 −Si)(Si+1 −Si−1)

+(r−λκ)SiSi+1 −Si−1

. (28.39)

If αi,central or βi,central is negative, oscillations may appear in the solution. The oscillations can be avoided by usingforward or backward differences at the problem nodes, leading to (forward difference)

αni,forward =

σ2i S2

i(Si −Si−1)(Si+1 −Si−1)

βni,forward =

σ2i S2

i(Si+1 −Si)(Si+1 −Si−1)

+(r−λκ)SiSi+1 −Si

. (28.40)

Or, (backward difference)

αni,backward =

σ2i S2

i(Si −Si−1)(Si+1 −Si−1)

− (r−λκ)SiSi −Si−1

βni,backward =

σ2i S2

i(Si+1 −Si)(Si+1 −Si−1)

. (28.41)

Algorithmically, we decide between a central or forward discretization at each node for equation (28.38) as follows:

If [αi,central ≥ 0 and βi,central ≥ 0] thenαi = αi,central

βi = βi,central

ElseIf[

βi,forward ≥ 0]

thenαi = αi,forward

βi = βi,forward

Elseαi = αi,backward

βi = βi,backward

EndIf

(28.42)

Note that the test condition (28.42) guarantees that αi and βi are non-negative. For typical values of σ,r andgrid spacing, forward differencing is rarely required for single factor options. In practice, since this occurs at only a

182

small number of nodes remote from the region of interest, the limited use of a low order scheme does not result inpoor convergence as the mesh is refined. As we shall see, requiring that all αi and βi are non-negative has importanttheoretical ramifications.

As S → 0, equation (28.3) reduces to

Vτ = −rV (28.43)

which is simply incorporated into the discrete equations (28.38) by setting αi,βi,λ = 0 at Si = 0.In practice we truncate the S grid at some large value Sp = Smax, where we impose Dirichlet conditions. This is

imposed replacing equation (28.38) at S = Smax = Sp, by

V n+1p = specified. (28.44)

28.2.3 Stability AnalysisIn this section, we analyze the stability of the discretization (28.38).

Theorem 28.1 (Stability of scheme (28.38)) The discretization method (28.38) is unconditionally stable, for anychoice of θJ ,0 ≤ θJ ≤ 1, provided that

• αi,βi ≥ 0 (see section 28.2.2)

• The discrete probability density f j has the properties (28.27).

• The interpolation weights satisfy (28.34).

• r,λ ≥ 0.

Proof . Let V n = [V n0 ,V n

1 , ...,V np ]′ be the discrete solution vector to equation (28.38). Suppose the initial solution

vector is perturbed, i.e.

V 0 = V 0 +E0, (28.45)


p]′ is the perturbation vector. Note that En

p = 0 since Dirichlet boundary conditions are imposedat this node.

Then, following the usual steps, we obtain the following equation for the propagation of the perturbation (notingthat χ is a linear operator)

En+1i [1+(αi +βi + r +λ)∆τ]−∆τβiEn+1

i+1 −∆ταiEn+1i−1

= Eni +(1−θJ)∆τλ

j=N/2

∑j=−N/2+1

χ(En+1, i, j) f j∆y

+θJ∆τλj=N/2

∑j=−N/2+1

χ(En, i, j) f j∆y. (28.46)

Now, if we define

‖E‖n = maxi

|Ei|n, (28.47)

then, it follows that since αi,βi ≥ 0, and using properties (28.27), (28.34), and (28.37) that

|En+1i | [1+(αi +βi + r +λ)∆τ] ≤ ‖E‖n +(1−θJ)∆τλ‖E‖n+1 +θJ∆τλ‖E‖n (28.48)

+∆τβi|En+1i+1 |+∆ταi|En+1

i−1 |, (28.49)

which then implies

|En+1i | [1+(αi +βi + r +λ)∆τ] ≤ (∆τβi +∆ταi)‖E‖n+1 (28.50)

+ ‖E‖n +(1−θJ)∆τλ‖E‖n+1 +θJ∆τλ‖E‖n. (28.51)

183

Now, equation (28.51) is valid for all i < p. In particular, it is true for node i∗, where

maxi

|En+1i | = |En+1

i∗ |, (28.52)

so that writing equation (28.51) for i = i∗ gives

‖E‖n+1 [1+(r +θJλ)∆τ] = ‖E‖n(1+θJ∆τλ), (28.53)

and so that

‖E‖n+1 ≤ ‖E‖n (1+θJ∆τλ)

(1+(r +θJλ)∆τ)≤ 1. (28.54)

This result is somewhat surprising, since we can discretize the correlation integral term explicitly (θJ = 1), yetscheme (28.38) is unconditionally stable. It appears that this result was known previously [2], but no proof was givenin [2]. In [93], the method with θJ = 1 was derived for the American option pricing problem. However, the proofs in[93] suggest that this method (θJ = 1) is only conditionally stable, in contrast to Theorem 28.1.

28.3 Crank-Nicolson DiscretizationThe discretization method used in the previous section is only first order correct in the time direction. In order toimprove the timestepping error, we can use a Crank-Nicolson method, which results in the following

V n+1i

[

1+(αi +βi + r +λ)∆τ2

]

− ∆τ2 βiV n+1

i+1 − ∆τ2 αiV n+1

i−1

= V ni

[

1− (αi +βi + r +λ)∆τ2

]

+∆τ2 λ

j=N/2

∑j=−N/2+1

χ(V n+1, i, j) f j∆y

+∆τ2 αiV n

i−1 +∆τ2 βiV n

+1 +∆τ2 λ

j=N/2

∑j=−N/2+1

χ(V n, i, j) f j∆y. (28.55)

If we define the matrix M such that

−[MV n]i = V ni (αi +βi + r +λ)

∆τ2

−∆τ2 βiV n

i+1 −∆τ2 αiV n

i−1

−∆τ2 λ

j=N/2

∑j=−N/2+1

χ(V n, i, j) f j∆y, (28.56)

then we can write equation (28.55) as

[I−M]V n+1 = [I +M]V n. (28.57)

Alternatively, we can define

B = [I−M]−1 [I +M] , (28.58)

so that equation (28.57) can be written as

V n = BnV 0. (28.59)

184

Consequently, following the usual steps, an initial perturbation vector E0 will generate a perturbation at the n′th step,En, given by

En = BnE0. (28.60)

The stability of the operator B is defined in terms of the power boundedness of B. If n is the number of timesteps,and p is the number of grid nodes, then given some matrix norm ‖.‖, we say that B is strictly stable if

‖Bn‖ ≤ 1 ,∀n, p . (28.61)

Strong stability is defined as [43]

‖Bn‖ ≤ C ,∀n, p, (28.62)

where C is a constant independent of n, p. Algebraic stability [43] is defined as

‖Bn‖ ≤ Cns pq ,∀n,m. (28.63)

Algebraic stability is obviously a weaker condition than strict or strong stability. Note that the Lax EquivalenceTheorem states that strong stability is a necessary and sufficient condition for convergence for all initial data. Weakeralgebraic stability yields convergence only for certain initial data. For a more detailed discussion of this, see [43].

If µi are the eigenvalues of B, then a necessary condition for strong stability is that

|µi| ≤ 1, (28.64)

and that any |µi| = 1 has multiplicity one.From equation (28.56) and noting properties (28.27), we have that

• The off-diagonals of M are all non-negative,

• The diagonals of M (excluding the last row) are strictly negative,

• Assuming that r > 0,j=p

∑j=0

Mi j < 0 ; i = 0, ..., p−1, (28.65)

• The last row of M is identically zero due to the Dirichlet boundary condition (28.44).

It then follows that all the Gershgorin disks of M are strictly contained in the left half complex plane, with oneeigenvalue identically zero. Hence all the eigenvalues of B are strictly less than one in magnitude, with one eigenvaluehaving modulus one. As a result, B satisfies the necessary conditions for strict stability.

However, since B is non-symmetric, this is not sufficient for power boundedness of B (see [17] for a counterexam-ple). As discussed in [56, 61], we can guarantee algebraic stability by examining the γ numerical range of the matrixM as in [56, 61]. In the case γ = 1, the γ = 1 numerical range of M coincides with the convex hull of the Gershgorindisks of M when the maximum norm is used in equation (28.63). These results can be summarized in the followingTheorem.

Theorem 28.2 (Algebraic Stability of Crank-Nicolson Timestepping) The Crank-Nicolson discretization (28.55)is algebraically stable in the sense that

‖Bn‖∞ ≤ Cn1/2 ;∀n, p (28.66)

where C is independent of n, p.

Proof . Since all the Gershgorin disks of M are in the left half complex plane, this follows from the results in [61].

In fact, we believe that the algebraic stability estimate is overly pessimistic. In the case of constant coefficients, andusing a log-spaced grid, in Section 28.8.2, we show using Von Neumann analysis that Crank-Nicolson timestepping(with the correlation product) is unconditionally strictly stable.

185

28.4 Fixed Point Iteration MethodOf course, when using an implicit discretization, it is computationally infeasible to solve the full linear system, sincethe correlation product term makes the system dense. Consequently, we will use a fixed point iteration to solve thelinear system which results from an implicit discretization of the correlation product term. This idea was suggested in[81], but no convergence analysis was given.

Define the matrix M such that

−[MV n]i = V ni (αi +βi + r +λ)∆τ

−∆τβiV ni+1−∆ταiV n

i−1, (28.67)

and the vector Ω(V n) such that (Ω(V n) is a linear function of V n)

[Ω(V n)]i =j=N/2

∑j=−N/2+1

χ(V n, i, j) f j∆y. (28.68)

Then we can write a fully implicit (θ = 0) or Crank Nicolson (θ = 1/2) discretization as

[I− (1−θ)M]V n+1 = [I +θM]V n +(1−θ)λ∆τΩ(Vn+1)

+θλ∆τΩ(V n). (28.69)

We can then derive the fixed point iteration method

Fixed Point Iteration

Let (V n+1)0 = V n

Let V k = (V n+1)k

For k = 0,1,2, . . . until convergenceSolve[

I− (1−θ)M]

V k+1

=[

I +θM]

V n

(1−θ)λ∆τΩ(V k)+θλ∆τΩ(V n)

If maxi

|V k+1i − V k

i |max(1, |V k+1

i |)< tolerance then quit

EndFor

(28.70)

Let ek = V n+1−V k, then the convergence of the fixed point scheme can be summarized in the following Theorem.

Theorem 28.3 (Convergence of the fixed point iteration) Provided that

• αi,βi ≥ 0 (see section 28.2.2)

• The discrete probability density f j has the properties (28.27).

• The interpolation weights satisfy (28.34).

• r > 0,λ ≥ 0

then the fixed point iteration (28.70) is globally convergent, and the maximum error at each iteration satisfies

‖ek+1‖∞ ≤ ‖ek‖∞(1−θ)λ∆τ

1+(1−θ)(r +λ)∆τ. (28.71)

186

Proof . It is easily seen from iteration (28.70) that ek satisfies[

I − (1−θ)M]

ek+1 = (1−θ)λ∆τΩ(ek) , (28.72)

so that following the same steps as used to prove Theorem 28.1, we obtain

‖ek+1‖∞ ≤ ‖ek‖∞(1−θ)λ∆τ

1+(1−θ)(r +λ)∆τ< 1. (28.73)

Note that typically λ∆τ 1, so that

‖ek+1‖∞ ' ‖ek‖∞(1−θ)λ∆τ, (28.74)

which will result in rapid convergence. It is also interesting to observe that the number of iterations required forconvergence is independent of the number of nodes in the S grid.

28.5 Numerical issues when pricing options under the log-normal processIn this section, we study in detail the different numerical issues that arise when pricing options under the jump diffusionprocess [64].

As we have shown, it is possible to write the PIDE as a PDE and a correlation product (equation (28.10)). Notethat each iteration of the scheme (28.70) requires evaluation of this correlation integral for all points on the PDE grid.

Fast evaluation of this integral using FFT methods necessitates transformation to an equally spaced grid in x =log(S) coordinates. If the original PDE grid is equally spaced in log(S), then there is clearly no difficulty. However,this type of grid spacing is not very efficient for complex payoffs, barriers and other exotic options. We thereforeprefer not to restrict the type of grid used for the original PDE.

Recall that the correlation integral is

I(x) =Z ∞

−∞V (x+ y) f (y)dy, (28.75)

or in discrete form

Ii =j=N/2

∑j=−N/2+1

V i+ j f j∆y+O((∆y)2), (28.76)

where Ii = I(xmin + i∆x), V i = V (xmin + i∆x), i = 0, ..,N−1, f j = f ( j∆y), j =−N/2+1, ...,N/2. Note that ∆y = ∆x,and that V (logS) = V (S).

Now, V j will not necessarily coincide with any of the discrete values Vk in equation (28.38). Consequently, wewill linearly interpolate to determine the appropriate values, as in equation (28.30).

Since equation (28.19) has the form of a discrete correlation, we can use a Fast Fourier transform (FFT) to computethis efficiently. Assuming that f is real, then

FFT(I)k = ( FFT(V ))k( FFT( f ))∗k , (28.77)

where (·)∗ denotes the complex conjugate. Since f (z) is the probability density of z = logη, which is a specifiedfunction, we can simply precompute FFT( f ) on the required equally spaced grid in z coordinates.

We can then carry out an inverse FFT to obtain the values of the correlation integral on the equally spaced x = logSgrid. A further interpolation step is required to obtain the value of the correlation integral on the original S grid(equation (28.33)).

We can summarize the steps required to generate the required values I(Sk),k = 0, ..., p.

187

• Interpolate the discrete values of V onto an equally spaced log(S) grid. This generates therequired values of V j.

• Carry out the FFT on this data.

• Compute the correlation in the frequency domain (with precomputed FFT(

f)

), using equa-tion (28.77).

• Invert the FFT of the correlation.

• Interpolate the discrete values of I(xi) onto the original S grid.

(28.78)Note that as long as linear or higher order interpolation is used, this procedure is second order correct, which

is consistent with the discretization error in the PDE and the midpoint rule used to evaluate the integral (equation(28.19)).

In principal, we can avoid the interpolation steps in the above procedure if we use special techniques for computingthe FFT for unequally spaced data. There are several methods for computing the inverse FFT problem (i.e. givenunequally spaced data, determine the Fourier coefficients) as well as the forward FFT problem (given the Fouriercoefficients, determine the inverse transform values on an unequally spaced grid) [30, 88, 69]. However, it should benoted that we are not particularly interested in obtaining highly accurate estimates of the discrete Fourier coefficients,but we simply need to evaluate the correlation integral correct to second order. Consequently, there is no particularbenefit, in terms of accuracy (for our purposes) in using these methods.

This issue is discussed in Section 28.8.3. We will use the interpolation method (28.78) followed by the standardFFT in subsequent sections.

28.5.1 Wrap around and DFT Evaluation of the CorrelationThe FFT algorithm effectively assumes that the input functions are periodic. This may cause wrap around pollutionunless special care is taken when devising the algorithm.

The integral (28.9) is approximated on the finite domain

I(x) =

Z ymax

yminV (x+ y) f (y)dy. (28.79)

The PDE part of the PIDE (28.4) is computed using the finite computational domain [0,Smax], using the discrete gridS0,S1, ...Sp. Initially, we chose

ymax = log(Smax)

ymin = log(S1) (28.80)

assuming S1 > 0. Note that ymin = log(S1) since normally S0 = 0, so that log(S0) = −∞.Generally f (y), which represents the probability density of a jump of S → ηS, where y = logη is rapidly decaying

for |y| >> 0 (see equation (28.14) for a specific example).However V (y), does not decay to zero near y = ymin,ymax. Typically, V (S) ≤ Const. S as S → ∞, and V (S) '

Const. as S → 0, or in y = logS coordinates

V (y) ≤ Const. ey ; y → ∞ (28.81)' Const. ; y →−∞. (28.82)

This will cause some problems if we use an FFT approach to evaluate the integral (28.79), since the DFT is effectivelyapplied to the periodic extension of the input functions. This will cause undesirable wrap-around effects. To avoidthese problems, we will extend the domain of the integral to the left and right, by a size which reflects the width of theprobability density. In other words, we will actually evaluate

Iext(x) =

Z ymax+∆y+

ymin−∆y−V (x+ y) f (y)dy, (28.83)

188

using FFT methods.In order to determine values in the extended region, we solve the following PDE-PIDE in the region [0,Smaxe∆y+

]

Vτ =

−rV if S = 012 σ2S2VSS +(r−λκ)SVS− (r +λ)V +λ

R ∞0 V (Sη)g(η)dη if 0 < S < Smax

12 σ2S2VSS + rSVS− rV if Smax ≤ S ≤ Smaxe∆y+

. (28.84)

Note that we have assumed that Smax is sufficently large so that it is valid to assume that approximation

VSS ' 0 (28.85)

is valid in Smax ≤ S ≤ Smaxe∆y+ . If VSS = 0, then the PIDE reduces to the PDE with λ = 0. Similarly, The PIDE reducesto the PDE with λ = 0 at S = 0.

The values of V (u) for u ∈ [ymax,ymax +∆y+] are estimated using simple linear interpolation. The values in the leftextension [ymin −∆y−,ymin] can be determined from interpolation on the original S grid.

This extended domain is then used as input to the forward DFT, the correlation computation (in the spectraldomain), and the inverse DFT. The values in the domain extensions are affected by wrap-around and are discarded,since they are not needed. (Recall from equation (28.84) that we solve the PDE at S = 0, and in the right domainextension [Smax,Smaxe∆y+

], so that the values of the correlation integrals are not needed in these regions).We show how select the domain extensions ∆y+,∆y− such that FFT wrap-around effects are less than a user

specified tolerance. To avoid algebraic complication, we derive the results in an informal way. We focus on the errordue to the FFT wrap-around. We assume that any other errors (interpolation, discretization of the integral, etc.) aresecond order in the asset grid spacing, and we ignore such errors in the following.

We make the assumptions

ymin < 0ymax > 0

V (y) ≥ 0

V (y) ≤ max(A2,A1ey)

f (y) ≤ A3e−γ|y|, ∀y, γ > 2

maxV (y) ≤ A4 ; y ∈ [ymin,ymax] (28.86)

where A1,A2,A3,A4 are constants independent of y. We assume that V (y) is only given at discrete points on the intervaly ∈ [ymin,ymax].

Recall that we wish to compute an approximation to

I(x) =

Z ymax

yminV (x+ y) f (y)dy. (28.87)

Considering the case where x = ymax, equation (28.87) becomes

I(ymax) =

Z ymax

yminV (ymax + y) f (y) dy

=

Z 0

yminV (ymax + y) f (y) dy+

Z ymax

0V (ymax + y) f (y) dy. (28.88)

When using an FFT to evaluate the correlation integral, the termZ ymax

0V (ymax + y) f (y) dy

is actually evaluated usingZ ymax

0V (ymin + y) f (y) dy (28.89)

189

due to the wrap-around effect of the discrete FFT. The idea here is to extend the definition of V to the interval y ∈[ymin −∆y−,ymax + ∆y+]. We make the assumption that the values of V can be obtained in the extended regionscorrect to second order, and we ignore these errors in the following. Setting ∆y− = 0 for the time being, equation(28.88) becomes

I(ymax) =

Z ∆y+

yminV (ymax + y) f (y) dy+

Z ymax

∆y+V (ymax + y) f (y) dy. (28.90)

Now, the wrap-around error E(ymax) which will occur using an FFT will be

E(ymax) 'Z ymax

∆y+|V (ymax + y)−V (ymin +y−∆y+)| f (y) dy

≤ max[

Z ymax

∆y+A1e(ymax+y) f (y) dy , A2

Z ymax

∆y+f (y) dy

]

≤ max[

A1eymaxZ ymax

∆y+eyA3e−γy dy , A2

Z ymax

∆y+A3e−γy dy

]

≤ max[

A1eymax e∆y+A3e−γ∆y+

γ−1 , A2A3e−γ∆y+

γ

]

≤ A3e∆y+e−γ∆y+ max [A1eymax ,A2]

≤ A3e−γ∆y+e∆y+A4 (28.91)

So, if we require that the relative error at x = ymax be less than a given tolerance, then we select ∆y+ such that

E(ymax)

A4≤ A3e−γ∆y+e∆y+

< tolR. (28.92)

For practical purposes, we assume that f (∆y+) ' e−γ∆y+ , so that we can approximate equation (28.92) by

E(ymax)

A4' f (∆y+)e∆y+

< tolR. (28.93)

Note that a relative error criteria is a reasonable choice at x = ymax since V (ymax) may be O(eymax).Following the same reasoning at x = ymin, assuming that ∆y+ = 0 for simplicity, we now extend the domain of V

to the left by ∆y−, and we assume that we can determine V in [ymin −∆y−,ymin] correct to second order. The error inI(ymin) due to wrap-around is given by

E(ymin) 'Z −∆y−

ymin|V (ymin + y)−V (ymax + y+∆y−)| f (y) dy

≤ max[

Z −∆y−

yminA1e(ymax+y+∆y−) f (y) dy , A2

Z −∆y−

yminf (y) dy

]

≤ max[

A1e(ymax+∆y−)Z −∆y−

ymineyA3eγy dy , A2A3

e−γ∆y−

γ

]

≤ max[

A1e(ymax+∆y−)A3e−γ∆y−(

e−∆y−

1+ γ

)

, A2A3e−γ∆y−

γ

]

≤ A3e−γ∆y− max [A2,A1eymax ]

≤ A3e−γ∆y−A4 (28.94)

Therefore we can require that the absolute error at x = ymin be less than a specified tolerance if we select ∆y− such that

E(ymin) ≤ A4A3e−γ∆y− < tolL. (28.95)

190

Again, for practical purposes we assume that f (−∆y−) ' e−γ∆y− and so we approximate equation (28.92) to obtain

E(ymin) ≤ A4 f (−∆y−) < tolL. (28.96)

An estimate of A4 can be obtained from

A4 ' max0≤S≤Smax

V (S,τ = 0). (28.97)

Note that an absolute error criteria is appropriate near x = ymin since V is bounded at y = ymin.Typically, we chose tolL = tolR = 10−6. Since the wrap-around errors are largest at x = ymin,x = ymax, selecting the

domain extensions which satisfy equations (28.92) and (28.95) will bound these errors at all other points. The domainextensions are illustrated in Figures 28.1-28.2.

log(Smax)

Domainis expanded

Log stock price

Option Value on the log grid

0

Option Value

Stock price

Strike

Payoff

Smax

FIGURE 28.1: The value of the option is interpolated onto the log-spaced grid. The right hand-side boundary of thelog-spaced grid ymax = log(Smax) is expanded by ∆y+, where ∆y+ is given by equation (28.92).

28.5.2 American OptionsSo far, we have been discussing valuing European options under the jump diffusion process. Suppose, we have toprice an American style option, where the holder can exercise at any time and receive a payoff

Payoff = V ∗(S,τ). (28.98)

The American option pricing problem can then be written as the differential linear complementarity problem

Vτ −(

12σ2S2VSS +(r−λκ)SVS− (r +λ)V +λ

Z ∞

0V (Sη)g(η)dη

)

≥ 0 (28.99)

V −V ∗ ≥ 0, (28.100)

where at least one of equations (28.99) and (28.100) must hold with equality.We can easily combine the fixed point iteration with the penalty method described in [37] to price American

options with jump diffusion.For a detailed analysis of the penalty method, the readers are referred to [37].

191

log(Smax)

Domainis expanded

Log stock price

Option Value on the log grid

0

Option Value

Stock price

SmaxStrike

Payoff

FIGURE 28.2: The value of the option is interpolated onto the log-spaced grid. But, the value of the option V (S,τ) atS = 0 is not used. The left hand boundary grid point is chosen to be logS1 where S1 is the grid point nearest to S = 0.This left boundary is then expanded by ∆y−, given by equation (28.95).

28.6 ResultsIn this section, we present our numerical results for various options with different payoff types, including call, put,butterfly and digital payoffs and several European and exotic options. Unless stated otherwise, we use the Crank-Nicolson discretization scheme (28.55). The discrete system of equations is solved using a fixed point iteration method(28.70) with a convergence tolerance of 10−6.

28.6.1 European Options

Parameter valuesσ .3r .05γ .2µ 0λ .6T .25K 100.

TABLE 28.1: Input data used to price European options under the log-normal jump diffusion process.

We validate our finite difference approach by demonstrating the convergence of prices for European call and putoptions under the log-normal jump diffusion process presented in [64]. For each test, as we double the number of pointswe cut the initial timestep (∆τ = .01 on the coarsest grid) in half. We use the probability density function defined in(28.11), where the parameter µ is set to zero. This approach permits us to compare our numerical results with theanalytical solution presented by [64]. Table 28.1 contains the value of the input data used for both the European calland put.

The convergence ratio presented in this subsection (see table 28.2 for example) is defined in the following way. If

192

0 20 40 60 80 100 120 140 160 180 2000

20

40

60

80

100

120

Stock

Price

(a) Call option price.

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Stock

Delta

(b) Call option delta.

0 20 40 60 80 100 120 140 160 180 2000

0.01

0.02

0.03

Stock

Gam

ma

(c) Call option gamma.

FIGURE 28.3: Call option values, delta (VS) and gamma (VSS) for the Crank-Nicolson discretization with 128 pointswith initial timestep 0.01. The input data is contained in table 28.1.

we let

∆τ = maxn

τn+1 − τn,

∆S = maxi

Si+1 −Si, (28.101)

and then carry out a convergence study letting h → 0 where

∆S = Const. h,

∆τ = Const. h, (28.102)

then we can assume that the error in the solution (at a given node) is

Vapprox(h) = Vexact +Const. hγ. (28.103)

The convergence ratio is then defined as

R =Vapprox(h/2)−Vapprox(h)

Vapprox(h/4)−Vapprox(h/2), (28.104)

so that if γ = 2, then R = 4, and if γ = 1, R = 2.Recall that interpolation is required to transform data from the clustered PDE grid to the equally spaced logS grid,

and vice versa. In tables 28.2-28.3, we compare linear interpolation (see equations (28.30-28.33), with quadratic andcubic Lagrange interpolation.

193

Number of points N Linear interpolation Quadratic interpolation Cubic interpolation(# timesteps) Value Convergence rate Value Convergence rate Value Convergence rate

128 (25) 5.935804 n.a. 5.920725 n.a. 5.919668 n.a.255 (50) 5.927415 n.a. 5.923948 n.a. 5.923820 n.a.509 (100) 5.925680 4.832 5.924950 3.215 5.924936 3.720

1017 (200) 5.925419 6.664 5.925223 3.667 5.925222 3.9112033 (400) 5.925350 3.756 5.925293 3.912 5.925293 3.9974065 (800) 5.925327 2.990 5.925311 3.945 5.925311 3.979

TABLE 28.2: Value of a European put option at S = 100.0 under jump diffusion using Crank-Nicolson timestepping forlinear, quadratic and cubic interpolation. The interpolation schemes are used to transfer data between the non-uniformS grid and the uniform log-spaced grid. The parameters are: σ = .3, r = .05, γ = .2, µ = 0. λ = .6, T = .25, K = 100.0∆τ = .01, and tol = 10−6. The exact solution is 5.925317. The number of points used for the FFT grid is 2α such that αis the smallest integer p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is defined in equation(28.104).

Number of points N Linear interpolation Quadratic interpolation Cubic interpolation(# timesteps) Value Convergence rate Value Convergence rate Value Convergence rate

128 (25) 5.920476 n.a. 5.919375 n.a. 5.919343 n.a.255 (50) 5.924059 n.a. 5.923805 n.a. 5.923801 n.a.509 (100) 5.924993 3.835 5.924937 3.915 5.924936 3.927

1017 (200) 5.925236 3.844 5.925222 3.970 5.925222 3.9782033 (400) 5.925297 4.005 5.925293 3.992 5.925293 3.9954065 (800) 5.925312 4.034 5.925311 3.979 5.925311 3.980

TABLE 28.3: Value of a European put option at S = 100.0 under jump diffusion using Crank-Nicolson timestepping forlinear, quadratic and cubic interpolation. The interpolation schemes are used transfer data between the non-uniform Sgrid and the uniform log-spaced grid. The parameters are: σ = .3, r = .05, γ = .2, µ = 0. λ = .6, T = .25, K = 100.0∆τ = .01, and tol = 10−6. The exact solution is 5.925317. The number of points used for the FFT grid is 4×2α suchthat α is the smallest integer p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is defined inequation (28.104).

We can see from the results in table 28.3 that if the correlation integral is evaluated using an FFT approach, thenquadratic convergence is obtained using linear interpolation, provided the grid used to compute the FFT is oversam-pled, i.e. the number of points used in the FFT grid is ω ∗ 2α where 2α is the smallest integer p ≤ 2α, and p is thenumber of nodes in the PDE grid. Note that if ω = 1, then we obtain erratic convergence behaviour with linear in-terpolation (see table 28.2), but we do obtain quadratic convergence if ω ≥ 4. However, note the interesting fact thatif quadratic or cubic interpolation is used, then we obtain quadratic convergence even if ω = 1. Further experimentswith even finer meshes indicates that the convergence ratio appears to be tending to four (ω = 1), but in a very er-ratic fashion (for linear interpolation). However, these meshes are impractically large. Since the computational costfor each timestep is dominated by the FFT computation, it is desirable to use ω = 1 (see table 28.2). Note that thetheoretical analysis for stability and convergence of the fixed point iteration was based on linear interpolation. Thiswas required, since linear interpolation is the only Lagrange interpolation method which has non-negative weights.However, it is clearly the case that the quadratic interpolant is more efficient than linear (although the convergence rateis theoretically the same as for linear interpolation). Consequently, in all subsequent examples, we will use quadraticinterpolation.

In tables 28.4 and 28.5, we compare the convergence rates between the implicit and the Crank-Nicolson discretiza-tions. As expected, we find that Crank-Nicolson converges quadratically, while the implicit approach converges onlylinearly.

Next, we present the delta(

∂V∂S

)

and gamma(

∂2V∂S2

)

of the solutions of European call and put options in figures28.3 and 28.4. In both figures, we observe that the option values, delta and gamma are smooth.

194

Number of points N S = 95 S = 100 S = 110(# of timesteps) Value Convergence ratio Value Convergence ratio Value Convergence ratio

128 (25) 4.674852 n.a. 7.162920 n.a. 13.901645 n.a.255 (50) 4.677965 n.a. 7.166161 n.a. 13.905415 n.a.509 (100) 4.678878 3.412 7.167168 3.218 13.906624 3.120

1017 (200) 4.679121 3.747 7.167443 3.668 13.906958 3.6182033 (400) 4.679183 3.934 7.167513 3.911 13.907044 3.8964065 (800) 4.679199 3.969 7.167531 3.958 13.907065 3.949

TABLE 28.4: Call option convergence tests for the Crank-Nicolson discretization using the input data contained in table28.1. Exact solution at S = 95 is 4.679204, at S = 100, 7.167537, and at S = 110, 13.907073. The number of pointsused for the FFT grid is 2α such that α is the smallest integer p ≤ 2α, where p is the number of nodes in the PDE grid.Convergence ratio is defined in equation (28.104). Quadratic interpolation is used.

Number of points S = 95 S = 100 S = 110(# timesteps) Value Convergence ratio Value Convergence ratio Value Convergence ratio

128 (25) 4.649303 n.a. 7.131084 n.a. 13.882571 n.a.255 (50) 4.664666 n.a. 7.149643 n.a. 13.895371 n.a.509 (100) 4.672094 2.068 7.158754 2.037 13.901477 2.096

1017 (200) 4.675696 2.063 7.163197 2.051 13.904354 2.1232033 (400) 4.677462 2.039 7.165380 2.035 13.905734 2.0844065 (800) 4.678336 2.021 7.166462 2.019 13.906409 2.046

TABLE 28.5: Call option convergence tests for the implicit discretization using the input data contained in table 28.1.Exact solution at S = 95 is 4.679204, at S = 100, 7.167537, and at S = 110, 13.907073. The number of points used forthe FFT grid is 2α such that α is the smallest integer satisfying p ≤ 2α, where p is the number of nodes in the PDE grid.Convergence ratio is defined in equation (28.104).

28.6.2 Non-Smooth PayoffWe now consider the issues raised by the presence of a discontinuity in the payoff. We have shown that our Crank-Nicolson discretization is algebraically stable. However, this property does not prevent oscillations from occurring[67]. One way to avoid oscillations is to use a fully implicit method for a small number of timesteps after anydiscontinuities arise and Crank-Nicolson thereafter. This technique has been described by Rannacher [72]. As well,the discontinuity must also be l2 projected onto the space of linear Lagrange basis functions. Only in this case is theconvergence quadratic [72]. To illustrate the projection mechanism point, we present in figure 28.5 the payoff of adigital call option in figure 5(a) and its l2 projection in figure 5(b).

Table 28.6 gives a convergence study for the digital put with jumps, using Rannacher timestepping [72] and l2projection (two implicit steps initially). Quadratic convergence is recovered (see table 28.6) and no oscillations areobserved in the delta and gamma of the solution (see figure 28.6).


128 (25) 0.62280095 n.a. 0.49806801 n.a. 0.27500914 n.a.255 (50) 0.62270997 n.a. 0.49815260 n.a. 0.27538780 n.a.

509 (100) 0.62269643 6.720 0.49818676 2.477 0.27550122 3.3391017 (200) 0.62269231 3.284 0.49819487 4.211 0.27552993 3.9512033 (400) 0.62269136 4.330 0.49819701 3.793 0.27553727 3.9104065 (800) 0.62269109 3.619 0.49819752 4.174 0.27553909 4.030

TABLE 28.6: Digital put option convergence tests for the Crank-Nicolson discretization using l2 projection, Rannachertimestepping and the parameters contained in table 28.1. The number of points used for the FFT grid is 2α such that αis the smallest integer satisfying p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is definedin equation (28.104).

We continue our numerical tests by pricing a butterfly payoff call option. A butterfly payoff call option is the

195

0 20 40 60 80 100 120 140 160 180 2000

10

20

30

40

50

60

70

80

90

100

Stock

Price

(a) Put option price.

0 20 40 60 80 100 120 140 160 180 200−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Stock

Delta

(b) Put option delta.

0 20 40 60 80 100 120 140 160 180 2000

0.005

0.01

0.015

0.02

0.025

Stock

Gam

ma

(c) Put option gamma.

FIGURE 28.4: Put option values, delta and gamma for the Crank-Nicolson discretization with 128 points with initialtimestep 0.01. The input data is contained in table 28.1.

0 20 40 60 80 100 120 140 160 180 200−0.5

0

0.5

1

1.5

Stock

Price

(a) Digital call payoff.

80 85 90 95 100 105 110 115 120−0.5

0

0.5

1

1.5

Stock

Price

(b) Digital call projected payoff.

FIGURE 28.5: The digital payoff figure 5(a) is l2 projected onto the space of basis functions 5(b). This approachcombined with Rannacher timestepping restore quadratic convergence [67].

combination of three call options with two different strike prices K1,K2 where K2 > K1. More precisely,

V (S,0) = max(S−K1,0.0)+max(S−K2,0.0)

−2max(

S− K1 +K22 ,0.0

)

. (28.105)

The payoff (28.105) has the peculiarity of having compact support which renders the estimation of the convo-

196

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Stock

Price

(a) Digital put option price.

0 20 40 60 80 100 120 140 160 180 200−0.03

−0.02

−0.01

0

Stock

Delta

(b) Digital put option delta.

0 20 40 60 80 100 120 140 160 180 200−1.5

−1

−0.5

0

0.5

1x 10−3

Stock

Gam

ma

(c) Digital put option gamma.

FIGURE 28.6: Digital put option values, delta and gamma for the Crank-Nicolson discretization with 128 points withinitial timestep 0.01. The input data is contained in table 28.1. The initial data is l2 projected and two Rannachertimesteps are used to smooth the initial data.


128 (25) 2.279206 n.a. 2.396078 n.a. 2.002380 n.a.255 (50) 2.280140 n.a. 2.396983 n.a. 2.004147 n.a.

509 (100) 2.280352 4.396 2.397166 4.947 2.004494 5.0931017 (200) 2.280385 6.565 2.397182 11.412 2.004557 5.5102033 (400) 2.280394 3.509 2.397188 3.043 2.004573 3.9394065 (800) 2.280396 4.437 2.397189 5.125 2.004576 4.228

TABLE 28.7: Butterfly call option convergence tests for the Crank-Nicolson discretization using Rannacher timestep-ping and the parameters contained in table 28.1. Exact solution at S = 95 is 2.280396, at S = 100, 2.397189, and atS = 110, 2.004577. The number of points used for the FFT grid is 2α such that α is the smallest integer satisfyingp ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is defined in equation (28.104).

lution product easier. Table 28.7 presents our convergence test results. Rannacher timestepping is used. Quadraticconvergence is observed and no oscillations are found in the delta and gamma of the solution (see figure 28.7). Whencomparing the numerical solution and analytical solution of a butterfly call option, we see that for penny accuracy only128 grid points and a timestep of .01 are necessary.

28.6.3 Probability Density Functions

So far we have been pricing jump diffusion options, where the proportional jump size λ is log normally distributed.However, jump returns are generally not symmetrical. In practice, jumps are skewed to the left or the right. A simple

197

0 20 40 60 80 100 120 140 160 180 2000

0.5

1

1.5

2

Stock

Price

(a) Butterfly call option price.

0 20 40 60 80 100 120 140 160 180 200−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

Stock

Delta

(b) Butterfly call option delta.

0 20 40 60 80 100 120 140 160 180 200

−8

−6

−4

−2

0

2

4

6x 10−3

Stock

Gam

ma

(c) Butterfly call option gamma.

FIGURE 28.7: Butterfly call option values, delta and gamma for the Crank-Nicolson discretization with 128 points withinitial timestep 0.01. The input data is contained in table 28.1. Two Rannacher timesteps are used to smooth the initialdata.

way to incorporate a skew into equation (28.4) is to set µ 6= 0 in (28.11). For the next convergence test, we use thetimestep selector described in [52, 37]; an initial timestep is given and the next time-step is computed according to

τn+2 − τn+1

τn+1 − τn =d

maxi|V n+1

i −V ni |

max(1,|V ni |)

, (28.106)

where d specifies the maximum relative change allowed. Initially we set d = .1, and we divide this value by two ateach grid refinement. An initial timestep of .01 on the coarsest grid is used, and this initial timestep is reduced by fouron each refinement [37].

We price a call option where µ = −1.0 using (28.106). To validate our numerical solutions, we use the analyticalsolutions presented in [2]. Table 28.8 contains our convergence test results.

Another probability density function is the one suggested by Kou [55] where the logarithm of the jump size has adouble exponential distribution such that

f (x) = pη1 exp(−η1x)H(x)+qη2 exp(η2x)H(−x), (28.107)

where η1 > 1,η2, p,q > 0 and p + q = 1. The condition η1 > 1 is used to ensure that the stock price S has finiteexpectation. H(x) is the Heaviside function. It follows [55] that κ = E[Y − 1] = pη1

η1−1 + qη2η2+1 − 1. Figure 28.8

graphically presents the double exponential probability density function superimposed on the Normal probabilitydensity function.

Table 28.9 presents numerical convergence tests for pricing a call option. The number of points used on theuniform-spaced x grid has been oversampled (i.e. number of points in FFT grid is 8×2α, such that α is the smallest

198


128 (53) 7.964412 n.a. 11.411926 n.a. 19.557611 n.a.255 (121) 7.968684 n.a. 11.415705 n.a. 19.560711 n.a.509 (253) 7.969532 5.042 11.416319 6.155 19.561125 7.496

1017 (516) 7.969730 4.281 11.416452 4.635 19.561206 5.1402033 (1037) 7.969778 4.071 11.416484 4.166 19.561224 4.3114065 (2057) 7.969791 4.018 11.416491 4.041 19.561229 4.076

TABLE 28.8: Call option convergence tests for the Crank-Nicolson discretization using Rannacher timestepping. Theparameters are: σ = .3, r = .05, γ = .2, µ = −1. λ = .6, T = .25, K = 100.0. The exact solution at S = 95 is 7.969794,at S = 100, 11.416494, and at S = 110, 19.561230. The number of points used for the FFT grid is 2α such that α isthe smallest integer satisfying p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is defined inequation (28.104). The timesteps are selected using equation (28.106), with d = .1 on the coarsest grid, and divided bytwo for each grid refinement.

−3 −2 −1 0 1 2 3 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x−axis

Valu

e double exponential pdf

normal pdf

FIGURE 28.8: Overall comparison of the Normal (µ = 0,γ = .6) and double exponential probability functions (η1 =5,η2 = 2.5, p = .3,q = 1− p).

integer 2α ≥ p, where p is the number of points in the PDE grid) to better capture the very sharp peak of the doubleexponential distribution. Rannacher timestepping is used. Quadratic convergence is obtained (see table 28.9). Smoothsolution, delta and gamma values are also observed in figure 28.9.


128 (43) 6.927419 n.a. 9.238044 n.a. 15.555950 n.a.255 (89) 6.930615 n.a. 9.241890 n.a 15.561794 n.a.509 (186) 6.931123 6.298 9.242517 6.137 15.563101 4.469

1017 (379) 6.931233 4.596 9.242655 4.544 15.563420 4.1012033 (764) 6.931258 4.488 9.242686 4.361 15.563498 4.078

4065 (1534) 6.931264 3.920 9.242694 3.987 15.563518 4.008

TABLE 28.9: Call option convergence tests for the Crank-Nicolson discretization using the double exponential proba-bility density function (28.107) and Rannacher timestepping. The parameters are: σ = .3, r = .05, , λ = .6, T = .25,K = 100.0, η1 = 1/.4, η2 = 1/.4 p = .5 and q = 1− p. The number of points used for the FFT grid is 8×2α such that αis the smallest integer satisfying 8p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is definedin equation (28.104). The timesteps are selected using equation (28.106), with d = .1 on the coarsest grid, and dividedby two for each grid refinement. On the coarsest grid, the initial timestep is .01.

199

0 20 40 60 80 100 120 140 160 180 2000

10

20

30

40

50

60

70

80

90

100

Stock

Price

(a) Call option price using the double expo-nential probability density function (28.107).

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Stock

Delta

(b) Call option delta using the double expo-nential probability density function (28.107).

0 20 40 60 80 100 120 140 160 180 2000

0.005

0.01

0.015

0.02

0.025

0.03

Stock

Gam

ma

(c) call option gamma using the double expo-nential probability density function (28.107).

FIGURE 28.9: Call option values, delta and gamma for the Crank-Nicolson discretization with 128 points with initialtimestep 0.01 using the double exponential probability density function (28.107). The parameters are: σ = .3, r = .05,λ = .6, T = .25, K = 100.0, η1 = 1/.4, η2 = 1./.4 p = .5 and q = 1− p.

28.6.4 Exotic Options

Very little previous work has been carried out to numerically price exotic options with jump diffusion [2]. In thissection, we show how our numerical framework can be used to these contracts.

We begin by valuing an American put option. As mentioned in section 28.5.2, we combine the fixed point iterationwith the penalty method and use variable timestepping [37].


128 (41) 8.550566 n.a. 5.989161 n.a. 2.681353 n.a.255 (87) 8.556180 n.a. 5.994848 n.a 2.687503 n.a.

509 (181) 8.557167 5.690 5.995859 5.627 2.688838 4.6071017 (370) 8.557365 4.978 5.996063 4.934 2.689144 4.3612033 (747) 8.557409 4.520 5.996109 4.494 2.689216 4.208

4065 (1499) 8.557419 4.234 5.996120 4.224 2.689234 4.097

TABLE 28.10: American Put option convergence tests for the Crank-Nicolson discretization using the parameterscontained in table 28.1 with Rannacher timestepping. The number of points used for the FFT grid is 2α such that α isthe smallest integer satisfying p ≤ 2α, where p is the number of nodes in the PDE grid. Convergence ratio is defined inequation (28.104). The timesteps are selected using equation (28.106), with d = .1 on the coarsest grid, and divided bytwo on each grid refinement. On the coarsest grid, the initial timestep is .01.

200

We observe (see table 28.10) that quadratic convergence is obtained for pricing American put options under jumpdiffusion. As desired, the delta and gamma of the solution are smooth functions (see figure 28.10).

0 20 40 60 80 100 120 140 160 180 2000

10

20

30

40

50

60

70

80

90

100

Stock

Price

(a) American put option price.

0 20 40 60 80 100 120 140 160 180 200−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Stock

Delta

(b) American put option delta.

0 20 40 60 80 100 120 140 160 180 200−0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

Stock

Gam

ma

(c) American put option gamma.

FIGURE 28.10: American put option values, delta and gamma for the Crank-Nicolson discretization with 128 pointswith initial timestep 0.01. The input data is contained in table 28.1. The time-step selector defined in (28.106) is used,with d = .1 on the coarsest grid, and d is reduced by two on each refinement. The initial timestep is reduced by four oneach refinement, with an initial timestep on the coarsest grid of .01.

Parameter valuesσ .3r .05γ .2µ .0λ .6T .25K 100.

Barrier 120observation interval 1/250

number of days to knock-out 10

TABLE 28.11: Input data used to price a call knock-out Parisian option under the log-normal jump diffusion processwith discrete daily observation dates.

Next, we price an up and out knock-out call Parisian option [86], with daily discrete observation dates. Recall thata Parisian knock-out ceases to have value if S is continuously above (below) the barrier for the required number ofobservations for an up (down) option. For our test, the barrier is set at S = 120 and the required number of continuous

201

daily observations to knockout is 10. The input data is presented in table 28.11. For a more detailed discussion ofParisian options, readers are referred to [86].

In table 28.12, we present our convergence results. We used constant timestepping (∆tay = .002 on the coarse grid)and the solution is l2 projected after each observation [67]. Rannacher timestepping is used after each observation. Asexpected quadratic convergence is obtained.


101 (125) 2.925892 n.a. 3.898134 n.a 4.636354 n.a201 (250) 2.942779 n.a 3.916424 n.a 4.651038 n.a401 (500) 2.947044 3.959 3.920773 4.205 4.653847 5.228801 (1000) 2.948116 3.977 3.921851 4.037 4.654522 4.162

1601 (2000) 2.948367 4.272 3.922092 4.460 4.654648 5.349

TABLE 28.12: Up and out Parisian call option convergence tests for the Crank-Nicolson discretization using the pa-rameters contained in table 28.1. We used constant Rannacher timestepping ∆τ = .002 (coarse grid) and the initialdata is l2 projected. The number of points used for the FFT grid is 2α such that α is the smallest integer satisfyingp ≤ 2α, where p is the number of points in the PDE grid. Convergence ratio is defined in equation (28.104). After eachobservation, the solution is l2 projected and two implicit timesteps are taken before switching back to Crank-Nicolsontimestepping .

50 100 150 200 250 3000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Stock

Valu

e

no jump diffusion

jump diffusion

FIGURE 28.11: Parisian call knock-out option with discrete daily observation dates with and without jumps. We usedRannacher timestepping and l2 projection. The parameters are: σ = .3, r = .05, γ = .2, µ = −.5 λ = 1., T = .25,K = 100.0 ∆τ = .01 (coarsest grid), and tol = 10−6 . The barrier is set at S = 120 and the number of continuous dailyobservations to knockout is 10.

In figure 28.11, we compare graphically the solutions of a Parisian put knock-out option with discrete daily ob-servation dates with and without jumps. We observe that the difference in pricing is quite significant, in particular wefind that the presence of jumps reduces the option price. This change in price can be explained by the shift introducedin the probability density function µ = −.5 and consequently, the increase of the drift term in (28.4).

28.6.5 Comparison with Previous Work

When pricing options under the jump diffusion process, the main computational cost is the evaluation of the integralterm of (28.4). The approach presented in [2] is based on a FFT-ADI finite difference method. This method evaluatesthe convolution integral twice at each timestep. Hence this method requires a total of four FFT computations (twoforward FFTs, and two reverse FFTs). Note that the method in [2] is second order accurate. If N is the number oftimesteps, and p the number of nodes in the S grid, then both the method in [2] and the method in this work havecomplexity O(N p log p).

202

Number of points N Timesteps Iters (tol =10−6) Iters (tol =10−8)128 25 77 100255 50 150 200509 100 300 390

1017 200 600 6002033 400 1091 12004065 800 1600 2400

TABLE 28.13: Number of iterations (iters) for a European call option under jump diffusion using Crank-Nicolsontimestepping. The solving parameters are: σ = .3, r = .05, γ = .2, µ = 0. λ = .6, T = .25, K = 100.0 and ∆τ = .01.Convergence tolerance tol defined in equation (28.70).

In table 28.13 we see that the number of iterations required for convergence (at each timestep) depends on theconvergence tolerance. For a typical convergence tolerance of 10−6, at most three iterations per step are required (onaverage). In this case, about six FFT computations are required per timestep.

Consequently, for Vanilla European options (with jumps), the method in [2] may be more efficient than the ap-proach suggested here, especially for large timesteps.

However, when pricing American options, it is not clear how the the approach in [2] could be modified to handlethe American constraint implicitly, unless some form of iteration is used. In contrast, the technique developed in thiswork can handle implicit treatment of the American constraint in a straightforward fashion.

In addition, the method presented in this paper can be easily applied to implicit discretizations of nonlinear prob-lems (uncertain volatility, transaction costs [68]). We will report numerical tests for option pricing with jump diffusionand transaction costs in future work.

The technique developed in [63] which uses a wavelet method for evaluation of the jump integral term has com-plexity O(N p(log p)2), in contrast to the complexity of O(N p log p), (for one dimensional problems) of the methoddeveloped here. In addition, it is not obvious how to generalize the technique in [63] to the nonlinear case.

The method in [93] uses an explicit evaluation of of the correlation integral term, hence is only first order accurate.

28.7 ConclusionIn this paper, we have shown that an explicit evaluation of the correlation integral in the jump diffusion PIDE, coupledwith an implicit discretization of the usual PDE terms, is unconditionally stable. However, since this method is onlyfirst order correct, an implicit method is preferred. We show that Crank-Nicolson timestepping is algebraically stable,and in certain special cases (an equally spaced logS grid), we can prove that Crank-Nicolson timestepping is strictlystable.

If implicit timestepping is used, then the direct evaluation of the correlation integral appearing in the PIDE wouldrequire a dense matrix solve. To avoid this computational complexity, a fixed point iteration method is developed.For typical parameter values, this fixed point iteration converges very quickly (the error is reduced by two orders ofmagnitude on each iteration).

Each fixed point iteration requires evaluation of the correlation integral. We use Lagrange interpolation to transferthe date on the clustered PDE grid to an equally spaced logS grid. An FFT method is then used to evaluate thecorrelation integral, and Lagrange interpolation is used to transfer data back to the PDE grid. We demonstrate howto extend the logS grid to avoid FFT wrap around effects, by taking into account the properties of the jump sizeprobability density.

The methods developed in this paper can be applied to arbitrary jump size probability densities. Furthermore, wehave demonstrated that this method can be used to obtain implicit solutions to American options with jump diffusions.

Since the method used to handle the jump diffusion term implicitly is a simple fixed point iteration, then it is avery simple matter to modify an existing exotic option pricing library to handle the jump diffusion case. All that isrequired is that a function be added to the library which, given the current vector of discrete option prices, returns thevector value of the correlation integral. This vector is then added to the right hand side of the fixed point iteration.

203

28.8 Von Neumann Analysis28.8.1 TransformationIn this Appendix, we will carry out a Von Neumann stability analysis for Crank-Nicolson timestepping for the specialcase of constant coefficients and an equally spaced grid in logS coordinates.

From equations (28.4) and (28.9),

Vτ =12σ2S2VSS +(r−λκ)SVS− (r +λ)V +λ

Z ∞

−∞V (y) f (y− logS) dy. (28.108)

where V (x,τ) = V (exp(x),τ) and f (y) = f (exp(y)). Using the change of variable

x = log(S), S = exp(x),

and substituting into (28.108), we find

V τ =12σ2V xx +(r−λκ− 1

2σ2)V x − (r +λ)V +λZ ∞

−∞V (y) f (y− x)dy. (28.109)

From equation (28.109), it can be observed that the integral part of the PIDE is simply a correlation product. Usingthe correlation operator ⊗, equation (28.109) is now given by

V τ =12σ2V xx +(r−λκ− 1

2σ2)V x − (r +λ)V +λV⊗ f . (28.110)

A Crank-Nicolson discretization of equation (28.110) is

V n+1i −V n

i∆τ

=

12

[

12σ2

(

V n+1i+1 −2V n+1

i +V n+1i−1

∆x2

)

+ (r−λκ− 12σ2)

V n+1i+1 −V n+1

i−12∆x − (r +λ)V n+1

i

]

+12

[

12σ2

(

V ni+1 −2V n

i +V ni−1

∆x2

)

+ (r−λκ− 12σ2)

V ni+1 −V n

i−12∆x − (r +λ)V n

i

]

+λ2

[

(V ⊗ f )ni +(V ⊗ f )n+1

i

]

.

(28.111)

Equation (28.111) can be written as

V n+1i

[

1+(α+β+ r +λ)∆τ2

]

− ∆τ2 βV n+1

i+1 − ∆τ2 αV n+1

i−1

= V ni

[

1− (α+β+ r +λ)∆τ2

]

+∆τ2 βV n

i+1 +θ∆2 ταV n

i−1 +∆τ2 λ[

(V ⊗ f )ni ,+(V ⊗ f )n+1

i

]

, (28.112)

where

α =12σ2 1

∆x2 − (r−λκ− 12σ2)

12∆x ,

β =12σ2 1

∆x2 +(r−λκ− 12σ2)

12∆x .

28.8.2 Stability Analysis

Let V n = [V n0 ,V n

1 , ...,V nM ]′, be the discrete solution vector to equation (28.38). Suppose the initial solution vector is

perturbed, i.e.

V 0 = V 0 +E0, (28.113)

204


M]′ is the perturbation vector. Note that EnM = 0 since Dirichlet boundary conditions are imposed

at this node.Then, following the usual steps, we obtain the following equation for the propagation of the perturbation, from

equation (28.112)

En+1i

[

1+(α+β+ r +λ)∆τ2

]

− ∆τ2 βEn+1

i+1 − ∆τ2 αEn+1

i−1

= Eni

[

1− (α+β+ r +λ)∆τ2

]

+∆τ2 βEn

i+1 +θ∆2 ταEn

i−1 +∆τ2 λ[

(E⊗ f )ni ,+(E⊗ f )n+1

i]

. (28.114)

In the following we determine the stability of our discretization scheme using the von Neumann approach [73]. Inorder to apply the Fourier transform method, we suppose that the boundary conditions can be replaced by periodicityconditions.

We define the inverse discrete Fourier transform (DFT) as follows (we have selected a particular scaling factor)

Eni =

1XN

N2

∑k=− N

2 +1Cn

k exp(√−12π

N ik), (28.115)

fi =1

XN

N2

∑l=− N

2 +1Fl exp(

√−12π

N il), (28.116)

where Ck and Fl corresponds respectively to the discrete Fourier coefficient of E and f , and XN = xN/2 − x−N/2+1 isthe width of the domain along the x-axis. We have assumed that the probability density function is time independent,which is the usual assumption. Note that the notation Cn

k should be interpreted as (Ck)n, i.e. in this case n is a power,

not a superscript.The forward transforms are

Cnk =

XNN

N2

∑i=− N

2 +1En

i exp(−√−12π

N ik), (28.117)

Fl =XNN

N2

∑i=− N

2 +1fi exp(−

√−12π

N il), (28.118)

The discrete correlation is given by

(E⊗ f )ni =

XNN

N2

∑j=− N

2 +1En

j f j−i, (28.119)

which is second order accurate. Substituting equation (28.115) and equation (28.116) into equation (28.119), weobtain

(E⊗ f )ni =

XNN

N2

∑j=− N

2 +1

1XN

N2

∑k=− N

2 +1Cn

k exp(√−12π

N jk) 1XN

N2

∑l=− N

2 +1Fl exp(

√−12π

N ( j− i)l)

=1

XN

1N

N2

∑k=− N

2 +1

N2

∑l=− N

2 +1Cn

k Fl exp(−√−12π

N il)N2

∑j=− N

2 +1exp(

√−12π

N jk)exp(√−12π

N jl).

Using the orthogonality condition:N2

∑j=− N

2 +1exp(

√−12π

N jk)exp(√−12π

N jl) =

N if l = −k0 otherwise, (28.120)

205

we find that,

(E⊗ f )ni =

1XN

N2

∑k=− N

2 +1Cn

k F−k exp(√−12π

N ik). (28.121)

Substituting equation (28.115) and equation (28.121) into (28.114)

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N ik)[

1+(α+β+ r +λ)∆τ2

]

− ∆τ2 β

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N (i+1)k)− ∆τ2 α

1XN

N2

∑k=− N

2 +1(Ck)

n+1 exp(√−12π

N (i−1)k)

=

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N ik)[

1− (α+β+ r +λ)∆τ2

]

+∆τ2 β

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N (i+1)k)

+∆τ2 α

1XN

N2

∑k=− N

2 +1(Ck)

n exp(√−12π

N (i−1)k)

+∆τ2 λ

1XN

N2

∑k=− N

2

Cnk F−k exp(

√−12π

N ik)+1

XN

N2

∑k=− N

2

Cn+1k F−k exp(

√−12π

N ik)

. (28.122)

Because of linearity, each Fourier component can be treated separately. Equation (28.122) becomes

(Ck)n+1 exp(

√−12π

N ik)[

1+(α+β+ r +λ)∆τ2

]

− ∆τ2 β(Ck)

n+1 exp(√−12π

N (i+1)k)− ∆τ2 α(Ck)

n+1 exp(√−12π

N (i−1)k)

=

(Ck)n exp(

√−12π

N ik)[

1− (α+β+ r +λ)∆τ2

]

+∆τ2 β(Ck)

n exp(√−12π

N (i+1)k)

+∆τ2 α(Ck)

n exp(√−12π

N (i−1)k)

+∆τ2 λ[

Cnk F−k exp(

√−12π

N ik)+Cn+1k F−k exp(

√−12π

N ik).]

. (28.123)

Dividing equation (28.123) by (Ck)n exp(

√−1 2π

N ik), we obtain

Ck

[

1+(α+β+ r +λ)∆τ2

]

− ∆τ2 βCk exp(

√−12π

N k)− ∆τ2 αCk exp(−

√−12π

N k)−λ∆τF−k

=[

1− (α+β+ r +λ)∆τ2

]

+∆τ2 βexp(

√−12π

N k)+∆τ2 αexp(−

√−12π

N k)+∆τ2 [λF−k] . (28.124)

Factoring the Ck term, equation 28.124 becomes

Ck =

[

1− (α+β+ r +λ)∆τ2]

+ ∆τ2 βexp(

√−1 2π

N k)+ ∆τ2 αexp(−

√−1 2π

N k)+ ∆τ2 λF−k

[

1+(α+β+ r +λ)∆τ2]

− ∆τ2 βexp(

√−1 2π

N k)− ∆τ2 αexp(−

√−1 2π

N k)− ∆τ2 λF−k

. (28.125)

206

Recall that

α =12σ2 1

∆x2 − (r−λκ− 12σ2)

12∆x ,

β =12σ2 1

∆x2 +(r−λκ− 12σ2)

12∆x .

It follows that

α+β+ r +λ =12σ2 1

∆x2 − (r−λκ− 12σ2)

12∆x +

12σ2 1

∆x2 +(r−λκ− 12σ2)

12∆x + r +λ

= σ2 1∆x2 + r +λ,

and

∆τβexp(√−12π

N k) + ∆ταexp(−√−12π

N k) =

∆τ12σ2 1

∆x2

[

exp(√−12π

N k)+ exp(−√−12π

N k)]

+∆τ(r−λκ− 12σ2)

12∆x

[

exp(√−12π

N k)− exp(−√−12π

N k)]

= ∆τσ2 1∆x2 cos(2π

N k)+√−1∆τ

1∆x (r−λκ− 1

2σ2)sin(2πN k).

Using the above results in equation (28.125), we find

Ck =[

1− ( σ2

∆x2 + r +λ)∆τ2

]

+ 12

[

∆τσ2

∆x2 cos( 2πN k)+

√−1 ∆τ

∆x (r−λκ− 12 σ2)sin( 2π

N k)+∆τλF−k)]

[

1+ 12 ( σ2

∆x2 + r +λ)∆τ]

− 12

[

∆τσ2

∆x2 cos( 2πN k)+

√−1 ∆τ

∆x (r−λκ− 12 σ2)sin( 2π

N k)+∆τλF−k)] . (28.126)

Let

FR−k = Re(F−k),

F I−k = Im(F−k),

(28.127)

then equation (28.126) gives

|Ck|2 =[

1− ( σ2

∆x2 + r +λ)∆τ2 + ∆τσ2

2∆x2 cos( 2πN k)+ ∆τ

2 λFR−k)]2

+[ ∆τ

2∆x (r−λκ− 12 σ2)sin( 2π

N k)+λ ∆τ2 F I

−k]2

[

1+( σ2∆x2 + r +λ)∆τ

2 − ∆τσ22∆x2 cos( 2π

N k)− ∆τ2 λFR

−k)]2

+[ ∆τ

2∆x (r−λκ− 12 σ2)sin( 2π

N k)+λ ∆τ2 F I

−k]2

,

(28.128)

or

|Ck|2 =[

1− r− ∆τσ2

2∆x2 (1− cos( 2πN k))− ∆τλ

2 (1−FR−k)]2

+[ ∆τ

2∆x (r−λκ− 12 σ2)sin( 2π

N k)+λ ∆τ2 F I

−k]2

[

[1+ r + ∆τσ2

2∆x2 (1− cos( 2πN k))+ ∆τλ

2 (1−FR−k)]2

+[ ∆τ

2∆x (r−λκ− 12 σ2)sin( 2π

N k)+λ ∆τ2 F I

−k]2

,

(28.129)

207

Now, note that

F−k =XNN

N2

∑j=− N

2 +1f j exp(

√−12π

N k j), (28.130)

then, from equation (28.27), we have

XNN

N2

∑j=− N

2 +1f j ≤ 1, (28.131)

so that

|F−k| ≤ 1, (28.132)

and hence

−1 ≤ FR−k ≤ +1. (28.133)

It then follows that (∀k ∈ −N/2+1, ...,+N/2)

|1+ r +∆τσ2

2∆x2 (1− cos(2πN k))+

∆τλ2 (1−FR

−k)| ≥ |1− r− ∆τσ2

2∆x2 (1− cos(2πN k))− ∆τλ

2 (1−FR−k)|,

(28.134)

and consequently |Ck| < 1,∀k.

28.8.3 Non-uniform FFTRecall that in order to evaluate the correlation integral (28.19), we used an FFT method. This requires that the nodalvalues V i be equally spaced on a logS grid. However, since this is not efficient for solving the PDE, we interpolatefrom the PDE grid to an equally spaced logS grid. After evaluating the correlation integral, we interpolate from theequally spaced logS grid back to the original PDE grid.

However, there have been several recent papers concerning the problem of using Fourier methods on unequallyspaced grids. There are two basic problems

Forward Given discrete values fi, i = −N/2 + 1, ...,+N/2, then determine Fk, k = −N/2 +1, ...,+N/2, such that

fi =1

XN

N2

∑l=− N

2 +1Fl exp(

√−1 2π

XNxil), (28.135)

where XN = xN/2− x−N/2+1.

Reverse Given discrete values Fk, k =−N/2+1, ...,+N/2, determine fi, i =−N/2+1, ...,+N/2,such that

fi =1

XN

N2

∑l=− N

2 +1Fl exp(

√−1 2π

XNxil), (28.136)

Note that if the nodes xi are at locations xi = iXN/N, then both the forward and reverse problems are easily computedusing the discrete orthogonality conditions (28.120). However, if the xi are arbitrary nodes, then a direct evaluation ofthe forward algorithm requires O(N3) operations, and the reverse algorithm requires O(N2) operations.

208

There are several algorithms for carrying out the Reverse problem. In [88], several methods are tested for comput-ing an approximation to the Reverse problem (28.136). In fact, the straightforward method of simply computing

f j =1

XN

N2

∑l=− N

2 +1Fl exp(

√−12π

N jl), (28.137)

at f j = f ( jXN/N) using an FFT, and then using Lagrange interpolation to determine the required values at f (xi), isquite competitive with other methods.

The Forward problem (28.135) can be posed as a linear algebra problem, i.e. if f = [ f−N/2+1, ..., fN/2]′, and

F = [F−N/2+1, ...,FN/2]′, then the Forward problem can be stated as given the right hand side vector f , solve the

system

AF = f (28.138)

for F, where

[A]i j =1

XNexp(

√−1 2π

XNxi j) (28.139)

Several methods have been proposed for solving (28.138). The most efficient method appears to the techniquesuggested in [3] , which uses a preconditioned GMRES approach. However, as pointed out in [3], the preconditionersuggested results in rapid convergence only in cases where the xi are small perturbations from a grid with xi = iXN/N.We have carried out numerical experiments, using the method in [3], and for the clustered grids which are typicallyused in option pricing, the number of GMRES iterations required for convergence is unacceptably large.

An alternative method for the Forward problem is suggested in [31]. This method uses a Fast Multipole method.However, in [69], the authors indicate that this method has problems unless the nodes are also almost equally spaced.

In any case, it should be recalled that the discrete approximation for the integral (28.19) is only second ordercorrect. Consequently, we do not need to evaluate the Fourier coefficients to high accuracy.

209

29 Matlab Exercises29.1 IntroductionThe purpose of these exercises is to get “hands-on” experience in applying the algorithms, methods, and conceptswhich have been discussed in the course. The Matlab programming language was chosen for pedagogical reasons: itis relatively simple to understand, permitting users to easily see the main concepts and ideas without getting boggeddown in low-level details. The downside to this is that Matlab is not a particularly fast language. As a result, someof the examples will not run at blazing speed. Moreover, we have made no effort to optimize the code for speed,preferring instead to strive for legibility and ease of comprehension.

This code is intended for educational purposes only. We make no guarantees that it is free of errors (though if youfind any, please be sure to let us know!). Furthermore, we make no representation with respect to the code, its quality,accuracy, or suitability for a particular purpose. We will assume no liability with respect to any loss or damage causedor allege to have been caused directly or indirectly by the software. If you don’t agree to these terms, don’t use thesoftware!

With that out of the way, we can go on to describing the general nature of these exercises. As you have learnedin the course, many path-dependent options can be viewed as collections of simpler options. These simpler problemsexchange information at discrete observation dates. Since we may have to solve a relatively large number of thesesimpler options, it is important to be able to handle each of these individual options as effectively as we can. Conse-quently, the exercises begin with material related to relatively simple one dimensional option pricing problems. Weexamine issues such as grid choices, convergence rates, enforcement of constraints, time step selection, and handlingdiscontinuous payoffs. It is important to understand all of this in detail before going on to more complex cases suchas Asian options and Parisian style barrier options.

There are a total of ten exercises, contained in the following directories:Vanilla EuropeanVanilla AmericanDiscrete DividendsVariable TimestepDigitalBarrierAsianParisianUncertainJump

Each directory contains a number of Matlab M-files. One of these files is called main , where is a digit ranging from1 to 10. This file contains the input parameters for each exercise, along with some of the required general code. Theremaining files contain functions to implement some of the lower level details. In most, if not all cases, the changesneeded to do the exercises can all be done by editing the appropriate main file. The programs can be run by typingthe name of the appropriate main file in the Matlab window and pressing the return key.

Some commonly used variables in our code include:K strike priceT expiry dater interest rateσ volatilityopttype 0 for a call, otherwise a putgridspacing 0 for constant, otherwise variablerefine steps number of times to refine initial coarse gridcrank 0 for fully explicit, 1 for Crank-Nicolson, otherwise fully implicitdelt default time step sizeplots 1 to produce plots of price, delta, and gamma, otherwise no plots

Some useful Matlab commands to remember are clear all (removes all variables and functions from the workspace),close all (closes all open figure windows), more on and more off to turn paging of output on and off, and, ofcourse, help.

210

Feel free to ask a course instructor any time for assistance. As some final advice, if you are looking at a call optionand you find the code is running too slowly, you may wish to consider a put option instead (the default grid is a bitsmaller for a put, and so things should be a little quicker in that case). Also, note that we will try to leave some timeduring each exercise for you to explore the code to see how things work and for you to make some additional changes(such as putting in discrete dividends, changing to American from European, etc.)

211

29.2 Exercise #1

Objectives: Examine convergence for fully explicit, Crank-Nicolson, and fully implicit discretizations; explore effectsof non-uniform grids.

Instructions: Change to the Vanilla European directory and start Matlab. Open up the main1.m file. Feel free toset K, T , r, and σ to any reasonable values. Set the opttype variable to 0 for a call option or, if you prefer, any othernumber for a put. Begin by setting gridspacing to 0 and crank to 0. Complete the following table by making therequired changes to main1.m and running the program:

Fully Explicit Convergence: Constant GridNumber of Number of Option Value Change From

refine steps Grid Nodes Time Steps At S = K Coarser Grid0 n.a.12

What rate of convergence do you observe?

Now change crank to 1 in main1.m and complete the following table:

Crank-Nicolson Convergence: Constant GridNumber of Number of Option Value Change From


Is the rate of convergence different from the fully explicit case?

Now set crank to 2 and fill out this table:

Fully Implicit Convergence: Constant GridNumber of Number of Option Value Change From


Is the convergence rate different from the earlier cases?

212

Now change gridspacing to 1 in order to use a variable grid and fill out the tables below:

Fully Explicit Convergence: Variable GridNumber of Number of Option Value Change From


Crank-Nicolson Convergence: Variable GridNumber of Number of Option Value Change From


Fully Implicit Convergence: Variable GridNumber of Number of Option Value Change From


Is anything different from the constant grid case?

As a final exercise, try setting r = .15, σ = .01, crank = 1, and refine steps = 2 and ensure that plots = 1. Howdo the plots for delta and gamma look? Does it matter if the grid spacing is variable or constant?

213

29.3 Exercise #2

Objectives: Explore use of penalty method for valuing American style options.

Instructions: Change to the Vanilla American directory and edit the main2.m file. Start with the following:

opttype = 1American = 1gridspacing = 1n iters = 1refine steps = 0crank = 2plots = 1

Again, feel free to change things like K, r, σ, and T to whatever you like. One possible advantage to sticking withthe default values of K = 100, r = .05, σ = .25, and T = .50 is that you can compare your answers with the followingresults obtained using a standard binomial tree when S0 = 100:

Number of Time Steps Option Value100 6.0134200 6.0179400 6.0201

Notice that convergence is approximately linear and an extrapolated answer is 6.0223. In fact, with 6400 time stepswe get 6.0222.

Complete the following two tables for fully implicit and Crank-Nicolson discretizations:

Fully Implicit Convergence: Variable Grid (n iters = 1)Number of Number of Option Value Change From


Crank-Nicolson Convergence: Variable Grid (n iters = 1)Number of Number of Option Value Change From


214

Now change n iters to 20 and fill out these tables:

Fully Implicit Convergence: Variable Grid (n iters = 20)Number of Number of Option Value Change From


Crank-Nicolson Convergence: Variable Grid (n iters = 20)Number of Number of Option Value Change From


In each case, what are the approximate rates of convergence? Does the solution profile for gamma look any differentwhen n iters is 1 vs. 20?

If you want to see how many iterations the penalty method is taking at each time step, change the following section ofcode in main2.m

end % end iteration loopif (error > eps & n_iters > 1)

to

end % end iteration loopdisp(sprintf(’step: %g\tno. of iters: %g’,j,k));if (error > eps & n_iters > 1)

(this is approximately at line 140 in the file).

215

29.4 Exercise #3

Objectives: Incorporate effects of discrete dividends.

Instructions: Change to the Discrete Dividends directory and edit the main3.m file. Specify discrete dividends bysetting the div t array (time of dividend payment going forward) and the D array (amount of dividend payment) tosome suitable values. Note that:

• we are still using constant time steps, so set the dividend times to be an exact multiple of delt and do not use afully explicit method; and

• to prevent the unrealistic phenomenon of a dividend being larger than the stock price, the code changes thedividend to the minimum of the dividend specified by the user and the stock price (of course, it is easy to handlemore complicated relationships between dividends and stock prices).

Start by considering some kind of European option on a coarse grid. For this case, compare your results to pricesgenerated by two quasi-analytical methods:

1. Black-Scholes with S adjusted by subtracting the present value of dividends; and

2. Black-Scholes with S adjusted by subtracting the present value of dividends and σ multiplied by the ratio of Sto S less the present value of dividends (this is in theory the correct volatility to use)

(both of these quasi-analytical estimates will be printed automatically if you specify a European option).

Numerical Option Value Quasi-Analytic (Method 1) Quasi-Analytic (Method 2)

Does your numerical estimate change significantly if you refine the grid?In the main3.m file, the interpolation at the dividend dates is handled by the line

vtemp = interp1(S,vnew,Stemp);

(this is about line 190). Try typing “help interp1” in the Matlab window to see how to change the interpolationmethod from its default choice of linear to various other methods. Change to nearest neighbor interpolation and re-runthe same coarse and fine grid cases that you tried above. What do you observe?

216

29.5 Exercise #4

Objectives: Incorporation of variable time step selector; effects on convergence.

Instructions: Change to the Variable Timestep directory and edit the main4.m file. Consider a case with nodividends (i.e. div t = [ ] and D = [ ]) and:

refine steps = 2crank = 1plots = 1variable timestep = 0rannacher = 0American = 0

Note that the specification of delt in main4.m is a little different than previously and we are also setting refine stepsto 2 throughout. This allows us to concentrate on the effects of time discretization. Complete the following table:

Crank-Nicolson With Constant Time StepsNumber of Number of Option Value

delt Grid Nodes Time Steps At S = KT/12.5T/25T/50

Analytic Value at S = K:

Now set variable timestep = 1 and complete this table:

Crank-Nicolson With Variable Time StepsNumber of Number of Option Value

rannacher Grid Nodes Time Steps At S = K01


Observe the time step sizes plotted on the last graph. Try putting in some dividends and seeing what happens.

217

Next, try turning the dividends back off and running a call option with a maturity of, e.g. T = 5 and values of dt max=.05 and .25:

Crank-Nicolson Call With Variable Time StepsNumber of Number of Option Value

dt max Grid Nodes Time Steps At S = K.05.25


Repeat this for a put option:

Crank-Nicolson Put With Variable Time StepsNumber of Number of Option Value

dt max Grid Nodes Time Steps At S = K.05.25


Finally, try running an American put with T = 1 (and no dividends). Check out the plot of gamma. Try switching to afully implicit discretization and re-run this.

218

29.6 Exercise #5

Objectives: Handling discontinuous payoff functions, effects of smoothing and the Rannacher method on conver-gence.

Instructions: Change to the Digital directory and edit the main5.m file. An accurate answer to this type of problemwill require a fine grid spacing near the discontinuity (so ensure that gridspacing = 1).

Make sure that refine steps = 0 and try a fully explicit method (crank = 0). If you run out of patience, break offthe execution (you won’t get a tremendously accurate answer anyways), having noted how many time steps wouldbe needed. More seriously, pick a European case so that you can compare your answers with an analytic solution,and set variable timestep = 1. Also, be sure that plots = 1 so that you can observe any oscillations and start bysetting rannacher = 0 and smooth payoff = 0. Complete the following tables for Crank-Nicolson and fully implicitdiscretizations (and take note of the quality of the estimates of delta and gamma for these runs):

Crank-Nicolson Convergence: Digital OptionNumber of Number of Option Value Change From


Fully Implicit Convergence: Digital OptionNumber of Number of Option Value Change From


What can you say about the rate of convergence for each of these cases?

219

Now change smooth payoff to 1 and repeat the above runs:

Crank-Nicolson Convergence: Digital Option(Smooth Payoff)

Number of Number of Option Value Change Fromrefine steps Grid Nodes Time Steps At S = K Coarser Grid

0 n.a.12

Fully Implicit Convergence: Digital Option(Smooth Payoff)


0 n.a.12

What can you say about the rate of convergence for each of these cases?

Now change rannacher to 1 and repeat the Crank-Nicolson case:

Crank-Nicolson Convergence: Digital Option(Smooth Payoff, Rannacher Method)


0 n.a.12

Finally, repeat the last set of runs above after changing the line

if istep <= 2

to

if istep <= 4

in the bigstep.m file in the directory (about line 28 of the file). Did you notice anything different?

220

29.7 Exercise #6

Objectives: Handling discontinuities due to barriers.

Instructions: Change to the Barrier directory and edit the main6.m file. An accurate answer to this type of problemwill require a fine grid spacing near the discontinuity (so ensure that gridspacing = 1). Start by setting

T = 0.25American = 0opttype = 0plots = 1refine steps = 0crank = 0lb = 0.9Kub = 1.2Kbarrier interval = 2variable timestep = 0smooth payoff = 0rannacher = 0

Once you’ve satisfied yourself that crank = 0 is a bad idea, switch it to 1 (and change lb back to 0.9K if you changedit from this value). Complete the following two tables:

Crank-Nicolson Convergence: Double Barrier Option(Constant Time Steps)


0 n.a.12

Fully Implicit Convergence: Double Barrier Option(Constant Time Steps)


0 n.a.12

What can you say about the rates of convergence which you observe?

221

Now turn on the time step selector (variable timestep = 1) and repeat the above examples:

Crank-Nicolson Convergence: Double Barrier Option(Variable Time Steps)


0 n.a.12

Fully Implicit Convergence: Double Barrier Option(Variable Time Steps)


0 n.a.12

Now try going back to Crank-Nicolson and setting rannacher = 1. Leave smooth payoff = 0 to complete the firsttable below and change it to 1 to do the second one:

Crank-Nicolson Convergence: Double Barrier Option(Variable Time Steps - Rannacher)


0 n.a.12

Crank-Nicolson Convergence: Double Barrier Option(Variable Time Steps - Rannacher, Smooth Payoff)


0 n.a.12

222

One last exercise involving Crank-Nicolson: go back to the coarsest grid but otherwise leave things as for the lastexample. Force a little extra smoothing by editing the bigstep.m file and changing

if istep <= 2

to

if istep <= 4

(approximately line 28 of the file). Has anything been improved?

Try some further experiments. You can test a down-and-out call option by setting lb = 0.9K and ub = 2K. Alter-natively, you can examine an up-and-out call by setting lb = 0 and ub = 1.2K. Other examples you might considerinclude incorporating dividends, early exercise, switching to daily barrier observation, etc.

223

29.8 Exercise #7

Objectives: Valuing Asian options, effects of interpolation methods.

Instructions: Change to the Asian directory and edit the main7.m file. As we will be solving a fairly large collectionof one dimensional PDEs, be sure to put refine steps = 0. It is also advisable to set T to something fairly low, suchas 0.20. Note that we have simplified the code somewhat here in that there is no provision for dividends or barrierconstraints. Start by looking at a European fixed strike Asian call option using a Crank-Nicolson discretization withconstant time steps and weekly averaging. Complete the following table to show the effects of various interpolationmethods:

Fixed Strike European-Asian Call Option Values(Crank-Nicolson)

Interpolation Method Option Value at S = KNearest neighbor (method = 0)Linear (method = 1)Quadratic upstream-biased (method = 2)

Now try a fully implicit discretization (skipping linear interpolation):

Fixed Strike European-Asian Call Option Values(Fully Implicit)

Interpolation Method Option Value at S = KNearest neighbor (method = 0)Quadratic upstream-biased (method = 2)

To consider further the choice of interpolation method, consider the forward shooting grid algorithm (fsg.m). Set theparameter values to S0 = 100, r = .10, σ = .40, T = .25, and K = 100. Consider a European call option (the correctvalue is ' 5.17) and complete the following table by making appropriate changes to fsg.m and typing fsg in theMatlab window:

Fixed Strike European-Asian Call Option Values(Forward Shooting Grid)

Number of Time Steps/Grid quantization parameter Interpolation Method Option Value at S = Knsteps = 25, ρ = 0.50 Nearest neighbor (q = 0)nsteps = 50, ρ = 0.50 Nearest neighbor (q = 0)nsteps = 25, ρ = 0.50 Linear (q = 1)nsteps = 50, ρ = 0.50 Linear (q = 1)nsteps = 25, ρ = 0.25 Nearest neighbor (q = 0)nsteps = 50, ρ = 0.25 Nearest neighbor (q = 0)

224

29.9 Exercise #8

Objectives: Valuing Parisian knock-out options.

Instructions: Change to the Parisian directory and edit the main8.m file. Make sure that the number of observationsuntil knock-out is relatively low (e.g. ncount = 5), and that T is small (e.g. .25). Set partype = 0 to get an up option.Check convergence for Crank-Nicolson and fully implicit discretizations with constant time steps:

Crank-Nicolson Convergence: Parisian Knock-out Option(Constant Time Steps)

Number of Number of Option Value Change Fromrefine steps S Grid Nodes Time Steps At S = K Coarser Grid

0 n.a.12

Fully Implicit Convergence: Parisian Knock-out Option(Constant Time Steps)

Number of Number of Option Value Change Fromrefine steps S Grid Nodes Time Steps At S = K Coarser Grid

0 n.a.12

Try some more experiments: set refine steps = 0 to get a coarse grid and change back to Crank-Nicolson. Tryturning on the time step selector for two cases: rannacher = 0 and 1. Examine puts, Americans, down options, theeffects of different values of ncount, etc. For these various examples, you’ll probably want to go back to constanttime steps. You might also want to think about the modifications needed to handle knock-in options, or how the codewould have to be changed to deal with CEV processes, or volatility surfaces.

Can you figure out how to change the code to do knock-in instead of knock-out? Its only a few lines of code (lookat the initial condition section).

225

29.10 Exercise #9

Objectives: Examining uncertain volatility models. Effects of Crank-Nicolson and fully implicit discretizations, andsensitivity to the spatial grid.

Instructions: Change to the Uncertain directory and edit the main9.m file. Start with the following settings:

opttype = 2American = 0T = 0.25r = .10K = 100σmax = .25σmin = .15worst flag = 1grid select = 0variable timestep = 0rannacher = 0

Adjust crank appropriately and complete the following two tables:

Fully Implicit Convergence: European Butterfly Payoff Option(first grid)


0 n.a.12

Crank-Nicolson Convergence: European Butterfly Payoff Option(first grid)


0 n.a.12

What can you say about the rates of convergence that you observe?

Now try changing grid select to 1 to get a slightly different initial grid. This new grid has 65 nodes, whereas beforethere were 61. All points are the same as before, but now there are four more near the strike (two on each side). Repeatthe exercise above with this new grid:

Fully Implicit Convergence: European Butterfly Payoff Option(second grid)


0 n.a.12

226

Crank-Nicolson Convergence: European Butterfly Payoff Option(second grid)


0 n.a.12

How have things changed from above?

Now set σmax = σmin = 0.25 and repeat:

Fully Implicit Convergence: European Butterfly Payoff Option(second grid: σmax = σmin)


0 n.a.12

Crank-Nicolson Convergence: European Butterfly Payoff Option(second grid: σmax = σmin)


0 n.a.12

What can you conclude?

Try some additional experiments, such as using variable timesteps, the Rannacher method, or a different payoff func-tion.

227

29.11 Exercise #10

Objectives: Examining jump diffusion models. Effects of Crank-Nicolson and explicit handling of jump diffusionintegral.

Instructions: Change to the Jump directory and edit the main10.m file. Start with the following settings:

opttype = 1American = 1T = 0.25r = .05K = 100sigma = .15variable timestep = 1rannacher = 1crank = 1n iters = 20lambda = .1mu = −.9gamma = .45

First, try Crank-Nicolson.

Crank-Nicolson Convergence: American PutNumber of Number of Option Value Change From


Now, use Crank-Nicolson on the PDE part, but use an explicit method for the jump diffusion integral, and theAmerican constraint. (Actually, this method is not unconditionally stable. The stability condition is λ∆τ/2 < 1.) Dothis by setting n iters = 1. Complete the following table.

Crank-Nicolson and explicit integral term, American constraintNumber of Number of Option Value Change From


What can you say about the rates of convergence that you observe?Now, go back and set n iters = 10, refine = 2 and record the value of the American jump diffusion put at

S = 90,100,110.

Jump Diffusion Prices: American PutS = 90 S = 100 S = 110

228

Now, try computing the price of a constant volatility model (no jump), with the implied volatility which matchesthe price of a European call (jump diffusion) at the money. Determine the implied volatility (σimp) by using theimplied vol.m function. Edit this file, to make sure the jump parameters are the same as in the main10.m file.

Implied Volatility: Prices matched at S = Kσimp

Now, compare the price of a constant volatility American option (no jumps) with the price of the jump-diffusionmodel. Use the implied volatility you just computed for the constant volatility (no jump) model. Edit the file main10.m,and change sigma = σimp, and set lambda = 0.

Constant Volatility Prices: σ = σimpS = 90 S = 100 S = 110

Can you explain this?

229

References[1] K. Amin. Jump diffusion option valuation in discrete time. Journal of Finance, 2:5–38, 1996.

[2] L. Andersen and J. Andreasen. Jump-diffusion processes: Volatility smile fitting and numerical methods foroption pricing. Review of Derivative Research, 4:231–262, 2000.

[3] C. Anderson and M. D. Dahleh. Rapid computation of the discrete Fourier transform. SIAM Journal on ScientificComputation, 17(4):913–919, July 1996.

[4] W.K. Anderson, J.L. Thomas, and B. Van Leer. Comparison of finite volume flux vector splittings for the Eulerequations. AIAA J., 24:1453–1460, 1986.

[5] J. Andreason. The pricing of discretely sampled Asian and lookback options: a change of numerarie approach.J. Comp. Fin., 2(Fall):5–30, 1998.

[6] M. Avellaneda, A. Levy, and A Paras. Pricing and hedging derivative securities in markets with uncertainvolatilities. Appl. Math. Fin., 2:73–88, 1995.

[7] M. Avellaneda and A. Paras. Managing security risk of portfolios of derivative securities: The Lagrangianuncertain volatility model. Appl. Math. Fin., 3:21–52, 1996.

[8] O. Axellson and V. Barker. Finite Element Solution of Boundary Value Problems. Academic Press, 1984.

[9] Elie Ayache. Credit risk and convertibles. 2001. www.ito33.com/html/print/creditrisk.pdf.

[10] G. Bakshi, C. Cao, and C. Zhiwu. Empirical performance of alternativee option pricing models. Journal ofFinance, 52:2003–2049, 1997.

[11] Deutsche Bank. Convertible bond: Risk premium treatment. 2001. www.dbconvertibles.com.

[12] G. Barles. Cambridge University Press, 1997. Convergence of numerical schemes for degenerate parabolicequations arising in Finance theory, in Numerical Methods in Finance, edited by L. Rogers and D. Talay.

[13] J. Barraquand and T. Pudet. Pricing of American path-dependent contingent claims. Math. Finance, 6:17–51,1996.

[14] D. Bates. Jumps and stochastic volatility: Exchange rate process implicit in Deutsche M ark options. Reviewof Financial Studies, pages 69–107, 1996.

[15] J. Becker. A second order backward difference method with variable timesteps for a parabolic problem. BIT,38:644–662, 1998.

[16] N. Borovykh, , D. Drissi, and M.N. Spijker. A bound on the powers of linear operators, with relevance tonumerical stability. Applied Mathematics Letters, 15:47–53, 20002.

[17] N. Borovykh and M.N. Spijker. Resolvent conditions and bounds on the powers of matrices, with relevance tothe numerical stability of initial value problems. Journal of Computational and Applied Mathematics, 125:41–56, 2000.

[18] P.P. Boyle and T. Vorst. Option replication in discrete time with transaction costs. J. Fin., 47:271–293, 1992.

[19] T.H.F. Cheuk and T.C.F. Vorst. Shout floors. Net Exposure, 2, November 1997.

[20] T.F. Coleman, Y.Li, and A. Verma. A Newton method for American option pricing. Cornell Theory CenterTechnical Report CTC9907, also at http://www.fisc-ny.com, 1999.

[21] M.G. Crandall, H. Ishii, and P.L. Lions. Users guide to viscosity solutions of the Hamilton-Jacobi equations.Bull. Amer. Math. Soc., 27:1–67, 1992.

[22] M. Davis and F. Lischka. Convertible bonds with market risk and credit risk. 1999. www.ma.ic.ac.uk/∼mdavis.

230

[23] E.F. D’Azevedo, P.A. Forsyth, and W.P. Tang. Ordering methods for preconditioned conjugate gradient methodsapplied to unstructured grid problems. SIAM J. Matrix Anal. Applic., 13:944–961, 1992.

[24] E.F. D’Azevedo, P.A. Forsyth, and W.P. Tang. Towards a cost effective ILU preconditioner with high level fill.BIT, 32:442–463, 1992.

[25] M.A.H. Dempster and J.P. Hutton. Fast numerical valuation of American, exotic and complex options. Appl.Math. Finance, 4:1–20, 1997.

[26] M.H. Dempster, J.P. Hutton, and D.G. Richards. LP valuation of exotic American options exploiting structure.J. Comp. Fin., 2:61–84, Fall 1998.

[27] J.N. Dewynne and P. Wilmott. A note on average rate options with discrete sampling. SIAM J. Appl. Math.,55:267–277, 1995.

[28] Y. d’Halluin, P.A. Forsyth, and G. Labahn. A penalty method for american options with jump diffusion pro-cesses. Numerische Mathematik, 97:321–352, 2004.

[29] Y. d’Halluin, P.A. Forsyth, and K.R. Vetzal. Robust numerical methods for contingent claims under jumpdiffusion processes. IMA Journal on Numerical Analysis, 25:87–112, 2005.

[30] A. Duijndam and M. Schonewille. Nonuniform fast fourier transform. Geophysics, 64:539–551, 1999.

[31] A. Dutt and V. Rokhlin. Fast Fourier transforms for nonequispaced data. SIAM Journal of Scientific Computa-tion, 14(6):1368–1393, November 1993.

[32] E. Eriksson, D. Estep, P. Hansbro, and C. Johnson. Computational Differential Equations. Cambridge, 1996.

[33] R. Eymard, T. Gallouet, and R. Herbin. Convergence of finite volume schemes for semilinear convectiondiffusion equations. Num. Math., 82:91–116, 1999.

[34] Q. Fan, P.A. Forsyth, J. McMacken, and W.P. Tang. Performance issues for iterative solvers in device simulation.SIAM J. Sci. Comp., 19:100–117, 1996.

[35] P.A. Forsyth and H. Jiang. Nonlinear iteration methods for high speed laminar compressible Navier-Stokesequations. Comp. Fluids, 26:249–268, 1997.

[36] P.A. Forsyth and K.R. Vetzal. Implicit solution of uncertain volatility/transaction cost option pricing modelswith discretely observed barriers. Appl. Num. Math., 36:427–445., 2001.

[37] P.A. Forsyth and K.R. Vetzal. Quadratic convergence for valuing American options using a penalty method.SIAM J. Sci. Comp., 23:2096–2123, 2002.

[38] P.A. Forsyth and K.R. Vetzal. Quadratic covnergence for of a penalty method for valuing American options.SIAM J. Sci. Comp., 2002. in press.

[39] P.A. Forsyth, K.R. Vetzal, and R. Zvan. Convergence of lattice and PDE methods forAsian options. Working paper, Department of Computer Science, University of Waterloo,http://www.scicom.uwaterloo.ca/˜paforsyt, 1999.

[40] P.A. Forsyth, K.R. Vetzal, and R. Zvan. A finite element approach to the pricing of discrete lookbacks withstochastic volatility. Appl. Math. Fin., 6:87–106, 1999.

[41] A. Friedman. Variational Principles and Free-Boundary Problems. Robert E. Krieger, 1988.

[42] A. George and J. W. H. Liu. Computer Solutions of Large Sparse Positive-Definite Systems. Prentice-Hall,Englewood Cliffs, New Jersey, 1981.

[43] M.B. Giles. On the stability and convergence of discretizations of initial value pdes. IMA Journal of NumericalAnalysis, 17:563–576, 1997.

231

[44] S. Heston and G. Zhou. On the rate of convergence of discrete-time contingent claims. Math. Finance, 10:53–75, 2000.

[45] S.L. Heston. A closed form solution for options with stochastic volatility and applications to bond and currencyoptions. Rev. Fin. Studies, 6:327–343, 1993.

[46] T. Hoggard, A.E. Whalley, and P. Wilmott. Hedging option portfolios in the presence of transaction costs. Adv.Fut. Opt. Res., 7:21–35, 1994.

[47] J. Hull. Options, Futures and Other Derivatives. Prentice Hall, 1997.

[48] J. Hull and A. White. The pricing of options with stochastic volatilities. Journal of Finance, 42:281–300, 1987.

[49] P. Jaillet, E.I. Ronn, and S. Tompaidas. Valuation of commodity-based “swing” options. Proceedings of theEighth Annual Derivative Securities Conference, Boston, 1998.

[50] H. Jiang and P.A. Forsyth. Robust linear and nonlinear strategies for solution of the transonic Euler equations.Comp. Fluids, 24:753–770, 1995.

[51] C. Johnson. Numerical Solutions of Partial Differential Equations by the Finite Element Method. CambridgeUniversity Press, 1987.

[52] C. Johnson. Numerical Solutions of Partial Differential Equations By the Finite Element Method. CambridgeUniversity Press, Cambridge, 1987.

[53] R. Kangro and R. Nicolaides. Far field boundary conditions for Black-Scholes equations. SIAM, Journal onNumerical Analysis, 38(4):1357–1368, 2000.

[54] G. Kossioris, Ch. Makridakis, and P.E. Souganidis. Finite volume schemes for hamiltion-jacobi equations.Num. Math., 83:427–442, 1999.

[55] S. G. Kou. A jump diffusion model for option pricing. Management Science @ 2002 Informs, 48:1086–1101,August 2002.

[56] J. F. B. M. Kraaijevanger, H. W. J. Lenferink, and M. N. Spijker. Stepsize restrictions for stability in the numer-ical solution of ordinary and partial differential equations. Journal of Computational and Applied Mathematics,20:67–81, 1987.

[57] D. Kroner. Numerical Schemes for Conservation Laws. Academic Press, 1997.

[58] R. Kuske and J. Keller. Optimal exercise boundary for an American put option. Appl. Math. Fin., 5:107–116,1998.

[59] P. Lax and X.D. Liu. Solution of two dimensional Riemann problems of gas dynamics by positive schemes.SIAM J. Sci. Comp., 19:319–340, 1998.

[60] H.E. Leland. Option pricing and replication with transaction costs. J. Fin., 40:1283–1301, 1985.

[61] H. Lenferink and M. Spijker. On the use of stability regions in numerical analysis of intial value problems.Math. Comp., 57:221–237, 1991.

[62] R. J. LeVeque. Numerical Methods for Conservation Laws. Birkhauser, 1990.

[63] A.M. Matache, T. von Petersdorff, and C. Schwab. Fast deterministic pricing of options on Levy driven assets.2002. RiskLab research rePort, ETH, Zurich.

[64] R. C. Merton. Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics,3:125–144, 1976.

[65] Y. Muromachi. The growing recognition of credit risk in corporate and financial bond markets. 1999. NLIResearch Institute, paper 126.

232

[66] A. Paras and M. Avellaneda. Dynamic hedging with transaction costs: From lattice models to nonlinear volatil-ity and free boundary problems. Preprint, Courant Institute of Mathematical Sciences, New York, 1995.

[67] D. M. Pooley, P. A. Forsyth, and K. R. Vetzal. Convergence remedies for non-smooth payoffs in option pricing.To appear in Journal of Computational Finance, 2002.

[68] D. M. Pooley, P. A. Forsyth, and K. R. Vetzal. Numerical convergence properties of option pricing PDEs withuncertain volatility. To appear in IMA Journal of Numerical Analysis, 2002.

[69] D. Potts, G. Steidl, and M. Tasche. Fast Fourier transforms for nonequispaced data: A tutorial, 2000. in ModernSampling Theory: Mathematics and Application, J. J. Benedetto and P. Ferreira, eds., ch. 12, pp. 253 - 274,Birkhauser.

[70] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes: The Art of ScientificComputing. Cambridge University Press, Cambridge (UK) and New York, 2nd edition, 1992.

[71] C. Randall, E. Kant, and A. Chhabra. Using program synthesis to price derivatives. J. Comp. Finance, 1:97–129,Winter 1998.

[72] R. Rannacher. Finite element solution of diffusion problems with irregular data. Num. Math., 43:309–327,1984.

[73] R. D. Richtmyer and K. W. Morton. Difference Methods for Initial-Value Problems. Interscience Publisher,1967.

[74] I. Rupf, J. Dewynne, S. Howison, and P. Wilmott. Some mathematical results in the pricing of Americanoptions. Euro. J. Appl. Math., 4:381–398, 1993.

[75] Y. Saad. Iterative Methods for Sparse Systems. PWS, 1996.

[76] Goldman Sachs. Valuing convertible bonds as derivatives. 1994. Quantitative Research Notes, Goldman Sachs.

[77] L. Shampine. Numerical Solution of Ordinary Differential Equations. Chapman and Hall, 1994.

[78] M.N. Spijker and F.A.J. Straetemans. Error growth analysis via stability regions for discretizations of initialvalue problems. BIT, 37:442–464, 1997.

[79] F.A.J. Straetmans. Resolvent conditions for discretizations of diffusion-convection reaction equations in severalspace dimensions. Applied Numerical Mathematics, 28:45–67, 1998.

[80] A. Takahashi, T. Kobayashi, and N. Nakagawa. Pricing convertible bonds with default risk. J. Fixed Income,pages 20–29, December 2001.

[81] D. Tavella and C. Randall. Pricing financial instruments: the finite difference method. John Wiley & Sons, Inc,2000.

[82] B. Thomas. Something to shout about. RISK, 6:56–58, May 1993.

[83] K. Tsiveriotis and C. Fernandes. Valuing convertible bonds with credit risk. J. Fixed Income, 8:95–102,September 1998.

[84] A.J.A. Unger, P.A. Forsyth, and E.A. Sudicky. Variable spatial and temporal weighting schemes for use inmulti-phase compositional problems. Adv. Water Res., 19:1–27, 1996.

[85] R.S. Varga. Matrix iterative analysis. Springer, New York, 2000.

[86] K.R. Vetzal and P.A. Forsyth. Discrete Parisian and delayed barrier options: A general numerical approach.Adv. Fut. Opt. Res., 10:1–15, 1999.

[87] L. Walbein. A remark on parabolic smoothing and the finite element method. SIAM Journal on NumericalAnalysis, 17:33–38, 1980.

233

[88] A. F. Ware. Fast approximate Fourier transforms for irregularly spaced data. SIAM Review, 40(4):838–856,1998.

[89] P. Wilmott. Derivatives. Wiley, 1998.

[90] P. Wilmott, J. Dewynne, and S. Howison. Option Pricing. Oxford Financial Press, 1993.

[91] H. Windcliff, P.A. Forsyth, and K.R. Vetzal. Boundary conditions for the black-scholes equations. 2001.submitted to SIAM J. Sci. Comp.

[92] H. Windcliff, P.A. Forsyth, and K.R. Vetzal. Shout options: A framework for pricing contracts which can bemodified by the investor. J. Comp. Appl. Math., 134:213–241, 2001.

[93] X. L. Zhang. Numerical analysis of American option pricing in a jump-diffusion model. Mathematics ofOperations Research, 22:668–690, 1997.

[94] R. Zvan, P. A. Forsyth, and K. R. Vetzal. Penalty methods for American options with stochastic volatility.Journal of Computational and Applied Mathematics, 91:199–218, 1998.

[95] R. Zvan, P. A. Forsyth, and K. R. Vetzal. A finite element approach to the pricing of discrete lookbacks withstochastic v olatility. Appl. Math. Finance, 6:87–106, 1999.

[96] R. Zvan, P. A. Forsyth, and K. R. Vetzal. A finite volume approach for contingent claims valuation. IMA Journalof Numerical Analysis, 21:703–731, 2001.

[97] R. Zvan, P.A. Forsyth, and K.R. Vetzal. A penalty method for American options with stochastic volatility. J.Comp. Appl. Math., 91:199–218, 1998.

[98] R. Zvan, P.A. Forsyth, and K.R. Vetzal. Robust numerical methods for PDE models of Asian options. J. Comp.Fin., 1:39–78, Winter 1998.

[99] R. Zvan, P.A. Forsyth, and K.R. Vetzal. Discrete Asian barrier options. J. Comp. Fin., 3:41–68, Fall 1999.

[100] R. Zvan, P.A. Forsyth, and K.R. Vetzal. A finite volume approach for contingent claims valuation. IMA J. Num.Anal., 21:703–731, 2001.

[101] R. Zvan, K.R. Vetzal, and P.A. Forsyth. Swing low, swing high. RISK, 11:71–75, March 1998.

[102] R. Zvan, K.R. Vetzal, and P.A. Forsyth. PDE methods for barrier options. J. Econ. Dyn. Con., 24:1563–1590,2000.

234

numerical pde methods for pricing path dependent options

Documents