numerical methods by jeffrey r. chasnov

63
Introduction to Numerical Methods Lecture notes for MATH 3311 Jeffrey R. Chasnov The Hong Kong University of Science and Technology

Upload: ankushnathe

Post on 09-Jun-2015

344 views

Category:

Engineering


14 download

DESCRIPTION

IEEE Arithmetic,Root Finding.Systems of equations, Least-squares approximation, Interpolation

TRANSCRIPT

Page 1: Numerical methods by Jeffrey R. Chasnov

Introduction to NumericalMethods

Lecture notes for MATH 3311

Jeffrey R. Chasnov

The Hong Kong University of

Science and Technology

Page 2: Numerical methods by Jeffrey R. Chasnov

The Hong Kong University of Science and TechnologyDepartment of MathematicsClear Water Bay, Kowloon

Hong Kong

Copyright cβ—‹ 2012 by Jeffrey Robert Chasnov

This work is licensed under the Creative Commons Attribution 3.0 Hong Kong License.

To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/hk/

or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco,

California, 94105, USA.

Page 3: Numerical methods by Jeffrey R. Chasnov

Preface

What follows are my lecture notes for Math 3311: Introduction to NumericalMethods, taught at the Hong Kong University of Science and Technology. Math3311, with two lecture hours per week, is primarily for non-mathematics majorsand is required by several engineering departments.

All web surfers are welcome to download these notes athttp://www.math.ust.hk/~machas/numerical-methods.pdf

and to use the notes freely for teaching and learning. I welcome any comments,suggestions or corrections sent by email to [email protected].

iii

Page 4: Numerical methods by Jeffrey R. Chasnov
Page 5: Numerical methods by Jeffrey R. Chasnov

Contents

1 IEEE Arithmetic 1

1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Numbers with a decimal or binary point . . . . . . . . . . . . . . 1

1.3 Examples of binary numbers . . . . . . . . . . . . . . . . . . . . . 1

1.4 Hex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.5 4-bit unsigned integers as hex numbers . . . . . . . . . . . . . . . 2

1.6 IEEE single precision format: . . . . . . . . . . . . . . . . . . . . 2

1.7 Special numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.8 Examples of computer numbers . . . . . . . . . . . . . . . . . . . 3

1.9 Inexact numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.9.1 Find smallest positive integer that is not exact in singleprecision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.10 Machine epsilon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.11 IEEE double precision format . . . . . . . . . . . . . . . . . . . . 5

1.12 Roundoff error example . . . . . . . . . . . . . . . . . . . . . . . 6

2 Root Finding 7

2.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Estimate√2 = 1.41421356 using Newton’s Method . . . . 8

2.3.2 Example of fractals using Newton’s Method . . . . . . . . 8

2.4 Order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Systems of equations 13

3.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 πΏπ‘ˆ decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 Operation counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.5 System of nonlinear equations . . . . . . . . . . . . . . . . . . . . 21

4 Least-squares approximation 23

4.1 Fitting a straight line . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Fitting to a linear combination of functions . . . . . . . . . . . . 24

v

Page 6: Numerical methods by Jeffrey R. Chasnov

vi CONTENTS

5 Interpolation 275.1 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.1 Vandermonde polynomial . . . . . . . . . . . . . . . . . . 275.1.2 Lagrange polynomial . . . . . . . . . . . . . . . . . . . . . 285.1.3 Newton polynomial . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Piecewise linear interpolation . . . . . . . . . . . . . . . . . . . . 295.3 Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . 305.4 Multidimensional interpolation . . . . . . . . . . . . . . . . . . . 33

6 Integration 356.1 Elementary formulas . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.1.1 Midpoint rule . . . . . . . . . . . . . . . . . . . . . . . . . 356.1.2 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . 366.1.3 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2 Composite rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2.1 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . 376.2.2 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3 Local versus global error . . . . . . . . . . . . . . . . . . . . . . . 386.4 Adaptive integration . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Ordinary differential equations 437.1 Examples of analytical solutions . . . . . . . . . . . . . . . . . . 43

7.1.1 Initial value problem . . . . . . . . . . . . . . . . . . . . . 437.1.2 Boundary value problems . . . . . . . . . . . . . . . . . . 447.1.3 Eigenvalue problem . . . . . . . . . . . . . . . . . . . . . 45

7.2 Numerical methods: initial value problem . . . . . . . . . . . . . 467.2.1 Euler method . . . . . . . . . . . . . . . . . . . . . . . . . 467.2.2 Modified Euler method . . . . . . . . . . . . . . . . . . . 467.2.3 Second-order Runge-Kutta methods . . . . . . . . . . . . 477.2.4 Higher-order Runge-Kutta methods . . . . . . . . . . . . 487.2.5 Adaptive Runge-Kutta Methods . . . . . . . . . . . . . . 497.2.6 System of differential equations . . . . . . . . . . . . . . . 50

7.3 Numerical methods: boundary value problem . . . . . . . . . . . 517.3.1 Finite difference method . . . . . . . . . . . . . . . . . . . 517.3.2 Shooting method . . . . . . . . . . . . . . . . . . . . . . . 53

7.4 Numerical methods: eigenvalue problem . . . . . . . . . . . . . . 547.4.1 Finite difference method . . . . . . . . . . . . . . . . . . . 547.4.2 Shooting method . . . . . . . . . . . . . . . . . . . . . . . 56

Page 7: Numerical methods by Jeffrey R. Chasnov

Chapter 1

IEEE Arithmetic

1.1 Definitions

Bit = 0 or 1Byte = 8 bitsWord = Reals: 4 bytes (single precision)

8 bytes (double precision)= Integers: 1, 2, 4, or 8 byte signed

1, 2, 4, or 8 byte unsigned

1.2 Numbers with a decimal or binary point

οΏ½ οΏ½ οΏ½ οΏ½ Β· οΏ½ οΏ½ οΏ½ οΏ½Decimal: 103 102 101 100 10βˆ’1 10βˆ’2 10βˆ’3 10βˆ’4

Binary: 23 22 21 20 2βˆ’1 2βˆ’2 2βˆ’3 2βˆ’4

1.3 Examples of binary numbers

Decimal Binary1 12 103 114 1000.5 0.11.5 1.1

1.4 Hex numbers

{0, 1, 2, 3, . . . , 9, 10, 11, 12, 13, 14, 15} = {0, 1, 2, 3.......9, a,b,c,d,e,f}

1

Page 8: Numerical methods by Jeffrey R. Chasnov

2 CHAPTER 1. IEEE ARITHMETIC

1.5 4-bit unsigned integers as hex numbers

Decimal Binary Hex1 0001 12 0010 23 0011 3...

......

10 1010 a...

......

15 1111 f

1.6 IEEE single precision format:

π‘ βž ⏟ οΏ½0

π‘’βž ⏟ οΏ½1οΏ½2οΏ½3οΏ½4οΏ½5οΏ½6οΏ½7οΏ½8

π‘“βž ⏟ οΏ½9Β· Β· Β· Β· Β· Β· Β· Β·οΏ½

31

# = (βˆ’1)𝑠 Γ— 2π‘’βˆ’127 Γ— 1.f

where s = signe = biased exponentp=e-127 = exponent1.f = significand (use binary point)

Page 9: Numerical methods by Jeffrey R. Chasnov

1.7. SPECIAL NUMBERS 3

1.7 Special numbers

Smallest exponent: e = 0000 0000, represents denormal numbers (1.f β†’ 0.f)Largest exponent: e = 1111 1111, represents ±∞, if f = 0

e = 1111 1111, represents NaN, if f = 0

Number Range: e = 1111 1111 = 28 - 1 = 255 reservede = 0000 0000 = 0 reserved

so, p = e - 127 is1 - 127 ≀ p ≀ 254-127-126 ≀ p ≀ 127

Smallest positive normal number= 1.0000 0000 Β· Β· Β· Β· Β·Β· 0000Γ— 2βˆ’126

≃ 1.2 Γ— 10βˆ’38

bin: 0000 0000 1000 0000 0000 0000 0000 0000hex: 00800000MATLAB: realmin(’single’)

Largest positive number= 1.1111 1111 Β· Β· Β· Β· Β·Β· 1111Γ— 2127

= (1 + (1βˆ’ 2βˆ’23))Γ— 2127

≃ 2128 ≃ 3.4Γ— 1038

bin: 0111 1111 0111 1111 1111 1111 1111 1111hex: 7f7fffffMATLAB: realmax(’single’)

Zerobin: 0000 0000 0000 0000 0000 0000 0000 0000hex: 00000000

Subnormal numbersAllow 1.f β†’ 0.f (in software)Smallest positive number = 0.0000 0000 Β· Β· Β· Β· Β· 0001 Γ— 2βˆ’126

= 2βˆ’23 Γ— 2βˆ’126 ≃ 1.4 Γ— 10βˆ’45

1.8 Examples of computer numbers

What is 1.0, 2.0 & 1/2 in hex ?

1.0 = (βˆ’1)0 Γ— 2(127βˆ’127) Γ— 1.0Therefore, 𝑠 = 0, 𝑒 = 0111 1111, 𝑓 = 0000 0000 0000 0000 0000 000bin: 0011 1111 1000 0000 0000 0000 0000 0000hex: 3f80 0000

2.0 = (βˆ’1)0 Γ— 2(128βˆ’127) Γ— 1.0Therefore, 𝑠 = 0, 𝑒 = 1000 0000, 𝑓 = 0000 0000 0000 0000 0000 000bin: 0100 00000 1000 0000 0000 0000 0000 0000hex: 4000 0000

Page 10: Numerical methods by Jeffrey R. Chasnov

4 CHAPTER 1. IEEE ARITHMETIC

1/2 = (βˆ’1)0 Γ— 2(126βˆ’127) Γ— 1.0Therefore, 𝑠 = 0, 𝑒 = 0111 1110, 𝑓 = 0000 0000 0000 0000 0000 000bin: 0011 1111 0000 0000 0000 0000 0000 0000hex: 3f00 0000

1.9 Inexact numbers

Example:1

3= (βˆ’1)0 Γ— 1

4Γ— (1 +

1

3),

so that 𝑝 = 𝑒 βˆ’ 127 = βˆ’2 and 𝑒 = 125 = 128 βˆ’ 3, or in binary, 𝑒 = 0111 1101.How is 𝑓 = 1/3 represented in binary? To compute binary number, multiplysuccessively by 2 as follows:

0.333 . . . 0.

0.666 . . . 0.0

1.333 . . . 0.01

0.666 . . . 0.010

1.333 . . . 0.0101

etc.

so that 1/3 exactly in binary is 0.010101 . . . . With only 23 bits to represent 𝑓 ,the number is inexact and we have

𝑓 = 01010101010101010101011,

where we have rounded to the nearest binary number (here, rounded up). Themachine number 1/3 is then represented as

00111110101010101010101010101011

or in hex3π‘’π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘.

1.9.1 Find smallest positive integer that is not exact insingle precision

Let 𝑁 be the smallest positive integer that is not exact. Now, I claim that

𝑁 βˆ’ 2 = 223 Γ— 1.11 . . . 1,

and𝑁 βˆ’ 1 = 224 Γ— 1.00 . . . 0.

The integer 𝑁 would then require a one-bit in the 2βˆ’24 position, which is notavailable. Therefore, the smallest positive integer that is not exact is 224 + 1 =16 777 217. In MATLAB, single(224) has the same value as single(224+1). Sincesingle(224+1) is exactly halfway between the two consecutive machine numbers224 and 224+2, MATLAB rounds to the number with a final zero-bit in f, whichis 224.

Page 11: Numerical methods by Jeffrey R. Chasnov

1.10. MACHINE EPSILON 5

1.10 Machine epsilon

Machine epsilon (πœ–mach) is the distance between 1 and the next largest number.If 0 ≀ 𝛿 < πœ–mach/2, then 1 + 𝛿 = 1 in computer math. Also since

π‘₯+ 𝑦 = π‘₯(1 + 𝑦/π‘₯),

if 0 ≀ 𝑦/π‘₯ < πœ–mach/2, then π‘₯+ 𝑦 = π‘₯ in computer math.

Find πœ–mach

The number 1 in the IEEE format is written as

1 = 20 Γ— 1.000 . . . 0,

with 23 0’s following the binary point. The number just larger than 1 has a 1in the 23rd position after the decimal point. Therefore,

πœ–mach = 2βˆ’23 β‰ˆ 1.192Γ— 10βˆ’7.

What is the distance between 1 and the number just smaller than 1? Here,the number just smaller than one can be written as

2βˆ’1 Γ— 1.111 . . . 1 = 2βˆ’1(1 + (1βˆ’ 2βˆ’23)) = 1βˆ’ 2βˆ’24

Therefore, this distance is 2βˆ’24 = πœ–mach/2.The spacing between numbers is uniform between powers of 2, with logarith-

mic spacing of the powers of 2. That is, the spacing of numbers between 1 and2 is 2βˆ’23, between 2 and 4 is 2βˆ’22, between 4 and 8 is 2βˆ’21, etc. This spacingchanges for denormal numbers, where the spacing is uniform all the way downto zero.

Find the machine number just greater than 5

A rough estimate would be 5(1+ πœ–mach) = 5+5πœ–mach, but this is not exact. Theexact answer can be found by writing

5 = 22(1 +1

4),

so that the next largest number is

22(1 +1

4+ 2βˆ’23) = 5 + 2βˆ’21 = 5 + 4πœ–mach.

1.11 IEEE double precision format

Most computations take place in double precision, where round-off error is re-duced, and all of the above calculations in single precision can be repeated fordouble precision. The format is

π‘ βž ⏟ οΏ½0

π‘’βž ⏟ οΏ½1οΏ½2οΏ½3οΏ½4οΏ½5οΏ½6οΏ½7οΏ½8οΏ½9οΏ½10οΏ½11

π‘“βž ⏟ οΏ½12Β· Β· Β· Β· Β· Β· Β· Β·οΏ½

63

Page 12: Numerical methods by Jeffrey R. Chasnov

6 CHAPTER 1. IEEE ARITHMETIC

# = (βˆ’1)𝑠 Γ— 2π‘’βˆ’1023 Γ— 1.f

where s = signe = biased exponentp=e-1023 = exponent1.f = significand (use binary point)

1.12 Roundoff error example

Consider solving the quadratic equation

π‘₯2 + 2𝑏π‘₯βˆ’ 1 = 0,

where 𝑏 is a parameter. The quadratic formula yields the two solutions

π‘₯Β± = βˆ’π‘Β±βˆšπ‘2 + 1.

Consider the solution with 𝑏 > 0 and π‘₯ > 0 (the π‘₯+ solution) given by

π‘₯ = βˆ’π‘+√

𝑏2 + 1. (1.1)

As 𝑏 β†’ ∞,

π‘₯ = βˆ’π‘+βˆšπ‘2 + 1

= βˆ’π‘+ π‘βˆš1 + 1/𝑏2

= 𝑏(√

1 + 1/𝑏2 βˆ’ 1)

β‰ˆ 𝑏

(1 +

1

2𝑏2βˆ’ 1

)=

1

2𝑏.

Now in double precision, realmin β‰ˆ 2.2 Γ— 10βˆ’308 and we would like π‘₯ to beaccurate to this value before it goes to 0 via denormal numbers. Therefore,π‘₯ should be computed accurately to 𝑏 β‰ˆ 1/(2 Γ— realmin) β‰ˆ 2 Γ— 10307. Whathappens if we compute (1.1) directly? Then π‘₯ = 0 when 𝑏2 + 1 = 𝑏2, or1 + 1/𝑏2 = 1. That is 1/𝑏2 = πœ–mach/2, or 𝑏 =

√2/βˆšπœ–mach β‰ˆ 108.

For a subroutine written to compute the solution of a quadratic for a generaluser, this is not good enough. The way for a software designer to solve thisproblem is to compute the solution for π‘₯ as

π‘₯ =1

𝑏+βˆšπ‘2 + 1

.

In this form, if 𝑏2+1 = 𝑏2, then π‘₯ = 1/2𝑏 which is the correct asymptotic form.

Page 13: Numerical methods by Jeffrey R. Chasnov

Chapter 2

Root Finding

Solve 𝑓(π‘₯) = 0 for π‘₯, when an explicit analytical solution is impossible.

2.1 Bisection Method

The bisection method is the easiest to numerically implement and almost alwaysworks. The main disadvantage is that convergence is slow. If the bisectionmethod results in a computer program that runs too slow, then other fastermethods may be chosen; otherwise it is a good choice of method.

We want to construct a sequence π‘₯0, π‘₯1, π‘₯2, . . . that converges to the rootπ‘₯ = π‘Ÿ that solves 𝑓(π‘₯) = 0. We choose π‘₯0 and π‘₯1 such that π‘₯0 < π‘Ÿ < π‘₯1. Wesay that π‘₯0 and π‘₯1 bracket the root. With 𝑓(π‘Ÿ) = 0, we want 𝑓(π‘₯0) and 𝑓(π‘₯1)to be of opposite sign, so that 𝑓(π‘₯0)𝑓(π‘₯1) < 0. We then assign π‘₯2 to be themidpoint of π‘₯0 and π‘₯1, that is π‘₯2 = (π‘₯0 + π‘₯1)/2, or

π‘₯2 = π‘₯0 +π‘₯1 βˆ’ π‘₯0

2.

The sign of 𝑓(π‘₯2) can then be determined. The value of π‘₯3 is then chosen aseither the midpoint of π‘₯0 and π‘₯2 or as the midpoint of π‘₯2 and π‘₯1, depending onwhether π‘₯0 and π‘₯2 bracket the root, or π‘₯2 and π‘₯1 bracket the root. The root,therefore, stays bracketed at all times. The algorithm proceeds in this fashionand is typically stopped when the increment to the left side of the bracket(above, given by (π‘₯1 βˆ’ π‘₯0)/2) is smaller than some required precision.

2.2 Newton’s Method

This is the fastest method, but requires analytical computation of the derivativeof 𝑓(π‘₯). Also, the method may not always converge to the desired root.

We can derive Newton’s Method graphically, or by a Taylor series. We againwant to construct a sequence π‘₯0, π‘₯1, π‘₯2, . . . that converges to the root π‘₯ = π‘Ÿ.Consider the π‘₯𝑛+1 member of this sequence, and Taylor series expand 𝑓(π‘₯𝑛+1)about the point π‘₯𝑛. We have

𝑓(π‘₯𝑛+1) = 𝑓(π‘₯𝑛) + (π‘₯𝑛+1 βˆ’ π‘₯𝑛)𝑓′(π‘₯𝑛) + . . . .

7

Page 14: Numerical methods by Jeffrey R. Chasnov

8 CHAPTER 2. ROOT FINDING

To determine π‘₯𝑛+1, we drop the higher-order terms in the Taylor series, andassume 𝑓(π‘₯𝑛+1) = 0. Solving for π‘₯𝑛+1, we have

π‘₯𝑛+1 = π‘₯𝑛 βˆ’ 𝑓(π‘₯𝑛)

𝑓 β€²(π‘₯𝑛).

Starting Newton’s Method requires a guess for π‘₯0, hopefully close to the rootπ‘₯ = π‘Ÿ.

2.3 Secant Method

The Secant Method is second best to Newton’s Method, and is used when afaster convergence than Bisection is desired, but it is too difficult or impossibleto take an analytical derivative of the function 𝑓(π‘₯). We write in place of 𝑓 β€²(π‘₯𝑛),

𝑓 β€²(π‘₯𝑛) β‰ˆπ‘“(π‘₯𝑛)βˆ’ 𝑓(π‘₯π‘›βˆ’1)

π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1.

Starting the Secant Method requires a guess for both π‘₯0 and π‘₯1.

2.3.1 Estimate√2 = 1.41421356 using Newton’s Method

The√2 is the zero of the function 𝑓(π‘₯) = π‘₯2 βˆ’ 2. To implement Newton’s

Method, we use 𝑓 β€²(π‘₯) = 2π‘₯. Therefore, Newton’s Method is the iteration

π‘₯𝑛+1 = π‘₯𝑛 βˆ’ π‘₯2𝑛 βˆ’ 2

2π‘₯𝑛.

We take as our initial guess π‘₯0 = 1. Then

π‘₯1 = 1βˆ’ βˆ’1

2=

3

2= 1.5,

π‘₯2 =3

2βˆ’

94 βˆ’ 2

3=

17

12= 1.416667,

π‘₯3 =17

12βˆ’

172

122 βˆ’ 2176

=577

408= 1.41426.

2.3.2 Example of fractals using Newton’s Method

Consider the complex roots of the equation 𝑓(𝑧) = 0, where

𝑓(𝑧) = 𝑧3 βˆ’ 1.

These roots are the three cubic roots of unity. With

𝑒𝑖2πœ‹π‘› = 1, 𝑛 = 0, 1, 2, . . .

the three unique cubic roots of unity are given by

1, 𝑒𝑖2πœ‹/3, 𝑒𝑖4πœ‹/3.

Withπ‘’π‘–πœƒ = cos πœƒ + 𝑖 sin πœƒ,

Page 15: Numerical methods by Jeffrey R. Chasnov

2.4. ORDER OF CONVERGENCE 9

and cos (2πœ‹/3) = βˆ’1/2, sin (2πœ‹/3) =√3/2, the three cubic roots of unity are

π‘Ÿ1 = 1, π‘Ÿ2 = βˆ’1

2+

√3

2𝑖, π‘Ÿ3 = βˆ’1

2βˆ’

√3

2𝑖.

The interesting idea here is to determine which initial values of 𝑧0 in the complexplane converge to which of the three cubic roots of unity.

Newton’s method is

𝑧𝑛+1 = 𝑧𝑛 βˆ’ 𝑧3𝑛 βˆ’ 1

3𝑧2𝑛.

If the iteration converges to π‘Ÿ1, we color 𝑧0 red; π‘Ÿ2, blue; π‘Ÿ3, green. The resultwill be shown in lecture.

2.4 Order of convergence

Let π‘Ÿ be the root and π‘₯𝑛 be the 𝑛th approximation to the root. Define the erroras

πœ–π‘› = π‘Ÿ βˆ’ π‘₯𝑛.

If for large 𝑛 we have the approximate relationship

|πœ–π‘›+1| = π‘˜|πœ–π‘›|𝑝,

with π‘˜ a positive constant, then we say the root-finding numerical method is oforder 𝑝. Larger values of 𝑝 correspond to faster convergence to the root. Theorder of convergence of bisection is one: the error is reduced by approximatelya factor of 2 with each iteration so that

|πœ–π‘›+1| =1

2|πœ–π‘›|.

We now find the order of convergence for Newton’s Method and for the SecantMethod.

2.4.1 Newton’s Method

We start with Newton’s Method

π‘₯𝑛+1 = π‘₯𝑛 βˆ’ 𝑓(π‘₯𝑛)

𝑓 β€²(π‘₯𝑛).

Subtracting both sides from π‘Ÿ, we have

π‘Ÿ βˆ’ π‘₯𝑛+1 = π‘Ÿ βˆ’ π‘₯𝑛 +𝑓(π‘₯𝑛)

𝑓 β€²(π‘₯𝑛),

or

πœ–π‘›+1 = πœ–π‘› +𝑓(π‘₯𝑛)

𝑓 β€²(π‘₯𝑛). (2.1)

Page 16: Numerical methods by Jeffrey R. Chasnov

10 CHAPTER 2. ROOT FINDING

We use Taylor series to expand the functions 𝑓(π‘₯𝑛) and 𝑓 β€²(π‘₯𝑛) about the rootπ‘Ÿ, using 𝑓(π‘Ÿ) = 0. We have

𝑓(π‘₯𝑛) = 𝑓(π‘Ÿ) + (π‘₯𝑛 βˆ’ π‘Ÿ)𝑓 β€²(π‘Ÿ) +1

2(π‘₯𝑛 βˆ’ π‘Ÿ)2𝑓 β€²β€²(π‘Ÿ) + . . . ,

= βˆ’πœ–π‘›π‘“β€²(π‘Ÿ) +

1

2πœ–2𝑛𝑓

β€²β€²(π‘Ÿ) + . . . ;

𝑓 β€²(π‘₯𝑛) = 𝑓 β€²(π‘Ÿ) + (π‘₯𝑛 βˆ’ π‘Ÿ)𝑓 β€²β€²(π‘Ÿ) +1

2(π‘₯𝑛 βˆ’ π‘Ÿ)2𝑓 β€²β€²β€²(π‘Ÿ) + . . . ,

= 𝑓 β€²(π‘Ÿ)βˆ’ πœ–π‘›π‘“β€²β€²(π‘Ÿ) +

1

2πœ–2𝑛𝑓

β€²β€²β€²(π‘Ÿ) + . . . .

(2.2)

To make further progress, we will make use of the following standard Taylorseries:

1

1βˆ’ πœ–= 1 + πœ–+ πœ–2 + . . . , (2.3)

which converges for |πœ–| < 1. Substituting (2.2) into (2.1), and using (2.3) yields

πœ–π‘›+1 = πœ–π‘› +𝑓(π‘₯𝑛)

𝑓 β€²(π‘₯𝑛)

= πœ–π‘› +βˆ’πœ–π‘›π‘“

β€²(π‘Ÿ) + 12πœ–

2𝑛𝑓

β€²β€²(π‘Ÿ) + . . .

𝑓 β€²(π‘Ÿ)βˆ’ πœ–π‘›π‘“ β€²β€²(π‘Ÿ) + 12πœ–

2𝑛𝑓

β€²β€²β€²(π‘Ÿ) + . . .

= πœ–π‘› +βˆ’πœ–π‘› + 1

2πœ–2𝑛𝑓 β€²β€²(π‘Ÿ)𝑓 β€²(π‘Ÿ) + . . .

1βˆ’ πœ–π‘›π‘“ β€²β€²(π‘Ÿ)𝑓 β€²(π‘Ÿ) + . . .

= πœ–π‘› +

(βˆ’πœ–π‘› +

1

2πœ–2𝑛

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)+ . . .

)(1 + πœ–π‘›

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)+ . . .

)= πœ–π‘› +

(βˆ’πœ–π‘› + πœ–2𝑛

(1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)βˆ’ 𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)

)+ . . .

)= βˆ’1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)πœ–2𝑛 + . . .

Therefore, we have shown that

|πœ–π‘›+1| = π‘˜|πœ–π‘›|2

as 𝑛 β†’ ∞, with

π‘˜ =1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)

,

provided 𝑓 β€²(π‘Ÿ) = 0. Newton’s method is thus of order 2 at simple roots.

2.4.2 Secant Method

Determining the order of the Secant Method proceeds in a similar fashion. Westart with

π‘₯𝑛+1 = π‘₯𝑛 βˆ’ (π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1)𝑓(π‘₯𝑛)

𝑓(π‘₯𝑛)βˆ’ 𝑓(π‘₯π‘›βˆ’1).

We subtract both sides from π‘Ÿ and make use of

π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1 = (π‘Ÿ βˆ’ π‘₯π‘›βˆ’1)βˆ’ (π‘Ÿ βˆ’ π‘₯𝑛)

= πœ–π‘›βˆ’1 βˆ’ πœ–π‘›,

Page 17: Numerical methods by Jeffrey R. Chasnov

2.4. ORDER OF CONVERGENCE 11

and the Taylor series

𝑓(π‘₯𝑛) = βˆ’πœ–π‘›π‘“β€²(π‘Ÿ) +

1

2πœ–2𝑛𝑓

β€²β€²(π‘Ÿ) + . . . ,

𝑓(π‘₯π‘›βˆ’1) = βˆ’πœ–π‘›βˆ’1𝑓′(π‘Ÿ) +

1

2πœ–2π‘›βˆ’1𝑓

β€²β€²(π‘Ÿ) + . . . ,

so that

𝑓(π‘₯𝑛)βˆ’ 𝑓(π‘₯π‘›βˆ’1) = (πœ–π‘›βˆ’1 βˆ’ πœ–π‘›)𝑓′(π‘Ÿ) +

1

2(πœ–2𝑛 βˆ’ πœ–2π‘›βˆ’1)𝑓

β€²β€²(π‘Ÿ) + . . .

= (πœ–π‘›βˆ’1 βˆ’ πœ–π‘›)

(𝑓 β€²(π‘Ÿ)βˆ’ 1

2(πœ–π‘›βˆ’1 + πœ–π‘›)𝑓

β€²β€²(π‘Ÿ) + . . .

).

We therefore have

πœ–π‘›+1 = πœ–π‘› +βˆ’πœ–π‘›π‘“

β€²(π‘Ÿ) + 12πœ–

2𝑛𝑓

β€²β€²(π‘Ÿ) + . . .

𝑓 β€²(π‘Ÿ)βˆ’ 12 (πœ–π‘›βˆ’1 + πœ–π‘›)𝑓 β€²β€²(π‘Ÿ) + . . .

= πœ–π‘› βˆ’ πœ–π‘›1βˆ’ 1

2πœ–π‘›π‘“ β€²β€²(π‘Ÿ)𝑓 β€²(π‘Ÿ) + . . .

1βˆ’ 12 (πœ–π‘›βˆ’1 + πœ–π‘›)

𝑓 β€²β€²(π‘Ÿ)𝑓 β€²(π‘Ÿ) + . . .

= πœ–π‘› βˆ’ πœ–π‘›

(1βˆ’ 1

2πœ–π‘›

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)+ . . .

)(1 +

1

2(πœ–π‘›βˆ’1 + πœ–π‘›)

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)+ . . .

)= βˆ’1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)πœ–π‘›βˆ’1πœ–π‘› + . . . ,

or to leading order

|πœ–π‘›+1| =1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)

|πœ–π‘›βˆ’1||πœ–π‘›|. (2.4)

The order of convergence is not yet obvious from this equation, and to determinethe scaling law we look for a solution of the form

|πœ–π‘›+1| = π‘˜|πœ–π‘›|𝑝.

From this ansatz, we also have

|πœ–π‘›| = π‘˜|πœ–π‘›βˆ’1|𝑝,

and therefore|πœ–π‘›+1| = π‘˜π‘+1|πœ–π‘›βˆ’1|𝑝

2

.

Substitution into (2.4) results in

π‘˜π‘+1|πœ–π‘›βˆ’1|𝑝2

=π‘˜

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)

|πœ–π‘›βˆ’1|𝑝+1.

Equating the coefficient and the power of πœ–π‘›βˆ’1 results in

π‘˜π‘ =1

2

𝑓 β€²β€²(π‘Ÿ)

𝑓 β€²(π‘Ÿ)

,

and𝑝2 = 𝑝+ 1.

Page 18: Numerical methods by Jeffrey R. Chasnov

12 CHAPTER 2. ROOT FINDING

The order of convergence of the Secant Method, given by 𝑝, therefore is deter-mined to be the positive root of the quadratic equation 𝑝2 βˆ’ π‘βˆ’ 1 = 0, or

𝑝 =1 +

√5

2β‰ˆ 1.618,

which coincidentally is a famous irrational number that is called The GoldenRatio, and goes by the symbol Ξ¦. We see that the Secant Method has an orderof convergence lying between the Bisection Method and Newton’s Method.

Page 19: Numerical methods by Jeffrey R. Chasnov

Chapter 3

Systems of equations

Consider the system of 𝑛 linear equations and 𝑛 unknowns, given by

π‘Ž11π‘₯1 + π‘Ž12π‘₯2 + Β· Β· Β·+ π‘Ž1𝑛π‘₯𝑛 = 𝑏1,

π‘Ž21π‘₯1 + π‘Ž22π‘₯2 + Β· Β· Β·+ π‘Ž2𝑛π‘₯𝑛 = 𝑏2,

......

π‘Žπ‘›1π‘₯1 + π‘Žπ‘›2π‘₯2 + Β· Β· Β·+ π‘Žπ‘›π‘›π‘₯𝑛 = 𝑏𝑛.

We can write this system as the matrix equation

Ax = b,

with

A =

βŽ›βŽœβŽœβŽœβŽπ‘Ž11 π‘Ž12 Β· Β· Β· π‘Ž1π‘›π‘Ž21 π‘Ž22 Β· Β· Β· π‘Ž2𝑛...

.... . .

...π‘Žπ‘›1 π‘Žπ‘›2 Β· Β· Β· π‘Žπ‘›π‘›

⎞⎟⎟⎟⎠ , x =

βŽ›βŽœβŽœβŽœβŽπ‘₯1

π‘₯2

...π‘₯𝑛

⎞⎟⎟⎟⎠ , b =

βŽ›βŽœβŽœβŽœβŽπ‘1𝑏2...𝑏𝑛

⎞⎟⎟⎟⎠ .

3.1 Gaussian Elimination

The standard numerical algorithm to solve a system of linear equations is calledGaussian Elimination. We can illustrate this algorithm by example.

Consider the system of equations

βˆ’3π‘₯1 + 2π‘₯2 βˆ’ π‘₯3 = βˆ’1,

6π‘₯1 βˆ’ 6π‘₯2 + 7π‘₯3 = βˆ’7,

3π‘₯1 βˆ’ 4π‘₯2 + 4π‘₯3 = βˆ’6.

To perform Gaussian elimination, we form an Augmented Matrix by combiningthe matrix A with the column vector b:βŽ›βŽβˆ’3 2 βˆ’1 βˆ’1

6 βˆ’6 7 βˆ’73 βˆ’4 4 βˆ’6

⎞⎠ .

Row reduction is then performed on this matrix. Allowed operations are (1)multiply any row by a constant, (2) add multiple of one row to another row, (3)

13

Page 20: Numerical methods by Jeffrey R. Chasnov

14 CHAPTER 3. SYSTEMS OF EQUATIONS

interchange the order of any rows. The goal is to convert the original matrixinto an upper-triangular matrix.

We start with the first row of the matrix and work our way down as follows.First we multiply the first row by 2 and add it to the second row, and add thefirst row to the third row: βŽ›βŽβˆ’3 2 βˆ’1 βˆ’1

0 βˆ’2 5 βˆ’90 βˆ’2 3 βˆ’7

⎞⎠ .

We then go to the second row. We multiply this row by βˆ’1 and add it to thethird row: βŽ›βŽβˆ’3 2 βˆ’1 βˆ’1

0 βˆ’2 5 βˆ’90 0 βˆ’2 2

⎞⎠ .

The resulting equations can be determined from the matrix and are given by

βˆ’3π‘₯1 + 2π‘₯2 βˆ’ π‘₯3 = βˆ’1

βˆ’2π‘₯2 + 5π‘₯3 = βˆ’9

βˆ’2π‘₯3 = 2.

These equations can be solved by backward substitution, starting from the lastequation and working backwards. We have

βˆ’2π‘₯3 = 2 β†’ π‘₯3 = βˆ’1

βˆ’2π‘₯2 = βˆ’9βˆ’ 5π‘₯3 = βˆ’4 β†’ π‘₯2 = 2,

βˆ’3π‘₯1 = βˆ’1βˆ’ 2π‘₯2 + π‘₯3 = βˆ’6 β†’ π‘₯1 = 2.

Therefore, βŽ›βŽπ‘₯1

π‘₯2

π‘₯3

⎞⎠ =

βŽ›βŽ 22

βˆ’1

⎞⎠ .

3.2 πΏπ‘ˆ decomposition

The process of Gaussian Elimination also results in the factoring of the matrixA to

A = LU,

where L is a lower triangular matrix and U is an upper triangular matrix.Using the same matrix A as in the last section, we show how this factorizationis realized. We haveβŽ›βŽβˆ’3 2 βˆ’1

6 βˆ’6 73 βˆ’4 4

⎞⎠ β†’

βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 βˆ’2 3

⎞⎠ = M1A,

where

M1A =

βŽ›βŽ1 0 02 1 01 0 1

βŽžβŽ βŽ›βŽβˆ’3 2 βˆ’16 βˆ’6 73 βˆ’4 4

⎞⎠ =

βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 βˆ’2 3

⎞⎠ .

Page 21: Numerical methods by Jeffrey R. Chasnov

3.2. πΏπ‘ˆ DECOMPOSITION 15

Note that the matrix M1 performs row elimination on the first column. Twotimes the first row is added to the second row and one times the first row isadded to the third row. The entries of the column of M1 come from 2 = βˆ’(6/βˆ’3)and 1 = βˆ’(3/βˆ’3) as required for row elimination. The number βˆ’3 is called thepivot.

The next step isβŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 βˆ’2 3

⎞⎠ β†’

βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 0 βˆ’2

⎞⎠ = M2(M1A),

where

M2(M1A) =

βŽ›βŽ 1 0 00 1 00 βˆ’1 1

βŽžβŽ βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 βˆ’2 3

⎞⎠ =

βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 0 βˆ’2

⎞⎠ .

Here, M2 multiplies the second row by βˆ’1 = βˆ’(βˆ’2/ βˆ’ 2) and adds it to thethird row. The pivot is βˆ’2.

We now haveM2M1A = U

orA = Mβˆ’1

1 Mβˆ’12 U.

The inverse matrices are easy to find. The matrix M1 multiples the first row by2 and adds it to the second row, and multiplies the first row by 1 and adds itto the third row. To invert these operations, we need to multiply the first rowby βˆ’2 and add it to the second row, and multiply the first row by βˆ’1 and addit to the third row. To check, with

M1Mβˆ’11 = I,

we have βŽ›βŽ 1 0 02 1 01 0 1

βŽžβŽ βŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 0 1

⎞⎠ =

βŽ›βŽ1 0 00 1 00 0 1

⎞⎠ .

Similarly,

Mβˆ’12 =

βŽ›βŽ1 0 00 1 00 1 1

⎞⎠Therefore,

L = Mβˆ’11 Mβˆ’1

2

is given by

L =

βŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 0 1

βŽžβŽ βŽ›βŽ1 0 00 1 00 1 1

⎞⎠ =

βŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 1 1

⎞⎠ ,

which is lower triangular. The off-diagonal elements of Mβˆ’11 and Mβˆ’1

2 are simplycombined to form L. Our LU decomposition is thereforeβŽ›βŽβˆ’3 2 βˆ’1

6 βˆ’6 73 βˆ’4 4

⎞⎠ =

βŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 1 1

βŽžβŽ βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 0 βˆ’2

⎞⎠ .

Page 22: Numerical methods by Jeffrey R. Chasnov

16 CHAPTER 3. SYSTEMS OF EQUATIONS

Another nice feature of the LU decomposition is that it can be done by over-writing A, therefore saving memory if the matrix A is very large.

The LU decomposition is useful when one needs to solve Ax = b for x whenA is fixed and there are many different b’s. First one determines L and U usingGaussian elimination. Then one writes

(LU)x = L(Ux) = b.

We lety = Ux,

and first solveLy = b

for y by forward substitution. We then solve

Ux = y

for x by backward substitution. When we count operations, we will see thatsolving (LU)x = b is significantly faster once L and U are in hand than solvingAx = b directly by Gaussian elimination.

We now illustrate the solution of LUx = b using our previous example,where

L =

βŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 1 1

⎞⎠ , U =

βŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 0 βˆ’2

⎞⎠ , b =

βŽ›βŽβˆ’1βˆ’7βˆ’6

⎞⎠ .

With y = Ux, we first solve Ly = b, that isβŽ›βŽ 1 0 0βˆ’2 1 0βˆ’1 1 1

βŽžβŽ βŽ›βŽπ‘¦1𝑦2𝑦3

⎞⎠ =

βŽ›βŽβˆ’1βˆ’7βˆ’6

⎞⎠ .

Using forward substitution

𝑦1 = βˆ’1,

𝑦2 = βˆ’7 + 2𝑦1 = βˆ’9,

𝑦3 = βˆ’6 + 𝑦1 βˆ’ 𝑦2 = 2.

We now solve Ux = y, that isβŽ›βŽβˆ’3 2 βˆ’10 βˆ’2 50 0 βˆ’2

βŽžβŽ βŽ›βŽπ‘₯1

π‘₯2

π‘₯3

⎞⎠ =

βŽ›βŽβˆ’1βˆ’92

⎞⎠ .

Using backward substitution,

βˆ’2π‘₯3 = 2 β†’ π‘₯3 = βˆ’1,

βˆ’2π‘₯2 = βˆ’9βˆ’ 5π‘₯3 = βˆ’4 β†’ π‘₯2 = 2,

βˆ’3π‘₯1 = βˆ’1βˆ’ 2π‘₯2 + π‘₯3 = βˆ’6 β†’ π‘₯1 = 2,

and we have once again determinedβŽ›βŽπ‘₯1

π‘₯2

π‘₯3

⎞⎠ =

βŽ›βŽ 22

βˆ’1

⎞⎠ .

Page 23: Numerical methods by Jeffrey R. Chasnov

3.3. PARTIAL PIVOTING 17

3.3 Partial pivoting

When performing Gaussian elimination, the diagonal element that one usesduring the elimination procedure is called the pivot. To obtain the correctmultiple, one uses the pivot as the divisor to the elements below the pivot.Gaussian elimination in this form will fail if the pivot is zero. In this situation,a row interchange must be performed.

Even if the pivot is not identically zero, a small value can result in big round-off errors. For very large matrices, one can easily lose all accuracy in the solution.To avoid these round-off errors arising from small pivots, row interchanges aremade, and this technique is called partial pivoting (partial pivoting is in contrastto complete pivoting, where both rows and columns are interchanged). We willillustrate by example the LU decomposition using partial pivoting.

Consider

A =

βŽ›βŽβˆ’2 2 βˆ’16 βˆ’6 73 βˆ’8 4

⎞⎠ .

We interchange rows to place the largest element (in absolute value) in the pivot,or π‘Ž11, position. That is,

A β†’

βŽ›βŽ 6 βˆ’6 7βˆ’2 2 βˆ’13 βˆ’8 4

⎞⎠ = P12A,

where

P12 =

βŽ›βŽ0 1 01 0 00 0 1

⎞⎠is a permutation matrix that when multiplied on the left interchanges the firstand second rows of a matrix. Note that Pβˆ’1

12 = P12. The elimination step isthen

P12A β†’

βŽ›βŽ6 βˆ’6 70 0 4/30 βˆ’5 1/2

⎞⎠ = M1P12A,

where

M1 =

βŽ›βŽ 1 0 01/3 1 0

βˆ’1/2 0 1

⎞⎠ .

The final step requires one more row interchange:

M1P12A β†’

βŽ›βŽ6 βˆ’6 70 βˆ’5 1/20 0 4/3

⎞⎠ = P23M1P12A = U.

Since the permutation matrices given by P are their own inverses, we can writeour result as

(P23M1P23)P23P12A = U.

Page 24: Numerical methods by Jeffrey R. Chasnov

18 CHAPTER 3. SYSTEMS OF EQUATIONS

Multiplication of M on the left by P interchanges rows while multiplication onthe right by P interchanges columns. That is,

P23

βŽ›βŽ 1 0 01/3 1 0

βˆ’1/2 0 1

⎞⎠P23 =

βŽ›βŽ 1 0 0βˆ’1/2 0 11/3 1 0

⎞⎠P23 =

βŽ›βŽ 1 0 0βˆ’1/2 1 01/3 0 1

⎞⎠ .

The net result on M1 is an interchange of the nondiagonal elements 1/3 andβˆ’1/2.

We can then multiply by the inverse of (P23M1P23) to obtain

P23P12A = (P23M1P23)βˆ’1U,

which we write asPA = LU.

Instead of L, MATLAB will write this as

A = (Pβˆ’1L)U.

For convenience, we will just denote (Pβˆ’1L) by L, but by L here we mean apermutated lower triangular matrix.

For example, in MATLAB, to solve Ax = b for x using Gaussian elimination,one types

x = A βˆ– b;

where βˆ– solves for x using the most efficient algorithm available, depending onthe form of A. If A is a general 𝑛× 𝑛 matrix, then first the LU decompositionof A is found using partial pivoting, and then π‘₯ is determined from permutedforward and backward substitution. If A is upper or lower triangular, thenforward or backward substitution (or their permuted version) is used directly.

If there are many different right-hand-sides, one would first directly find theLU decomposition of A using a function call, and then solve using βˆ–. That is,one would iterate for different b’s the following expressions:

[LU] = lu(A);

y = L βˆ– b;x = U βˆ– 𝑦;

where the second and third lines can be shortened to

x = U βˆ– (L βˆ– b);

where the parenthesis are required. In lecture, I will demonstrate these solutionsin MATLAB using the matrix A = [βˆ’2, 2,βˆ’1; 6,βˆ’6, 7; 3,βˆ’8, 4]; which is theexample in the notes.

3.4 Operation counts

To estimate how much computational time is required for an algorithm, one cancount the number of operations required (multiplications, divisions, additionsand subtractions). Usually, what is of interest is how the algorithm scales with

Page 25: Numerical methods by Jeffrey R. Chasnov

3.4. OPERATION COUNTS 19

the size of the problem. For example, suppose one wants to multiply two full𝑛× 𝑛 matrices. The calculation of each element requires 𝑛 multiplications and𝑛 βˆ’ 1 additions, or say 2𝑛 βˆ’ 1 operations. There are 𝑛2 elements to computeso that the total operation count is 𝑛2(2π‘›βˆ’ 1). If 𝑛 is large, we might want toknow what will happen to the computational time if 𝑛 is doubled. What mattersmost is the fastest-growing, leading-order term in the operation count. In thismatrix multiplication example, the operation count is 𝑛2(2𝑛 βˆ’ 1) = 2𝑛3 βˆ’ 𝑛2,and the leading-order term is 2𝑛3. The factor of 2 is unimportant for the scaling,and we say that the algorithm scales like O(𝑛3), which is read as β€œbig Oh of ncubed.” When using the big-Oh notation, we will drop both lower-order termsand constant multipliers.

The big-Oh notation tells us how the computational time of an algorithmscales. For example, suppose that the multiplication of two large 𝑛×𝑛 matricestook a computational time of 𝑇𝑛 seconds. With the known operation countgoing like O(𝑛3), we can write

𝑇𝑛 = π‘˜π‘›3

for some unknown constant π‘˜. To determine how long multiplication of two2𝑛× 2𝑛 matrices will take, we write

𝑇2𝑛 = π‘˜(2𝑛)3

= 8π‘˜π‘›3

= 8𝑇𝑛,

so that doubling the size of the matrix is expected to increase the computationaltime by a factor of 23 = 8.

Running MATLAB on my computer, the multiplication of two 2048Γ— 2048matrices took about 0.75 sec. The multiplication of two 4096 Γ— 4096 matricestook about 6 sec, which is 8 times longer. Timing of code in MATLAB can befound using the built-in stopwatch functions tic and toc.

What is the operation count and therefore the scaling of Gaussian elimina-tion? Consider an elimination step with the pivot in the 𝑖th row and 𝑖th column.There are both π‘›βˆ’ 𝑖 rows below the pivot and π‘›βˆ’ 𝑖 columns to the right of thepivot. To perform elimination of one row, each matrix element to the right ofthe pivot must be multiplied by a factor and added to the row underneath. Thismust be done for all the rows. There are therefore (π‘›βˆ’ 𝑖)(π‘›βˆ’ 𝑖) multiplication-additions to be done for this pivot. Since we are interested in only the scalingof the algorithm, I will just count a multiplication-addition as one operation.

To find the total operation count, we need to perform elimination using π‘›βˆ’1pivots, so that

op. counts =

π‘›βˆ’1βˆ‘π‘–=1

(π‘›βˆ’ 𝑖)2

= (π‘›βˆ’ 1)2 + (π‘›βˆ’ 2)2 + . . . (1)2

=

π‘›βˆ’1βˆ‘π‘–=1

𝑖2.

Page 26: Numerical methods by Jeffrey R. Chasnov

20 CHAPTER 3. SYSTEMS OF EQUATIONS

Three summation formulas will come in handy. They are

π‘›βˆ‘π‘–=1

1 = 𝑛,

π‘›βˆ‘π‘–=1

𝑖 =1

2𝑛(𝑛+ 1),

π‘›βˆ‘π‘–=1

𝑖2 =1

6𝑛(2𝑛+ 1)(𝑛+ 1),

which can be proved by mathematical induction, or derived by some tricks.The operation count for Gaussian elimination is therefore

op. counts =

π‘›βˆ’1βˆ‘π‘–=1

𝑖2

=1

6(π‘›βˆ’ 1)(2π‘›βˆ’ 1)(𝑛).

The leading-order term is therefore 𝑛3/3, and we say that Gaussian eliminationscales like O(𝑛3). Since LU decomposition with partial pivoting is essentiallyGaussian elimination, that algorithm also scales like O(𝑛3).

However, once the LU decomposition of a matrix A is known, the solutionof Ax = b can proceed by a forward and backward substitution. How doesa backward substitution, say, scale? For backward substitution, the matrixequation to be solved is of the formβŽ›βŽœβŽœβŽœβŽœβŽœβŽ

π‘Ž1,1 π‘Ž1,2 Β· Β· Β· π‘Ž1,π‘›βˆ’1 π‘Ž1,𝑛0 π‘Ž2,2 Β· Β· Β· π‘Ž2,π‘›βˆ’1 π‘Ž2,𝑛...

.... . .

......

0 0 Β· Β· Β· π‘Žπ‘›βˆ’1,π‘›βˆ’1 π‘Žπ‘›βˆ’1,𝑛

0 0 Β· Β· Β· 0 π‘Žπ‘›,𝑛

⎞⎟⎟⎟⎟⎟⎠

βŽ›βŽœβŽœβŽœβŽœβŽœβŽπ‘₯1

π‘₯2

...π‘₯π‘›βˆ’1

π‘₯𝑛

⎞⎟⎟⎟⎟⎟⎠ =

βŽ›βŽœβŽœβŽœβŽœβŽœβŽπ‘1𝑏2...

π‘π‘›βˆ’1

𝑏𝑛

⎞⎟⎟⎟⎟⎟⎠ .

The solution for π‘₯𝑖 is found after solving for π‘₯𝑗 with 𝑗 > 𝑖. The explicit solutionfor π‘₯𝑖 is given by

π‘₯𝑖 =1

π‘Žπ‘–,𝑖

βŽ›βŽπ‘π‘– βˆ’π‘›βˆ‘

𝑗=𝑖+1

π‘Žπ‘–,𝑗π‘₯𝑗

⎞⎠ .

The solution for π‘₯𝑖 requires 𝑛 βˆ’ 𝑖 + 1 multiplication-additions, and since thismust be done for 𝑛 such 𝑖′𝑠, we have

op. counts =

π‘›βˆ‘π‘–=1

π‘›βˆ’ 𝑖+ 1

= 𝑛+ (π‘›βˆ’ 1) + Β· Β· Β·+ 1

=

π‘›βˆ‘π‘–=1

𝑖

=1

2𝑛(𝑛+ 1).

Page 27: Numerical methods by Jeffrey R. Chasnov

3.5. SYSTEM OF NONLINEAR EQUATIONS 21

The leading-order term is 𝑛2/2 and the scaling of backward substitution isO(𝑛2). After the LU decomposition of a matrix A is found, only a single forwardand backward substitution is required to solve Ax = b, and the scaling of thealgorithm to solve this matrix equation is therefore still O(𝑛2). For large 𝑛,one should expect that solving Ax = b by a forward and backward substitutionshould be substantially faster than a direct solution using Gaussian elimination.

3.5 System of nonlinear equations

A system of nonlinear equations can be solved using a version of Newton’sMethod. We illustrate this method for a system of two equations and twounknowns. Suppose that we want to solve

𝑓(π‘₯, 𝑦) = 0, 𝑔(π‘₯, 𝑦) = 0,

for the unknowns π‘₯ and 𝑦. We want to construct two simultaneous sequencesπ‘₯0, π‘₯1, π‘₯2, . . . and 𝑦0, 𝑦1, 𝑦2, . . . that converge to the roots. To construct thesesequences, we Taylor series expand 𝑓(π‘₯𝑛+1, 𝑦𝑛+1) and 𝑔(π‘₯𝑛+1, 𝑦𝑛+1) about thepoint (π‘₯𝑛, 𝑦𝑛). Using the partial derivatives 𝑓π‘₯ = πœ•π‘“/πœ•π‘₯, 𝑓𝑦 = πœ•π‘“/πœ•π‘¦, etc., thetwo-dimensional Taylor series expansions, displaying only the linear terms, aregiven by

𝑓(π‘₯𝑛+1, 𝑦𝑛+1) = 𝑓(π‘₯𝑛, 𝑦𝑛) + (π‘₯𝑛+1 βˆ’ π‘₯𝑛)𝑓π‘₯(π‘₯𝑛, 𝑦𝑛)

+ (𝑦𝑛+1 βˆ’ 𝑦𝑛)𝑓𝑦(π‘₯𝑛, 𝑦𝑛) + . . .

𝑔(π‘₯𝑛+1, 𝑦𝑛+1) = 𝑔(π‘₯𝑛, 𝑦𝑛) + (π‘₯𝑛+1 βˆ’ π‘₯𝑛)𝑔π‘₯(π‘₯𝑛, 𝑦𝑛)

+ (𝑦𝑛+1 βˆ’ 𝑦𝑛)𝑔𝑦(π‘₯𝑛, 𝑦𝑛) + . . . .

To obtain Newton’s method, we take 𝑓(π‘₯𝑛+1, 𝑦𝑛+1) = 0, 𝑔(π‘₯𝑛+1, 𝑦𝑛+1) = 0 anddrop higher-order terms above linear. Although one can then find a system oflinear equations for π‘₯𝑛+1 and 𝑦𝑛+1, it is more convenient to define the variables

Ξ”π‘₯𝑛 = π‘₯𝑛+1 βˆ’ π‘₯𝑛, Δ𝑦𝑛 = 𝑦𝑛+1 βˆ’ 𝑦𝑛.

The iteration equations will then be given by

π‘₯𝑛+1 = π‘₯𝑛 +Ξ”π‘₯𝑛, 𝑦𝑛+1 = 𝑦𝑛 +Δ𝑦𝑛;

and the linear equations to be solved for Ξ”π‘₯𝑛 and Δ𝑦𝑛 are given by(𝑓π‘₯ 𝑓𝑦𝑔π‘₯ 𝑔𝑦

)(Ξ”π‘₯𝑛

Δ𝑦𝑛

)=

(βˆ’π‘“βˆ’π‘”

),

where 𝑓 , 𝑔, 𝑓π‘₯, 𝑓𝑦, 𝑔π‘₯, and 𝑔𝑦 are all evaluated at the point (π‘₯𝑛, 𝑦𝑛). The two-dimensional case is easily generalized to 𝑛 dimensions. The matrix of partialderivatives is called the Jacobian Matrix.

We illustrate Newton’s Method by finding the steady state solution of theLorenz equations, given by

𝜎(𝑦 βˆ’ π‘₯) = 0,

π‘Ÿπ‘₯βˆ’ 𝑦 βˆ’ π‘₯𝑧 = 0,

π‘₯𝑦 βˆ’ 𝑏𝑧 = 0,

Page 28: Numerical methods by Jeffrey R. Chasnov

22 CHAPTER 3. SYSTEMS OF EQUATIONS

where π‘₯, 𝑦, and 𝑧 are the unknown variables and 𝜎, π‘Ÿ, and 𝑏 are the knownparameters. Here, we have a three-dimensional homogeneous system 𝑓 = 0,𝑔 = 0, and β„Ž = 0, with

𝑓(π‘₯, 𝑦, 𝑧) = 𝜎(𝑦 βˆ’ π‘₯),

𝑔(π‘₯, 𝑦, 𝑧) = π‘Ÿπ‘₯βˆ’ 𝑦 βˆ’ π‘₯𝑧,

β„Ž(π‘₯, 𝑦, 𝑧) = π‘₯𝑦 βˆ’ 𝑏𝑧.

The partial derivatives can be computed to be

𝑓π‘₯ = βˆ’πœŽ, 𝑓𝑦 = 𝜎, 𝑓𝑧 = 0,

𝑔π‘₯ = π‘Ÿ βˆ’ 𝑧, 𝑔𝑦 = βˆ’1, 𝑔𝑧 = βˆ’π‘₯,

β„Žπ‘₯ = 𝑦, β„Žπ‘¦ = π‘₯, β„Žπ‘§ = βˆ’π‘.

The iteration equation is thereforeβŽ›βŽ βˆ’πœŽ 𝜎 0π‘Ÿ βˆ’ 𝑧𝑛 βˆ’1 βˆ’π‘₯𝑛

𝑦𝑛 π‘₯𝑛 βˆ’π‘

βŽžβŽ βŽ›βŽΞ”π‘₯𝑛

Δ𝑦𝑛Δ𝑧𝑛

⎞⎠ = βˆ’

βŽ›βŽ 𝜎(𝑦𝑛 βˆ’ π‘₯𝑛)π‘Ÿπ‘₯𝑛 βˆ’ 𝑦𝑛 βˆ’ π‘₯𝑛𝑧𝑛

π‘₯𝑛𝑦𝑛 βˆ’ 𝑏𝑧𝑛

⎞⎠ ,

with

π‘₯𝑛+1 = π‘₯𝑛 +Ξ”π‘₯𝑛,

𝑦𝑛+1 = 𝑦𝑛 +Δ𝑦𝑛,

𝑧𝑛+1 = 𝑧𝑛 +Δ𝑧𝑛.

The MATLAB program that solves this system is contained in newton system.m.

Page 29: Numerical methods by Jeffrey R. Chasnov

Chapter 4

Least-squaresapproximation

The method of least-squares is commonly used to fit a parameterized curve toexperimental data. In general, the fitting curve is not expected to pass throughthe data points, making this problem substantially different from the one ofinterpolation.

We consider here only the simplest case of the same experimental error forall the data points. Let the data to be fitted be given by (π‘₯𝑖, 𝑦𝑖), with 𝑖 = 1 to𝑛.

4.1 Fitting a straight line

Suppose the fitting curve is a line. We write for the fitting curve

𝑦(π‘₯) = 𝛼π‘₯+ 𝛽.

The distance π‘Ÿπ‘– from the data point (π‘₯𝑖, 𝑦𝑖) and the fitting curve is given by

π‘Ÿπ‘– = 𝑦𝑖 βˆ’ 𝑦(π‘₯𝑖)

= 𝑦𝑖 βˆ’ (𝛼π‘₯𝑖 + 𝛽).

A least-squares fit minimizes the sum of the squares of the π‘Ÿπ‘–β€™s. This minimumcan be shown to result in the most probable values of 𝛼 and 𝛽.

We define

𝜌 =

π‘›βˆ‘π‘–=1

π‘Ÿ2𝑖

=

π‘›βˆ‘π‘–=1

(𝑦𝑖 βˆ’ (𝛼π‘₯𝑖 + 𝛽)

)2.

To minimize 𝜌 with respect to 𝛼 and 𝛽, we solve

πœ•πœŒ

πœ•π›Ό= 0,

πœ•πœŒ

πœ•π›½= 0.

23

Page 30: Numerical methods by Jeffrey R. Chasnov

24 CHAPTER 4. LEAST-SQUARES APPROXIMATION

Taking the partial derivatives, we have

πœ•πœŒ

πœ•π›Ό=

π‘›βˆ‘π‘–=1

2(βˆ’π‘₯𝑖)(𝑦𝑖 βˆ’ (𝛼π‘₯𝑖 + 𝛽)

)= 0,

πœ•πœŒ

πœ•π›½=

π‘›βˆ‘π‘–=1

2(βˆ’1)(𝑦𝑖 βˆ’ (𝛼π‘₯𝑖 + 𝛽)

)= 0.

These equations form a system of two linear equations in the two unknowns 𝛼and 𝛽, which is evident when rewritten in the form

𝛼

π‘›βˆ‘π‘–=1

π‘₯2𝑖 + 𝛽

π‘›βˆ‘π‘–=1

π‘₯𝑖 =

π‘›βˆ‘π‘–=1

π‘₯𝑖𝑦𝑖,

𝛼

π‘›βˆ‘π‘–=1

π‘₯𝑖 + 𝛽𝑛 =

π‘›βˆ‘π‘–=1

𝑦𝑖.

These equations can be solved either analytically, or numerically in MATLAB,where the matrix form is(βˆ‘π‘›

𝑖=1 π‘₯2𝑖

βˆ‘π‘›π‘–=1 π‘₯π‘–βˆ‘π‘›

𝑖=1 π‘₯𝑖 𝑛

)(𝛼𝛽

)=

(βˆ‘π‘›π‘–=1 π‘₯π‘–π‘¦π‘–βˆ‘π‘›π‘–=1 𝑦𝑖

).

A proper statistical treatment of this problem should also consider an estimateof the errors in 𝛼 and 𝛽 as well as an estimate of the goodness-of-fit of the datato the model. We leave these further considerations to a statistics class.

4.2 Fitting to a linear combination of functions

Consider the general fitting function

𝑦(π‘₯) =

π‘šβˆ‘π‘—=1

𝑐𝑗𝑓𝑗(π‘₯),

where we assume π‘š functions 𝑓𝑗(π‘₯). For example, if we want to fit a cubicpolynomial to the data, then we would have π‘š = 4 and take 𝑓1 = 1, 𝑓2 = π‘₯,𝑓3 = π‘₯2 and 𝑓4 = π‘₯3. Typically, the number of functions 𝑓𝑗 is less than thenumber of data points; that is, π‘š < 𝑛, so that a direct attempt to solve forthe 𝑐𝑗 ’s such that the fitting function exactly passes through the 𝑛 data pointswould result in 𝑛 equations andπ‘š unknowns. This would be an over-determinedlinear system that in general has no solution.

We now define the vectors

y =

βŽ›βŽœβŽœβŽœβŽπ‘¦1𝑦2...𝑦𝑛

⎞⎟⎟⎟⎠ , c =

βŽ›βŽœβŽœβŽœβŽπ‘1𝑐2...π‘π‘š

⎞⎟⎟⎟⎠ ,

and the 𝑛-by-π‘š matrix

A =

βŽ›βŽœβŽœβŽœβŽπ‘“1(π‘₯1) 𝑓2(π‘₯1) Β· Β· Β· π‘“π‘š(π‘₯1)𝑓1(π‘₯2) 𝑓2(π‘₯2) Β· Β· Β· π‘“π‘š(π‘₯2)

......

. . ....

𝑓1(π‘₯𝑛) 𝑓2(π‘₯𝑛) Β· Β· Β· π‘“π‘š(π‘₯𝑛)

⎞⎟⎟⎟⎠ . (4.1)

Page 31: Numerical methods by Jeffrey R. Chasnov

4.2. FITTING TO A LINEAR COMBINATION OF FUNCTIONS 25

With π‘š < 𝑛, the equation Ac = y is over-determined. We let

r = y βˆ’Ac

be the residual vector, and let

𝜌 =

π‘›βˆ‘π‘–=1

π‘Ÿ2𝑖 .

The method of least squares minimizes 𝜌 with respect to the components of c.Now, using 𝑇 to signify the transpose of a matrix, we have

𝜌 = r𝑇 r

= (y βˆ’Ac)𝑇 (y βˆ’Ac)

= y𝑇y βˆ’ c𝑇A𝑇y βˆ’ y𝑇Ac+ c𝑇A𝑇Ac.

Since 𝜌 is a scalar, each term in the above expression must be a scalar, and sincethe transpose of a scalar is equal to the scalar, we have

c𝑇A𝑇y =(c𝑇A𝑇y

)𝑇= y𝑇Ac.

Therefore,

𝜌 = y𝑇y βˆ’ 2y𝑇Ac+ c𝑇A𝑇Ac.

To find the minimum of 𝜌, we will need to solve πœ•πœŒ/πœ•π‘π‘— = 0 for 𝑗 = 1, . . . ,π‘š.To take the derivative of 𝜌, we switch to a tensor notation, using the Einsteinsummation convention, where repeated indices are summed over their allowablerange. We can write

𝜌 = 𝑦𝑖𝑦𝑖 βˆ’ 2𝑦𝑖Aπ‘–π‘˜π‘π‘˜ + 𝑐𝑖Aπ‘‡π‘–π‘˜Aπ‘˜π‘™π‘π‘™.

Taking the partial derivative, we have

πœ•πœŒ

πœ•π‘π‘—= βˆ’2𝑦𝑖Aπ‘–π‘˜

πœ•π‘π‘˜πœ•π‘π‘—

+πœ•π‘π‘–πœ•π‘π‘—

Aπ‘‡π‘–π‘˜Aπ‘˜π‘™π‘π‘™ + 𝑐𝑖A

π‘‡π‘–π‘˜Aπ‘˜π‘™

πœ•π‘π‘™πœ•π‘π‘—

.

Now,

πœ•π‘π‘–πœ•π‘π‘—

=

{1, if 𝑖 = 𝑗;

0, otherwise.

Therefore,πœ•πœŒ

πœ•π‘π‘—= βˆ’2𝑦𝑖A𝑖𝑗 +A𝑇

π‘—π‘˜Aπ‘˜π‘™π‘π‘™ + 𝑐𝑖Aπ‘‡π‘–π‘˜Aπ‘˜π‘— .

Now,

𝑐𝑖Aπ‘‡π‘–π‘˜Aπ‘˜π‘— = 𝑐𝑖Aπ‘˜π‘–Aπ‘˜π‘—

= Aπ‘˜π‘—Aπ‘˜π‘–π‘π‘–

= Aπ‘‡π‘—π‘˜Aπ‘˜π‘–π‘π‘–

= Aπ‘‡π‘—π‘˜Aπ‘˜π‘™π‘π‘™.

Page 32: Numerical methods by Jeffrey R. Chasnov

26 CHAPTER 4. LEAST-SQUARES APPROXIMATION

Therefore,πœ•πœŒ

πœ•π‘π‘—= βˆ’2𝑦𝑖A𝑖𝑗 + 2A𝑇

π‘—π‘˜Aπ‘˜π‘™π‘π‘™.

With the partials set equal to zero, we have

Aπ‘‡π‘—π‘˜Aπ‘˜π‘™π‘π‘™ = 𝑦𝑖A𝑖𝑗 ,

orA𝑇

π‘—π‘˜Aπ‘˜π‘™π‘π‘™ = A𝑇𝑗𝑖𝑦𝑖,

In vector notation, we haveA𝑇Ac = A𝑇y. (4.2)

Equation (4.2) is the so-called normal equation, and can be solved for c byGaussian elimination using the MATLAB backslash operator. After construct-ing the matrix A given by (4.1), and the vector y from the data, one can codein MATLAB

𝑐 = (𝐴′𝐴)βˆ–(𝐴′𝑦);

But in fact the MATLAB back slash operator will automatically solve the normalequations when the matrix A is not square, so that the MATLAB code

𝑐 = π΄βˆ–π‘¦;

yields the same result.

Page 33: Numerical methods by Jeffrey R. Chasnov

Chapter 5

Interpolation

Consider the following problem: Given the values of a known function 𝑦 = 𝑓(π‘₯)at a sequence of ordered points π‘₯0, π‘₯1, . . . , π‘₯𝑛, find 𝑓(π‘₯) for arbitrary π‘₯. Whenπ‘₯0 ≀ π‘₯ ≀ π‘₯𝑛, the problem is called interpolation. When π‘₯ < π‘₯0 or π‘₯ > π‘₯𝑛 theproblem is called extrapolation.

With 𝑦𝑖 = 𝑓(π‘₯𝑖), the problem of interpolation is basically one of drawing asmooth curve through the known points (π‘₯0, 𝑦0), (π‘₯1, 𝑦1), . . . , (π‘₯𝑛, 𝑦𝑛). This isnot the same problem as drawing a smooth curve that approximates a set of datapoints that have experimental error. This latter problem is called least-squaresapproximation.

Here, we will consider three interpolation algorithms: (1) polynomial inter-polation; (2) piecewise linear interpolation, and; (3) cubic spline interpolation.

5.1 Polynomial interpolation

The 𝑛+ 1 points (π‘₯0, 𝑦0), (π‘₯1, 𝑦1), . . . , (π‘₯𝑛, 𝑦𝑛) can be interpolated by a uniquepolynomial of degree 𝑛. When 𝑛 = 1, the polynomial is a linear function;when 𝑛 = 2, the polynomial is a quadratic function. There are three standardalgorithms that can be used to construct this unique interpolating polynomial,and we will present all three here, not so much because they are all useful, butbecause it is interesting to learn how these three algorithms are constructed.

When discussing each algorithm, we define 𝑃𝑛(π‘₯) to be the unique 𝑛th degreepolynomial that passes through the given 𝑛+ 1 data points.

5.1.1 Vandermonde polynomial

This Vandermonde polynomial is the most straightforward construction of theinterpolating polynomial 𝑃𝑛(π‘₯). We simply write

𝑃𝑛(π‘₯) = 𝑐0π‘₯𝑛 + 𝑐1π‘₯

π‘›βˆ’1 + Β· Β· Β·+ 𝑐𝑛.

27

Page 34: Numerical methods by Jeffrey R. Chasnov

28 CHAPTER 5. INTERPOLATION

Then we can immediately form 𝑛 + 1 linear equations for the 𝑛 + 1 unknowncoefficients 𝑐0, 𝑐1, . . . , 𝑐𝑛 using the 𝑛+ 1 known points:

𝑦0 = 𝑐0π‘₯𝑛0 + 𝑐1π‘₯

π‘›βˆ’10 + Β· Β· Β·+ π‘π‘›βˆ’1π‘₯0 + 𝑐𝑛

𝑦2 = 𝑐0π‘₯𝑛1 + 𝑐1π‘₯

π‘›βˆ’11 + Β· Β· Β·+ π‘π‘›βˆ’1π‘₯1 + 𝑐𝑛

......

...

𝑦𝑛 = 𝑐0π‘₯𝑛𝑛 + 𝑐1π‘₯

π‘›βˆ’1𝑛 + Β· Β· Β·+ π‘π‘›βˆ’1π‘₯𝑛 + 𝑐𝑛.

The system of equations in matrix form isβŽ›βŽœβŽœβŽœβŽπ‘₯𝑛0 π‘₯π‘›βˆ’1

0 Β· Β· Β· π‘₯0 1π‘₯𝑛1 π‘₯π‘›βˆ’1

1 Β· Β· Β· π‘₯1 1...

.... . .

...π‘₯𝑛𝑛 π‘₯π‘›βˆ’1

𝑛 Β· Β· Β· π‘₯𝑛 1

βŽžβŽŸβŽŸβŽŸβŽ βŽ›βŽœβŽœβŽœβŽπ‘0𝑐1...𝑐𝑛

⎞⎟⎟⎟⎠ =

βŽ›βŽœβŽœβŽœβŽπ‘¦0𝑦1...𝑦𝑛

⎞⎟⎟⎟⎠ .

The matrix is called the Vandermonde matrix, and can be constructed usingthe MATLAB function vander.m. The system of linear equations can be solvedin MATLAB using the βˆ– operator, and the MATLAB function polyval.m canused to interpolate using the 𝑐 coefficients. I will illustrate this in class andplace the code on the website.

5.1.2 Lagrange polynomial

The Lagrange polynomial is the most clever construction of the interpolatingpolynomial 𝑃𝑛(π‘₯), and leads directly to an analytical formula. The Lagrangepolynomial is the sum of 𝑛 + 1 terms and each term is itself a polynomial ofdegree 𝑛. The full polynomial is therefore of degree 𝑛. Counting from 0, the 𝑖thterm of the Lagrange polynomial is constructed by requiring it to be zero at π‘₯𝑗

with 𝑗 = 𝑖, and equal to 𝑦𝑖 when 𝑗 = 𝑖. The polynomial can be written as

𝑃𝑛(π‘₯) =(π‘₯βˆ’ π‘₯1)(π‘₯βˆ’ π‘₯2) Β· Β· Β· (π‘₯βˆ’ π‘₯𝑛)𝑦0(π‘₯0 βˆ’ π‘₯1)(π‘₯0 βˆ’ π‘₯2) Β· Β· Β· (π‘₯0 βˆ’ π‘₯𝑛)

+(π‘₯βˆ’ π‘₯0)(π‘₯βˆ’ π‘₯2) Β· Β· Β· (π‘₯βˆ’ π‘₯𝑛)𝑦1(π‘₯1 βˆ’ π‘₯0)(π‘₯1 βˆ’ π‘₯2) Β· Β· Β· (π‘₯1 βˆ’ π‘₯𝑛)

+ Β· Β· Β·+ (π‘₯βˆ’ π‘₯0)(π‘₯βˆ’ π‘₯1) Β· Β· Β· (π‘₯βˆ’ π‘₯π‘›βˆ’1)𝑦𝑛(π‘₯𝑛 βˆ’ π‘₯0)(π‘₯𝑛 βˆ’ π‘₯1) Β· Β· Β· (π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1)

.

It can be clearly seen that the first term is equal to zero when π‘₯ = π‘₯1, π‘₯2, . . . , π‘₯𝑛

and equal to 𝑦0 when π‘₯ = π‘₯0; the second term is equal to zero when π‘₯ =π‘₯0, π‘₯2, . . . π‘₯𝑛 and equal to 𝑦1 when π‘₯ = π‘₯1; and the last term is equal to zerowhen π‘₯ = π‘₯0, π‘₯1, . . . π‘₯π‘›βˆ’1 and equal to 𝑦𝑛 when π‘₯ = π‘₯𝑛. The uniqueness ofthe interpolating polynomial implies that the Lagrange polynomial must be theinterpolating polynomial.

5.1.3 Newton polynomial

The Newton polynomial is somewhat more clever than the Vandermonde poly-nomial because it results in a system of linear equations that is lower triangular,and therefore can be solved by forward substitution. The interpolating polyno-mial is written in the form

𝑃𝑛(π‘₯) = 𝑐0 + 𝑐1(π‘₯βˆ’ π‘₯0) + 𝑐2(π‘₯βˆ’ π‘₯0)(π‘₯βˆ’ π‘₯1) + Β· Β· Β·+ 𝑐𝑛(π‘₯βˆ’ π‘₯0) Β· Β· Β· (π‘₯βˆ’ π‘₯π‘›βˆ’1),

Page 35: Numerical methods by Jeffrey R. Chasnov

5.2. PIECEWISE LINEAR INTERPOLATION 29

which is clearly a polynomial of degree 𝑛. The 𝑛+1 unknown coefficients givenby the 𝑐’s can be found by substituting the points (π‘₯𝑖, 𝑦𝑖) for 𝑖 = 0, . . . , 𝑛:

𝑦0 = 𝑐0,

𝑦1 = 𝑐0 + 𝑐1(π‘₯1 βˆ’ π‘₯0),

𝑦2 = 𝑐0 + 𝑐1(π‘₯2 βˆ’ π‘₯0) + 𝑐2(π‘₯2 βˆ’ π‘₯0)(π‘₯2 βˆ’ π‘₯1),

......

...

𝑦𝑛 = 𝑐0 + 𝑐1(π‘₯𝑛 βˆ’ π‘₯0) + 𝑐2(π‘₯𝑛 βˆ’ π‘₯0)(π‘₯𝑛 βˆ’ π‘₯1) + Β· Β· Β·+ 𝑐𝑛(π‘₯𝑛 βˆ’ π‘₯0) Β· Β· Β· (π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1).

This system of linear equations is lower triangular as can be seen from thematrix formβŽ›βŽœβŽœβŽœβŽ

1 0 0 Β· Β· Β· 01 (π‘₯1 βˆ’ π‘₯0) 0 Β· Β· Β· 0...

......

. . ....

1 (π‘₯𝑛 βˆ’ π‘₯0) (π‘₯𝑛 βˆ’ π‘₯0)(π‘₯𝑛 βˆ’ π‘₯1) Β· Β· Β· (π‘₯𝑛 βˆ’ π‘₯0) Β· Β· Β· (π‘₯𝑛 βˆ’ π‘₯π‘›βˆ’1)

βŽžβŽŸβŽŸβŽŸβŽ βŽ›βŽœβŽœβŽœβŽπ‘0𝑐1...𝑐𝑛

⎞⎟⎟⎟⎠

=

βŽ›βŽœβŽœβŽœβŽπ‘¦0𝑦1...𝑦𝑛

⎞⎟⎟⎟⎠ ,

and so theoretically can be solved faster than the Vandermonde polynomial. Inpractice, however, there is little difference because polynomial interpolation isonly useful when the number of points to be interpolated is small.

5.2 Piecewise linear interpolation

Instead of constructing a single global polynomial that goes through all thepoints, one can construct local polynomials that are then connected together.In the the section following this one, we will discuss how this may be done usingcubic polynomials. Here, we discuss the simpler case of linear polynomials. Thisis the default interpolation typically used when plotting data.

Suppose the interpolating function is 𝑦 = 𝑔(π‘₯), and as previously, there are𝑛+1 points to interpolate. We construct the function 𝑔(π‘₯) out of 𝑛 local linearpolynomials. We write

𝑔(π‘₯) = 𝑔𝑖(π‘₯), for π‘₯𝑖 ≀ π‘₯ ≀ π‘₯𝑖+1,

where

𝑔𝑖(π‘₯) = π‘Žπ‘–(π‘₯βˆ’ π‘₯𝑖) + 𝑏𝑖,

and 𝑖 = 0, 1, . . . , π‘›βˆ’ 1.We now require 𝑦 = 𝑔𝑖(π‘₯) to pass through the endpoints (π‘₯𝑖, 𝑦𝑖) and (π‘₯𝑖+1, 𝑦𝑖+1).

We have

𝑦𝑖 = 𝑏𝑖,

𝑦𝑖+1 = π‘Žπ‘–(π‘₯𝑖+1 βˆ’ π‘₯𝑖) + 𝑏𝑖.

Page 36: Numerical methods by Jeffrey R. Chasnov

30 CHAPTER 5. INTERPOLATION

Therefore, the coefficients of 𝑔𝑖(π‘₯) are determined to be

π‘Žπ‘– =𝑦𝑖+1 βˆ’ 𝑦𝑖π‘₯𝑖+1 βˆ’ π‘₯𝑖

, 𝑏𝑖 = 𝑦𝑖.

Although piecewise linear interpolation is widely used, particularly in plottingroutines, it suffers from a discontinuity in the derivative at each point. Thisresults in a function which may not look smooth if the points are too widelyspaced. We next consider a more challenging algorithm that uses cubic polyno-mials.

5.3 Cubic spline interpolation

The 𝑛+ 1 points to be interpolated are again

(π‘₯0, 𝑦0), (π‘₯1, 𝑦1), . . . (π‘₯𝑛, 𝑦𝑛).

Here, we use 𝑛 piecewise cubic polynomials for interpolation,

𝑔𝑖(π‘₯) = π‘Žπ‘–(π‘₯βˆ’ π‘₯𝑖)3 + 𝑏𝑖(π‘₯βˆ’ π‘₯𝑖)

2 + 𝑐𝑖(π‘₯βˆ’ π‘₯𝑖) + 𝑑𝑖, 𝑖 = 0, 1, . . . , π‘›βˆ’ 1,

with the global interpolation function written as

𝑔(π‘₯) = 𝑔𝑖(π‘₯), for π‘₯𝑖 ≀ π‘₯ ≀ π‘₯𝑖+1.

To achieve a smooth interpolation we impose that 𝑔(π‘₯) and its first andsecond derivatives are continuous. The requirement that 𝑔(π‘₯) is continuous(and goes through all 𝑛+ 1 points) results in the two constraints

𝑔𝑖(π‘₯𝑖) = 𝑦𝑖, 𝑖 = 0 to π‘›βˆ’ 1, (5.1)

𝑔𝑖(π‘₯𝑖+1) = 𝑦𝑖+1, 𝑖 = 0 to π‘›βˆ’ 1. (5.2)

The requirement that 𝑔′(π‘₯) is continuous results in

𝑔′𝑖(π‘₯𝑖+1) = 𝑔′𝑖+1(π‘₯𝑖+1), 𝑖 = 0 to π‘›βˆ’ 2. (5.3)

And the requirement that 𝑔′′(π‘₯) is continuous results in

𝑔′′𝑖 (π‘₯𝑖+1) = 𝑔′′𝑖+1(π‘₯𝑖+1), 𝑖 = 0 to π‘›βˆ’ 2. (5.4)

There are 𝑛 cubic polynomials 𝑔𝑖(π‘₯) and each cubic polynomial has four freecoefficients; there are therefore a total of 4𝑛 unknown coefficients. The numberof constraining equations from (5.1)-(5.4) is 2𝑛+2(π‘›βˆ’1) = 4π‘›βˆ’2. With 4π‘›βˆ’2constraints and 4𝑛 unknowns, two more conditions are required for a uniquesolution. These are usually chosen to be extra conditions on the first 𝑔0(π‘₯) andlast π‘”π‘›βˆ’1(π‘₯) polynomials. We will discuss these extra conditions later.

We now proceed to determine equations for the unknown coefficients of thecubic polynomials. The polynomials and their first two derivatives are given by

𝑔𝑖(π‘₯) = π‘Žπ‘–(π‘₯βˆ’ π‘₯𝑖)3 + 𝑏𝑖(π‘₯βˆ’ π‘₯𝑖)

2 + 𝑐𝑖(π‘₯βˆ’ π‘₯𝑖) + 𝑑𝑖, (5.5)

𝑔′𝑖(π‘₯) = 3π‘Žπ‘–(π‘₯βˆ’ π‘₯𝑖)2 + 2𝑏𝑖(π‘₯βˆ’ π‘₯𝑖) + 𝑐𝑖, (5.6)

𝑔′′𝑖 (π‘₯) = 6π‘Žπ‘–(π‘₯βˆ’ π‘₯𝑖) + 2𝑏𝑖. (5.7)

Page 37: Numerical methods by Jeffrey R. Chasnov

5.3. CUBIC SPLINE INTERPOLATION 31

We will consider the four conditions (5.1)-(5.4) in turn. From (5.1) and (5.5),we have

𝑑𝑖 = 𝑦𝑖, 𝑖 = 0 to π‘›βˆ’ 1, (5.8)

which directly solves for all of the 𝑑-coefficients.To satisfy (5.2), we first define

β„Žπ‘– = π‘₯𝑖+1 βˆ’ π‘₯𝑖,

and𝑓𝑖 = 𝑦𝑖+1 βˆ’ 𝑦𝑖.

Now, from (5.2) and (5.5), using (5.8), we obtain the 𝑛 equations

π‘Žπ‘–β„Ž3𝑖 + π‘π‘–β„Ž

2𝑖 + π‘π‘–β„Žπ‘– = 𝑓𝑖, 𝑖 = 0 to π‘›βˆ’ 1. (5.9)

From (5.3) and (5.6) we obtain the π‘›βˆ’ 1 equations

3π‘Žπ‘–β„Ž2𝑖 + 2π‘π‘–β„Žπ‘– + 𝑐𝑖 = 𝑐𝑖+1, 𝑖 = 0 to π‘›βˆ’ 2. (5.10)

From (5.4) and (5.7) we obtain the π‘›βˆ’ 1 equations

3π‘Žπ‘–β„Žπ‘– + 𝑏𝑖 = 𝑏𝑖+1 𝑖 = 0 to π‘›βˆ’ 2. (5.11)

It is will be useful to include a definition of the coefficient 𝑏𝑛, which is nowmissing. (The index of the cubic polynomial coefficients only go up to 𝑛 βˆ’ 1.)We simply extend (5.11) up to 𝑖 = π‘›βˆ’ 1 and so write

3π‘Žπ‘›βˆ’1β„Žπ‘›βˆ’1 + π‘π‘›βˆ’1 = 𝑏𝑛, (5.12)

which can be viewed as a definition of 𝑏𝑛.We now proceed to eliminate the sets of π‘Ž- and 𝑐-coefficients (with the 𝑑-

coefficients already eliminated in (5.8)) to find a system of linear equations forthe 𝑏-coefficients. From (5.11) and (5.12), we can solve for the 𝑛 π‘Ž-coefficientsto find

π‘Žπ‘– =1

3β„Žπ‘–(𝑏𝑖+1 βˆ’ 𝑏𝑖) , 𝑖 = 0 to π‘›βˆ’ 1. (5.13)

From (5.9), we can solve for the 𝑛 𝑐-coefficients as follows:

𝑐𝑖 =1

β„Žπ‘–

(𝑓𝑖 βˆ’ π‘Žπ‘–β„Ž

3𝑖 βˆ’ π‘π‘–β„Ž

2𝑖

)=

1

β„Žπ‘–

(𝑓𝑖 βˆ’

1

3β„Žπ‘–(𝑏𝑖+1 βˆ’ 𝑏𝑖)β„Ž

3𝑖 βˆ’ π‘π‘–β„Ž

2𝑖

)=

π‘“π‘–β„Žπ‘–

βˆ’ 1

3β„Žπ‘– (𝑏𝑖+1 + 2𝑏𝑖) , 𝑖 = 0 to π‘›βˆ’ 1. (5.14)

We can now find an equation for the 𝑏-coefficients by substituting (5.8),(5.13) and (5.14) into (5.10):

3

(1

3β„Žπ‘–(𝑏𝑖+1 βˆ’ 𝑏𝑖)

)β„Ž2𝑖 + 2π‘π‘–β„Žπ‘– +

(π‘“π‘–β„Žπ‘–

βˆ’ 1

3β„Žπ‘–(𝑏𝑖+1 + 2𝑏𝑖)

)=

(𝑓𝑖+1

β„Žπ‘–+1βˆ’ 1

3β„Žπ‘–+1(𝑏𝑖+2 + 2𝑏𝑖+1)

),

Page 38: Numerical methods by Jeffrey R. Chasnov

32 CHAPTER 5. INTERPOLATION

which simplifies to

1

3β„Žπ‘–π‘π‘– +

2

3(β„Žπ‘– + β„Žπ‘–+1)𝑏𝑖+1 +

1

3β„Žπ‘–+1𝑏𝑖+2 =

𝑓𝑖+1

β„Žπ‘–+1βˆ’ 𝑓𝑖

β„Žπ‘–, (5.15)

an equation that is valid for 𝑖 = 0 to 𝑛 βˆ’ 2. Therefore, (5.15) represent 𝑛 βˆ’ 1equations for the 𝑛+1 unknown 𝑏-coefficients. Accordingly, we write the matrixequation for the 𝑏-coefficients, leaving the first and last row absent, asβŽ›βŽœβŽœβŽœβŽœβŽœβŽ

. . . . . . . . . . . . missing . . . . . .13β„Ž0

23 (β„Ž0 + β„Ž1)

13β„Ž1 . . . 0 0 0

......

.... . .

......

...0 0 0 . . . 1

3β„Žπ‘›βˆ’223 (β„Žπ‘›βˆ’2 + β„Žπ‘›βˆ’1)

13β„Žπ‘›βˆ’1

. . . . . . . . . . . . missing . . . . . .

⎞⎟⎟⎟⎟⎟⎠

βŽ›βŽœβŽœβŽœβŽœβŽœβŽπ‘0𝑏1...

π‘π‘›βˆ’1

𝑏𝑛

⎞⎟⎟⎟⎟⎟⎠

=

βŽ›βŽœβŽœβŽœβŽœβŽœβŽœβŽmissing𝑓1β„Ž1

βˆ’ 𝑓0β„Ž0

...π‘“π‘›βˆ’1

β„Žπ‘›βˆ’1βˆ’ π‘“π‘›βˆ’2

β„Žπ‘›βˆ’2

missing

⎞⎟⎟⎟⎟⎟⎟⎠ .

Once the missing first and last equations are specified, the matrix equationfor the 𝑏-coefficients can be solved by Gaussian elimination. And once the 𝑏-coefficients are determined, the π‘Ž- and 𝑐-coefficients can also be determined from(5.13) and (5.14), with the 𝑑-coefficients already known from (5.8). The piece-wise cubic polynomials, then, are known and 𝑔(π‘₯) can be used for interpolationto any value π‘₯ satisfying π‘₯0 ≀ π‘₯ ≀ π‘₯𝑛.

The missing first and last equations can be specified in several ways, andhere we show the two ways that are allowed by the MATLAB function spline.m.The first way should be used when the derivative 𝑔′(π‘₯) is known at the endpointsπ‘₯0 and π‘₯𝑛; that is, suppose we know the values of 𝛼 and 𝛽 such that

𝑔′0(π‘₯0) = 𝛼, π‘”β€²π‘›βˆ’1(π‘₯𝑛) = 𝛽.

From the known value of 𝛼, and using (5.6) and (5.14), we have

𝛼 = 𝑐0

=𝑓0β„Ž0

βˆ’ 1

3β„Ž0(𝑏1 + 2𝑏0).

Therefore, the missing first equation is determined to be

2

3β„Ž0𝑏0 +

1

3β„Ž0𝑏1 =

𝑓0β„Ž0

βˆ’ 𝛼. (5.16)

From the known value of 𝛽, and using (5.6), (5.13), and (5.14), we have

𝛽 = 3π‘Žπ‘›βˆ’1β„Ž2π‘›βˆ’1 + 2π‘π‘›βˆ’1β„Žπ‘›βˆ’1 + π‘π‘›βˆ’1

= 3

(1

3β„Žπ‘›βˆ’1(𝑏𝑛 βˆ’ π‘π‘›βˆ’1)

)β„Ž2π‘›βˆ’1 + 2π‘π‘›βˆ’1β„Žπ‘›βˆ’1 +

(π‘“π‘›βˆ’1

β„Žπ‘›βˆ’1βˆ’ 1

3β„Žπ‘›βˆ’1(𝑏𝑛 + 2π‘π‘›βˆ’1)

),

Page 39: Numerical methods by Jeffrey R. Chasnov

5.4. MULTIDIMENSIONAL INTERPOLATION 33

which simplifies to

1

3β„Žπ‘›βˆ’1π‘π‘›βˆ’1 +

2

3β„Žπ‘›βˆ’1𝑏𝑛 = 𝛽 βˆ’ π‘“π‘›βˆ’1

β„Žπ‘›βˆ’1, (5.17)

to be used as the missing last equation.The second way of specifying the missing first and last equations is called

the not-a-knot condition, which assumes that

𝑔0(π‘₯) = 𝑔1(π‘₯), π‘”π‘›βˆ’2(π‘₯) = π‘”π‘›βˆ’1(π‘₯).

Considering the first of these equations, from (5.5) we have

π‘Ž0(π‘₯βˆ’ π‘₯0)3 + 𝑏0(π‘₯βˆ’ π‘₯0)

2 + 𝑐0(π‘₯βˆ’ π‘₯0) + 𝑑0

= π‘Ž1(π‘₯βˆ’ π‘₯1)3 + 𝑏1(π‘₯βˆ’ π‘₯1)

2 + 𝑐1(π‘₯βˆ’ π‘₯1) + 𝑑1.

Now two cubic polynomials can be proven to be identical if at some value of π‘₯,the polynomials and their first three derivatives are identical. Our conditions ofcontinuity at π‘₯ = π‘₯1 already require that at this value of π‘₯ these two polynomialsand their first two derivatives are identical. The polynomials themselves will beidentical, then, if their third derivatives are also identical at π‘₯ = π‘₯1, or if

π‘Ž0 = π‘Ž1.

From (5.13), we have

1

3β„Ž0(𝑏1 βˆ’ 𝑏0) =

1

3β„Ž1(𝑏2 βˆ’ 𝑏1),

or after simplification

β„Ž1𝑏0 βˆ’ (β„Ž0 + β„Ž1)𝑏1 + β„Ž0𝑏2 = 0, (5.18)

which provides us our missing first equation. A similar argument at π‘₯ = π‘₯𝑛 βˆ’ 1also provides us with our last equation,

β„Žπ‘›βˆ’1π‘π‘›βˆ’2 βˆ’ (β„Žπ‘›βˆ’2 + β„Žπ‘›βˆ’1)π‘π‘›βˆ’1 + β„Žπ‘›βˆ’2𝑏𝑛 = 0. (5.19)

The MATLAB subroutines spline.m and ppval.m can be used for cubic splineinterpolation (see also interp1.m). I will illustrate these routines in class andpost sample code on the course web site.

5.4 Multidimensional interpolation

Suppose we are interpolating the value of a function of two variables,

𝑧 = 𝑓(π‘₯, 𝑦).

The known values are given by

𝑧𝑖𝑗 = 𝑓(π‘₯𝑖, 𝑦𝑗),

with 𝑖 = 0, 1, . . . , 𝑛 and 𝑗 = 0, 1, . . . , 𝑛. Note that the (π‘₯, 𝑦) points at which𝑓(π‘₯, 𝑦) are known lie on a grid in the π‘₯βˆ’ 𝑦 plane.

Page 40: Numerical methods by Jeffrey R. Chasnov

34 CHAPTER 5. INTERPOLATION

Let 𝑧 = 𝑔(π‘₯, 𝑦) be the interpolating function, satisfying 𝑧𝑖𝑗 = 𝑔(π‘₯𝑖, 𝑦𝑗). Atwo-dimensional interpolation to find the value of 𝑔 at the point (π‘₯, 𝑦) may bedone by first performing 𝑛 + 1 one-dimensional interpolations in 𝑦 to find thevalue of 𝑔 at the 𝑛 + 1 points (π‘₯0, 𝑦), (π‘₯1, 𝑦), . . . , (π‘₯𝑛, 𝑦), followed by a singleone-dimensional interpolation in π‘₯ to find the value of 𝑔 at (π‘₯, 𝑦).

In other words, two-dimensional interpolation on a grid of dimension (𝑛 +1) Γ— (𝑛 + 1) is done by first performing 𝑛 + 1 one-dimensional interpolationsto the value 𝑦 followed by a single one-dimensional interpolation to the valueπ‘₯. Two-dimensional interpolation can be generalized to higher dimensions. TheMATLAB functions to perform two- and three-dimensional interpolation areinterp2.m and interp3.m.

Page 41: Numerical methods by Jeffrey R. Chasnov

Chapter 6

Integration

We want to construct numerical algorithms that can perform definite integralsof the form

𝐼 =

∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯. (6.1)

Calculating these definite integrals numerically is called numerical integration,numerical quadrature, or more simply quadrature.

6.1 Elementary formulas

We first consider integration from 0 to β„Ž, with β„Ž small, to serve as the buildingblocks for integration over larger domains. We here define πΌβ„Ž as the followingintegral:

πΌβ„Ž =

∫ β„Ž

0

𝑓(π‘₯)𝑑π‘₯. (6.2)

To perform this integral, we consider a Taylor series expansion of 𝑓(π‘₯) aboutthe value π‘₯ = β„Ž/2:

𝑓(π‘₯) = 𝑓(β„Ž/2) + (π‘₯βˆ’ β„Ž/2)𝑓 β€²(β„Ž/2) +(π‘₯βˆ’ β„Ž/2)2

2𝑓 β€²β€²(β„Ž/2)

+(π‘₯βˆ’ β„Ž/2)3

6𝑓 β€²β€²β€²(β„Ž/2) +

(π‘₯βˆ’ β„Ž/2)4

24𝑓 β€²β€²β€²β€²(β„Ž/2) + . . .

6.1.1 Midpoint rule

The midpoint rule makes use of only the first term in the Taylor series expansion.Here, we will determine the error in this approximation. Integrating,

πΌβ„Ž = β„Žπ‘“(β„Ž/2) +

∫ β„Ž

0

((π‘₯βˆ’ β„Ž/2)𝑓 β€²(β„Ž/2) +

(π‘₯βˆ’ β„Ž/2)2

2𝑓 β€²β€²(β„Ž/2)

+(π‘₯βˆ’ β„Ž/2)3

6𝑓 β€²β€²β€²(β„Ž/2) +

(π‘₯βˆ’ β„Ž/2)4

24𝑓 β€²β€²β€²β€²(β„Ž/2) + . . .

)𝑑π‘₯.

35

Page 42: Numerical methods by Jeffrey R. Chasnov

36 CHAPTER 6. INTEGRATION

Changing variables by letting 𝑦 = π‘₯ βˆ’ β„Ž/2 and 𝑑𝑦 = 𝑑π‘₯, and simplifying theintegral depending on whether the integrand is even or odd, we have

πΌβ„Ž = β„Žπ‘“(β„Ž/2)

+

∫ β„Ž/2

βˆ’β„Ž/2

(𝑦𝑓 β€²(β„Ž/2) +

𝑦2

2𝑓 β€²β€²(β„Ž/2) +

𝑦3

6𝑓 β€²β€²β€²(β„Ž/2) +

𝑦4

24𝑓 β€²β€²β€²β€²(β„Ž/2) + . . .

)𝑑𝑦

= β„Žπ‘“(β„Ž/2) +

∫ β„Ž/2

0

(𝑦2𝑓 β€²β€²(β„Ž/2) +

𝑦4

12𝑓 β€²β€²β€²β€²(β„Ž/2) + . . .

)𝑑𝑦.

The integrals that we need here are

∫ β„Ž2

0

𝑦2𝑑𝑦 =β„Ž3

24,

∫ β„Ž2

0

𝑦4𝑑𝑦 =β„Ž5

160.

Therefore,

πΌβ„Ž = β„Žπ‘“(β„Ž/2) +β„Ž3

24𝑓 β€²β€²(β„Ž/2) +

β„Ž5

1920𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . . (6.3)

6.1.2 Trapezoidal rule

From the Taylor series expansion of 𝑓(π‘₯) about π‘₯ = β„Ž/2, we have

𝑓(0) = 𝑓(β„Ž/2)βˆ’ β„Ž

2𝑓 β€²(β„Ž/2) +

β„Ž2

8𝑓 β€²β€²(β„Ž/2)βˆ’ β„Ž3

48𝑓 β€²β€²β€²(β„Ž/2) +

β„Ž4

384𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . ,

and

𝑓(β„Ž) = 𝑓(β„Ž/2) +β„Ž

2𝑓 β€²(β„Ž/2) +

β„Ž2

8𝑓 β€²β€²(β„Ž/2) +

β„Ž3

48𝑓 β€²β€²β€²(β„Ž/2) +

β„Ž4

384𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . .

Adding and multiplying by β„Ž/2 we obtain

β„Ž

2

(𝑓(0) + 𝑓(β„Ž)

)= β„Žπ‘“(β„Ž/2) +

β„Ž3

8𝑓 β€²β€²(β„Ž/2) +

β„Ž5

384𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . .

We now substitute for the first term on the right-hand-side using the midpointrule formula:

β„Ž

2

(𝑓(0) + 𝑓(β„Ž)

)=

(πΌβ„Ž βˆ’ β„Ž3

24𝑓 β€²β€²(β„Ž/2)βˆ’ β„Ž5

1920𝑓 β€²β€²β€²β€²(β„Ž/2)

)+

β„Ž3

8𝑓 β€²β€²(β„Ž/2) +

β„Ž5

384𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . ,

and solving for πΌβ„Ž, we find

πΌβ„Ž =β„Ž

2

(𝑓(0) + 𝑓(β„Ž)

)βˆ’ β„Ž3

12𝑓 β€²β€²(β„Ž/2)βˆ’ β„Ž5

480𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . . (6.4)

Page 43: Numerical methods by Jeffrey R. Chasnov

6.2. COMPOSITE RULES 37

6.1.3 Simpson’s rule

To obtain Simpson’s rule, we combine the midpoint and trapezoidal rule toeliminate the error term proportional to β„Ž3. Multiplying (6.3) by two andadding to (6.4), we obtain

3πΌβ„Ž = β„Ž

(2𝑓(β„Ž/2) +

1

2(𝑓(0) + 𝑓(β„Ž))

)+ β„Ž5

(2

1920βˆ’ 1

480

)𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . ,

or

πΌβ„Ž =β„Ž

6

(𝑓(0) + 4𝑓(β„Ž/2) + 𝑓(β„Ž)

)βˆ’ β„Ž5

2880𝑓 β€²β€²β€²β€²(β„Ž/2) + . . . .

Usually, Simpson’s rule is written by considering the three consecutive points0, β„Ž and 2β„Ž. Substituting β„Ž β†’ 2β„Ž, we obtain the standard result

𝐼2β„Ž =β„Ž

3

(𝑓(0) + 4𝑓(β„Ž) + 𝑓(2β„Ž)

)βˆ’ β„Ž5

90𝑓 β€²β€²β€²β€²(β„Ž) + . . . . (6.5)

6.2 Composite rules

We now use our elementary formulas obtained for (6.2) to perform the integralgiven by (6.1).

6.2.1 Trapezoidal rule

We suppose that the function 𝑓(π‘₯) is known at the 𝑛 + 1 points labeled asπ‘₯0, π‘₯1, . . . , π‘₯𝑛, with the endpoints given by π‘₯0 = π‘Ž and π‘₯𝑛 = 𝑏. Define

𝑓𝑖 = 𝑓(π‘₯𝑖), β„Žπ‘– = π‘₯𝑖+1 βˆ’ π‘₯𝑖.

Then the integral of (6.1) may be decomposed as

∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =

π‘›βˆ’1βˆ‘π‘–=0

∫ π‘₯𝑖+1

π‘₯𝑖

𝑓(π‘₯)𝑑π‘₯

=

π‘›βˆ’1βˆ‘π‘–=0

∫ β„Žπ‘–

0

𝑓(π‘₯𝑖 + 𝑠)𝑑𝑠,

where the last equality arises from the change-of-variables 𝑠 = π‘₯βˆ’π‘₯𝑖. Applyingthe trapezoidal rule to the integral, we have∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =

π‘›βˆ’1βˆ‘π‘–=0

β„Žπ‘–

2(𝑓𝑖 + 𝑓𝑖+1) . (6.6)

If the points are not evenly spaced, say because the data are experimental values,then the β„Žπ‘– may differ for each value of 𝑖 and (6.6) is to be used directly.

However, if the points are evenly spaced, say because 𝑓(π‘₯) can be computed,we have β„Žπ‘– = β„Ž, independent of 𝑖. We can then define

π‘₯𝑖 = π‘Ž+ π‘–β„Ž, 𝑖 = 0, 1, . . . , 𝑛;

Page 44: Numerical methods by Jeffrey R. Chasnov

38 CHAPTER 6. INTEGRATION

and since the end point 𝑏 satisfies 𝑏 = π‘Ž+ π‘›β„Ž, we have

β„Ž =π‘βˆ’ π‘Ž

𝑛.

The composite trapezoidal rule for evenly space points then becomes∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =β„Ž

2

π‘›βˆ’1βˆ‘π‘–=0

(𝑓𝑖 + 𝑓𝑖+1)

=β„Ž

2(𝑓0 + 2𝑓1 + Β· Β· Β·+ 2π‘“π‘›βˆ’1 + 𝑓𝑛) . (6.7)

The first and last terms have a multiple of one; all other terms have a multipleof two; and the entire sum is multiplied by β„Ž/2.

6.2.2 Simpson’s rule

We here consider the composite Simpson’s rule for evenly space points. Weapply Simpson’s rule over intervals of 2β„Ž, starting from π‘Ž and ending at 𝑏:

∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =β„Ž

3(𝑓0 + 4𝑓1 + 𝑓2) +

β„Ž

3(𝑓2 + 4𝑓3 + 𝑓4) + . . .

+β„Ž

3(π‘“π‘›βˆ’2 + 4π‘“π‘›βˆ’1 + 𝑓𝑛) .

Note that 𝑛 must be even for this scheme to work. Combining terms, we have∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =β„Ž

3(𝑓0 + 4𝑓1 + 2𝑓2 + 4𝑓3 + 2𝑓4 + Β· Β· Β·+ 4π‘“π‘›βˆ’1 + 𝑓𝑛) .

The first and last terms have a multiple of one; the even indexed terms have amultiple of 2; the odd indexed terms have a multiple of 4; and the entire sum ismultiplied by β„Ž/3.

6.3 Local versus global error

Consider the elementary formula (6.4) for the trapezoidal rule, written in theform ∫ β„Ž

0

𝑓(π‘₯)𝑑π‘₯ =β„Ž

2

(𝑓(0) + 𝑓(β„Ž)

)βˆ’ β„Ž3

12𝑓 β€²β€²(πœ‰),

where πœ‰ is some value satisfying 0 ≀ πœ‰ ≀ β„Ž, and we have used Taylor’s theoremwith the mean-value form of the remainder. We can also represent the remainderas

βˆ’β„Ž3

12𝑓 β€²β€²(πœ‰) = O(β„Ž3),

where O(β„Ž3) signifies that when β„Ž is small, halving of the grid spacing β„Ž de-creases the error in the elementary trapezoidal rule by a factor of eight. We callthe terms represented by O(β„Ž3) the Local Error.

Page 45: Numerical methods by Jeffrey R. Chasnov

6.4. ADAPTIVE INTEGRATION 39

π‘Ž 𝑑 𝑐 𝑒 𝑏

Figure 6.1: Adaptive Simpson quadrature: Level 1.

More important is the Global Error which is obtained from the compositeformula (6.7) for the trapezoidal rule. Putting in the remainder terms, we have

∫ 𝑏

π‘Ž

𝑓(π‘₯)𝑑π‘₯ =β„Ž

2(𝑓0 + 2𝑓1 + Β· Β· Β·+ 2π‘“π‘›βˆ’1 + 𝑓𝑛)βˆ’

β„Ž3

12

π‘›βˆ’1βˆ‘π‘–=0

𝑓 β€²β€²(πœ‰π‘–),

where πœ‰π‘– are values satisfying π‘₯𝑖 ≀ πœ‰π‘– ≀ π‘₯𝑖+1. The remainder can be rewrittenas

βˆ’β„Ž3

12

π‘›βˆ’1βˆ‘π‘–=0

𝑓 β€²β€²(πœ‰π‘–) = βˆ’π‘›β„Ž3

12

βŸ¨π‘“ β€²β€²(πœ‰π‘–)

⟩,

whereβŸ¨π‘“ β€²β€²(πœ‰π‘–)

⟩is the average value of all the 𝑓 β€²β€²(πœ‰π‘–)’s. Now,

𝑛 =π‘βˆ’ π‘Ž

β„Ž,

so that the error term becomes

βˆ’π‘›β„Ž3

12

βŸ¨π‘“ β€²β€²(πœ‰π‘–)

⟩= βˆ’ (π‘βˆ’ π‘Ž)β„Ž2

12

βŸ¨π‘“ β€²β€²(πœ‰π‘–)

⟩= O(β„Ž2).

Therefore, the global error is O(β„Ž2). That is, a halving of the grid spacing onlydecreases the global error by a factor of four.

Similarly, Simpson’s rule has a local error of O(β„Ž5) and a global error ofO(β„Ž4).

6.4 Adaptive integration

The useful MATLAB function quad.m performs numerical integration usingadaptive Simpson quadrature. The idea is to let the computation itself decideon the grid size required to achieve a certain level of accuracy. Moreover, thegrid size need not be the same over the entire region of integration.

We begin the adaptive integration at what is called Level 1. The uniformlyspaced points at which the function 𝑓(π‘₯) is to be evaluated are shown in Fig.6.1. The distance between the points π‘Ž and 𝑏 is taken to be 2β„Ž, so that

β„Ž =π‘βˆ’ π‘Ž

2.

Page 46: Numerical methods by Jeffrey R. Chasnov

40 CHAPTER 6. INTEGRATION

Integration using Simpson’s rule (6.5) with grid size β„Ž yields

𝐼 =β„Ž

3

(𝑓(π‘Ž) + 4𝑓(𝑐) + 𝑓(𝑏)

)βˆ’ β„Ž5

90𝑓 β€²β€²β€²β€²(πœ‰),

where πœ‰ is some value satisfying π‘Ž ≀ πœ‰ ≀ 𝑏.Integration using Simpson’s rule twice with grid size β„Ž/2 yields

𝐼 =β„Ž

6

(𝑓(π‘Ž) + 4𝑓(𝑑) + 2𝑓(𝑐) + 4𝑓(𝑒) + 𝑓(𝑏)

)βˆ’ (β„Ž/2)5

90𝑓 β€²β€²β€²β€²(πœ‰π‘™)βˆ’

(β„Ž/2)5

90𝑓 β€²β€²β€²β€²(πœ‰π‘Ÿ),

with πœ‰π‘™ and πœ‰π‘Ÿ some values satisfying π‘Ž ≀ πœ‰π‘™ ≀ 𝑐 and 𝑐 ≀ πœ‰π‘Ÿ ≀ 𝑏.We now define

𝑆1 =β„Ž

3

(𝑓(π‘Ž) + 4𝑓(𝑐) + 𝑓(𝑏)

),

𝑆2 =β„Ž

6

(𝑓(π‘Ž) + 4𝑓(𝑑) + 2𝑓(𝑐) + 4𝑓(𝑒) + 𝑓(𝑏)

),

𝐸1 = βˆ’β„Ž5

90𝑓 β€²β€²β€²β€²(πœ‰),

𝐸2 = βˆ’ β„Ž5

25 Β· 90(𝑓 β€²β€²β€²β€²(πœ‰π‘™) + 𝑓 β€²β€²β€²β€²(πœ‰π‘Ÿ)

).

Now we ask whether 𝑆2 is accurate enough, or must we further refine the cal-culation and go to Level 2? To answer this question, we make the simplifyingapproximation that all of the fourth-order derivatives of 𝑓(π‘₯) in the error termsare equal; that is,

𝑓 β€²β€²β€²β€²(πœ‰) = 𝑓 β€²β€²β€²β€²(πœ‰π‘™) = 𝑓 β€²β€²β€²β€²(πœ‰π‘Ÿ) = 𝐢.

Then

𝐸1 = βˆ’β„Ž5

90𝐢,

𝐸2 = βˆ’ β„Ž5

24 · 90𝐢 =

1

16𝐸1.

Then since𝑆1 + 𝐸1 = 𝑆2 + 𝐸2,

and𝐸1 = 16𝐸2,

we have for our estimate for the error term 𝐸2,

𝐸2 =1

15(𝑆2 βˆ’ 𝑆1).

Therefore, given some specific value of the tolerance tol, if1

15(𝑆2 βˆ’ 𝑆1)

< tol,

then we can accept 𝑆2 as 𝐼. If the tolerance is not achieved for 𝐼, then weproceed to Level 2.

The computation at Level 2 further divides the integration interval from π‘Žto 𝑏 into the two integration intervals π‘Ž to 𝑐 and 𝑐 to 𝑏, and proceeds with the

Page 47: Numerical methods by Jeffrey R. Chasnov

6.4. ADAPTIVE INTEGRATION 41

above procedure independently on both halves. Integration can be stopped oneither half provided the tolerance is less than tol/2 (since the sum of both errorsmust be less than tol). Otherwise, either half can proceed to Level 3, and so on.

As a side note, the two values of 𝐼 given above (for integration with stepsize β„Ž and β„Ž/2) can be combined to give a more accurate value for I given by

𝐼 =16𝑆2 βˆ’ 𝑆1

15+ O(β„Ž7),

where the error terms of O(β„Ž5) approximately cancel. This free lunch, so tospeak, is called Richardson’s extrapolation.

Page 48: Numerical methods by Jeffrey R. Chasnov

42 CHAPTER 6. INTEGRATION

Page 49: Numerical methods by Jeffrey R. Chasnov

Chapter 7

Ordinary differentialequations

We now discuss the numerical solution of ordinary differential equations. Theseinclude the initial value problem, the boundary value problem, and the eigen-value problem. Before proceeding to the development of numerical methods, wereview the analytical solution of some classic equations.

7.1 Examples of analytical solutions

7.1.1 Initial value problem

One classic initial value problem is the 𝑅𝐢 circuit. With 𝑅 the resistor and 𝐢the capacitor, the differential equation for the charge π‘ž on the capacitor is givenby

π‘…π‘‘π‘ž

𝑑𝑑+

π‘ž

𝐢= 0. (7.1)

If we consider the physical problem of a charged capacitor connected in a closedcircuit to a resistor, then the initial condition is π‘ž(0) = π‘ž0, where π‘ž0 is the initialcharge on the capacitor.

The differential equation (7.1) is separable, and separating and integratingfrom time 𝑑 = 0 to 𝑑 yields ∫ π‘ž

π‘ž0

π‘‘π‘ž

π‘ž= βˆ’ 1

𝑅𝐢

∫ 𝑑

0

𝑑𝑑,

which can be integrated and solved for π‘ž = π‘ž(𝑑):

π‘ž(𝑑) = π‘ž0π‘’βˆ’π‘‘/𝑅𝐢 .

The classic second-order initial value problem is the 𝑅𝐿𝐢 circuit, with dif-ferential equation

𝐿𝑑2π‘ž

𝑑𝑑2+𝑅

π‘‘π‘ž

𝑑𝑑+

π‘ž

𝐢= 0.

Here, a charged capacitor is connected to a closed circuit, and the initial condi-tions satisfy

π‘ž(0) = π‘ž0,π‘‘π‘ž

𝑑𝑑(0) = 0.

43

Page 50: Numerical methods by Jeffrey R. Chasnov

44 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

The solution is obtained for the second-order equation by the ansatz

π‘ž(𝑑) = π‘’π‘Ÿπ‘‘,

which results in the following so-called characteristic equation for π‘Ÿ:

πΏπ‘Ÿ2 +π‘…π‘Ÿ +1

𝐢= 0.

If the two solutions for π‘Ÿ are distinct and real, then the two found exponentialsolutions can be multiplied by constants and added to form a general solution.The constants can then be determined by requiring the general solution to satisfythe two initial conditions. If the roots of the characteristic equation are complexor degenerate, a general solution to the differential equation can also be found.

7.1.2 Boundary value problems

The dimensionless equation for the temperature 𝑦 = 𝑦(π‘₯) along a linear heat-conducting rod of length unity, and with an applied external heat source 𝑓(π‘₯),is given by the differential equation

βˆ’ 𝑑2𝑦

𝑑π‘₯2= 𝑓(π‘₯), (7.2)

with 0 ≀ π‘₯ ≀ 1. Boundary conditions are usually prescribed at the end points ofthe rod, and here we assume that the temperature at both ends are maintainedat zero so that

𝑦(0) = 0, 𝑦(1) = 0.

The assignment of boundary conditions at two separate points is called a two-point boundary value problem, in contrast to the initial value problem where theboundary conditions are prescribed at only a single point. Two-point boundaryvalue problems typically require a more sophisticated algorithm for a numericalsolution than initial value problems.

Here, the solution of (7.2) can proceed by integration once 𝑓(π‘₯) is specified.We assume that

𝑓(π‘₯) = π‘₯(1βˆ’ π‘₯),

so that the maximum of the heat source occurs in the center of the rod, andgoes to zero at the ends.

The differential equation can then be written as

𝑑2𝑦

𝑑π‘₯2= βˆ’π‘₯(1βˆ’ π‘₯).

The first integration results in

𝑑𝑦

𝑑π‘₯=

∫(π‘₯2 βˆ’ π‘₯)𝑑π‘₯

=π‘₯3

3βˆ’ π‘₯2

2+ 𝑐1,

Page 51: Numerical methods by Jeffrey R. Chasnov

7.1. EXAMPLES OF ANALYTICAL SOLUTIONS 45

where 𝑐1 is the first integration constant. Integrating again,

𝑦(π‘₯) =

∫ (π‘₯3

3βˆ’ π‘₯2

2+ 𝑐1

)𝑑π‘₯

=π‘₯4

12βˆ’ π‘₯3

6+ 𝑐1π‘₯+ 𝑐2,

where 𝑐2 is the second integration constant. The two integration constants aredetermined by the boundary conditions. At π‘₯ = 0, we have

0 = 𝑐2,

and at π‘₯ = 1, we have

0 =1

12βˆ’ 1

6+ 𝑐1,

so that 𝑐1 = 1/12. Our solution is therefore

𝑦(π‘₯) =π‘₯4

12βˆ’ π‘₯3

6+

π‘₯

12

=1

12π‘₯(1βˆ’ π‘₯)(1 + π‘₯βˆ’ π‘₯2).

The temperature of the rod is maximum at π‘₯ = 1/2 and goes smoothly to zeroat the ends.

7.1.3 Eigenvalue problem

The classic eigenvalue problem obtained by solving the wave equation by sepa-ration of variables is given by

𝑑2𝑦

𝑑π‘₯2+ πœ†2𝑦 = 0,

with the two-point boundary conditions 𝑦(0) = 0 and 𝑦(1) = 0. Notice that𝑦(π‘₯) = 0 satisfies both the differential equation and the boundary conditions.Other nonzero solutions for 𝑦 = 𝑦(π‘₯) are possible only for certain discrete valuesof πœ†. These values of πœ† are called the eigenvalues of the differential equation.

We proceed by first finding the general solution to the differential equation.It is easy to see that this solution is

𝑦(π‘₯) = 𝐴 cosπœ†π‘₯+𝐡 sinπœ†π‘₯.

Imposing the first boundary condition at π‘₯ = 0, we obtain

𝐴 = 0.

The second boundary condition at π‘₯ = 1 results in

𝐡 sinπœ† = 0.

Since we are searching for a solution where 𝑦 = 𝑦(π‘₯) is not identically zero, wemust have

πœ† = πœ‹, 2πœ‹, 3πœ‹, . . . .

Page 52: Numerical methods by Jeffrey R. Chasnov

46 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

The corresponding negative values of πœ† are also solutions, but their inclusiononly changes the corresponding values of the unknown 𝐡 constant. A linearsuperposition of all the solutions results in the general solution

𝑦(π‘₯) =

βˆžβˆ‘π‘›=1

𝐡𝑛 sinπ‘›πœ‹π‘₯.

For each eigenvalue π‘›πœ‹, we say there is a corresponding eigenfunction sinπ‘›πœ‹π‘₯.When the differential equation can not be solved analytically, a numericalmethod should be able to solve for both the eigenvalues and eigenfunctions.

7.2 Numerical methods: initial value problem

We begin with the simple Euler method, then discuss the more sophisticatedRunge-Kutta methods, and conclude with the Runge-Kutta-Fehlberg method,as implemented in the MATLAB function ode45.m. Our differential equationsare for π‘₯ = π‘₯(𝑑), where the time 𝑑 is the independent variable, and we will makeuse of the notation οΏ½οΏ½ = 𝑑π‘₯/𝑑𝑑. This notation is still widely used by physicistsand descends directly from the notation originally used by Newton.

7.2.1 Euler method

The Euler method is the most straightforward method to integrate a differentialequation. Consider the first-order differential equation

οΏ½οΏ½ = 𝑓(𝑑, π‘₯), (7.3)

with the initial condition π‘₯(0) = π‘₯0. Define 𝑑𝑛 = 𝑛Δ𝑑 and π‘₯𝑛 = π‘₯(𝑑𝑛). A Taylorseries expansion of π‘₯𝑛+1 results in

π‘₯𝑛+1 = π‘₯(𝑑𝑛 +Δ𝑑)

= π‘₯(𝑑𝑛) + Δ𝑑��(𝑑𝑛) + O(Δ𝑑2)

= π‘₯(𝑑𝑛) + Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛) + O(Δ𝑑2).

The Euler Method is therefore written as

π‘₯𝑛+1 = π‘₯(𝑑𝑛) + Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛).

We say that the Euler method steps forward in time using a time-step Δ𝑑,starting from the initial value π‘₯0 = π‘₯(0). The local error of the Euler Methodis O(Δ𝑑2). The global error, however, incurred when integrating to a time 𝑇 , isa factor of 1/Δ𝑑 larger and is given by O(Δ𝑑). It is therefore customary to callthe Euler Method a first-order method.

7.2.2 Modified Euler method

This method is of a type that is called a predictor-corrector method. It is alsothe first of what are Runge-Kutta methods. As before, we want to solve (7.3).The idea is to average the value of οΏ½οΏ½ at the beginning and end of the time step.That is, we would like to modify the Euler method and write

π‘₯𝑛+1 = π‘₯𝑛 +1

2Δ𝑑

(𝑓(𝑑𝑛, π‘₯𝑛) + 𝑓(𝑑𝑛 +Δ𝑑, π‘₯𝑛+1)

).

Page 53: Numerical methods by Jeffrey R. Chasnov

7.2. NUMERICAL METHODS: INITIAL VALUE PROBLEM 47

The obvious problem with this formula is that the unknown value π‘₯𝑛+1 appearson the right-hand-side. We can, however, estimate this value, in what is calledthe predictor step. For the predictor step, we use the Euler method to find

π‘₯𝑝𝑛+1 = π‘₯𝑛 +Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛).

The corrector step then becomes

π‘₯𝑛+1 = π‘₯𝑛 +1

2Δ𝑑

(𝑓(𝑑𝑛, π‘₯𝑛) + 𝑓(𝑑𝑛 +Δ𝑑, π‘₯𝑝

𝑛+1)).

The Modified Euler Method can be rewritten in the following form that we willlater identify as a Runge-Kutta method:

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛),

π‘˜2 = Δ𝑑𝑓(𝑑𝑛 +Δ𝑑, π‘₯𝑛 + π‘˜1),

π‘₯𝑛+1 = π‘₯𝑛 +1

2(π‘˜1 + π‘˜2).

(7.4)

7.2.3 Second-order Runge-Kutta methods

We now derive all second-order Runge-Kutta methods. Higher-order methodscan be similarly derived, but require substantially more algebra.

We consider the differential equation given by (7.3). A general second-orderRunge-Kutta method may be written in the form

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛),

π‘˜2 = Δ𝑑𝑓(𝑑𝑛 + 𝛼Δ𝑑, π‘₯𝑛 + π›½π‘˜1),

π‘₯𝑛+1 = π‘₯𝑛 + π‘Žπ‘˜1 + π‘π‘˜2,

(7.5)

with 𝛼, 𝛽, π‘Ž and 𝑏 constants that define the particular second-order Runge-Kutta method. These constants are to be constrained by setting the local errorof the second-order Runge-Kutta method to be O(Δ𝑑3). Intuitively, we mightguess that two of the constraints will be π‘Ž+ 𝑏 = 1 and 𝛼 = 𝛽.

We compute the Taylor series of π‘₯𝑛+1 directly, and from the Runge-Kuttamethod, and require them to be the same to order Δ𝑑2. First, we compute theTaylor series of π‘₯𝑛+1. We have

π‘₯𝑛+1 = π‘₯(𝑑𝑛 +Δ𝑑)

= π‘₯(𝑑𝑛) + Δ𝑑��(𝑑𝑛) +1

2(Δ𝑑)2οΏ½οΏ½(𝑑𝑛) + O(Δ𝑑3).

Now,οΏ½οΏ½(𝑑𝑛) = 𝑓(𝑑𝑛, π‘₯𝑛).

The second derivative is more complicated and requires partial derivatives. Wehave

οΏ½οΏ½(𝑑𝑛) =𝑑

𝑑𝑑𝑓(𝑑, π‘₯(𝑑))

]𝑑=𝑑𝑛

= 𝑓𝑑(𝑑𝑛, π‘₯𝑛) + οΏ½οΏ½(𝑑𝑛)𝑓π‘₯(𝑑𝑛, π‘₯𝑛)

= 𝑓𝑑(𝑑𝑛, π‘₯𝑛) + 𝑓(𝑑𝑛, π‘₯𝑛)𝑓π‘₯(𝑑𝑛, π‘₯𝑛).

Page 54: Numerical methods by Jeffrey R. Chasnov

48 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

Therefore,

π‘₯𝑛+1 = π‘₯𝑛 +Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛) +1

2(Δ𝑑)

2 (𝑓𝑑(𝑑𝑛, π‘₯𝑛) + 𝑓(𝑑𝑛, π‘₯𝑛)𝑓π‘₯(𝑑𝑛, π‘₯𝑛)

). (7.6)

Second, we compute π‘₯𝑛+1 from the Runge-Kutta method given by (7.5).Substituting in π‘˜1 and π‘˜2, we have

π‘₯𝑛+1 = π‘₯𝑛 + π‘ŽΞ”π‘‘π‘“(𝑑𝑛, π‘₯𝑛) + 𝑏Δ𝑑𝑓(𝑑𝑛 + 𝛼Δ𝑑, π‘₯𝑛 + 𝛽Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛)

).

We Taylor series expand using

𝑓(𝑑𝑛 + 𝛼Δ𝑑, π‘₯𝑛 + 𝛽Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛)

)= 𝑓(𝑑𝑛, π‘₯𝑛) + 𝛼Δ𝑑𝑓𝑑(𝑑𝑛, π‘₯𝑛) + 𝛽Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛)𝑓π‘₯(𝑑𝑛, π‘₯𝑛) + O(Δ𝑑2).

The Runge-Kutta formula is therefore

π‘₯𝑛+1 = π‘₯𝑛 + (π‘Ž+ 𝑏)Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛)

+ (Δ𝑑)2(𝛼𝑏𝑓𝑑(𝑑𝑛, π‘₯𝑛) + 𝛽𝑏𝑓(𝑑𝑛, π‘₯𝑛)𝑓π‘₯(𝑑𝑛, π‘₯𝑛)

)+O(Δ𝑑3). (7.7)

Comparing (7.6) and (7.7), we find

π‘Ž+ 𝑏 = 1,

𝛼𝑏 = 1/2,

𝛽𝑏 = 1/2.

There are three equations for four parameters, and there exists a family ofsecond-order Runge-Kutta methods.

The Modified Euler Method given by (7.4) corresponds to 𝛼 = 𝛽 = 1 andπ‘Ž = 𝑏 = 1/2. Another second-order Runge-Kutta method, called the MidpointMethod, corresponds to 𝛼 = 𝛽 = 1/2, π‘Ž = 0 and 𝑏 = 1. This method is writtenas

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛),

π‘˜2 = Δ𝑑𝑓

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜1

),

π‘₯𝑛+1 = π‘₯𝑛 + π‘˜2.

7.2.4 Higher-order Runge-Kutta methods

The general second-order Runge-Kutta method was given by (7.5). The generalform of the third-order method is given by

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛),

π‘˜2 = Δ𝑑𝑓(𝑑𝑛 + 𝛼Δ𝑑, π‘₯𝑛 + π›½π‘˜1),

π‘˜3 = Δ𝑑𝑓(𝑑𝑛 + 𝛾Δ𝑑, π‘₯𝑛 + π›Ώπ‘˜1 + πœ–π‘˜2),

π‘₯𝑛+1 = π‘₯𝑛 + π‘Žπ‘˜1 + π‘π‘˜2 + π‘π‘˜3.

The following constraints on the constants can be guessed: 𝛼 = 𝛽, 𝛾 = 𝛿 + πœ–,and π‘Ž+ 𝑏+ 𝑐 = 1. Remaining constraints need to be derived.

Page 55: Numerical methods by Jeffrey R. Chasnov

7.2. NUMERICAL METHODS: INITIAL VALUE PROBLEM 49

The fourth-order method has a π‘˜1, π‘˜2, π‘˜3 and π‘˜4. The fifth-order methodrequires up to π‘˜6. The table below gives the order of the method and the numberof stages required.

order 2 3 4 5 6 7 8minimum # stages 2 3 4 6 7 9 11

Because of the jump in the number of stages required between the fourth-order and fifth-order method, the fourth-order Runge-Kutta method has someappeal. The general fourth-order method starts with 13 constants, and one thenfinds 11 constraints. A particularly simple fourth-order method that has beenwidely used is given by

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛),

π‘˜2 = Δ𝑑𝑓

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜1

),

π‘˜3 = Δ𝑑𝑓

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜2

),

π‘˜4 = Δ𝑑𝑓 (𝑑𝑛 +Δ𝑑, π‘₯𝑛 + π‘˜3) ,

π‘₯𝑛+1 = π‘₯𝑛 +1

6(π‘˜1 + 2π‘˜2 + 2π‘˜3 + π‘˜4) .

7.2.5 Adaptive Runge-Kutta Methods

As in adaptive integration, it is useful to devise an ode integrator that au-tomatically finds the appropriate Δ𝑑. The Dormand-Prince Method, which isimplemented in MATLAB’s ode45.m, finds the appropriate step size by compar-ing the results of a fifth-order and fourth-order method. It requires six functionevaluations per time step, and constructs both a fifth-order and a fourth-ordermethod from these function evaluations.

Suppose the fifth-order method finds π‘₯𝑛+1 with local error O(Δ𝑑6), and thefourth-order method finds π‘₯β€²

𝑛+1 with local error O(Δ𝑑5). Let πœ€ be the desirederror tolerance of the method, and let 𝑒 be the actual error. We can estimate 𝑒from the difference between the fifth- and fourth-order methods; that is,

𝑒 = |π‘₯𝑛+1 βˆ’ π‘₯′𝑛+1|.

Now 𝑒 is of O(Δ𝑑5), where Δ𝑑 is the step size taken. Let Ξ”πœ be the estimatedstep size required to get the desired error πœ€. Then we have

𝑒/πœ€ = (Δ𝑑)5/(Ξ”πœ)5,

or solving for Ξ”πœ ,

Ξ”πœ = Δ𝑑(πœ€π‘’

)1/5

.

On the one hand, if 𝑒 < πœ€, then we accept π‘₯𝑛+1 and do the next time stepusing the larger value of Ξ”πœ . On the other hand, if 𝑒 > πœ€, then we rejectthe integration step and redo the time step using the smaller value of Ξ”πœ . Inpractice, one usually increases the time step slightly less and decreases the timestep slightly more to prevent the waste of too many failed time steps.

Page 56: Numerical methods by Jeffrey R. Chasnov

50 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

7.2.6 System of differential equations

Our numerical methods can be easily adapted to solve higher-order differentialequations, or equivalently, a system of differential equations. First, we show howa second-order differential equation can be reduced to two first-order equations.Consider

οΏ½οΏ½ = 𝑓(𝑑, π‘₯, οΏ½οΏ½).

This second-order equation can be rewritten as two first-order equations bydefining 𝑒 = οΏ½οΏ½. We then have the system

οΏ½οΏ½ = 𝑒,

οΏ½οΏ½ = 𝑓(𝑑, π‘₯, 𝑒).

This trick also works for higher-order equation. For another example, the third-order equation

...π‘₯ = 𝑓(𝑑, π‘₯, οΏ½οΏ½, οΏ½οΏ½),

can be written as

οΏ½οΏ½ = 𝑒,

οΏ½οΏ½ = 𝑣,

οΏ½οΏ½ = 𝑓(𝑑, π‘₯, 𝑒, 𝑣).

Now, we show how to generalize Runge-Kutta methods to a system of dif-ferential equations. As an example, consider the following system of two odes,

οΏ½οΏ½ = 𝑓(𝑑, π‘₯, 𝑦),

οΏ½οΏ½ = 𝑔(𝑑, π‘₯, 𝑦),

with the initial conditions π‘₯(0) = π‘₯0 and 𝑦(0) = 𝑦0. The generalization of the

Page 57: Numerical methods by Jeffrey R. Chasnov

7.3. NUMERICAL METHODS: BOUNDARY VALUE PROBLEM 51

commonly used fourth-order Runge-Kutta method would be

π‘˜1 = Δ𝑑𝑓(𝑑𝑛, π‘₯𝑛, 𝑦𝑛),

𝑙1 = Δ𝑑𝑔(𝑑𝑛, π‘₯𝑛, 𝑦𝑛),

π‘˜2 = Δ𝑑𝑓

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜1, 𝑦𝑛 +

1

2𝑙1

),

𝑙2 = Δ𝑑𝑔

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜1, 𝑦𝑛 +

1

2𝑙1

),

π‘˜3 = Δ𝑑𝑓

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜2, 𝑦𝑛 +

1

2𝑙2

),

𝑙3 = Δ𝑑𝑔

(𝑑𝑛 +

1

2Δ𝑑, π‘₯𝑛 +

1

2π‘˜2, 𝑦𝑛 +

1

2𝑙2

),

π‘˜4 = Δ𝑑𝑓 (𝑑𝑛 +Δ𝑑, π‘₯𝑛 + π‘˜3, 𝑦𝑛 + 𝑙3) ,

𝑙4 = Δ𝑑𝑔 (𝑑𝑛 +Δ𝑑, π‘₯𝑛 + π‘˜3, 𝑦𝑛 + 𝑙3) ,

π‘₯𝑛+1 = π‘₯𝑛 +1

6(π‘˜1 + 2π‘˜2 + 2π‘˜3 + π‘˜4) ,

𝑦𝑛+1 = 𝑦𝑛 +1

6(𝑙1 + 2𝑙2 + 2𝑙3 + 𝑙4) .

7.3 Numerical methods: boundary value prob-lem

7.3.1 Finite difference method

We consider first the differential equation

βˆ’ 𝑑2𝑦

𝑑π‘₯2= 𝑓(π‘₯), 0 ≀ π‘₯ ≀ 1, (7.8)

with two-point boundary conditions

𝑦(0) = 𝐴, 𝑦(1) = 𝐡.

Equation (7.8) can be solved by quadrature, but here we will demonstrate anumerical solution using a finite difference method.

We begin by discussing how to numerically approximate derivatives. Con-sider the Taylor series approximation for 𝑦(π‘₯+ β„Ž) and 𝑦(π‘₯βˆ’ β„Ž), given by

𝑦(π‘₯+ β„Ž) = 𝑦(π‘₯) + β„Žπ‘¦β€²(π‘₯) +1

2β„Ž2𝑦′′(π‘₯) +

1

6β„Ž3𝑦′′′(π‘₯) +

1

24β„Ž4𝑦′′′′(π‘₯) + . . . ,

𝑦(π‘₯βˆ’ β„Ž) = 𝑦(π‘₯)βˆ’ β„Žπ‘¦β€²(π‘₯) +1

2β„Ž2𝑦′′(π‘₯)βˆ’ 1

6β„Ž3𝑦′′′(π‘₯) +

1

24β„Ž4𝑦′′′′(π‘₯) + . . . .

Page 58: Numerical methods by Jeffrey R. Chasnov

52 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

The standard definitions of the derivatives give the first-order approximations

𝑦′(π‘₯) =𝑦(π‘₯+ β„Ž)βˆ’ 𝑦(π‘₯)

β„Ž+O(β„Ž),

𝑦′(π‘₯) =𝑦(π‘₯)βˆ’ 𝑦(π‘₯βˆ’ β„Ž)

β„Ž+O(β„Ž).

The more widely-used second-order approximation is called the central differ-ence approximation and is given by

𝑦′(π‘₯) =𝑦(π‘₯+ β„Ž)βˆ’ 𝑦(π‘₯βˆ’ β„Ž)

2β„Ž+O(β„Ž2).

The finite difference approximation to the second derivative can be found fromconsidering

𝑦(π‘₯+ β„Ž) + 𝑦(π‘₯βˆ’ β„Ž) = 2𝑦(π‘₯) + β„Ž2𝑦′′(π‘₯) +1

12β„Ž4𝑦′′′′(π‘₯) + . . . ,

from which we find

𝑦′′(π‘₯) =𝑦(π‘₯+ β„Ž)βˆ’ 2𝑦(π‘₯) + 𝑦(π‘₯βˆ’ β„Ž)

β„Ž2+O(β„Ž2).

Sometimes a second-order method is required for π‘₯ on the boundaries of thedomain. For a boundary point on the left, a second-order forward differencemethod requires the additional Taylor series

𝑦(π‘₯+ 2β„Ž) = 𝑦(π‘₯) + 2β„Žπ‘¦β€²(π‘₯) + 2β„Ž2𝑦′′(π‘₯) +4

3β„Ž3𝑦′′′(π‘₯) + . . . .

We combine the Taylor series for 𝑦(π‘₯+ β„Ž) and 𝑦(π‘₯+ 2β„Ž) to eliminate the termproportional to β„Ž2:

𝑦(π‘₯+ 2β„Ž)βˆ’ 4𝑦(π‘₯+ β„Ž) = βˆ’3𝑦(π‘₯)βˆ’ 2β„Žπ‘¦β€²(π‘₯) + O(h3).

Therefore,

𝑦′(π‘₯) =βˆ’3𝑦(π‘₯) + 4𝑦(π‘₯+ β„Ž)βˆ’ 𝑦(π‘₯+ 2β„Ž)

2β„Ž+O(β„Ž2).

For a boundary point on the right, we send β„Ž β†’ βˆ’β„Ž to find

𝑦′(π‘₯) =3𝑦(π‘₯)βˆ’ 4𝑦(π‘₯βˆ’ β„Ž) + 𝑦(π‘₯βˆ’ 2β„Ž)

2β„Ž+O(β„Ž2).

We now write a finite difference scheme to solve (7.8). We discretize π‘₯ bydefining π‘₯𝑖 = π‘–β„Ž, 𝑖 = 0, 1, . . . , 𝑛 + 1. Since π‘₯𝑛+1 = 1, we have β„Ž = 1/(𝑛 + 1).The functions 𝑦(π‘₯) and 𝑓(π‘₯) are discretized as 𝑦𝑖 = 𝑦(π‘₯𝑖) and 𝑓𝑖 = 𝑓(π‘₯𝑖). Thesecond-order differential equation (7.8) then becomes for the interior points ofthe domain

βˆ’π‘¦π‘–βˆ’1 + 2𝑦𝑖 βˆ’ 𝑦𝑖+1 = β„Ž2𝑓𝑖, 𝑖 = 1, 2, . . . 𝑛,

with the boundary conditions 𝑦0 = 𝐴 and 𝑦𝑛+1 = 𝐡. We therefore have a linearsystem of equations to solve. The first and 𝑛th equation contain the boundaryconditions and are given by

2𝑦1 βˆ’ 𝑦2 = β„Ž2𝑓1 +𝐴,

βˆ’π‘¦π‘›βˆ’1 + 2𝑦𝑛 = β„Ž2𝑓𝑛 +𝐡.

Page 59: Numerical methods by Jeffrey R. Chasnov

7.3. NUMERICAL METHODS: BOUNDARY VALUE PROBLEM 53

The second and third equations, etc., are

βˆ’π‘¦1 + 2𝑦2 βˆ’ 𝑦3 = β„Ž2𝑓2,

βˆ’π‘¦2 + 2𝑦3 βˆ’ 𝑦4 = β„Ž2𝑓3,

. . .

In matrix form, we haveβŽ›βŽœβŽœβŽœβŽœβŽœβŽœβŽœβŽ

2 βˆ’1 0 0 . . . 0 0 0βˆ’1 2 βˆ’1 0 . . . 0 0 00 βˆ’1 2 βˆ’1 . . . 0 0 0...

......

.... . .

......

...0 0 0 0 . . . βˆ’1 2 βˆ’10 0 0 0 . . . 0 βˆ’1 2

⎞⎟⎟⎟⎟⎟⎟⎟⎠

βŽ›βŽœβŽœβŽœβŽœβŽœβŽœβŽœβŽ

𝑦1𝑦2𝑦3...

π‘¦π‘›βˆ’1

𝑦𝑛

⎞⎟⎟⎟⎟⎟⎟⎟⎠=

βŽ›βŽœβŽœβŽœβŽœβŽœβŽœβŽœβŽ

β„Ž2𝑓1 +π΄β„Ž2𝑓2β„Ž2𝑓3...

β„Ž2π‘“π‘›βˆ’1

β„Ž2𝑓𝑛 +𝐡

⎞⎟⎟⎟⎟⎟⎟⎟⎠.

The matrix is tridiagonal, and a numerical solution by Guassian eliminationcan be done quickly. The matrix itself is easily constructed using the MATLABfunction diag.m and ones.m. As excerpted from the MATLAB help page, thefunction call ones(m,n) returns an m-by-n matrix of ones, and the function calldiag(v,k), when v is a vector with n components, is a square matrix of ordern+abs(k) with the elements of v on the k-th diagonal: π‘˜ = 0 is the main diag-onal, π‘˜ > 0 is above the main diagonal and π‘˜ < 0 is below the main diagonal.The 𝑛× 𝑛 matrix above can be constructed by the MATLAB code

M=diag(-ones(n-1,1),-1)+diag(2*ones(n,1),0)+diag(-ones(n-1,1),1); .

The right-hand-side, provided f is a given n-by-1 vector, can be constructedby the MATLAB code

b=h^2*f; b(1)=b(1)+A; b(n)=b(n)+B;

and the solution for u is given by the MATLAB code

y=Mβˆ–b;

7.3.2 Shooting method

The finite difference method can solve linear odes. For a general ode of the form

𝑑2𝑦

𝑑π‘₯2= 𝑓(π‘₯, 𝑦, 𝑑𝑦/𝑑π‘₯),

with 𝑦(0) = 𝐴 and 𝑦(1) = 𝐡, we use a shooting method. First, we formulatethe ode as an initial value problem. We have

𝑑𝑦

𝑑π‘₯= 𝑧,

𝑑𝑧

𝑑π‘₯= 𝑓(π‘₯, 𝑦, 𝑧).

Page 60: Numerical methods by Jeffrey R. Chasnov

54 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

The initial condition 𝑦(0) = 𝐴 is known, but the second initial condition 𝑧(0) = 𝑏is unknown. Our goal is to determine 𝑏 such that 𝑦(1) = 𝐡.

In fact, this is a root-finding problem for an appropriately defined function.We define the function 𝐹 = 𝐹 (𝑏) such that

𝐹 (𝑏) = 𝑦(1)βˆ’π΅.

In other words, 𝐹 (𝑏) is the difference between the value of 𝑦(1) obtained fromintegrating the differential equations using the initial condition 𝑧(0) = 𝑏, and𝐡. Our root-finding routine will want to solve 𝐹 (𝑏) = 0. (The method is calledshooting because the slope of the solution curve for 𝑦 = 𝑦(π‘₯) at π‘₯ = 0 is givenby 𝑏, and the solution hits the value 𝑦(1) at π‘₯ = 1. This looks like pointing agun and trying to shoot the target, which is 𝐡.)

To determine the value of 𝑏 that solves 𝐹 (𝑏) = 0, we iterate using the Secantmethod, given by

𝑏𝑛+1 = 𝑏𝑛 βˆ’ 𝐹 (𝑏𝑛)𝑏𝑛 βˆ’ π‘π‘›βˆ’1

𝐹 (𝑏𝑛)βˆ’ 𝐹 (π‘π‘›βˆ’1).

We need to start with two initial guesses for 𝑏, solving the ode for the twocorresponding values of 𝑦(1). Then the Secant Method will give us the next valueof 𝑏 to try, and we iterate until |𝑦(1) βˆ’ 𝐡| < tol, where tol is some specifiedtolerance for the error.

7.4 Numerical methods: eigenvalue problem

For illustrative purposes, we develop our numerical methods for what is perhapsthe simplest eigenvalue ode. With 𝑦 = 𝑦(π‘₯) and 0 ≀ π‘₯ ≀ 1, this simple ode isgiven by

𝑦′′ + πœ†2𝑦 = 0. (7.9)

To solve (7.9) numerically, we will develop both a finite difference method anda shooting method. Furthermore, we will show how to solve (7.9) with homo-geneous boundary conditions on either the function 𝑦 or its derivative 𝑦′.

7.4.1 Finite difference method

We first consider solving (7.9) with the homogeneous boundary conditions 𝑦(0) =𝑦(1) = 0. In this case, we have already shown that the eigenvalues of (7.9) aregiven by πœ† = πœ‹, 2πœ‹, 3πœ‹, . . . .

With 𝑛 interior points, we have π‘₯𝑖 = π‘–β„Ž for 𝑖 = 0, . . . , 𝑛 + 1, and β„Ž =1/(𝑛 + 1). Using the centered-finite-difference approximation for the secondderivative, (7.9) becomes

π‘¦π‘–βˆ’1 βˆ’ 2𝑦𝑖 + 𝑦𝑖+1 = βˆ’β„Ž2πœ†2𝑦𝑖. (7.10)

Applying the boundary conditions 𝑦0 = 𝑦𝑛+1 = 0, the first equation with 𝑖 = 1,and the last equation with 𝑖 = 𝑛, are given by

βˆ’2𝑦1 + 𝑦2 = βˆ’β„Ž2πœ†2𝑦1,

π‘¦π‘›βˆ’1 βˆ’ 2𝑦𝑛 = βˆ’β„Ž2πœ†2𝑦𝑛.

Page 61: Numerical methods by Jeffrey R. Chasnov

7.4. NUMERICAL METHODS: EIGENVALUE PROBLEM 55

The remaining π‘›βˆ’ 2 equations are given by (7.10) for 𝑖 = 2, . . . , π‘›βˆ’ 1.It is of interest to see how the solution develops with increasing 𝑛. The

smallest possible value is 𝑛 = 1, corresponding to a single interior point, andsince β„Ž = 1/2 we have

βˆ’2𝑦1 = βˆ’1

4πœ†2𝑦1,

so that πœ†2 = 8, or πœ† = 2√2 = 2.8284. This is to be compared to the first

eigenvalue which is πœ† = πœ‹. When 𝑛 = 2, we have β„Ž = 1/3, and the resultingtwo equations written in matrix form are given by(

βˆ’2 11 βˆ’2

)(𝑦1𝑦2

)= βˆ’1

9πœ†2

(𝑦1𝑦2

).

This is a matrix eigenvalue problem with the eigenvalue given by πœ‡ = βˆ’πœ†2/9.The solution for πœ‡ is arrived at by solving

det

(βˆ’2βˆ’ πœ‡ 1

1 βˆ’2βˆ’ πœ‡

)= 0,

with resulting quadratic equation

(2 + πœ‡)2 βˆ’ 1 = 0.

The solutions are πœ‡ = βˆ’1,βˆ’3, and since πœ† = 3βˆšβˆ’πœ‡, we have πœ† = 3, 3

√3 =

5.1962. These two eigenvalues serve as rough approximations to the first twoeigenvalues πœ‹ and 2πœ‹.

With A an 𝑛-by-𝑛 matrix, the MATLAB variable mu=eig(A) is a vectorcontaining the 𝑛 eigenvalues of the matrix A. The built-in function eig.m cantherefore be used to find the eigenvalues. With 𝑛 grid points, the smaller eigen-values will converge more rapidly than the larger ones.

We can also consider boundary conditions on the derivative, or mixed bound-ary conditions. For example, consider the mixed boundary conditions given by𝑦(0) = 0 and 𝑦′(1) = 0. The eigenvalues of (7.9) can then be determinedanalytically to be πœ†π‘– = (π‘–βˆ’ 1/2)πœ‹, with 𝑖 a natural number.

The difficulty we now face is how to implement a boundary condition onthe derivative. Our computation of 𝑦′′ uses a second-order method, and wewould like the computation of the first derivative to also be second order. Thecondition 𝑦′(1) = 0 occurs on the right-most boundary, and we can make use ofthe second-order backward-difference approximation to the derivative that wehave previously derived. This finite-difference approximation for 𝑦′(1) can bewritten as

𝑦′𝑛+1 =3𝑦𝑛+1 βˆ’ 4𝑦𝑛 + π‘¦π‘›βˆ’1

2β„Ž. (7.11)

Now, the 𝑛th finite-difference equation was given by

π‘¦π‘›βˆ’1 βˆ’ 2𝑦𝑛 + 𝑦𝑛+1 = βˆ’β„Ž2𝑦𝑛,

and we now replace the value 𝑦𝑛+1 using (7.11); that is,

𝑦𝑛+1 =1

3

(2β„Žπ‘¦β€²π‘›+1 + 4𝑦𝑛 βˆ’ π‘¦π‘›βˆ’1

).

Page 62: Numerical methods by Jeffrey R. Chasnov

56 CHAPTER 7. ORDINARY DIFFERENTIAL EQUATIONS

Implementing the boundary condition 𝑦′𝑛+1 = 0, we have

𝑦𝑛+1 =4

3𝑦𝑛 βˆ’ 1

3π‘¦π‘›βˆ’1.

Therefore, the 𝑛th finite-difference equation becomes

2

3π‘¦π‘›βˆ’1 βˆ’

2

3𝑦𝑛 = βˆ’β„Ž2πœ†2𝑦𝑛.

For example, when 𝑛 = 2, the finite difference equations become(βˆ’2 1

23 βˆ’ 2

3

)(𝑦1𝑦2

)= βˆ’1

9πœ†2

(𝑦1𝑦2

).

The eigenvalues of the matrix are now the solution of

(πœ‡+ 2)

(πœ‡+

2

3

)βˆ’ 2

3= 0,

or3πœ‡2 + 8πœ‡+ 2 = 0.

Therefore, πœ‡ = (βˆ’4 ±√10)/3, and we find πœ† = 1.5853, 4.6354, which are ap-

proximations to πœ‹/2 and 3πœ‹/2.

7.4.2 Shooting method

We apply the shooting method to solve (7.9) with boundary conditions 𝑦(0) =𝑦(1) = 0. The initial value problem to solve is

𝑦′ = 𝑧,

𝑧′ = βˆ’πœ†2𝑦,

with known boundary condition 𝑦(0) = 0 and an unknown boundary conditionon 𝑦′(0). In fact, any nonzero boundary condition on 𝑦′(0) can be chosen: thedifferential equation is linear and the boundary conditions are homogeneous, sothat if 𝑦(π‘₯) is an eigenfunction then so is 𝐴𝑦(π‘₯). What we need to find here isthe value of πœ† such that 𝑦(1) = 0. In other words, choosing 𝑦′(0) = 1, we solve

𝐹 (πœ†) = 0, (7.12)

where 𝐹 (πœ†) = 𝑦(1), obtained by solving the initial value problem. Again, aniteration for the roots of 𝐹 (πœ†) can be done using the Secant Method. For theeigenvalue problem, there are an infinite number of roots, and the choice of thetwo initial guesses for πœ† will then determine to which root the iteration willconverge.

For this simple problem, it is possible to write explicitly the equation 𝐹 (πœ†) =0. The general solution to (7.9) is given by

𝑦(π‘₯) = 𝐴 cos (πœ†π‘₯) +𝐡 sin (πœ†π‘₯).

The initial condition 𝑦(0) = 0 yields 𝐴 = 0. The initial condition 𝑦′(0) = 1yields

𝐡 = 1/πœ†.

Page 63: Numerical methods by Jeffrey R. Chasnov

7.4. NUMERICAL METHODS: EIGENVALUE PROBLEM 57

Therefore, the solution to the initial value problem is

𝑦(π‘₯) =sin (πœ†π‘₯)

πœ†.

The function 𝐹 (πœ†) = 𝑦(1) is therefore given by

𝐹 (πœ†) =sinπœ†

πœ†,

and the roots occur when πœ† = πœ‹, 2πœ‹, . . . .If the boundary conditions were 𝑦(0) = 0 and 𝑦′(1) = 0, for example, then

we would simply redefine 𝐹 (πœ†) = 𝑦′(1). We would then have

𝐹 (πœ†) =cosπœ†

πœ†,

and the roots occur when πœ† = πœ‹/2, 3πœ‹/2, . . . .