lecture 6: nonlinear equations - boston college

CS227-Scientific Computing—

Lecture 6: Nonlinear Equations

A Financial Problem

You invest $100 a month in an interest-bearing account. Youmake 60 deposits, and one month after the last deposit (5years after the first deposit) you withdraw the money in theaccount.

You would like to have $ 8000.

What must the rate of interest on the account be in order toachieve this?

A Financial Problem

Let’s solve the problem in general with

I r : monthly rate of interest

I d : monthly deposit

I n : number of periods

I M : goal

A Financial Problem

Account balance after 0 months: d .

Account balance after 1 month (before second deposit):(1 + r)d .

Account balance after 2 months (before third deposit):(1 + r)2d + (1 + r)d .

Account balance after n months:

d((1 + r)n + (1 + r)n−1 + · · ·+ (1 + r)) =

d(1 + r)n−1∑j=0

(1 + r)j =

d(1 + r)

[(1 + r)n − 1

r

].

(geometric series)

A Financial Problem

So we have a nonlinear equation;

d(1 + r)

[(1 + r)n − 1

r

]= M.

With our original values for d , n, m, we have to solve thenonlinear equation

(1 + r)

[(1 + r)60 − 1

r

]− 80 = 0.

Truncation error and rate of convergence of iterativemethods

The solutions to linear equations and linear systems can beexpressed exactly as sums, products and quotients of theircoefficients. All the error present in calculating solutions thisway is due to roundoff error.

In contrast, nonlinear equations usually cannot have theirsolutions expressed this way. Instead, solution methodstypically generate a sequence of approximations that convergeto the root.

So in addition to the roundoff error, there is also truncationerror: If you cut the iterative process off after s steps, howclose are you to the correct answer? How fast does theprocess converge to the root?

A simple idea: Bisection

TheoremIntermediate Value Theorem. If f is a continuous functionthroughout the interval a ≤ x ≤ b, and f (a) · f (b) < 0, then thereis some c, with a < c < b, such that f (c) = 0.

Bisection Method

So if we start with an interval [a, b] that brackets a root of f ,we can split it in two by computing the midpoint c = a+b

2 .One of the two subintervals [a, c] or [c , b] will bracket a root,and we can continue subdividing until the width of thebracketing interval is as small as we desire.

In the figure below, we go from [a, b] to [a, b′] to [a, b′′] to[a, b′′′] to[a′′′, b′′′].

Rate of Convergence of Bisection

After k steps, we have the root trapped between two numbersthat are (b − a) · 2−k apart, so we get roughly one additionalbit of precision for each evaluation of the function f .

As long as we start with a pair of numbers that brackets aroot, and as long as f is continuous, this is foolproof.

Note that we have not discussed how to find brackets(sometimes a plot, or careful thinking about the functionitself, can help). There is a risk that the initial interval hasmore than one root of f .

And the rate of convergence is rather slow.

Implementation of Bisection

The bisection function posted on the course website shows arobust implementation in MATLAB.

Note, first of all, the use of function handles as arguments, aswell as the flexibility that allows us to apply this to a functionof several variables—fixing the values of all but the firstvariable and solving for the first variable.

Since evaluating the function f is likely to be the mosttime-consuming step, the code is written to make sure that fis evaluated only once in each pass through the while loop.

Note also that the function returns a lot of information: Thefinal values of the endpoints of the bracketing interval as wellas the values of f at these endpoints. You can of course justcall it with a single output argument and get anapproximation to the root.

Implementation of Bisection

Here is the function bisection applied to our interest rateproblem. The depositor is putting in a total of $ 6000. If heearned 33 % interest in the last month, he would get thedesired $ 8000 in one period, so we can use 0.33 as an upperbound.

We might like to use 0 as a lower bound, but our function iswritten in a form that is not defined at r = 0, so we need touse a very small positive value–let’s say 0.001 (one tenth ofone percent interest per month, which is surely too small).

>> F=@(r,d,n,M)d*(1+r)*((1+r)^n-1)/r-M;>> [x,y,r,s]=bisection(F,0.001,0.33,100,60,8000)x = 0.009072994445666y = 0.009072994445666r = -1.809894456528127e-10s =0.910827217623591e-11

So the answer is about 0.91 % monthly interest, which isclose to 11% annual interest.

Newton’s Method

The pretty idea here is to guess a value that is close to theroot, then follow the tangent line at that point until it crossesthe x-axis. This should be a closer approximation to the root,and we can iterate the procedure.

Newton’s Method

Call the initial guess x0, and the subsequent approximationsx1, x2, etc.

The equation of the tangent line to the graph of f at(xi , f (xi )) is

y = f ′(xi )(x − xi ) + f (xi ),

so we have0 = f ′(xi )(xi+1 − xi ) + f (xi ),

or

xi+1 = xi −f (xi)

f ′(xi).

Example

For instance, let us try to find a solution to the 5th degreepolynomial equation

x5 + 2x − 2 = 0.

We begin with a plot, which suggests a starting guess ofx0 = 1.

I

Example-continued

The Newton’s method iteration is

xi+1 = xi −x5i + 2xi − 2

5x4i + 2

.

Let’s try this out:

>> G=@(x)x-((x^5+2*x-2)/(5*x^4+2));>> x=1;>> x=G(x)x = 0.857142857142857>> x=G(x)x = 0.819484893762504>> x=G(x)x = 0.817476251723243>> x=G(x)x = 0.817471019036304>> x=G(x)x = 0.817471019000967>> x=G(x)x = 0.817471019000967

Example-continued

In successive iterations we get 1, 2, 5, 9 correct decimal digits,and the answer stabilizes after the 5th iterate.

So the convergence is very fast.

Rate of convergence of Newton’s Method

How closely does the tangent line to the graph of f at aapproximate f (x) for x close to a?

Taylor’s Theorem (for degree 2):

f (x) = f (a) + f ′(a)(x − a) + f ′′(c)(x − a)2

2

for some c between a and x .

So if we take x∗ to be a root of f and a = xi , we get

0 = f (xi ) + f ′(xi )(x∗ − xi ) + f ′′(c)(x∗ − xi )

2

2,

so

|xi+1 − x∗| =

∣∣∣∣ f ′′(c)

2f ′(xi )

∣∣∣∣|xi − x∗|2.

Rate of convergence of Newton’s Method

Roughly speaking, this means that if εi represents theabsolute error at the i th iteration, then

εi+1 ≈∣∣∣∣ f ′′(x∗)f ′(x∗)

∣∣∣∣ε2i .So, if you start out close enough to x∗, then number ofcorrect digits roughly doubles at each iteration. (Quadraticconvergence.) This is very fast.

But there are lots of caveats: If you don’t start out closeenough to a root, then the iterates may fail to convergealtogether. If f ′(x∗) is zero, or close to zero, the convergencemay be very slow.

Furthermore, the method requires you to know the derivativeof f . If the values of f are only tabulated, this may beunavailable. Even if it is available, you have to evaluate bothf and f ′ at each iteration, which means that there is extrawork to do.

Rapidly convergent methods that do not require thederivative.

Secant method: Start out with two guesses x0 and x1 thatbracket the root. At each subsequent step, set xi+1 to be thepoint where the line segment joining the points

(xi−1, f (xi−1)), (xi , f (xi ))

crosses the x-axis.

Rapidly convergent methods that do not require thederivative

When things are working right, the rate of convergence of thesecant method is much faster than linear, but not as fast asquadratic.

There are some of the same issues as with Newton’s method:Poor choices of initial values can get your farther and fartherfrom the root.

Rapidly convergent methods that do not require thederivative

The industrial-strength method used in MATLAB combinesseveral strategies.For most rapid convergence it keeps track of the threeprevious points xi−2, xi−1, xi .It then uses quadratic interpolation to find a parabola throughthe three points

(xi−2, f (xi−2), (xi−1, f (xi−1)), (xi , f (xi ).

The catch is, it does this backwards, interchanging the rolesof the x- and y -coordinates. So the resulting parabola isoriented with its axis parallel to the x-axis, and thus intersectsthe x-axis at one point, which is xi+1. This is called inversequadratic interpolation.In cases where inverse quadratic interpolation won’t work(e.g., two of the y -coordinates the same), or the methodappears to be wandering further from a root, the algorithmwill take a step of the secant method or bisection instead.

fzero

The basic syntax isx = fzero(function handle,guess)x = fzero(function handle,left bracket, right bracket))but as usual, there are many options.

For instance, typex=fzero(function handle,guess,optimset(‘Display’,‘iter’))

to get an idea of what fzero is doing.

lecture 6: nonlinear equations - boston college

Documents