lecture 1 newton's method - harvard math · pdf fileand only allows newton’s method...

Outline Square roots Newton’s method.

Lecture 1Newton’s method .

Shlomo Sternberg

Shlomo Sternberg

Lecture 1 Newton’s method .


1 Square roots

2 Newton’s method.The guts of the method.A vector version.Implementation.The existence theorem.Basins of attraction.

Shlomo Sternberg



The Babylonian algorithm for finding a square root.

Perhaps the oldest algorithm in recorded history is the Babylonianalgorithm (circa 2000BCE) for computing square roots: If we wantto find the square root of a positive number a we start with someapproximation, x0 > 0 and then recursively define

xn+1 =1

2

(xn +

a

xn

). (1)

This is a very effective algorithm which converges extremely rapidly.

Shlomo Sternberg



Example: the square root of 2.

Here is an illustration. Suppose we want to find the square root of2 and start with the really stupid approximation x0 = 99. We get:

Shlomo Sternberg



99.0000000000000049.5101010101010124.7752484036529712.42798706655775

6.294457086599663.306098480173161.955520568753001.489133069699681.416098193334651.414214816464751.414213562373651.414213562373091.41421356237309

Shlomo Sternberg



Analyzing the steps.

For the first seven steps we are approximately dividing by two inpassing from one step to the next, also (approximately) cutting theerror - the deviation from the true value - in half.

99.0000000000000049.5101010101010124.7752484036529712.42798706655775

6.294457086599663.306098480173161.95552056875300

Shlomo Sternberg



After line eight the accuracy improves dramatically: the ninthvalue, 1.416 . . . is correct to two decimal places. The tenth valueis correct to five decimal places, and the eleventh value is correctto eleven decimal places.

1.416098193334651.414214816464751.414213562373651.414213562373091.41421356237309

Shlomo Sternberg



Why does this work so well?

To see why this algorithm works so well (for general a), firstobserve that the algorithm is well defined, in that we are steadilytaking the average of positive quantities, and hence, by induction,xn > 0 for all n. Introduce the relative error in the n−thapproximation:

en :=xn −

√a√

a

soxn = (1 + en)

√a.

As xn > 0, it follows that

en > −1.

Shlomo Sternberg



en =xn −

√a√

aso

xn = (1 + en)√

a.

As xn > 0, it follows that

en > −1.

Then

xn+1 =√

a1

2(1 + en +

1

1 + en) =√

a(1 +1

2

e2n

1 + en).

This gives us a recursion formula for the relative error:

en+1 =e2n

2 + 2en. (2)

Shlomo Sternberg



en+1 =e2n

2 + 2en. (2)

This implies that e1 > 0 so after the first step we are alwaysovershooting the mark. Now 2en < 2 + 2en so (2) implies that

en+1 <1

2en

so the error is cut in half (at least) at each stage and hence, inparticular,

x1 > x2 > · · · ,

the iterates are steadily decreasing.

Shlomo Sternberg



en+1 =e2n

2 + 2en. (2)

Eventually we will reach the stage that

en < 1.

From this point on, we use the inequality 2 + 2en > 2 in (2) andwe get the estimate

en+1 <1

2e2n . (3)

Shlomo Sternberg



Exponential rate of convergence.

en+1 <1

2e2n . (3)

So if we renumber our approximation so that 0 ≤ e0 < 1 then(ignoring the 1/2 factor in (3)) we have

0 ≤ en < e2n

0 , (4)

an exponential rate of convergence.

Shlomo Sternberg



Starting with a negative initial value.

If we had started with an x0 < 0 then all the iterates would be < 0and we would get exponential convergence to −

√a. Of course,

had we been so foolish as to pick x0 = 0 we could not get theiteration started.

Shlomo Sternberg



Newton’s method.

This is a generalization of the above algorithm to find the zeros ofa function P = P(x) and which reduces to (1) whenP(x) = x2 − a. It is

xn+1 = xn −P(xn)

P ′(xn). (5)

If we take P(x) = x2 − a then P ′(x) = 2x the expression on theright in (5) is

1

2

(xn +

a

xn

)so (5) reduces to (1).

Shlomo Sternberg



Here is a graphic illustration of Newton’s method applied to thefunction y = x3 − x with the initial point 2. Notice that what weare doing is taking the tangent to the curve at the point (x , y) andthen taking as our next point, the intersection of this tangent withthe x-axis. This makes the method easy to remember.

Shlomo Sternberg



T

Shlomo Sternberg



A fixed point of the iteration scheme is a solution to ourproblem.

Also notice that if x is a “fixed point” of this iteration scheme, i.e.if

x = x − P(x)

P ′(x)

then P(x) = 0 and we have a solution to our problem. To theextent that xn+1 is “close to” xn we will be close to a solution (thedegree of closeness depending on the size of P(xn)).

Shlomo Sternberg



Caveat.

In the general case we can not expect that “most” points willconverge to a zero of P as was the case in the square rootalgorithm. After all, P might not have any zeros. Nevertheless,we will show in this lecture that if we are “close enough” to a zero- that P(x0) is “sufficiently small” in a sense to be made precise -then (5) converges exponentially fast to a zero.

Shlomo Sternberg



The guts of the method.

Suppose we know that we are close to an actual zero.

Before embarking on the formal proof, let us describe what isgoing on on the assumption that we know the existence of a zero -say by graphically plotting the function. So let z be a zero for thefunction f of a real variable, and let x be a point in the interval(z − µ, z + µ) of radius µ about z . Then

−f (x) = f (z)− f (x) =

∫ z

xf ′(s)ds

so

−f (x)− (z − x)f ′(x) =

∫ z

x(f ′(s)− f ′(x))ds.

Shlomo Sternberg




−f (x)− (z − x)f ′(x) =

∫ z

x(f ′(s)− f ′(x))ds.

Assuming f ′(x) 6= 0 we may divide both sides by f ′(x) to obtain(x − f (x)

f ′(x)

)− z =

1

f ′(x)

∫ z

x(f ′(s)− f ′(x))ds. (6)

Shlomo Sternberg




Unless some such stringent hypotheses are satisfied, there is noguarantee that the process will converge to the nearest root, orconverge at all. Furthermore, encoding a computation for f ′(x)may be difficult. In practice, one replaces f ′ by an approximation,and only allows Newton’s method to proceed if in fact it does nottake us out of the interval. We will return to these points, but firstrephrase the above argument in terms of a vector variable.

Shlomo Sternberg



A vector version.

Letting x be a vector variable.

Now let f a function of a vector variable, with a zero at z and x apoint in the ball of radius µ centered at z . Let vx := z − x andconsider the function

t :7→ f (x + tvx)

which takes the value f (z) when t = 1 and the value f (x) whent = 0. Differentiating with respect to t using the chain rule givesf ′(x + tvx)vx (where f ′ denotes the derivative =(the Jacobianmatrix) of f . Hence

−f (x) = f (z)− f (x) =

∫ 1

0f ′(x + tvx)vxdt.

Shlomo Sternberg



A vector version.

−f (x) = f (z)− f (x) =

∫ 1

0f ′(x + tvx)vxdt.

This gives

−f (x)−f ′(x)vx = −f (x)−f ′(x)(z−x) =

∫ 1

0[f ′(x+tvx)−f ′(x)]vxdt.

Applying [f ′(x)]−1 (which we assume to exist) gives the analogueof (6):

(x − [f ′(x)]−1f (x)

)− z = [f ′(x)]−1

∫ 1

0[f ′(x + tvx)− f ′(x)]vxdt.

Shlomo Sternberg



A vector version.

(x − [f ′(x)]−1f (x)

)− z = [f ′(x)]−1

∫ 1

0[f ′(x + tvx)− f ′(x)]vxdt.

Assume that ‖[f ′(y)]−1‖ ≤ ρ−1 (11)

‖f ′(y1)− f ′(y2)‖ ≤ δ‖y1 − y2‖ (12)

for all y , y1, y2 in the ball of radius µ about z , and assume alsothat µ ≤ ρ/δ holds. Setting xold = x and

xnew := xold − [f ′(xold)]−1f (xold)

gives

‖xnew − z‖ ≤ δ

ρ

∫ 1

0t‖vx‖‖vx‖dt =

δ

2ρ‖xold − z‖2.

From here on we can argue as in the one dimensional case.Shlomo Sternberg



Implementation.

Problems with implementation of Newton’s method.

We return to the one dimensional case.In numerical practice we have to deal with two problems: it maynot be easy to encode the derivative, and we may not be able totell in advance whether the conditions for Newton’s method towork are indeed fulfilled.In case f is a polynomial, MATLAB has an efficient command“polyder” for computing the derivative of f . Otherwise we replacethe derivative by the slope of the secant, which requires the inputof two initial values, call them x− and xc and replaces thederivative in Newton’s method by

f ′app(xc) =f (xc)− f (x−)

xc − x−.

Shlomo Sternberg



Implementation.

f ′app(xc) =f (xc)− f (x−)

xc − x−.

So at each stage of the Newton iteration we carry along two valuesof x , the “current value” denoted say by “xc” and the “old value”denoted by “x−”. We also carry along two values of f , the value off at xc denoted by fc and the value of f at x− denoted by f−. Sothe Newton iteration will look like

fpc=(fc-f−)/(xc-x−);xnew=xc-fc/fpc;x−-=xc; f−=fc;xc=xnew; fc=feval(fname,xc);

Shlomo Sternberg



Implementation.

fpc=(fc-f−)/(xc-x−);xnew=xc-fc/fpc;x−-=xc; f−=fc;xc=xnew; fc=feval(fname,xc);

In the last line, the command feval is the MATLAB evaluation of afunction command: if fname is a “script” (that is an expressionenclosed in ‘ ‘) giving the name of a function, then feval(fname,x)evaluates the function at the point x.

Shlomo Sternberg



Implementation.

The second issue - that of deciding whether Newton’s methodshould be used at all - is handled as follows: If the zero in questionis a critical point, so that f ′(z) = 0, there is no chance ofNewton’s method working. So let us assume that f ′(z) 6= 0, whichmeans that f changes sign at z , a fact that we can verify bylooking at the graph of f . So assume that we have found aninterval [a, b] containing the zero we are looking for, and such thatf takes on opposite signs at the end-points:

f (a)f (b) < 0.

Shlomo Sternberg



Implementation.

f (a)f (b) < 0.

A sure but slow method of narrowing in on a zero of f contained inthis interval is the “bisection method”: evaluate f at the midpoint12(a + b). If this value has a sign opposite to that of f (a) replace bby 1

2(a + b). Otherwise replace a by 12(a + b). This produces an

interval of half the length of [a, b] containing a zero.The idea now is to check at each stage whether Newton’s methodleaves us in the interval, in which case we apply it, or else we applythe bisection method.We now turn to the more difficult existence problem.

Shlomo Sternberg



The existence theorem.

For the purposes of the proof, in order to simplify the notation, letus assume that we have “shifted our coordinates” so as to takex0 = 0. Also let

B = {x : |x | ≤ 1}.

We need to assume that P ′(x) is nowhere zero, and that P ′′(x) isbounded. In fact, we assume that there is a constant K such that

|P ′(x)−1| ≤ K , |P ′′(x)| ≤ K , ∀x ∈ B. (13)

Shlomo Sternberg




Proposition.

Let τ = 32 and choose the K in (13) so that K ≥ 23/4. Let

c =8

3ln K .

Then if|P(0)| ≤ K−5 (14)

the recursion (5) starting with x0 = 0 satisfies

xn ∈ B ∀n (15)

and|xn − xn−1| ≤ e−cτn

. (16)

In particular, the sequence {xn} converges to a zero of P.Shlomo Sternberg




To prove: xn ∈ B ∀n (15)and

|xn − xn−1| ≤ e−cτn. (16)

In fact, we will prove a somewhat more general result. So we willlet τ be any real number satisfying

1 < τ < 2

and we will choose c in terms of K and τ to make the proof work.First of all we notice that (15) is a consequence of (16) if c issufficiently large. In fact,

xj = (xj − xj−1) + · · ·+ (x1 − x0)

so|xj | ≤ |xj − xj−1|+ · · ·+ |x1 − x0|.

Shlomo Sternberg




|xj | ≤ |xj − xj−1|+ · · ·+ |x1 − x0|.

Using (16) for each term on the right gives

|xj | ≤j∑1

e−cτn<

∞∑1

e−cτn<

∞∑1

e−cn(τ−1) =e−c(τ−1)

1− e−c(τ−1).

Here the third inequality follows from writing τ = 1 + (τ − 1) soby the binomial formula

τn = 1 + n(τ − 1) + · · · > n(τ − 1)

since τ > 1. The equality is obtained by summing the geometricseries.

Shlomo Sternberg




|xj | ≤e−c(τ−1)

1− e−c(τ−1).

So if we choose c sufficiently large that

e−c(τ−1)

1− e−c(τ−1)≤ 1 (17)

(15) follows from (16).This choice of c is conditioned by our choice of τ . But at least wenow know that if we can arrange that (16) holds, then by choosinga possibly larger value of c (so that (16) continues to hold) we canguarantee that the algorithm keeps going.

Shlomo Sternberg




Choosing c so that the induction works.

|xn+1 − xn| ≤ K 2|xn − xn−1|2 ≤ K 2e−2cτn.

So in order to pass from n to n + 1 in (16) we must have

K 2e−2cτn ≤ e−cτn+1

orK 2 ≤ ec(2−τ)τn

. (21)

Since 1 < τ < 2 we can arrange for this last inequality to hold forn = 1 and hence for all n if we choose c sufficiently large.

Shlomo Sternberg




Getting started.

To get started, we must verify (16) for n = 1 This says

S0P(0) ≤ e−cτ

or

|P(0)| ≤ e−cτ

K. (22)

So we have proved:

Shlomo Sternberg




|P ′(x)−1| ≤ K , |P ′′(x)| ≤ K , ∀x ∈ B. (13)

e−c(τ−1)

1− e−c(τ−1)≤ 1 (17)

K 2 ≤ ec(2−τ)τ . (21)

|P(0)| ≤ e−cτ

K. (22)

Theorem

Suppose that (13) holds and we have chosen K and c so that (17)and (21) hold. Then if P(0) satisfies (22) the Newton iterationscheme converges exponentially in the sense that (16) holds.

|xn − xn−1| ≤ e−cτn. (16)

Shlomo Sternberg




If we choose τ = 32 as in the proposition, let c be given by

K 2 = e3c/4 so that (21) just holds. This is our choice in theproposition. The inequality K ≥ 23/4 implies that e3c/4 ≥ 43/4 or

ec ≥ 4.

This implies that

e−c/2 ≤ 1

2

so (17) holds. Then

e−cτ = e−3c/2 = K−4

so (22) becomes |P(0)| ≤ K−5 completing the proof of theproposition.

Shlomo Sternberg




Review.

We have put in all the gory details, but it is worth reviewing theguts of the argument, and seeing how things differ from the specialcase of finding the square root. Our algorithm is

xn+1 = xn − Sn[P(xn)] (23)

where Sn is chosen as (18). Taylor’s formula gave (20) and withthe choice (18) we get

|xn+1 − xn| ≤ K 2|xn − xn−1|2. (24)

Shlomo Sternberg




In contrast to (4) we do not know that K ≤ 1 so, once we getgoing, we can’t quite conclude that the error vanishes as

r τn

with τ = 2. But we can arrange that we eventually have suchexponential convergence with any τ < 2.

Shlomo Sternberg



Basins of attraction.

The more decisive difference has to do with the “basins ofattraction” of the solutions. For the square root, starting with anypositive number ends us up with the positive square root. This wasthe effect of the en+1 <

12en argument which eventually gets us to

the region where the exponential convergence takes over. Everynegative number leads us to the negative square root. So the“basin of attraction” of the positive square root is the entirepositive half axis, and the “basin of attraction” of the negativesquare root is the entire negative half axis. The only “bad” pointbelonging to no basin of attraction is the point 0.

Shlomo Sternberg




Even for cubic polynomials the global behavior of Newton’smethod is extraordinarily complicated. For example, consider thepolynomial

P(x) = x3 − x ,

with roots at 0 and ±1. We have

x − P(x)

P ′(x)= x − x3 − x

3x2 − 1=

2x3

3x2 − 1

so Newton’s method in this case says to set

xn+1 =2x3

n

3x2n − 1

. (25)

There are obvious “bad” points where we can’t get started, due tothe vanishing of the denominator, P ′(x). These are the pointsx = ±

√1/3. These two points are the analogues of the point 0 in

the square root algorithm.Shlomo Sternberg




We know from the general theory, that any point sufficiently closeto 1 will converge to 1 under Newton’s method and similarly forthe other two roots, 0 and -1.

Shlomo Sternberg




If x > 1, then 2x3 > 3x2 − 1 since both sides agree at x = 1and the left side is increasing faster, as its derivative is 6x2 whilethe derivative of the right hand side is only 6x . This implies that ifwe start to the right of x = 1 we will stay to the right. The sameargument shows that

2x3 < 3x3 − x

for x > 1. This is the same as

2x3

3x2 − 1< x ,

which implies that if we start with x0 > 1 we havex0 > x1 > x2 > · · · and eventually we will reach the region wherethe exponential convergence takes over. So every point to the rightof x = 1 is in the basin of attraction of the root x = 1. Bysymmetry, every point to the left of x = −1 will converge to −1.

Shlomo Sternberg




But let us examine what happens in the interval −1 < x0 < 1. Forexample, suppose we start with x0 = −1

2 . Then one application ofNewton’s method gives

x1 =−.25

3× .25− 1= 1.

In other words, one application of Newton’s method lands us on theroot x = 1, right on the nose. Notice that although −.5 is halfwaybetween the roots −1 and 0, we land on the farther root x = 1.

Shlomo Sternberg




In fact, by continuity, if we start with x0 close to −.5, then x1 mustbe close to 1. So all points, x0, sufficiently close to −.5 will have x1

in the region where exponential convergence to x = 1 takes over.In other words, the basin of attraction of x = 1 will include pointsto the immediate left of −.5, even though −1 is the closest root.

Shlomo Sternberg




Here are the results of applying Newton’s method to the threeclose points 0.4472 , 0.4475 and 0.4480 with ten iterations:

0.4472 0.4475 0.4480−0.4471 −0.4489 −0.4520

0.4467 0.4577 0.4769−0.4443 −0.5162 −0.6827

0.4301 1.3699 −1.5980−0.3576 1.1105 −1.2253

0.1483 1.0146 −1.0500−0.0070 1.0003 −1.0034

0.0000 1.0000 −1.0000−0.0000 1.0000 −1.0000

0.0000 1.0000 −1.0000

Shlomo Sternberg




Suppose we have a point x which satisfies

2x3

3x2 − 1= −x .

So one application of Newton’s method lands us at −x , and asecond lands us back at x . The above equation is the same as

0 = 5x3 − x = x(5x2 − 1)

which has roots, x = 0,±√

1/5. So the points ±√

1/5 form acycle of order two: Newton’s method cycles between these twopoints and hence does not converge to any root.

Shlomo Sternberg




In fact, in the interval (−1, 1) there are infinitely many points thatdon’t converge to any root. We will return to a description of thiscomplicated type of phenomenon later. If we apply Newton’smethod to cubic or higher degree polynomials and to complexnumbers instead of real numbers, the results are even morespectacular. This phenomenon was first discovered by Cayley, andwas published in a short article which appeared in the second issueof the American Journal of Mathematics in 1879.

Shlomo Sternberg




After describing Newton’s method, Cayley writes, concerning apolynomial with roots A,B,C... in the complex plane:

The problem is to determine the regions of the planesuch that P, taken at pleasure anywhere within oneregion, we arrive ultimately at the point A, anywherewithin another region we arrive at the point B, and so forthe several points representing the root of the equation.The solution is easy and elegant for the case of a quadricequation; but the next succeeding case of a cubicequation appears to present considerable difficulty.

This paper of Cayley’s was the starting point for many futureinvestigations.

Shlomo Sternberg




With the advent of computers, we can see how complicated theproblem really is. The next slide shows, via color coding, theregions corresponding to the three roots of 1, i.e. the results ofapplying Newton’s method to the polynomial x3 − 1. The rootsthemselves are indicated by the + signs.

Shlomo Sternberg




Shlomo Sternberg




Arthur Cayley (August 16, 1821 - January 26, 1895)

Shlomo Sternberg


lecture 1 newton's method - harvard math · pdf fileand only allows newton’s method...

Documents