solving equations on a computer*octopus.library.cmu.edu/collections/traub62/box00027/fld...i solving...

i

SOLVING EQUATIONS ON A COMPUTER*

by J. P. Traub

Bell Telephone Laboratories, Incorporated Murray Hill, New Jersey

1 , Introduction When a problem is put on a computer, the finite

length arithmetic of the machine plays an important role. Hence a substantial portion of this paper will be devoted to the effects of finite-length arithmetic. Certain basic ideas which are important in a far wider context than in the solution of equations will be discussed. Only a fraction of the paper will be devoted to specific methods. For additional material on these, the reader can consult the references. 2. The Effects of Finite Length Arithmetic

The discussion of finite length arithmetic will be illustrated by considering the solution of quadratic equations. Now, this seems a rather trivial topic for by now the solution of quadratic equations by formula \nust have seeped down to the Kindergarten level. And, indeed, when working with infinite precision arithmetic (the implicit assumption made by mathematicians), there are no problems. When working with

* t t

To appear in The Use of Computers in the Solution of Engineering Problemsedited by G. L. Schrenk, Prentice-Hall. Based on a lecture given at the Towne School of [Engineering, University of Pennsylvania.

i

- 2 -

a finite number of decimal places, some interesting points come up. Indeed, Porsythe [l] has written a paper on just this subject which will be drawn on heavily during this discussion. In particular, PorsytheTs examples will be used for illustration.

point numbers with eight significant decimal digits normalized so that the magnitudes of the mantissas lie between 1 and 1 0 .

The exponents will be assumed to lie between -50 and 49.

of quadratic equations is as follows. If a, b, and c are any real numbers, and if a ^ 0, then the quadratic equation

We will assume that we are dealing with floating

The mathematical theorem which covers the solution

ax + bx + c = 0 •

is satisfied by exactly two values of x, namely

Let us apply this theorem as a computer algorithm to some

examples, using finite arithmetic.

Example 1 : 6 x 2 + 5* - 4 = 0 . Then x x = 5.0000000, x 2 = -1-3333333.

The value of x^ is exact while the value of x g is the correctly

rounded result. Hence, the algorithm has done as good a job as it

could possibly do.

2 S Example 2 : x - 10^x + 1 = 0 . The true solution rounded to ten

significant figures is

x 1 = 1.000 000 0 0 0 x l 0 5 , x 2 = 9-999 999 999*10~ 6 .

Furthermore, it can be shown that small perturbations in the data

cause only small changes in x^ and x 2 # . Yet when we apply the

algorithm we find

x 1 = 1.0000000 x 10^, x 2 = 0 .

Hence, x 2 has a relative error of 100$; not a single digit is correc If an algorithm is to be used as the basis for a general

computer library routine, then it must always work within the domain of its applicability. This example shows the above algorithm to be inadequate or at least incomplete.

Example 3 : x - 4.0000000x + 3.9999999 = 0. The correctly rounded solution to ten significant figures is

x x = 2.000316228, x 2 = 1 .999683772.

If we apply the algorithm, we find

x1 = x 2 = 2.0000000 .

The computed roots are both in error by about .0003. That is,

of 8 computed digits only 4 are correct. Indeed the computer mistakenly finds a double root instead of two distinct roots.

At first sight, it seems that the accuracy we have obtained is disappointing and we are ready to again blame the algorithm. But before doing so, let's have another look at our example. Clearly our computed solution is the exact solution

p of the equation x - 4x + 4 = 0. The coefficients of this equation differ by only one unit in the last decimal place from the given equation.

1

- 5 -

This illustrates an important contemporary idea in numerical mathematics, that is, the idea of backward error analysis. In the backward approach, we say that the computed solution is the exact answer of a perturbed problem. If the perturbations are "small", we are in good shape. An alternative to a backward error analysis is a forward error analysis in which one analyzes by how much the computed answers differ from the true ones, A backward approach often permits an easier analysis. Furthermore, the backward approach is particularly .relevant to the engineer or scientist who has some idea of the number of significant digits in his input data. For if he can be assured that his computed answer is the exact solution to a problem with input data which agrees with his data to the number of correct digits of his data, then he should be satisfied.

Our discussion indicates that our algorithm is satisfactory for example 3 . But we are not happy with its performance on example 2 . Let D = b - 4ac . The trouble with example 2 is that 4ac is much smaller than b 2 . Hence, when D is formed, the 4ac is completely lost. In the calculation of x g we form -b - ^/D . Since -b is positive, we are subtracting two numbers

- 6 -

I

which are nearly equal. (in this case they are actually equal.) This always leads to trouble because the significant digits are lost in the subtraction and only "noise" is left. This type of difficulty is easily cured by calculating = c/x^. For example 2 we then find x 2 = 1.0000000x10""-^ a correctly rounded answer. To construct a quadratic equation solver which gives a good answer in every case is a nontrivial problem. Such a program has been written by Kahan. 3« Condition

A basic concept in numerical mathematics is that of condition. It is instructive to return to examples 2 and 3 of the preceding section. In example 2 we stated that small perturbations in the data caused only small perturbations in the solution. That is^ the solution is well-determined by the data., and we say the problem is well-conditioned. Yet the answer provided by the algorithm is poor. The fault must lie with the algorithm. On the other hand^ in example 3 the solution is not well determined by the data. Rather widely differing solutions stem from essentially the same data. Hence> we say the problem is ill-conditioned.

It is essential that an algorithm used as the basis of a routine in a computer library handle all problems within its domain of applicability as well as possible in the face of

k

- 7 -

and hence, x± = 2,0004666. We have lost a significant figure]

finite-length arithmetic. It is also desirable that the algorithm perform reasonably well on problems which are on the borderline of the domain for which the routine was designed. These concepts are somewhat vague, but we refer to algorithms having these two properties as being robust. 4 . More on Conditioning

To emphasize the point that the problem of example 3

is ill-conditioned, we will try to compute one of its two roots by iteration. We use Newton-Raphson

P(Xj) 1 + 1 1 P (x ±)

where P(x) is the polynomial whose zero is to be calculated. Nov; for simple zeros this method is quadratically convergent. That is, the error at the (i+l)th step is proportional to the square of the error at the ith step.

V/e start with a good approximation to the zero at 2.000316228. We take x Q = 2.0003. Hence, the initial error is

-5

about 10 and we might expect the error at the next approximation to be good to ten significant figures? Using our eight significant figure floating point arithmetic we find

P(x Q) = -1 .0000000xl0~ 7 , P'(xn) = 6.0000000x10^, '

It

- 8 -

In our case P(x) is a parabola. Our initial approximation is to the left of the larger zero. Perhaps that's our trouble. If we start to the right of the larger zero^ Newton-Raphson iteration will be monotonically decreasing. So we try x Q = 2.0004. We calculate x-̂ = 2.0004. We haven't moved.

Thus Newton-Raphson is no more successful than the quadratic formula. The difficulty really lies with the problem.

We now show analytically why we cannot expect to get any closer to the root by using Newtonls method than in the numerical example above. The difficulty lies in the fact that in the neighborhood of two zeros which lie close together we cannot accurately perform the function evaluation required by Newton!s method. Let a be a zero of P(x) and let e be a small number. Then

P(a+e) = p'(a)e + P " ( g ) g 2 ,

5sume where 6 lies in the interval spanned by a and a + e. As* that P (a) has order of magnitude e. Hence P(a-fe) has order of

2

magnitude e . Assume we are evaluating P(oH-e) by synthetic division ("Horner's" method). Then at the last step of the evaluation process we add the constant term and by the above calculation end up with a quantity of size e . Hence the final step must involve the subtraction of quantities which differ

2 by only e . Hence,, what we have left is largely "noise".

- 9 -

We emphasize that other algorithms would fare no better with this problem. This is because the two zeros are nearly multiple. Multiple or near-multiple zeros are a prime reason for polynomials being ill-conditioned. A general discussion of the condition of polynomials may be found in Wilkinson [ 2 , pp. 38-41] who gives an example [2, pp. 41-43] of an ill-conditioned polynomial whose zeros are not near-multiple. A very readable discussion is given by Forsythe [ 3 ] .

5 . The Calculation of Zeros of Jeneral Functions

We consider methods for calculating zeros of a function f. In this and the following section we discuss the case where f is a function of one variable having a certain number of derivatives. The remainder of the chapter will be devoted to the important special case where f is a polynomial.

We first describe the method or bisection where we need only assume that f is continuous. Let f be real and let a and b be such that f(a)f(b) < 0. If = 0, we're done. If not, calculate the sign of ^ ^ r ^ « Depending on the sign we can determine whether the zero lies in ^a, ^5^) or ( ^ 7 ^ 9 t>̂ . By repeating this procedure jbhe zero may be calculated to any accuracy desired. N

For analytic functions we can use a theorem from complex analysis (the argument principle, see Churchill [4, pp. 271-273]) which relates the number of zeros in a region bounded by a curve C to a certain contour integral on C By successively refining the regions a zero may be

- 10 -

calculated more and more accurately. The integration is done numerically. A careful analysis may be found in Delves and Lyness [ 5 ] .

The most common method for calculating a zero of an arbitrary function is by iteration. Iterative methods consist of:

A. An initial approximation B. A procedure for calcul ting a sequence of

approximations C. A procedure for termination

The calculation of an initial approximation can be quite difficult. A good approximation often depends on specific properties of f.

Assume the sequence of approximations is generated according to a rule of the form

xi+l = <P(xi) •

Then cp is called the iteration function. A general classification of iteration functions may be found in Traub [6] or [ 7 ] .

We give a few examples of specific iteration functions in the next section.

It is a popular game to cook up iteration functions. The crux of the problem is to cook up good ones. One important property of an iteration function is its rate of convergence.

- 1 1 -

A quantitative measure of this is supplied by the notion of the order of an iteration function. If there exists a real number p and a nonzero constant C so that

|cp(x)- a | lim — = C x-*x |x - a p

then we call p the order and C the asymptotic error constant. Actually the order depends on the function f whose zero is sought as well as on the iteration function. For example, the Newton-Raphson iteration function cp(x) = x - f(x)/f'(x) is second order if f has a continuous nonvanishing second derivative at a simple zero. For additional discussion see Traub [ 7 , P. 1 7 3 ] .

Roughly speaking, if a method 'is of order p and if an approximation has q significant figures, then the next approximation has pq significant figures.

We turn finally to the termination procedures. One

common termination procedure is to stop iteration when

lxi+l " X i ' y c

A second common criterion is to stop when

\t(x±)\ < e'.

- 12 -

Here e and e' are chosen aprlori depending on the accuracy desired In the answer. 6. Some Iteration Functions

We turn to some specific iteration functions. In this section we give iteration functions which may be used for any functions with a sufficient number of derivatives. In the following section we discuss methods designed for polynomials. We use the notation f ± = ffx.^), f£ = f'(x i), etc. In discussing order we assume the zero is simple.

Perhaps the best known iteration functions are the Newton iteration function

xi+l x i ' *i

and the secant iteration function

xi+l ~ X i i f ^ - f ^ -

The secand method has the advantage of not requiring derivative evaluations . Its rate of convergence is somewhat slower than that of Newton (approximately 1 . 6 l 8 as against 2 ) , but if a derivative value costs the same to calculate as a function

•- 13 -

value then it can be shown [Traub ( 6 , Appendix C) ] that the secant method is "cheaper". The secant iteration is used by Rice [8] as the basis for a sophisticated program for computing a zero of a real function of one variable.

An iteration function which, like Newton-Raphson, requires an evaluation of f and f' at each step but which is of order approximately 2 .73 is given by

xi+i - x i - u ± - ui v>

spq^- {2'i-"i-i-3f.bci'xi-i1>

Observe that u^V is a correction to Newton. Near the zero, is small and V need not be calculated too accurately.

An interesting iteration is specified by

u i f ( x i " u i ) x. = x. - u. + .

- 14 -

This iteration function uses 2 evaluations of f (at x.̂ and x.̂ - u i) and one evaluation of f' (at x^) but is fourth order! It has the following geometric interpretation. Let z i - x i - u i # Let Q be the point which bisects that segment of the tangent line at [ x ^ f ^ which lies between [x^f.^] and [z^,0]. Then x 1 + 1 is the intersection with the x-axis of the line through Q and [z ^ f f z ^ ] .

7. The Calculation of Polynomial Zeros Any of the general iteration functions given in the

last section could be used for polynomials. We are then not taking full advantage of the fact that we are dealing with a polynomial.

There is considerable current interest in algorithms and programs for calculating polynomial zeros. There was a recent symposium devoted to "Constructive Aspects of the Fundamental Theorem of Algebra". Proceedings of the Symposium have been published [ 9 ] .

Many of the methods for calculating polynomial zeros are not iterative. It seems worthwhile to survey the field including both iterative and non-iterative methods. Although this survey will be far from complete, it will include most of the methods commonly used in practice. Although the number of apparently different methods may seem bewildering to the novice, the number of basic ideas which these methods exploit are rather few. In what follows, we take the degree of the polynomial to be n.

In one type of method a zero \is isolated in successively smaller portions of the z-plane. An example of such a method is Lehmer"fs method. (See Ralston [10, pp. 355-3591)

Next we give several examples of root separation methods. In Graeffe's method (Ralston [10, pp. 359-364] ) we form a sequence of polynomials whose roots are the squares of

the roots of the previous member of the sequence. In Graeffe's method all the zeros are calculated simultaneously. In Bernoulli's method [ll, Chapter 7] we generate an expression which is a linear combination of roots to high powers. Bernoulli's method depends on the fact that the root or roots of largest magnitude will eventually dominate. The QD algorithm [12] is an extension of Bernoulli's method which permits simultaneous approximation of all the zeros. It has been suggested that the QD algorithm be used to obtain initial approximations of all zeros with a more rapidly convergent method such as Newton-Raphson being used to finish the calculation.

Another class of methods is illustrated by Bairstow's method. We seek a quadratic factor corresponding to a pair

2

of complex zeros. Let x + px + q be an approximate quadratic factor. Dividing we have

a P < x ) = Q(x) + .

x + p x + q x + p x + q

The remainder is a linear polynomial whose two coefficients depend on p and q. We set these two coefficients equal to zero. This gives us two simultaneous equations in p and q which we solve by an extension of Newton-Raphson iteration suitable for simultaneous equations. We repeat the process with the new p and q. See Ralston [ 10, pp. 3 7 6 - 3 7 8 ] , for a more detailed description.

A very interesting method is Laguerre's method

defined by the iteration function

H(x±) = [(n-l)^] 2 - n(n-l)f±fJ .

This method is guaranteed to converge if all the zeros are

real and simple. A description may be found in Ralston [ 10,

PP. 3 6 8 - 3 7 1 ] .

8 . Globally Convergent Iteration A globally convergent method is one which works for

all problems, that is, for all distributions of zeros. Traub's method [ 1 3 ] , [14] is a globally convergent iterative method which does not require an initial approximation, and which will find the zeros in order of increasing magnitude. We will see the importance of controlling the order in which the zeros are found, in the next section. The method is based on the follow-ing idea. Let a be a zero of P(x). L§t V (x) = P(x)/(x-a). Then

for all x. Of course, we don't know V (x). However, we can construct a sequence of polynomials converging to V (x). Let

- 18 -

H ( 0 ) ( X ) = P,(x),

H ^ + 1 ) ( x ) = 1 v 9 X •K = 0 , 1 ,

Then all the H ^ ( x ) are polynomials of degree at most n - 1 .

Let H ^ ( x ) be H ^ ( x ) divided by its leading coefficient. Then it can be shown that if P(x) has a smallest zero a,

lim H (^(x) =|M . ( 1 )

Hence, TP '(x) can be used as an approximation to V (x). Since Newton-Raphson gives the answer in one step if we apply it to a linear polynomial, we should do very well by applying it to the rational function P(x)/H^(x) which by (l) is as close as we want to a linear polynomial;

Analysis shows that this is indeed a globally convergent method. An appropriate generalization works for equimodular zeros and, in particular, for complex conjugate zeros.

The implementation of this algorithm, with all decisions made automatically by computed, is discussed in a paper "An Algorithm for an Automatic General Polynomial Solver" by Jenkins and Traub which appears in [ 9 ] .

The method discussed above has two stages. A globally convergent three-stage algorithm which incorporates "shifting" and which has certain advantages over the two-stage algorithm

- 19 -

has been studied by Jenkins and Traub [15] is specified as follows. Stage One

H ( 0 ) ( x ) = P'(x),

This algorithm

H"> + 1>(x) - I >(x) -4^°ip (x)' , T\ = 0, . . .,M-1.

Stage Two Let the distinct zeros of P be labeled a^, i = l,...,j,

let 3 be a positive number such that 3 < min|a 1|, and let s be such that |s| = 6 and such that

s-OLjJ < |s-ai|, i = 2,...,j.

Let

H x-s H H V A ; ( S )

Stage Three Take

a. . . - , L Pis)

- 20 -

and let

H^ + 1>(x) = x-s. H ( X ) (x) - p ( f l ] P(x)

It can be shown that for L sufficiently large, this iteration is globally convergent. See [15] for details. An ALGOL implementation may be found in Jenkins1 Stanford thesis f l 6 l• A Fortran implementation will soon be available. 9. Polynomial Deflation

Many root-finding methods calculate the roots one at a time. As an approximate root is found, it is divided out by synthetic division. This process is known as deflation.

The question naturally arises as to whether the successive computed zeros may suffer a progressive loss of accuracy. Wilkinson [2 , pp. 55-65] has carried out a detailed analysis of the deflation process. He has shown that if a large zero is deflated first then there may be a serious loss of accuracy. This is why in the methods of the previous Section, the zeros are calculated in increasing order of magnitude.

- 21 -

Another type of deflation instability can occur. It is possible for a polynomial to be well-conditioned but to contain factors which are ill-conditioned. For example, the zeros of P(z) = z n - 1 are well-conditioned. However the polynomial consisting of those nth roots of unity lying only on a half-circle is ill-conditioned if n is large.

Consider a polynomial with all its zeros lying on two half-circles of differing radii. If the zero-finder calculates the zeros in increasing order of magnitude, then the zeros on the smaller half-circle will be found first and we will suffer deflation instability.

There does not seem to be any known method for calculating the zeros in an order such that all deflation calculations are as accurate as possible. We know how to find any one zero; the state-of-the-art problem is to calculate all the zeros accurately. 10. A Posteriori Error Bounds

Assume we have calculated a set of numbers which purport to be "good" zeros of a polynomial. How do we decide they are as good as they should be? The accuracy we "deserve" depends on the condition of the polynomial which we do not, of course, know..

Jenkins has written a program which calculates a posteriori error bounds. He supplies a set of disks in which the true zeros are guaranteed to lie. Effects of round-off are taken into account. We cannot give details here and the interested reader is referred to Jenkins1 thesis [16].

REFERENCES

1 . Forsythe, G. E., Solving a Quadratic Equation on a Computer. Appears in The Mathematical Sciences, pp. 138-152* edited by COSRIMS and George Boehm, MIT Press, 1969.

2 . Wilkinson, J. H., Rounding Errors in Algebraic Processes. Prentice Hall, 1963.

3 . Forsythe, G. E., Singularity and Near Singularity in Numerical Analysis. Amer. Math. Mo. 65 ( 1 9 5 8 ) , 229-240.

4 . Churchill, R. V. Complex Variables and Applications, McGraw-Hill, i 9 6 0 .

5 . Delves, L. M. and Lyness, J, N., A Numerical Method for Locating the Zeros of an Analytic Function. Math. Comp. 2 1 , ( 1 9 6 7 ) , PP. 543-560.

6. Traub, J. F., Iterative Methods for the Solution of Equa tions . Prentice Hall, 1964.

7. Traub, J. F., The Solution of Transcendental Equations. Appears in Mathematical Methods for Digital Computers, Volume II, pp. 1 7 1 - 1 8 4 , edited by A. Ralston and H. S. Wilf, John Wiley, 1967.

8 . Rice, J. R., A Polyalgorithm for the Automatic Solution of Nonlinear Equations. Proceedings 1969 ACM Conference, pp. 1 7 9 - 1 8 3 .

9 . Dejon, B. and Henrici, P., editors, Constructive Aspects of the Fundamental Theorem of Algebra, Wiley-Interscience, 1969.

1 0 . Ralston, A., A First Course in Numerical Analysis. McGraw-Hill, 1965.

1 1 . Henrici, P., Elements of Numerical Analysis. John Wiley, 1964.

R - 2

1 2 . Henrici, P., The Quotient-Difference Algorithm. Appears in Mathematical Methods for Digital Computers, Volume II.pp. 37-6

edited by A. Ralston and H. S. Wilf, John Wiley, 1967.

1 3 . Traub, J. F., A Class of Globally Convergent Iteration Functions for the Solution of Polynomial Equations. Math. Comp. 20 ( 1 9 6 6 ) , pp. 1 1 3 - 1 3 8 .

1.4. Traub, J. F., The Calculation of Zeros of Polynomials and Analytic Functions. Proceedings of a Symposium on Mathematical Aspects of Computer Science, pp. 138-152 , Amer. Math. Soc., 1967.

1 5 . Jenkins, M. A. and Traub, J. F., A Three-Stage Variable -Shift Iteration for Polynomial Zeros and its Relation to Generalized Rayleigh Iteration. Computer Science Report 107, Stanford University, 1968. Will appear in Numerische Matematik.

16 . Jenkins, M. A., Three-Stage Variable-Shift Iterations for the Solution of Polynomial Equations with A Posteriori Error Bounds for the zeros, Stanford Dissertation, 1969.

solving equations on a computer*octopus.library.cmu.edu/collections/traub62/box00027/fld...i solving...

Documents