a class of numerical methods for the computation of pythagorean sums

8/8/2019 A Class of Numerical Methods for the Computation of Pythagorean Sums

1/8

Augustin A. Dubrulle

A Class of Numerical Methods for he Computation ofPythagorean Sums

Moler and Morrison have described an iterative algorithm or the computation of the Pythagorean suma' + b2)"' of two realnumbers a and b. This algorithm is immune to unwarrantedfloating-point ove rjo ws, has a cubic rate of convergence, and iseasily transportable. This paper,hich shows that the algorithm is e ssentially arley's method applie d to he computationofsquare roots. provides a generalization to any order of convergence. Formulas of orders 2 through 9 are illustrated withnumerical examples. The generalization keeps the number of floating-point divisions constant and should be particularlyuseful fo r computation in high-precision floating-point arithmetic.

1. IntroductionAs in [1 1 , the Pythagorean sum of two real numbersa an d bis defined by

h = (a' + b*) ' / ' .Pythagorean sums are a frequent occurrence in numericalcom putat ions, and heir evaluation n loating-point arith-metic requires some precautions,ften overlooked, to preventunwarranted overflowsor underflows. In their paper, Molerand Morrison [ I ] describe an elegant iterative algorithmorthe computation of Pythagorean sums hat is immune ounwarranted overflows, has a cu bic rateof convergence, andis easily transportable to any machine.

This paper presentsa summary description of the M oler-Morrison algorith m nd shows tha t it is essentially theapplication of Halley'smethod [2 ] to hecomputation ofsquare roots.A generalization oany convergence orderhigher than linear is then proposed th at preserves the mainproperties of the Moler-Morrison algorithm. Algorithms fororders 2 through 9 are illustrated with numerical examples.Since th e gen eral algorithm s based on a rational iteration,the num ber of divisions required per iteration is constant(two), while the num ber of multiplications is prop ortion al tothe order of convergence. High-ord er algorithm sshould thusbeparticularly nteresting formultiple-precision loating-point computations.

2. The Moler-Morrison algorithmIn a rectangula r coordina te system { x , y }of origin 0, weconsider a sequence of points{A,; n = 0, 1 , 2 , . ],A, being inthe first quadrant, at a distanceh from the origin.A,,, isderived fromA , as follows. LetH , be theprojection ofA, onthe x axis and M,, he midpoint ofA,H,. A,+, is defined as thereflection of A, in OM, Fig. 1 ) . Elementarygeometricconsiderations show that A,,, and A,, are n he am equadrant , a t he same dis tance from he origin, and hatA,,, is closer tha nAn o thex axis. Thus, from the efinition

of A,, the set { A n }s on the qua rter circle of radiush in thefirst qu adra nt, with A,,, between An an d A, the point ofabscissa h on the x axis. The equence { A n ]convergesmonotonically towardsA.

Let x, , and y , be the coordinates of A,. From the aboveconsiderations, th e sequence{ x n }onverges monotonically toh from below, with

h = (x: + y:) ' / ' , n = 0, 1,2, ..., (1 )while the sequencey , ] converges monotonically o zero fromabove. Thus,{ x n ]an be consideredas a sequence of approxi-mants to the P ythagorean sum f two arbi t rary numbersx ,an d y o .From the relatio nship betweenA, an d A,+, , we nowderive the ite ration formu las providing the pair( x , , + , , y, , , )from ( x n , , , ) .

0 Copyright 1983 by Interna tional Business Machines Co rporation. Copying in printed form for private use is permitted without payment ofroyalty provided that ( 1 ) each reproduction s done without alteration and (2) the Journal reference and IBM copyright notice are included onthe first page. The title and abstract, but no other portions , of this paper may be copied or distributed royalty free w ithout further perm ission bycomputer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the

582 Editor.

AUGUSTIN A. DUBRULLE IBM J . RES. DEVELOP. VOL. 21 NO. 6 NOVEMBER 1983


2/8

Without loss of generality, we hencefo rth assume that

Xg Yo > 0,

considering that he Pythag orean addition is comm utativeand that the caseyo = 0 is trivial. This assumption guaran-tees tha t

X" L y, > 0 (2 )

for any finite value ofn.

From Eq. ( I ) , we have

X" + Y" = X,+I + YZ+l

Y,(Y,+, - .) + 2X,(X,+I - X") = 0. (4 )

2 2 2(3 )

Since A , + , A , is perpendicular toOMn,

Combining Eqs. ( 3 ) nd (4), we obtain

under the assumptions n the inequalities(2).

From Eqs. ( 5 ) , th eMoler-Morrison lgorithm fo r thePythagorean sum of two real numbersa and b,

h = ( a 2+ b 2 p 2 ,can be expressed as follows:

x,, = m a x ( l a l , I b l ) ,

Y,, =m i n ( l a l , I b l ) ,

Equations ( 6 ) epresent the fo rmula tion in [11. For com-putations in floating-point arithme tic, the iterations stoppedwhen

4 = r,, + 4,

where = deno tes equality of machine floating-point repre-sentat ions. Themonotonic convergence ofx, o h from belowand the formulation in terms ofrn exclu de th e possibility ofunwarranted overflows. The asymp totic rate of convergenceis cubic, andvery few iterations are required to obtainesultsaccurate to the orking precisions of current com puters. Thesimplicity, com pactness, and accuracyf the algori thm makeit an attract ive hoice for inclusion in mathem atical softwarelibraries.

I A

Figure 1 Geometric nterpretation of the Moler-Morrison algo-rithm.

3. Relationship with Halley's methodLet f(x) be a real function twice differentiable of the realvariable x. Halley's method (1 6 9 4 ) s an iterative scheme orfinding an approximation to root of the equation

f (x ) = 0

given an initial guess x of that root. It is based on th eiteration formula

fif '

where f,,, , and f are the values off(x) and its first twoderivatives at x,. The asym ptotic rate of convergence of themethod is cubic for simple roots nd line ar or mul tiple roots.Mo re details on this iteration can be found in[2].

W e now apply Halley's method to hecomputat ion ofPythagorean sums. For a given pair(x,,, o) uch that

O < y , ~ x , , (8 )

the Pythagorean sum

h = (x; + y y

is a root of the equation

Replacing f(x) by its expression (10) in formula ( 7 ) , weobtain

h2 + 3x: 583

AUGUSTIN A. DUBRUL LEIB M J. RES. DEVELOP. VOL. 27 NO. 6 e N O V E M B E R 1983


3/8

From Eq. (1 l ) , we can derive tha t

O r x , r h ( 1 2 )

is a sufficient condition for the doub le inequa lity

X" I ,+I I (13)

to hold true. Sincex. satisfies th e inequalities ( 1 2 ) from the

definitions (8) an d (9), the inequalities (13) guarantee thaty,, exists such that

x, I ,+I I

to hold true;0 Stability for positive x iterate s: the coefficients ofF k ( x )

Rationalcomputability of thex iterates: isa rationalare positive (sufficient conditio n);

function of h2 and X,,;

( 15 ) an d (1 6) coincide when k = 3.Compatibi l i ty with he Moler-Morrison algori thm: Eqs.

y , = h -x:.2

The combinat ionof Eqs. (1 1) and 14 ) finally yields

and3

Y.+I =Y"

4x5 + y : 'which are precisely the equations( 5 ) derived for the Moler-Morrison algorithm.

4. A generalizationWhile heMol er-Morriso n lgorithm is likely tobe ofoptim al efficiency for the wo rking precision of cu rrent com-puters,higher-order algorithmsmay be desirable undercertain conditions of computationalcost) for calcula tionsrequiringhighe r levels of precision. In his section, wepropose such algorithms or which the num berof divisions isconstant and h e numb er of additions and multiplicationsvaries linearly with the rate of convergence (ratio nal itera-tion formulas).

Thecub ic ate of convergence of theMoler-Morrisonalgorithm is revealed by forming

from Eq.( 1 1) .

An obvious generalizatio n of Eq. (15 ) to an asymptot icconvergence ratek is

where F J x ) is a polynomial to be defined. The requirementswe choose to imposeupon F k ( x ) relate ocomputationalconsiderations and heproperties of theMoler-Morrisonmethod; they are

Monotonic convergence toh from below:

O I X , I h

584 is a sufficient conditionor

AUGUSTIN A. DUBRULLE

The choice

( h + + ( h - X)kFk(X) = 2h

satisfies th e above onditions an d defines the followingiteration formulas:

( h + X " ) k - h - x,,y( h + X$ + ( h - J k '"+ ] = h

2hYf:

( h + X J k + ( h - x")k"+ l=

An analysis of Eqs. (17) shows that two cases must beconsidered, based on the parity ofk:

0 k is even; thenThe num erator ofx,,+] is an odd polynomial inx, and aneven polynomial in h . The denom inator is an even poly-nomial in x, an d h . Thus, x,,+I s the product of x , and arational functionof x5 and h2 ;using a similar approach ,we

findy,+, to be the productf h and a rational function fxfand h2 ;

x,+] is again the product fx,, and a rational functionof xfand h2 , while Y, + ~ s theproduct of y , and a rationalfunctio n ofxf an d h2 .

k is odd; then

From these considerations, we see th at a rationa l iterationmapping the pairx,,, y,) into the pairx,,+], y n + ] ) xists onlywhen k is odd, since h is not quantity vailable orcomputation. Similar analyses how tha t

r n + l = ( Y n + 1 / X n + d 2

always has a ration al expression in term s ofrn and h2 .Whenk is even, an iteration can thus beerived that m aps th e pa ir( x n ,r n ) into (x,+1, r n + l ) . hese twoclasses of iterations,together with Eqs.(17), constitute thebasis for ou r general-ization of the Moler-M orrison algorithm. We derive in thenext sections th e corresponding iteration formulas. As muchas possible, the outlineof the general algorith ms ollows tha tof the cubic algorithm efined byEqs. (6).

IB M J. RES. DEVELOP. VOL. 27 NO . 6 NOVEMBER 1983


4/8


5/8

6. Formulas of odd orderThe analysis of Section4 showed th at when k is odd,

k = 2 m + 1 ,

we have an iteration on the pair (x", y , ) that we formulatebelow as a straightgeneralization of the Mo ler-Mo rrisonalgorithm. Using an approach similar to thatof the previous

section, we derive our iteration ormulas romEqs. 17).Defining first the coefficients

2 m + 1

and

we express as follows the algorithm or the P ythago rean sum(a' + b2)'I2:

X"+l = x, 9

from Eqs. (17).

7. An estimate of the maximum number of itera-tionsL et E,, denote the relative distance of the nth iterate to thePythagorean sum for the formu laof order k :

& = I - "h

Xn

Defining

un = -,L

E1 - 22

we obtain, from Eqs.( 1 7 )and ( 2 7 ) ,

U"+l = UIt,

that is,

k

k"k"

un = uo and 3 = -O2 1 + u , np=O

The teration is stopped when theequality of machinerepresentations

fl (1) = f l ( 1 + r n )is satisfied, except for the caseof the cubic iterationm = l ) ,where the criterion

f l (4) = fl (4 + r )is more econom ical and aseffective.

One teration of thisalgorithm equires he followingamount of floating-point arithmetic:

( 2 m + 1) additions, ( 3 m + 1) multiplications, an d 2 divi-sions.(with one less addition for the cubic iteration). We can seethat he odd-o rder terations are more efficient than heireven-order counterpa rts.

The formu la of order 1 is of no interest at all . It is a limit586 case of linear convergence for which

AUCUSTIN A . DUBRULLE

We now assume that the algori thms used with a machin ewhere the floating-point num ber representation hast digitsin base 0, and for which the relative error in numberrepresentation is bounded by machine precision,

E = y p , (30)

where y is aounding parameterakinghe values 1(chopped representation) or '2 (rounded epresentation).Ignoring rounding errors, the iterationf order k ceases to beeffective as soon a s

En < E

or , from Eq. ( 2 9 ) ,k"

UO Ek" < -

l + u o 2

Th is condition is satisfiedafor t ior i when

uo < - ." E2

Th e use of this simpler criterion finds its justification inthe consideration that

f l ( l + u g , = f l ( 1 )when

uo E,

where fl (.) denotes numb er floating-point representation inour m achine.

k"

IB M J . RES. DEVELOP. VOL. 27 NO . 6 NOVEMBER 1983


6/8

We can ow derive the leastupper boundN of the numberof effective iterations of orderk .

in the course of the com putatio n, underflows can have anunde sirable effect as illustratedbelow in the case f the cubicformula.

The maximum value ofE ~ , btained for the Pythagoreansum of two equal numbers, Assume that the Pythagorean sum

JzE o = 1 - - ,

2

defines the maxim umof uo a s

J z - I

Jz+uo = -from Eq . (28).

Combining Eqs. (30), (31), and ( 3 2 ) ,N is the smallestvalue of n for which

tha t is,

(33)

where ceiling (w ) enotes th e smallest integer not less thanW .

W e now give two examples based on IB M System/370arithm etic, sho rt a nd ong precision.

In short precision,

p = 16, t = 6, y = I ,

and we obtain, fromEq . (33),

N = 4 for k = 2,

N = 2 for 3 I 5 8, an d

N = 1 fo r k = 9.

In long precision,

p = 16, t = 14 , y = 1,

we get

N = 5 fo r k = 2,

N = 3 for 3 I I , and

N = 2 for 5 I k c 9 .

These estimate s were verified by numerical expe riments.

8. Robustness and transportabilityWhile the monotonic convergence of these algorithms guar-antees thatno unw arranted floating-point overflow can occu r

h = (u' + u 2 p 2 = u a

is computed with the cubic iteration6), u being the machineunderflow th resh old, that is, the smallest positive machinefloating-point number.

We have

ro = 1 and so = 1/5 .

The computat ionof the first iterat e reduces to

X I = u + 2s0u.

Since

2s0 < 1 ,

we have

fl (2s0u) = 0

and the first x iterate is u. For the same reason, the firstyiterate is zero, and the iteration terminatesith th e incorrectresult u . The same phenomenon occursfo r all our formulaswhere the iteration takes the form

xn+1 = x , + 4nxn (34)

because @ is bounded above by '/2. This is easily proved byconsidering the first ofEqs. ( 1 7) ,

( h + x y - h - X"Ik( h + x")k+ ( h - X yxn+1 = h

and the identity

h = (1 + r,,)"?xn ,

which yield th e inequalities

1

2o I - x , I ( I + r n ) ' / *- 1 1 x" I n x n ,

or

On e obvious approa ch to emedy this defect is to perform asystem atic scalingof a and b by a power of p, the base of themachine num ber representation, to bring x . into the range(0, p), and to apply the corresponding inverse scaling to theresult. This requires knowledgeof p and extra op erations inall cases. 587

IB M J. R E S . DEVELOP. VOL. 21 NO. 6 N O V E M B E R 1983 AUGUSTIN A. DUBRULLE


7/8

Table 1 Formulas of even orders, iteration(25).

k P and Q

2 P = rQ = 2 + r

4 P = 4r + 3r2Q = 8 + 8 r + r 2

6 P=

16r + 20rZ + 5r'Q = 32 + 48r + 18r2 + r'8 P = 64r + 112r2 + 56r' + 7r4

Q = 128 + 256r + 160r2 + 32r' + r 4

Table 2 Formulasof odd orders, iteration (26).

k P and S

3 P = 2S = r/(4 + r)

5 P = 8 + 4 rS = r/(16 + 12r + r2)

7 P = 32 + 32r + 6r2

9 P = 128 + 192r + 80r2 + 8r'S = r/(64 + 80r + 24r2 + r 3 )

S = r/(256 + 448r + 240r2 + 40r' + r4 )

A preferred alternative, requiring nowledge of the under-

flow threshold u and machin e precisionE, consists of apply-ing some scaling to the data nly when warranted.

In Eq. (34), we observe tha t, in the abse nce of underflow,x"+, s com puted accurate ly to mach ine recision if

4" L E.

Thus, the computation terminates correctlyf

@"X" L EX" L u ,

tha t is, when

X" L U E - l .

Because x, is an increasing function ofn, t will be sufficientto satisfy the ondition

x. L U E " .

These considerations define th e following scaling rule:

If max ( Ia 1Ib I) < U E - ' , multiply a and b by E-' ;588 Accordingly,multiply the result byE.


Table 3 Pythagoreansu m of 119 and 120.

Order Iterates

2 120.0000000000000159.5549451828402168.7209057465608168.9997691646582168.9999999998423

169.00000000000003 120.0000000000000

167.3605440280932168.9999608618056169.0000000000000

4 120.0000000000000168.7209057465608168.9999999998424169.0000000000000

5 120.0000000000000168.9526470501203169.0000000000000

6 120.0000000000000168.9919703649560169.0000000000000

7 120.0000000000000168.9986385471298169.0000000000000

8 120.0000000000000168.9997691646582169.0000000000000

9 120.0000000000000168.9999608618056169.0000000000000

For exam ple, long-precision computation with an IBM Sys-

tem /37 0 defines

E = 16-13 and u = 1665

and the caling condition

m a x ( l a l , I b l ) < 16-"

for a factor

E = 16".1

It must e noted that any reasonablenteger power of@no tless than E- ' can be used as a scaling fa ctor and thatvalueshigher than UE" can replace the threshold of applicability

with little loss of efficiency in order to achieve portab ilityacross a wide rang e of comp uters.

9. ExamplesThe iteration formulas f orders2 through 9 ar e representedin Tables 1 and 2 by the functionsP, Q, n d S that appear niterations 25)and 26) the terationsubscriptsand heorder superscripts areropped for simplicity of notation).

IB M J. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983


8/8

Two examples of application of these form ulas are dis-played in Tables 3 and4, btained with an IBM System/3 70in long-precision ar ithm etic.

10. ConclusionWh ile the efficiency of the M oler-Morrison algorith m maybe optimum for most current computers, higher-order for-

mula s can be seful for systems with low division. The y findan obvious use in c omp utation s involving multiple-precisionsoftware, where division can be particularly expensive. Byreducing thenumb er of iterations, heyarealsoadvan-tageous for implementation with interpretive high-levellan-guag es (e.g., APL).

11. AcknowledgmentThe au thor is indebted to Cleve Moler for the communica-tion of the cubic algorithm and its geo metricerivation.

12. References1. C. Moler and D. Morrison, Replacing Square Roots by Py-

thagorean Sums, IBM J . Res. Develop.27,577-581 (1983, thisissue).

2. J. F. Traub, Iterative Method for the Solution of Equations,Pren tice-H all, Inc., Englewood Cliffs, N J,1964.

Received June 2, I983

Augustin A. Dubrulle IBM Scientific Center. 1530 PageMill Road. Palo Alto, California 94304.Mr. D ubrulle is currentlyworking on numerical problems of seismic oil exploration. Hiseducation includes degrees in m athem atics (MS,1958) and physics(MS, 1959) from the University of Lille, France. Before joining th ePalo Alto Scientific Center in 1977, he worked asa numericalanalyst for IBM France (1962-1967), in theData Processing

IB M J. RES. DEVELOP. VOL. 27 NO. 6 NOVEMBER 1983

Table 4 Pythagorean sum of 19 an d 180.

Order Iterates

180.0000000000000180.997222264851 7180.999999978685318 .OOOOOOOOOOOOO

180.0000000000000180.999992305383918 1.OOOOOOOOOOOOO

180.0000000000000180.999999978685318 1.OOOOOOOOOOOOO

180.0000000000000180.999999999941018 1 .OOOOOOOOOOOOO

180.0000000000000180.999999999999818 1.OOOOOOOOOOOOO

180.000000000000018 1.OOOOOOOOOOOOO

180.000000000000018 1.OOOOOOOOOOOOO

180.000000000000018 1.OOOOOOOOOOOOO

Division (1968- 1974), and in the GeneralProducts Division (1975-1976). His main interests ar e in numerical linear algebra , functionalapproximation, floating point arithmetic, and mathematical soft-ware. Mr. Dubrulle is a member of the Association for ComputingMachinery, the Society for Industrial and Applied Mathematics,and the Special Interest Groupon Numerical Mathematics.

589


a class of numerical methods for the computation of pythagorean sums

Documents