ultra-arithmetic i: function data types

18
Mathematics and Computers in Simulation XXIV (1982) 1-18 North-Holland Publishing Company ULTRA-ARITHMETIC I: FUNCTION DATA TYPES C. EPSTEIN *, W.L. MIRANKER and T.J. RIVLIN IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A. We develop ultra-arithmetic, a calculus for functions which is performable on a digital computer. We proceed with an analogy between the real numbers and their truncated positional number representations on the one hand and functions and their truncated generalized-Fourier series on the other. Thus we lift the digital computer from a setting corresponding to the real numbers to a setting corresponding to function spaces. Digitized function data types are defined along with computer versions of the corresponding ultra-arithmetic operations (addition, subtraction, multiplication, division, differen- tiation and integration). Error estimates for these operations (the analogues of traditional rounding errors) are given. Ex- plicit examples of the error estimates for the ultra-arithmetic operations are given in the cases of five specific choices of basis functions; Fourier-, Chebyshev-, Legendre-, sine- and cosine-bases. Finally the algorithms of ultra-arithmetic are given in an explicitly implementable form for the cases both of the Fourier basis and the Chebyshev basis. 1. Introduction A digital computer does not deal with the real numbers but rather with a large but fixed subset of the reals, usually the collection of floating point num- bers of a specified range and precision. Moreover, the computer does not perform arithmetic, but rather an approximate arithmetic (which, for example, is not even associative). On the other hand the computer performs this approximate arithmetic on arrays of floating point numbers, so that in fact (subject to its limitations) it performs arithmetic on data types which are the computer versions of real and complex numbers, of vectors and matrices, of intervals, etc., as well as on data types which are combinations of these. In any case the totality of representatives of any such data type which is found in the computer while large is also fixed, and the associated arithmetic which is performed is only approximate. While computation of marginal precision or even no precision at all is frequently performed, the theory and practice of contemporary digital computation shows that meaningful results may be made to flow out of a computer. This practice is so commonplace that the marvel of approximating infinitely many ob- * Under the auspices of a student interaction agreement with the Courant Institute, New York, NY. jects, each in general of infinite precision with finitely many objects, each with a fured precision and of replacing the arithmetic operations by approximate operations (which typically deliver results with small errors) is taken completely for granted. Of course the ability to produce sensible computation in this approximate setting finds its basis among notions of approximation which are themselves made applicable by various density and continuity properties of the numbers and the operations. The calculus of which the reals and the arithmetic operations are an intrinsic part contains other con- structs and operations which by virtue of density and continuity properties can be approximated analo- gously. In particular, functions of a real variable and operators such as differentiation and integration have similar finite counterparts. Thus effective digital cal- culation with them ought to be possible. To see this consider by way of analogy a real func- tion represented by a Fourier series and contrast it with a real number. The error in truncating the Fou- rier series and replacing it by its finite partial sum is characterizable, as is well-known. The technique of this error characterization is in principle the same as the one for characterizing the error made by replacing the real number by a floating point number of fixed precision. Both are questions of characterizing the function on the one hand or the reals on the other in 03784754/82/0000-0000/$02.75 0 1982 North-Holland

Upload: c-epstein

Post on 21-Jun-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ultra-arithmetic I: Function data types

Mathematics and Computers in Simulation XXIV (1982) 1-18 North-Holland Publishing Company

ULTRA-ARITHMETIC I: FUNCTION DATA TYPES

C. EPSTEIN *, W.L. MIRANKER and T.J. RIVLIN IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.

We develop ultra-arithmetic, a calculus for functions which is performable on a digital computer. We proceed with an analogy between the real numbers and their truncated positional number representations on the one hand and functions and their truncated generalized-Fourier series on the other. Thus we lift the digital computer from a setting corresponding to the real numbers to a setting corresponding to function spaces. Digitized function data types are defined along with computer versions of the corresponding ultra-arithmetic operations (addition, subtraction, multiplication, division, differen- tiation and integration). Error estimates for these operations (the analogues of traditional rounding errors) are given. Ex- plicit examples of the error estimates for the ultra-arithmetic operations are given in the cases of five specific choices of basis functions; Fourier-, Chebyshev-, Legendre-, sine- and cosine-bases. Finally the algorithms of ultra-arithmetic are given in an explicitly implementable form for the cases both of the Fourier basis and the Chebyshev basis.

1. Introduction

A digital computer does not deal with the real numbers but rather with a large but fixed subset of the reals, usually the collection of floating point num- bers of a specified range and precision. Moreover, the computer does not perform arithmetic, but rather an approximate arithmetic (which, for example, is not even associative). On the other hand the computer performs this approximate arithmetic on arrays of floating point numbers, so that in fact (subject to its limitations) it performs arithmetic on data types which are the computer versions of real and complex numbers, of vectors and matrices, of intervals, etc., as well as on data types which are combinations of these. In any case the totality of representatives of any such data type which is found in the computer while large is also fixed, and the associated arithmetic which is performed is only approximate.

While computation of marginal precision or even no precision at all is frequently performed, the theory and practice of contemporary digital computation shows that meaningful results may be made to flow out of a computer. This practice is so commonplace that the marvel of approximating infinitely many ob-

* Under the auspices of a student interaction agreement with the Courant Institute, New York, NY.

jects, each in general of infinite precision with finitely many objects, each with a fured precision and of replacing the arithmetic operations by approximate operations (which typically deliver results with small errors) is taken completely for granted. Of course the ability to produce sensible computation in this approximate setting finds its basis among notions of approximation which are themselves made applicable by various density and continuity properties of the numbers and the operations.

The calculus of which the reals and the arithmetic operations are an intrinsic part contains other con- structs and operations which by virtue of density and continuity properties can be approximated analo- gously. In particular, functions of a real variable and operators such as differentiation and integration have similar finite counterparts. Thus effective digital cal- culation with them ought to be possible.

To see this consider by way of analogy a real func- tion represented by a Fourier series and contrast it with a real number. The error in truncating the Fou- rier series and replacing it by its finite partial sum is

characterizable, as is well-known. The technique of this error characterization is in principle the same as the one for characterizing the error made by replacing the real number by a floating point number of fixed precision. Both are questions of characterizing the function on the one hand or the reals on the other in

03784754/82/0000-0000/$02.75 0 1982 North-Holland

Page 2: Ultra-arithmetic I: Function data types

2 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-arithmetic

such a manner that the tails, i.e., the parts to be trun- cated are small. This is clear for the function, but the commonplace nature of reals and their floating point approximations obscures this feature and its qualita- tive equivalence to the case of functions.

For instance, consider the decimal expansion for the real number a.

here M is a fixed integer. We may view the elements ei = 1 O-‘, i =M, M t 1, . . . as a ‘basis’ for this series expansion. By convention the ai take values among the digits 0, 1, . . . . 9. Let us normalize the ‘basis’ {et}

so that each ‘basis element’ has the value unity. Call the ‘normalized basis’ {Zi}. Then the corresponding coefficients, called a”i, take values from the set (0, lo-‘, 2 x lo-‘, . ..) 9 X lo-‘}. Corresponding to an

integer N, the number a is represented by a rounded number called SNa in the computer, where for exam-

ple N

SNQ = c a”iFi. i=M

The coefficients of the truncated part have the prop- erty

O<Zi< lo-‘+’ 3 i>N+ 1.

Notice that this bound implies an exponential decay of coefficients for the decimal representation which employs a ‘normalized basis’. In a computer SNQ is used as an approximation to each of a set of real num- bers {b} which has the property

lb-SNQl<O.SX lOeN. (1)

These features, tacit in the context of arithmetic, are

explicit in series representations of functions. For instance tiff L2 is represented by a generalized- Fourier series with orthonormalized basis elements &, i = 0, 1, . . . . viz.

f = C Qi@i , Qi = (f, $j) , i = 0, 1, . . . i=O

and if

SNf = ,$ Qi& >

then the quality of the error IIf - SNf 11 depends on further characteristics of the function f (compare (1)). Such a characterization may take the form of smoothness requirements on f, and these in turn imply a property of asymptotic decay for the coeffi- cients ai = (f, &). Indeed the decay is usually like a power of i with the power depending on the degree of smoothness. (Recall for comparison the property ai < 1 o--i+1 for the decimal representation of a real number.) As an illustration consider Fourier series for L2(,S1). (Sr denotes the circle and we use this no-

tation to specify the periodicity of functions.) If sNf = @-Naj e Ux is a finite approximation to

an arbitrary function inL2(S1), then the error

ilf - sNfil =( ,jFN I(eiiX9fi12)1’2

can be arbitrarily large. (Note that without the requirement that the coefficients Qi of a decimal expansion be taken from among the digits 0, 1, . . . . 9, the same is true for numbers.) However, if f E CP(S1), then

Iv-@‘(x) II I(eiix,f)l < ip .

Then

Ilf - SNfll =( ,ipN I(ejx,fli2)1’2

Thus we may consider the (2Nt 1)vector of coeffi- cients of the partial sum of the Fourier series to be the data type in a digital computer which corresponds to a real valued function. Moreover, this choice of data type as a representation of the function is quali- tatively the same as representing a real number by a data type corresponding to a decimal expansion of fixed precision.

There remains to consider the operations on this new data type. As with the numbers we must define appropriate approximations to the arithmetic opera- tions used with functions which are suitable for their finite data type representatives. The appropriateness will depend on the error estimates, i.e., on the devia- tion of the result of exact arithmetic from the result of the corresponding approximate arithmetic. More-

Page 3: Ultra-arithmetic I: Function data types

C. Epstein, W. L. Miranker, T.J. Rivlin / Ultra-arithmetic 3

over, finite versions of the calculus operators of dif- ferentiation and integration must also be introduced, and the appropriateness of these finite versions be

established by error estimates. There are many possible representations of num-

bers which may be used for digital computation, although only a few occur commonly in practice (e.g., fixed and floating point versions of binary, octal, decimal and hexadecimal number systems). While the possibilities for representation are surely much greater

for functions than they are for numbers, we will deal with a very special class of such representations, namely the generalized-Fourier series. An analogy may be drawn between limiting number representations to positional systems such as decimal representations and limiting function representations to generalized- Fourier series. Indeed we have already drawn this analogy for another purpose. However unlike the case of the positional representation of numbers, the dif- ferent bases which may be used for generalized-Fou- rier series generate marked differences in suitability for computation. Examples show that some bases sup- port good arithmetic but are poor for the calculus operations. The grading of bases with respect to cost per arithmetic or calculus operation is also marked.

We begin in Section 2 with a formalization of the new data types, of a notion of rounding of functions and of the approximate operations. The case of divi- sion of functions proceeds via the introduction of an l

approximate reciprocal by means of an auxiliary mini- mization problem. The error expressions for all of the operations are also formalized.

In Section 3 we develop an error analysis for the multiplication and reciprocation operations. We show that the multiplication error is a function of the mod- ulus of continuity of the rounding error operator. The error in reciprocation is shown to have a similar dependence. Additional estimates showing the depen- dence of the errors on function smoothness are also

given. In Section 4 we carry out the error estimates for

five particular sets of basis functions corresponding to Fourier series, Legendre series, Chebyshev series, sine series and cosine series, resp. These explicit estimates show the marked grading imposed by the new data types on the collection of bases.

Finally, in Section 5 we give explicit algorithms for the operations in the cases of Fourier series and

Chebyshev series. That is we express the operations in terms of mappings on the associated arrays of rounded series coefficients. It is this form of the operations which permits a direct implementation into a digital

computer. Part 2 of this paper, Intervals of Polynomials, deals

with ultra-arithmetic in the case of intervals. That is we define data types which are intervals whose ele- ments are the data types discussed here. More particu- larly, we introduce intervals of polynomials of degree

n. Then we define the six operations (addition, sub- traction, multiplication, division, differentiation and integration) for the intervals. When such an operation yields a result which is not an interval of polynomials of degree n, we introduce an appropriate isotonal approximation which is such an interval. An error analysis of these approximations is also given.

2. Rounding and the approximate operations

2.1. Rounding

IRd will denote the d-dimensional Euclidean space and a C IRd a ddimensional bounded subset. We deal with the spaces c”(a), L2(s2) and Hk(fi), the last being the lcth Sobolev space with the norm

For convenience we denote the L2-norm off by Ilfll

instead of Ilflle. We wilI also drop the symbol a from the integration sign, when no confusion will result.

{$i}r& denotes an orthogonal basis (in,52(s2), for example). SNf denotes the projection onto the sub- space spanned by {q&}~e. Im SN will denote the im- age of this projection operator. We call S,f the round off (onto the sp{q+}&). The rounding error is (Z - S,> f, so that I - S, is the rounding error opera- tor. (The case of Fourier series is an exception to this convention since in that case we use S, to denote the

projection onto the subspace spanned by {einX)z_X)

2.2. Approximate operations

The computer data type corresponding to the function f is S,f, the round off. Let SNg be the

Page 4: Ultra-arithmetic I: Function data types

4 C Epstein, W. L. Miranker, TJ. Rivlin / Ultra-arithmetic

round ofg and let

N N

SNf = 2 ai@i and SNg = c bi@i . i=O

Addition and subtraction

SNf * SNg

are the sum/difference of SNf and SNg. In fact

N

SNf * SNg = z (ai k bi) @i ,

and the quantities ai + bi, are not in general determin- able without a roundoff error. However, in the con- text of our formulation, such errors (which are treated

elsewhere [l]) are put aside, The errors of (approxi- mate) addition/subtraction are

is taken to be the approximate reciprocal off. The cor-

responding error is

g-i?N.

We will also be interested in the residual

f+g-(SNf+SNg). 1 -%?NSNf .

For our purposes the operations of addition and sub-

traction are so straightforward that we will not here- after return to them.

Differentiation The derivative of SNf is taken to be

Multiplication

The product ’

itself corresponds in general to an infinite series

expansion in terms of the basis functions. In the case of Fourier series multiplication leads to a series of double length corresponding stylistically to the dou- ble length products and accumulators associated with ordinary floating point multiplication. In analogy to rounding the double length floating-point product to single length, we define the (approximate) product of

SNf and SNg t0 be

The error of (approximate) multiplication is

fg - sN(sNf * SNd * (2.1)

’ To reduce the use of parenthesis we write SNf. SJ@ instead of (Snrf) Sm.

Division

We treat division through reciprocation and multi- plication.

The equation fg = 1, which defines g as the recipro- cal off is converted into a quadratic minimization problem. A finite dimensional approximation to l/f is obtained by seeking the minimum of this problem, denoted Fjv, in Im SN. Indeed &, the solution of the following minimization problem:

tin 111 - gNSNf 11 (2.2) gj+h SN

aNf SN-

with the error

df *Nf -- dx

SN - dx *

Integration

‘l-he integral of SNf is taken to be

SN j SNf 0

with the error

jf-SNjsNf.

0 0

For certain sets of basis functions it will be conve- nient to replace 0 by -1 as the lower limit of integra- tion.

Page 5: Ultra-arithmetic I: Function data types

C. Epstein, W.L. Miranker, T. J. Rivlin / Ultra-arithmetic

3. Error analysis for multiplication and reciprocation

In this section we discuss the approximate arith- metic operations which were introduced in the previ- ous section. In particular we develop an error analysis for (the approximate) multiplication and reciproca- tion of functions in a Hilbert space. First we treat multiplication and then reciprocation. We conclude this section with some specialized and extended esti- mates, some of which demonstrate the tradeoff between improved estimates and function smoothness.

3.1. Multiplication

A natural setting for multiplication of functions is r. However, our objective is to characterize the rela- tionship between finite dimensional approximations

to functions and the arithmetic operations. The appro- priate setting for this characterization is the Sobolev space Hk, and so we begin by extending multiplication from C to Hk.

We define the mappingM : C”(a) X C”(a) -+ C,(a) by M(f, g) =fg; simple multiplication. The following theorem shows that M can be extended to Hk(s2) X Hk(s2) provided that k is taken large enough

compared to d := dim a.

Theorem 1. M : Hk(s2) X Hk(s2) + Hk(n) is a contin- uous bilinear operation provided

d/2<k- [k/2] .

Proof. We first show that M(f,g) E Hk(sZ). Using the Leibniz rule, we have

(3.1)

Here the K”$ are constants occurring in the Leibniz rule,

To obtain a majorant of the right member of (3.1), we divide the inner sum which occurs there into two parts:

c = c t c :=&Y& IPl+l~l=lal Ipl+lYl=l~l IPl+lrl=lal

lPlG[k/21 IYI GlkPl

The resulting majorant is

. (3.2)

By hypothesis [k/2] <k - d/2 so that the Sobo- lev embedding theorem impliesf,g E Cik/2l(a).

Then we may extract II +flL from the Zr and I( apgII, from the Err. Thus (3.2) is majorized by

(3.3)

where K is an appropriate constant which depends on the K”h and on II a~fll~ for 0 < [k/2] and on II a,g11& for y < [k/2]. Iff,g E Hk(s2), then (3.3) is finite and fg E Hk(s2) as well.

The mapping M is clearly bilinear so that is suf- fices to demonstrate continuity in each variable sepa- rately. We have

+Gdiay~ii~iaaGfi -.f2)i2 . I

Clearly this expression vanishes as llfr - f2 Ilk tends to zero. 0

The following theorem characterizes the error (2.1) of approximate multiplication.

Theorem 2. Iff, g E Hk(s2) and if k - [k/2] > d/2, then

II4f, g) - ~AJwN.f-, Sivg) II G

G II@ - SN)(fg)II + Ilgll-ll(Z - S,)fll

+ Il~Nfll-II@ - S,)gll .

Proof. Using the triangle inequality and the fact that

Page 6: Ultra-arithmetic I: Function data types

6 C Epstein, W.L. Miranker, T.J. Rivlin ! Ultra-arithmetic

S, is norm reducing, we have

WV,&9 - SivWSN.6 S,g) II G

G IIWL g) - S,M(L g) II

+ IlsN [“(f,g) - M(SNf, SiVdl 11

G II~(.f? d - SNmf, d II

+ ll”df, d - M(SNf, SNg) \I

G ll(I - SN)M(.f,g)iI + IIMdf- sNf,g)iI

+ IIM(sNf, g - SNg) II

G \I(1 - ~N)(fg>iI + kllmll(~ - sN).fli

+ llSNfllmll(~ - sN)tdI .

The last inequality makes use of the hypothesis k - [k/2] > d/2 to invoke the Sobolev imbedding

theorem. q

A somewhat more symmetric statement of the conclusion of the theorem is the subject of the fol-

lowing remark.

Remark. If we add the hypothesis to Theorem 2 that

ll(Z - SN)fll, = o(1) asN+ M, then there exists eN tending to zero withN_’ so that

G Il(l - ~N)(fg>II + kirnll(~ - sN)fll

+(EN+ ifb=)II(~-~N)gli. (3.1)

The theorem and remark assert that the accuracy of the approximate product sN(sNf * sNg) is a fun0 tion of the modulus of continuity of the rounding error operator Z - SN on the class of functions con- tainingf,g andfg. Note that it is possible that 11 (Z - SN) fll and 11 (Z - SN) g 11 are both small while 11 (Z - SN)(fg) 11 is large. An example of this corre- sponds to fi = [0, n] and {$,} = {sin mx}. Now if fE CP [0, n] andflzk)(0) =fl**)(n) = 0, k = 0, . . . . [(p + 1)/2], then (f, sin mx> < K/mP. Then choosing

f = sin nx and g = sin mx so that f, g E C, the jth Fourier coefficient off and g vanishes faster than any power of j. On the other hand

(sin mx sin nx, sin jx) = 0, mtntjeven,

0K3), mtn+jodd.

3.2. Reciprocation

To deal with reciprocation we suppose in addition to f E H*(a) that f(x) # 0 in a and that g = l/f E H*(s2). The following theorem characterizes the results of the minimization process (2.2) used to define the approximate reciprocal EN off.

Theorem 3. Let f(x) E Hk(s2) and let f(x) > c on Sl

for some positive constant c. We suppose that

II (Z - SN) f Ilm + 0 as N * 00. Zf k is sufficiently large compared to d, then

1. 111 -gNSNfll~llgIlmIl(Z--N)fll

+ IIsNf Ilmll(Z - SN) gll ,

I’. 111 -ENsNfll ~llgl\-ll(Z - sN)fll

+(eN+Ilfllm)ll(Z-SN)gII >

II* k - &ii G llgllrn [II 1 - FNSNf II

+ Il(Z - SN)flbIl&fII] ,

“I.

kllmll1 -~NsNfll + IlgII

‘lglv” ’ 1 - Ilgllmll(Z - SN) f Ilrn ’

III’. Il&ll < 1

hlSNf I [l + kllmII(I-SN)fll

+ IISNfllmll(Z - sN)glll >

III”. Il~Nll < mn;Nfl [1 + Ikllmll(z - sN)fll

+(eN+ Ilfll-)ll(Z-sN)gll] .

h I’ and III”, fN --f 0 as N * m. Zn III’ and III”, min means to take the minimum as the argument varies over CL.

Before proving Theorem 3, we make the following observations.

Remark. $?N is an approximate solution of the equa- tion 1 - gf = 0. I and I’ are estimates of the residual

and II is an estimate of the error. III, III’ and III” are estimates of the approximation itself.

Remark. The hypothesis II (Z - SN) f Ilm --f 0 as N * 00 as well as the fact that llgll,~ exists is assured by taking k large compared to d.

We now give the proof of Theorem 3.

Page 7: Ultra-arithmetic I: Function data types

C Epstein, W.L. Miranker, T. J. Rivlin / Ultra-atithmetic

Proof. I. Using the definition ofg and gN, we have

II 1 -gAJ,.fll G II 1 - s,g . S,fll

= ]].k - s,g’ s,fll

<I](-s,j)g/+ lls,f’(Z-SN)gl]

G k]]m]t(Z - s,)f]]

+ ~I~,.%ll(Z - s,)gl] .

I’. I’ follows from I and the hypothesis ]I(Z-SN)f]]_+

0 asfV+m. II.

]k - &]I = k - &.&]I

= ]k - FN [sNf+ (I- sN)f] g/l

G IId1 --E~s~flIl+ II@_& - S~)fll

G llg II= [II 1 - ~_~vS~.fll

+ ll@ - SN)fllmllgivll] .

III. Using II and the triangle inequality gives

IIEJI - llgll G llgll=.= [II 1 -F~s~.fll

+ Il(z - sN).fll=IlgNll] .

fkdving this inequality for IlrN]] gives III. III’. I and the triangle inequality give

IlslvJ’~.fll G 1 + llgllm II (I- S~)fll

+ llSNfllm(z - sj,T)gll .

Coupling this with the inequality

mdS&l IlFdl G IlF~~~fll ,

gives the result III’. III”. III” follows from III’ and the hypothesis

l](Z - sN)fll- + 0 asN+ O”. 0

The following corollary of Theorem 3 character- izes &7 as A’-+ 00 and demonstrates the appropriate- ness ofgN as an approximate reciprocal off.

Corollary 4. Under the hypothesis of Theorem 3,

fim Ilg - %Nll = 0 N-t-=

and

3.3. Additional estimates

Error estimates for approximate arithmetic differ-

ent from the ones already derived will be relevant for the computational processes with which we are deal- ing. They show that finer error estimates are possible

by imposing additional smoothness requirements. As esamples of the possibilities we will consider estimates inZP(R) as compared to the L’(a) estimates just

derived. Note that if p is appropriately large compared to d, then such estimates imply L” estimates as well. We will also consider the reciprocation problem set as a minimization problem in HP, and finally we will give error estimates in HP for functions which lie in

H9,q BP.

For clarity we set d = 1 and we begin with an esti- mate of Il(Z - SN) f lip, the rounding error. Using the commutator notation [A, B] = AB - BA , we have

+ 1 [z - s,, D’] f 1’1 .

Thus

Il(z - SN)fllp <

When the basis { einq is used, [D’, SN] = 0 so that

only the first sum in this estimate survives. Thus in some sense the first sum represents the least expected rounding error, while the second sum corresponds to an additional error peculiar to the particular basis used. The commutators which appear in the second sum complicate the estimates of llfg - sN(sNf ‘sNg)ll, and II 1 - &sNf lip, and we forego displaying them. We do however give the multiplication error for the Fourier basis in the following proposition wherein we

Page 8: Ultra-arithmetic I: Function data types

8 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-arithmetic

specify the periodicity of functions by means of a notation which employs the symbol S1 for the circle.

Proposition 5. if the projection operator SN corre-

sponds to the basis {einx} on L’(S’), then

III. ll~_i# G llgll, + II 1 -%vs,fll, /$ IlD~gIL

1 - 5 IDi [&$I - S,) f] II- j=0

IIU - Siv)fllp Gd2 g II@ - S,)D!fll Here K is an appropriate constant. Moreover, if

JgEHP+l(S1),p> 1,then

and lim II 1 -FNSNf Ilp = 0 , N-f-

llfg - sN(sNf ’ SNd IlP ’

< 5 ,,(I - sN)@(fg)ll i=O

Proof. I. For any f,g E Hp(s2),

11 1 - FNsNf l/p G II 1 - gSNf Ilp

+J$;)[,l((I- sN)@f>@-‘gll G Ilg(f - SNf) Ilp + II @! - SNg) SNf lip

t I&‘$&-)(~ - sN)Di-jidl *

The proof is straightforward and omitted. The corresponding reciprocation errors are given

in Proposition 6, where the approximate reciprocal FN is defined as the solution of the following minimi-

zation problem:

(3.4) min II 1 -gNsNfll, - Im SN

P

+ g [ J

IDi[k-~Nd~N.fl12 I’* 1 112

<K 5 / 2 (;)* IDigDi-j(l - SN) f I* i=e I

Proposition 6. Under the hypothesis of Proposition 5 and with FNdefined by (3.4), we have

5 s&(f)* ~(~‘~Nf)~‘-‘(~-~N)g~* 1 112 , i=O j=O J

I. ,I 1 -gNSNfllp G where K is an appropriate constant. Using the com- mutativity of D’ and S,, this last member here equals

<K 5 k(i){,l(- S,)Dif)Di-igll i=o i=o J

K 5 s,$ (;)* IDig@ - sN)Di--jf12 1 112 i=O

+ Il(SNDif)(l - SN)DiPigll) > +K 5 s J$ (;)’ &&rDif)Di-‘(~-S,)g12

[ i=O 1 112 G

11. Il&$, J$ llD’[g(l- sN)f] II-

<K 5 f; (!)[llDjg(l- S,) D’-‘fll D

i=o j=o J \

+ 11 1 -gNSNfllp 2 llDidL , j=o I t I@N@f)(~ - sN)@-jgll] >

Page 9: Ultra-arithmetic I: Function data types

C. Epstein, W. L. Miranker, T.J. Rivlin / Ultra-arithmetic

where the values of the constants K which appear here

are different. This concludes the proof of I. Note that iff,g E Hp+‘(S1) for p > 1, we may

extract suprema in the last member here to conclude that

II 1 - i%+vfll, G

As before note that iff,g EHP+‘(S1) for p > 1, then we may conclude from the inequality here that llg - &llp vanishes as N + 00. III. The assertion of III follows by applying II to the inequality

+ IISND~fllmll(l - sj@-~glll

which vanishes as N + 00. II. We proceed as in the proof of I:

Ilg - Tmlp = Ilg - &d Ilp

G llg(f - S,.f) Kvllp + II&l - FNSN~) Ilp

+ Jg (;)2,~ig~i-i(l -gNsN.f-) 12] ‘I2

(;)2,1@Z(~-&)f,,: go lDk&d2 1

112

+ II1 -Ihsp~.fll~ J$ lDigIIm 1 ,

which proves II. --I

s fin d.x

(2) a,=-ll I

$ (P,W dx

The estimates of the various errors of approximate arithmetic which we have made may be extended to more stringent norms which in turn would exhibit better quality of the approximations. Such extensions which are obtained by requiring additional smooth- ness for the functions in question are the subject of the following remark.

Remark. Iff, g E H4(S1) for 4 > p, then the results

of Propositions 5 and 6 may be extended to give

4. Examples

In this section we carry out the error estimates of the approximate operations (rounding, multiplication, reciprocation, differentiation and integration) for five sets of basis functions. These correspond to

(1) Fourier series on L1 (S’); (2) Legendre series onL’ [-1, 11; (3) Chebyshev series on L2 t-1 , 1 ] in the measure

dp = dx/dw; (4) sine series on L2 [0, 771; (5) cosine series on L2 [0, 771. The nth generalized-Fourier coefficient a, off(x)

is in each of these five cases:

(1) ’ jne-‘““f(x) dx ;

an=GO

Page 10: Ultra-arithmetic I: Function data types

10 C. Epstein, W.L. Miranker, T.J. Rivlin ! Ultra-arithmetic

(4) an=zjf(x)sinnxdx; 0

(5) a, = i jf(x) cos nx dx .

0

In (2) P,(x) is a Legendre polynomial while in (3) T,(x) is a Chebyshev polynomial both of which will be specified presently.

The estimates are stated first; the details following in the appendix to this Section 4.

4.1. Fourier series

ForfECP(S1), I(f,eiflX)l<K/nP.Thuswehave the following estimates. (Recall that S, corresponds to Fourier series on the basis {einX}EN.)

Rounding

K II@ - SN)fll GNpq2 .

Multiplication Combining this estimate for rounding with (3.1),

we have

~ Il(f-)@‘I1 + Ilghllf@)Il + (Ilfll- + EN)k@)Il

NP-112

(Recall that f E C1 (,Sl) implies that eN = o( 1) as N

tends to infinity.)

Reciprocation The constant K in the estimate for rounding may

be replaced by llf@)ll. Thus using I, II and III of Theorem 3, resp. we have

I. Ilf @) II minlfl+ Ilf h-k@)1I /Np-1’2

s1 1

II. 1

lk - FNll G -

, K -r - /NP-l12 t O(l/NP) ,

$nlf I J III llghill < llgll + Kllgll~/NP-“2

’ 1 -Kllgllm/NP-1/2 ’

resp.

Differentiation Using [D, SN] = 0, we have

II df llf@‘II ___

= ll(I - sN)&II G Np_3,2 ,

the power of N deriving from the property f’ E Cp-‘.

Integration

IIjf - SN j,fl\ G 0 0

G Ii@ - SN) jf,, + Ils, j (1 - sN)f II 0 0

G IlfWl + Ia01 NP+l/Z N’12 . (4.1)

(See the appendix for details.) Here a0 = (1/27r) JiO”” f(x) dx, so that this error estimate improves to

W/N P+l12) for functions of mean zero.

Using Propositions 5 and 6 we have the following HP(Sl)-estimates for functions f,g E P(S’) (p G k).

Multiplication Using Proposition 5,

llfg - sN@Nf ’ SNg) Ilp G

< [ll(fg)‘k’ll + Kl llfl”‘Il + K2 llg(k)lll/Nk-P+1’2 .

Here Kl is formed from llfl)llm,~ = 0, 1, . . . . p and K2 from Ilgo’)lL,j = 0 ) . . . . p .

Reciprocation Employing Proposition 6, we deduce the estimate

111 -~NsNfllp < [KI ltfl”‘II + K, Ilg(k)II]/Nk-p+1’2 ,

Page 11: Ultra-arithmetic I: Function data types

C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-an’thmetic 11

with constants defined analogously as in the previous

estimate. We similarly find that l/g - givllP = O(Np-k+‘/2) also.

4.2. Legendre series

Let L = (d/dx)(l - x*)(d/dx) and let P,,, n =

0, 1, . . . be the nth Legendre polynomial ( [L - n (n + l)] P,, = 0). Let p,, denote the L2-normalized P,. ThenforfECP[-l,l],wehave

l(f,Pn)l GKf[rl(rz + l>l-p’2 , where in terms of the well-defined integer m,

1IL”f II > p=2m, Kf =

lI;L”fll, p=2m+l.

Thus we have the following estimates:

Rounding

II@ - SN)f II zgg-

(4.2)

(see [4]). The same estimate holds for the norm (( Ilm if Kf is replaced by an appropriate constant C’

Multiplication

llfg - s,(sNf ’ SNg) II G

G [Kfg+ llgll-KY+ llS~fll=-Kg]/~~-~‘* .

Reciprocation

Differentiation

The first term of the right member follows from the rounding estimate. Then

(See the appendix for details.)

Integration

,ljf-,j sNfll< -1 -1

G Il(z - SN) j f ll + IlsN j (1 - sN)f 11 -1 -1

< KF bNl ‘-+J(2N+l)(m+3)* NP+1/2

(4.3)

(4.4)

Here F = JTI f so that the first term is the right num- ber follows from the rounding estimate while for the second term see the appendix for details. Combining this with the estimate (4.2) for aN above, we have

I. 11 1 -rNSNfll G + Ilsjjrf llmK, /Np-l’*, 1 KF llif-&j s,fll~-----0 NP+‘/*

-1 -1

II. llg - j?Nll G --& [Kf + lls,f ll-Kg

+ Cf logN{ Ilg(I + 0(N-‘)}]/Np-1/2,

Ml + & o(l/Np-r’*)

111. &Nll G ~ l- & o(UNp-“*)

4.3. Chebyshev series

The Chebyshev polynomials are an orthogonal set on [- I,1 ] with respect to the measure dy = d.x/dm The estimates in this section will be made with respect to this measure. We append the superscript /.J to the norm symbol to indicate this fact.

Let EN(j) be the sup-norm error for the best poly-

Page 12: Ultra-arithmetic I: Function data types

12 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-an’thmetic

nomial approximation tof on I-1, 11. Then

II (I - S,) fllP G 4K( 1 + rr-’ log N) EN(~) where K is an appropriate constant (see [2, p. 1341). Moreover f~ Cp [-1, I] implies for N > p that

cf EN(f)fNp

where Cf is an appropriate constant (see [3, p. 231). For the nth Fourier-Chebyshev coefficient a, we have, by integrating by parts,

1 a,=O- I ( ) np

We now give the estimates.

Rounding

Multiplication

Here we have used the fact that the rounding esti- mate holding for f and g also hold for fg.

Reciprocation Multiplication

111. Ilgj# G

l--10-- 1ogN - *

( 1 mnlfl NP

Differentiation

The first estimate in the right member comes from the rounding estimate; see the appendix for a deriva- tion of the second term.

Integration

Details concerning the derivation of this estimate are

given in the appendix.

4.4. Sine series

For f E Cp [0, rr], (f, sin nx> = 0( l/nP) provided that f(2k)(0) = f(2k)(n) = 0, k = 0, 1, . . . . [(p t 1)/2].

Thus (f, sin nx) = (g, sin nxj = O(l/nP) does not imply that (fg, sin nx) = O(l/nP). As we shall see this causes

a complete degradation of the multiplication estimate.

Rounding

Reciprocation

I. II 1 -grVS~fll G [ L+ IlflL hlfl I

/Np-1’2 ,

11. llg-j&li~ d- mnlf I [

-?- + Ilf IL tinIf I 1 /Np-l” ,

III. IlFiJll G &iNfl (l+[&+llfll-lln’p-1~2).

Differentiation

G O($q + OCN&j . (4.5)

Page 13: Ultra-arithmetic I: Function data types

C Epstein, W. L. Miranker, T.J. Rivlin / Ultra-arithmetic 13

The last estimate here resulting from the fact that d(Z - S,)f/dx is an even function so that its nth Fourier-cone coefficient decays like l/n. An analogous remark applies to the following estimate.

Integration

4.5, Cosine series

ForfE C* [0, n], (f, cos nx) = 0(1/n*) provided that f(2k+1)(0) =f(2k+1)(rr) = 0, k = 0, 1, . . . . [(p+ 1)/2]. Contrary to the case of sine series,fg has this property

along withf and g. This follows from the Leibniz for- mula, Dkfg = ZFo(j3 DifDk-ig since either j or k-j is odd. However (corresponding to the case of sine

series), while (I- SN) is even, (d/dx)(Z - sN)fis odd. Thus some but not all of the estimates corresponding to cosine series are improved relative to their sine

series counterparts. The relevant estimates follow.

Rounding

_

.

Multiplication

llfg - sN(sNf * SNg) 11 = ’

Reciprocation

Differentiation

Integration

,jf- SN j,fi,=o(N+). 0 0

5. Algorithms

We use the symbol = to denote the correspon-

dence between the data type SNf and the array of its Fourier coefficients (a,, al, . . . . aN), i.e., sNf=(ae, al, . . . . ahi). Thus execution of the operations on SNf (and of SNg in the case of dyadic operations) occurs in terms of the vector (a,, aI, . . . . aN) (and of the vec-

tor (b,, br , . . . . bN)=sNg). In this section we specify the algorithms for multiplication, reciprocation, dif- ferentiation and integration in terms of these arrays.

Comments concerning the work required (i.e., number of additions and multiplications) are also given. We begin with the algorithms for the Fourier basis and

conclude with those corresponding to the Chebyshev basis.

5.1. Fourier algorithms

Corresponding to the Fourier basis (example (1) in Section 4), let SNf =a : = (aN, . . . . ao, . . . . a_N)T and SNg=b := (bN, ..,, b,,, . . . . b_N)T be the VeCtOr Of

coefficients corresponding to the data types sNf and SNg respectively.

Multiplication Let SNM(SNf, SNg) =c := (cN, . . . . co, . . . . c-N)=,

be the array corresponding to s&jNf. sNg). Let

a0

1

. . . aN 0 . . . 0

a_lao . . . aNO...

1 a-N . . . a0

MFN= 0 a_N . .

0 . . . 0 a-N

(a (2N + 1) X (2N + 1) matrix). Then as may be directly verified,

c=MFNb.

As displayed, this product requires

3Na + 3N - 1 multiplies

Page 14: Ultra-arithmetic I: Function data types

14 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-an’thmetic

and

3Nz+N-2 adds.

Reciprocation Let 1x1 := (0, . . . . 0, 1, 0, . . . . 0), i.e., 1 denotes

the vector in Im S, which corresponds to the func- tionf= 1. Let RFJ-,I denote the following (4N + 1) X (2N+ 1) matrix given by blocks:

AN

RFN= MFN .

‘I BN

Here AN and BN are N X (2N + 1) matrices specified

as follows: - aN 0 . . . 0

aN_laNO . . . 1 0

AN=

BN=

-1 , a2 al a2 . .aN o...o

p...oa_N. .a_2 a-1 1

0 . . . 0 a-N a-N+1

0 . . . 0 ! 1.

a-N _’

The reciprocation algorithm will be given in terms of

the matrix RFN. Let gN=&7 = (bN, . . . . bo, . . . . b-N)=. Then

j$sNf=RFN&. The (k - p)-inner products COrlZ-

spond to the norm

To determine &, we minimize the following scalar

product (in the (k - p)-norm):

(1 -i?NsNf, 1 -gNSNflk--p =

= 1 - (l,gNSNf)k--p - @NsNf, l)k--p

+ @NSNf,gNSNf)k-p

= 1 - (1 p RFN~.)k--p - (RFNN, l)k+

t (RFNEN, RFNgN)k--p . (5.1)

Let p be the (2M t 1) X (2M t 1)diagonal matrix

whose jjth entry is

#=1tj2t... l -p), j=O,kl,..., kM.

Note that DM becomes the identity matrix when k - p = 0. Then the quantity to be minimized can be replaced by

(D2NRF~Z~, RFNFN) - (l,~~FNgrlv>

- (RFNFN, 1) , (5.2)

a quadratic form defined in terms of the customary

inner product. Note that D2N does not appear in the last two terms here since with the vector 1 the (k - p)-

inner product and the customary inner product are the same.

Note that VF~(RFN&T, 1) = u. Then differentiat- ing (5.2), we obtain the following condition for the minimum of the quadratic form:

RFiD2NRF~& = a .

Here RF; is the Hermitian transpose ofRFN. The matrix RFfiD2NRF~ is invertible for N suffi-

ciently large compared with minlf I. This is seen as

follows:

(RF;GDZNRF~z, z)l12 = (RFNz, RFNz)~!-$

= Ilzs,f Ilk-p

2 IlzSNf 11

>(c+0(1))IlzII, c>O *

We note that RFhDzNRF~ may be constructed from

RFN and DZN in

2(N t 1)(2N t 1)(4N t 1) multiplies

and

4N(2N t 1)2 adds.

Differentiation Letdf/dx=gord/dx:a+b.Then

b = i[NaN, (N - l)aN-r, . . . . al, 0, -a-l, . . . . -Na_N]=.

Page 15: Ultra-arithmetic I: Function data types

C. Epstein, W. L. Miranker, T. J. Rivlin / Ultra-atithmetic 15

Integration Letj,Xf=gorJt:a*b.Then

UN_-_l -a0

N-l , . . ..a1 - 00,

ao-,i~N~,a-~;ao ,..., .,Vao].

i G3

5.2. Chebyshev algorithms

These algorithms correspond to the example (3) of Section 4. Let

f(x) =,G ajTj(x)

so that

SNf =i ajTj(X) ,

or equivalently

sNf=a := (a& . . . . aN) .

As in the previous set of algorithms SNg=b := (b,, . . . . bN).

Multiplication

Since T, T, = (T,+, + TI,-,I)/‘&

fg =k,Z$ rn$ anbj-n + C (an-jbn +anh,-j)] Tj(X) ?l=j

Thus setting

cj =k,$ a,bj_n +ii (an_jb, +a,b,_j) , N

C %!?N(sNf * SNg) = g cj Tj (x) .

Indeed

c=MTNa=i(U+ V+ W)a,

where V = UT,

0 o... 0

and

L biv 0

MTN, U, V and W are (N + 1) X (N + 1) matrices.

Reciprocation Let 0 be the N X (N + 1) matrix of zeros, and let

P be the following N X (N + 1) matrix:

p=~~~_7.~_:.::;.

We introduce three (2N + 1) X (N t 1) matrices X, Y and Z blockwise as follows:

X=$i], Y=;[!] and z=$:].

Now let

RTN=X+ Ytz,

and let RT& denote its transpose. Examining the discussion preceding (5.2), we see

that the vector&J which represents the reciprocal of SNf is determined as the minimum of a quadratic form which is formally obtained from (5.1) by replac- ing RFN there by RTN and by dropping the sub- scripts k - p on the inner products. In the Chebyshev

3a a T case at hand, QFNWN~~V,+)= L2 0, 1, -,aNl . Thus, the condition for the minimum obtained by differentiation is

RT;FRTN~;V= [zao,al, . . . . aNIT.

As in the case of Fourier series RT~RTN is invertible if N is large compared with mini f I.

Page 16: Ultra-arithmetic I: Function data types

16 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-arithmetic

Differentiation Let d/dx : a -+ b. Then using the identities

1 -%,,,cx, = 1 + 2 g T&) 2k-+ldx

and Zk+l

1 d ~ - TZ(k+l)(x) = 2 g Tjcx) 3 2(/c + 1) dx

we find

[(N+r)Pl

2 c (2m - 1)a2,_l , k even, m=l+[k/2]

bk = * IN/21

4 c ~zrn, k odd . m=l+[k/2 ]

Integration LetI:, : a+b. We use the following identities:

and

n>2.

Then

C”=(n+l;(n-l)X

0, n even ,

_: 1 EIimi: 1

we find

x

s -1

We similarly find that

(5 *3) N-l

tc 4-l - a,+1 aN- 1 TN

2n ant---.

n=3 2N

Then

(5.4) ,a=$--5 a,c,,

n=2 bl=a,-y,

b2 =a1 -$, b, = an-1 - an+i

2n , 3Qn<N,

aN-1 bN=---, N

We conclude the presentation of the algorithms with the following critical observation: For convenience we have made no attempt to opti- mize or even economize the algorhhms given here with respect to the work required to execute them. Obviously such improvements can be made and are imperative in a real computer implementation.

Appendix

(5 -6)

In this appendix we supply details for some of the estimates developed in Section 4. The section head- ings in this appendix are the equation numbers in Section 4 for which the details in question are being supplied.

A typical estimate which is used is

,ln$ $11 =[n; -jjy2 =0(1/P-‘/2).

Equation 4.1

+ a1 -a+ F2 + c an--2;an+1 Tn. ( ) (5 *7) n=3

li(I-&) jf,, = 11 c q”1,

0 Inl>N

= O( 1/P--1/2) )

since this is the norm of a series whose terms decay like l/nP+l.

Now let c, be the Fourier coefficients of x. These

Page 17: Ultra-arithmetic I: Function data types

C, Epstein, W.L. Miranker, T.J. Rivlin / Ultra-arithmetic 17

c, are 0(1/n). Then Now

einx _ 1

s, ~(z-sN~f=(z-SN~uo~t,n~Nii,/-----) s 0

=llon~Ny LzNLz”(c$) I

n

Since the a, are O(l/nP), the first series is dominant, and we have that II 2; n>N c,ln II = o(l/N”2).

Equation 4.3 Let ci, k = (42(j + 2k) + 3 m) aj+l+zk. If

f= 2Z,& ajFj, then

If S,f = Z$Lo ajFj, then

N-l [(N-j+l)/Z]-1

;@Nn= Ig c - k=O

cj, kPj Y

so that (d/dx)(SNfl = sN(d/dX)(SNf>. (Here and elsewhere in this subsection and in clear context, brackets mean taking the integer part.) Then

N-l

tc c j=O j=[(N-j+l)/2]

cj, kpj .

Using (4.2), we deduce that lakl <M/(1 + kp), for an appropriate constant M. Then

=$ [E C’.X]Ztx [j.,(N~l),21Y.k]2 &pi- 2 1 (j f 1 + 2k)P-“2

N-l

+fw c c l/TFT 2

j=O k=[(N-j+l)/2] (j+ 1 + 2k)P-‘/2 1

zI<K’ c’L<- K

j=N J .2p-4 flp--5

for some constants K and K’. Also for some con-

stants K, K’ and K”, we have

N-l

%a” g(j_2t2qt y2P3

N-l

GK” c 2j+ 1

+) (Iv - 1)2P-3 ’

where we use k > 2 [k/2]. Then

c N+l K 11 dK’ (N _ 1)2p-4 ’ (N _ 1)2p-5 *

Combining we obtain

iI;- s,; (sNf)II G M(K + 1)

(N _ l)P-512

Equation 4.4

Using the Legendre function identity

PA)=& ’ Cpn+l(X) -pA-l(x)) 9

and the fact that P,(-1) = (-l)“, we have

x

s P, =

-1

&(p,,l(x)-pn-l(X)) *

In terms of the normalized Legendre functions, this relation becomes

x /- L &-I

-1 pn =XJ2n + 1)(2n + 3) - 42n t 1)(2n - 1) ’

NOW if f=4ZjZoaj&), then

n>l.

s, x (I-SN)f= s -1

=s, s x 5 CZjf$

-1 j=N+ 1 :=lw(c, t&J .

Page 18: Ultra-arithmetic I: Function data types

18 C Epstein, W.L. Miranker, T.J. Rivlin / Ultra-arithmetic

‘SN 5 aj( 4+1 j=N+ 1 d(2j + 1)(2j + 3)

Pi-1

J(2j - 1)(2j - 1)

=_

sN C

aNpN ON+ 1 pN+ 1

&xv+ 1)(2Nt 3) +@Nt 3)(2/v+ 5) I

=@N+&Nt 3)pN*

Equation 4.5 Employing (5.7) and (5.8) we have

Let Tj be the jth Chebyshev polynomial and let Pi be Ti normalized in the 11 lip’-norm. In fact F,, =

fiT,,/r, n > 0 while p0 = &/IT. If f = Zr&a, Fn,, thenf’=fi/n Z,“=la,TA.

j f-&jr &‘f=-uc;; 5 -1 -1

n=N+ 1

Then using the identities (5.3) and (5.4) we have a,-1 - a,+1

+,=$, ( 2n ) T?z*

f’=Ec rzan[l -e(n)+2e(n)Tl n

I+-r)/al +2 c

j=1 Taj+e(n) 9 1

where

By changing the summation range here from n = 1 to m into n = N to 00, we obtain an expression for

S,(d/dx)(I - sN) f. Using this expression, we have

+d2 C (2~2 + l)azm+l F0 IV m=[(N+l)/Z] 1

G[/$ &iy2 K

5/p-512 .

Here K is an appropriate donstant which depends on the L2(@) norms of the first p derivatives off.

Equation 4.6

The L2(-1, 1, dp) norm of this expression is

0

Using c, = 0( l/n) and a, = 0( I/&‘), we see that this expression is 0( l/P).

References

111

I21

131

141

U.W. Kulisch and W.L. Miranker, Computer Arithmetic in Theory and Practice (Academic Press, New York, 1981). T.J. Rivlin, The Chebyshev Polynomials (Wiley, New York, 1974). T.J. Rivlin, An Introduction to the Approximation of Functions (Dover, New York, 1981). P.K. Suetin, Representations of continuous and differen- tiable functions by Fourier series of Legendre poly- nomials, Dokl. Akad. Nauk SSSR 158 (1964) 1275-1277.

la2m+e(_j)l(2m + 4)