parallel hermite interpolation: an algebraic...

Computing 42, 291- 307 (1989) C, om0uting �9 by Springer-Verlag 1989

Parallel Hermite Interpolation: An Algebraic Approach*

O. E~ecio~lu 1, Santa Barbara, E. Gallopoulos 2, Urbana, and 1~. K. Ko~ 3, Houston

Received August 8, 1988; revised February 28, 1989

Abstract - - Zusammenfassung

Parallel Hermite Interpolation: An Algebraic Approach. Given n + 1 distinct points ~nd arbitrary order derivative information at these points, a parallel algorithm to compute the coefficients of the corresponding Hermite interpolating polynomial in O (log n) parallel arithmetic operations using O (n 2) processors is presented. The algorithm relies on a novel closed formula that yields the expansion of the generalized divided differences in terms of the given function and derivative values. We show that each one of the coefficients in this expansion and the required linear combinations can be evaluated efficiently.

The particular cases where up to first and second order derivative information is available are treated in detail. The proof of the general case, where arbitrarily high order derivative information is available, involves algebraic arguments that make use of the theory of symmetric functions.

AMS Subject Classifications (1985): 65D05, 65W05, 68Q25.

Key words: Hermite interpolation, parallel algorithm, parallel prefix, symmetric function.

Parallele Hermitesche Interpolation: Ein algebraiseher Zugang. Gegeben seien n + 1 verschiedene Punkte sowie die Werte yon Ableitungen beliebiger Ordnung in diesen Punkten. Fiir die Berechnung der Koeffizienten des zugeh6rigen Hermiteschen Interpolationspolynoms wird ein paralleler Algorithmus vorgestellt, der O(logn) paraUele arithmetische Operationen auf O(n 2) Prozessoren ben6tigt. Der Algorithmus basiert auf einer nenartigen geschlossenen DarsteIlung der verallgemeinerten Differenzen- quotienten dutch die gegebenen Funktions- und Ableitungswerte. Wir zeigen, dab sowohl die Koeffizienten in dieser Darstellung als auch die ben6tigten Linearkombinationen effizient berechnet werden k6nnen.

DetaiIliert behandelt werden die SpezialffiUe, dab die Ableitungen his zur ersten bzw. zweiten Ordnung bekannt sind. Fiir den Beweis des allgemeinen Falles, wo Ableitungswerte beliebiger h6herer Ordnung verfiigbar sind, wird ein algebraischer Zugang gew/ihlt, bei dem die Theorie symmetrischer Funktionen herangezogen wird.

* This article was presented at the First International Conference on Industrial and Applied Mathematics (ICIAM 87), Paris, France, June 29 - Ju ly 3, 1987.

i Supported in part by NSF Grant No. DCR-8603722. 2 Supported in part by the National Science Foundation under Grants Nos. NSF DCR84-10110 and

NSF DCR85-09770, the U.S. Department of Energy under Grant No. DOE DE-FG02-85ER25001, the Air Force Office of Scientific Research Grant No. AFOSR-85-0211 and an IBM Donation.

3 Supported by Lawrence Livermore National Laboratory Contract No. LLNL-7526225.

20*

292 O. E~eeio~lu, E. Gallopoulos, and ~. K. Ko~:

1. Introduction

Given a collection of n + 1 pairs (xi,fi)E F • F ( i=0, 1,..., n; xi's distinct), the interpolation problem over F is to construct a polynomial p, (x) ~ F [x] of degree n such that p , ( x i ) = f i ( i=0, 1 . . . . , n). If f/=f(x~) are the values of a function f at the points x~, then the polynomial p, (x) is said to interpolate f at the nodes x o, xl, ..., x,.

If p, (x) is expressed in the Newton form

p, (x) =fo +fol (x - x0) +Jo12 (x - xo)(x - xl) +

f0123 ( x - x0) ( x - xl) ( x - x2) + . . . +

fo* 2 .... (x - Xo) (x - xl) ( x - x2) ... (x - x,_ 1),

then the coefficients f012...p (p = 0, 1, ..., n) are called the divided differences (DD's) of f , which can be computed using Neville's or Aitken's recursion formulae [12], [5]: As an example the Neville procedure uses the recursion

f,,,+~ ..... ~+q-,-f,+,,,+2 ..... ,+q f,,,+,,...,§

X i - - X i+ q

to calculate the terms in the following triangular table:

fo

L fo~ A A~ fol~ A A~ f l ~ f0123

The terms on the diagonal are the DD's and hence the coefficients of the Newton polynomial. The Neville and the Aitken procedures require 0 (n z) arithmetic operations. Note that the entries in a given column can be calculated independently of one another, and they depend only on the entries in the previous column and the x~'s. This gives a straightforward parallel algorithm for the DD's, particularly suitable for systolic implementation (see [4] and [10]), where each column is computed in O(1) time using as many processors as there are entries in that particular column. Since the maximum length of a column is n and there are n columns to calculated, this approach requires 0 (n) parallel arithmetic operations to calculate all the DD's using O (n) processors.

A parallel algorithm to calculate the DD's in O (log n) time using 0 (n z) processors is reported in [3]: Here we will sketch this parallel Newton interpolation algorithm as it is relevant to our treatment of Hermite interpolation.

Setting 1 yi~= (1.1)

X i - - X j

for i @j, the p-th divided difference o f f can be expressed as a linear combination of the given function values f0, f l , -.., f , with coefficients that are products of the yiSs. This expansion is of the form

Parallel Hermite Interpolation: An Algebraic Approach 293

fo12...p =(Yot Y02 "'" Yop)fo +(Ylo Y12 "" Y l p ) f l + ' " + ( Y p o YpI " " Yp, p - 1 ) f v , (1.2)

where p ranges from 0 to n [1]; It will be useful to denote the coefficient o f f in the linear expansion Offo ~ 2 ... p (i < p) by fo 12...p Is,. Then the coefficients in the expansion (1.2) can be written concisely as

f o 12... p [fi = Ylo Yi 1 "'" Y~, i - 1 Yi, i + 1 ... Yip. (1.3)

If the coefficients f0~z...ely, are known for O < _ i < p < n , then all of the divided differences ( f o , f o l , f o l z , . . . , f o l z . . . , ) of f that are required for the interpolating polynomial p, (x) can be calculated in O (log n) time using 0 (n 2) processors. This is because for each p, the right hand side of (1.2) can be calculated using 0 (log n) parallel arithmetic operations with 0 (n) processors, and n independent instances of this computation is required for p ranging from 1 to n. The coefficients in (1.3) themselves can also be calculated in O (log n) time using 0 (n 2) processors. To see this note that

fo~ [ fo=Yo~

for2 I Io =Y01Yo2

f0123 ]fo =Y01 Y02 Y03

f o l z. . . . ] yo = Yo 1 Yo2 Yo3 -.. Y0.

and therefore the computation of this sequence of coefficients amounts to the calculation of the prefixes of the quantities (Yol, Y02 ..... Yo,). This can be done by using the parallel prefix algorithm in log n time using n processors [6], [8], Since n + 1 concurrent instances of a parallel prefix algorithm are needed to compute the prefixes of the terms (Yio, Y i l . . . . . Yl, i -1 , Yi, i+ 1 , ' " , Yl,) for i ranging from 0 to n, the total number of processors required becomes 0 (n2). A detailed analysis of this approach for the computation of DD's for the construction of the Newton interpolating polynomial can be found in [3],

The parallel Newton interpolation algorithm is numerically better conditioned than the interpolation algorithms that rely on FFT (such as given in [2] and [7]) regardless of the parallelism involved [3]. Particularly the parallel Newton interpolation algorithm is numerically superior to the parallel algorithm proposed in [11]; which constructs the Lagrange interpolating polynomial in O (log n) time with O ( n 2) processors by extensively using the FFT.

In this paper we construct a paraUel algorithm for Hermite interpolation. The algorithm computes the coefficients of the Hermite interpolating polynomial in O (log n) parallel arithmetic steps using O (n 2) processors for a fixed number of derivatives by making extensive use of parallel prefix algorithms.

The error analysis of the algorithm we present is similar to the analysis of the parallel Newton interpolation algorithm given in [3] and will not be addressed here.

294 O. Egecio~lu, E. Gallopoulos, and ~. K. Ko~:

2. Hermite Interpolation

In the most general case of Hermite interpolation, we are given the derivative information

f (xi), f(1) (xi) . . . . , f ( , , i - 1)(xi)

at n + 1 distinct points Xo, x 1 . . . . , x , .

Denote f ( x i ) by f / a n d f ( k - 1) (xi) by fi~. Then the Hermite interpolating polynomial (k- 1)!

can be expressed in the form

P (x) =fo +f02 (x - Xo) +f03 (x - Xo) 2 "~... +fo'o(X - Xo) m~ - 1 +

f O ~ O l ( X _ _ X o ) m O + f o ~ o l 2 ( X _ _ x O ) m o ( x _ _ X l ) + . . . + f O m O l . , ( X _ _ x o ) m O ( X _ _ X l ) m 1 - 1 +

fo-o 1", 2( x - Xo) m~ (x - x 1)"* +fo-ol-, 2~(x - Xo) m~ ( x - x,) "* (x - x2) J r - . . . i f ' . . . -[-

f o'~ I~, 2 =~ .., n",( X - - XO) mO ( X - - X 1 ) ml ( X - - x 2 ) m 2 . . . ( X - - Xn) ran-1 .

The coefficients f0.or,...~oo for various a~ <m, are called the general ized divided differences (GDD's) off . The G D D ' s can he calculated in the Same way as the DD's by using the Neville or the Aitken recursion formulae. For instance, the Neville recursion in this case takes the form

fe,...jo_fe,..4o, ' - f i~ ...j",

x i - xj (2.1)

=Y,~(f,o, r - ~ - f , o,-~ r,),

where the terms of the form f~, with k < m, are to be interpreted as

1 _ _ f ( k 1)(Xl)" ( k - l ) [

The correctness of this process can be verified by noting that

d k - 1

d x k - 1 P ( x i ) = ( k - 1 ) ! f i~=f(k-1)(Xi) l < k < m i

holds. As an example, given two points x o and x I with mo= 3 and m I = 2, the Neville procedure computes the entries in the following triangle

fo fo fo~ fo f~ f~

fo2 fo~

s fo21 fo31 f~ fo12 fo212 fo312

using the recursion in (2.1). As in the case for calculating the DD's , the Neville procedure requires 0 (n a) sequential time to calculate all G D D ' s for fixed values of ml, O<i<_n.


Since we are interested in developing a parallel algorithm to calculate all GDD's , we will seek a linear expansion formula of the form (1.2) for f0~ ,~ in terms offj, with 0 < j _< n and 1 < i _< ai for various ai < m i. It turns out that the coefficient fooo 1o,. ,~ [ ~, can be expressed in a closed form, which reduces to the expressions for the coefficients that appear in the Newton polynomial.(1.2) when a o = aa = . . . = a, = 1.

In most practical instances where Hermite interpolating polynomials are required for the data

(I) f(xi) and f ' (xi) for i = 0, 1,.. . , n are given, or

(II) f ( x i ) , f ' (xi) and f " ( x l ) for i = 0, 1, ..., n are given,

the algorithm turns out to have an especially simple structure. In Section 4, we describe parallel Hermite interpolation algorithms for these specific cases in detail.

3. Linear Expansion of G D D ' s

In this section, for brevity of notations we will denote Yoi by Yi and represent the G D D f0, l, simply by the string 0' lS whenever necessary. The repeated application Of (2.1) with two given points x o and x a can be represented as a signed and weighted binary tree, where the weight associated with each node at level p is yr. The leftson of each node is obtained by dropping a i, and the rightson by dropping a 0. The leaves correspond to strings consisting of O's or I'S only. All the right branches carry a negative sign and the left branches a positive sign, in accordance with the signs produced by repeated application of (2.1). The sign of a given node is defined to be the product of all the signs on the path from the node to the root. As an example, when r = 3 and s = 2, we have the following representation of the expansion offoooa a-

Weight:

1 00011

Yl 0001

/ \ y~ oo0 o01

o? o\:

0011

/ \ 001 O11

/\- /\- O0 Ol Ol I I

�9 /\ "// 0 I 0 I

Fig. 1

296 {). E~ecioglu, E. Gallopoutos, and ~. K. Koq:

From this representation, we see that foool 1 l yo is equal to the signed sum of all the weights of the leaves labeled 0. Since each 0 omitted on the path from the root to a given node introduces a negative sign, the sign of each leaf labeled 0 is positive. This gives

f00011 Ifo = +3y~.

In general, the sign of each leaf labeled 0 in the expansion of f0,1= will be ( - 1 ) ~-1. Note that all these leaves are at level r + s - 1. Furthermore, in the expansion

f0,1~l f0=( - 1) r -1 Coy] +s-1 '

the coefficient C O is the number of leaves labeled O.

Next, we count the number of leaves labeled 0 in such a tree in the general case. It is not difficult to see that C o is the number of ways of parenthesizing the string O" 1 s starting from 0'-1(01)1 s-I in such a way that each new pair of parentheses introduced contains one more symbol than the previous. For example, with this coding, the leftmost leaf in Fig. 1 corresponds to the parenthesization ((0 (0 (01))) 1), and the rightmost one to (0 (0 ((01) 1))).

Let at, s denote the number of suc h parenthesizations of the string 0" U. Using this interpretation (or proceeding directly from the recursive structure of a binary tree), we obtain the recursion

a,,s=a~_l,~+a,,~_ 1 (3.1)

with a,,1 =al,~ = 1. By a simple induction, this gives

(r+s-2~ a,,s=\ s -1 )"

Thus (r+s-2~ ,+s-1

f~176 \ s - 1 / t y l (3.2)

Note that by treating the string 00 as a single symbol the derivation of (3.1) also yields that the sign of each leaf labeled 00 is ( - 1) "-2 and that there are

s--1 of these. Thus

fo, lOlloo=(_ 1),_ 2 (r+s-3"~ y~+~-2 \ s - l /

and in general we have for i< r

fo,,.i So = ( _ 1) ' - i (r + s - i - 1~ y]+S-i. (3.3) K

t s - 1 )

The computation of the coefficients off1 and f l 1 can be carried out exactly as above by a symmetry argument, giving


fooo,, I:, = - 3 yi 4, fooolt I:,~ = -Y~-

Combin ing all these observat ions, foool~ has the expansion

foool ~ = 3 Yt fo - 2 y~ foo + Yt z f0oo - 3 Yt A - y~ f , , . (3.4)

Equivalently, using the no ta t ion int roduced in (1.1), the expansion in (3.4) takes the form

fooo ~ t = (3 y~,) fo + ( - 2 y3) foo + (Y~,) fooo + ( - 3 y4o) fa + (Y]o) f~1.

Now we turn to the general case of comput ing the coefficients fo, in the expansion of f0.o a-, ... ,.o. Even though the combinator ia l t r ea tment of the derivat ion of (3.3) can be extended to this case, we will proceed by induction. First, we give a closed formula for the coefficient of fo in the expansion offooo l~ ..,~ Recall tha t a composition or an ordered partition of a nonnegat ive integer m into n parts is a representat ion of the form

m = 2 1 + ~ 2 + ... +) 'n

in which each 2i is a nonnegat ive integer and the order of the summands is impor tan t . For example, there are exactly four ordered part i t ions Of 3 into 2 par ts : 0 + 3 , 1 + 2 , 2 + 1 , and 3 + 0 .

T h e o r e m 1 : fooo r, ... ~-ol fo =

0]'1 -I- ai -- 1"~ ('1'2 + a2 -1 ) (~'n+an--l'~y~,+a,y~2+a,...y~n,+a ~ (-1)"~ 'h / \ '~2 " " \ ,~. / where the summation is over all ordered partitions o f a o - 1 into n parts.

Proof:

Note first of all tha t a o = r and a 1 = s gives

fo, rlyo=(__l)r_l E (,~,1+S--1) (r+s--2~ ,q=,_, 2, Y~l'+S=(-1)r-ik s--1 ,]Yrl+s-i'

in agreement with (3.2). Fur thermore , in the special case where ao = a~ . . . . . a , = 1, we obta in the coefficient of the Newton interpolat ing polynomial

fo~ .... [ : o = y l y ~ ... yn, as given in (1.3).

For convenience, set

(~'lWal--l~()~2+a2--1~ (2n+an--l~y~+aly~2+a2...yZn,~+a n c,,m= Y. \ ,t, / \ ,h / \ & /

2 1 + 2 2 + . . . +~-n=m

for a = (al, a2 . . . . . a,) and m > 0, where the summat ion is over all ordered part i t ions of m into n parts . Thus we claim tha t

fo'or'...~e~[:o--(- 1)" ~ C~,.o-1 �9

298 0. E~ecio~lu, E. Gallopoulos, and ~. K. Kog:

First, we construct the generating function F~ of the sequence of numbers C~, ~. Note that by Newton 's theorem we have for every i the formal power series expansion

') (1 - y i ) m '~i Y/~'

so that multiplying these expansions for i = 1, 2 , . . . , n gives

E 21, -~2, ... , ~.n >__ 0

It follows that

Fa = ~ C~,~t m - m > O

(3.5)

y~y~Z . . . y~ .

( 1 - y l ) o~ ( 1 - y~)O~ . . . ( 1 - y . ) ~ 1 7 6

( ~ 1 + a 1 - - 1 ) ( 2 2 + a z - 1 ) ( 2 n + a n ~ l ~ . . . . . Yn

~1 \ ~2 k ~n /y~ ,+aly~z+a2.

a l f12 Yl Y2 "'" Y~"

(1 - - t y t ) "~ (1 -- ty2) "~ ... (1 - - ty , )"" (3.6)

F rom the recursive formula for the G D D ' s given in (2.1), we have

fO.ol~t ..n..-~- yn(fO.ol.,...n~ ~-- fO.o 1[~ . . . a . ) .

Therefore,

fO"~176 Ynfo~176 ...... ""-~lfo-- Ynfo~176 ' V'...,~176 i0" (3.7) n

By induction on ~ al, we may assume that the two coefficients on the right hand i = 0

side of (3.7) are given by

Yn(--1)a~ .......... -1) 1 e O-1 and

Y,(-- 1)a~ F(,,,,a ....... )[ eo -2,

respectively. Thus, to prove the theorem, it suffices to show that

F,~1,,2 ....... ) I i . . . . . y , F ( .. . . . . . . . . . . -1) leo-l+ y , F , . . . . . . . . . . . . ) [ ~ ....

= y , F ( . . . . z ....... -1)[ t "~ lAff Ynt V( . . . . 2 . . . . . . . )1 t"~

But by (3,6), this reduces to the verification of the functional identity

al 82 an Yl Y2 " " Y .

(1 - t y t ) "1 (1 - t y2)"2 . . . (1 - t y . ) a . (3.8)

�9 . Yl Y2 " " Y~" = Y " ( 1 - - t y l ) a l ( 1 - - t y 2 ) a 2 . . . ( 1 - - t y , ) ~.-1 ~ - Y " t ( 1 - - t y O " ~ ( a - - t y 2 ) ~ 2 . . . ( 1 - - t y . ) a . '

which is immediate. [ ]

The general fo rmula for the coefficient f0.oto,. ..... [I0, can be stated as follows:


Theorem 2: f0~176 ~176

(__l).o_ix~+a2+~ + 1+;~ 1 22+a2 -1 / 2 . + a , - l \ + a+

, , oo , ,h . . . \ ,~. )~1 ~2 . . .~ ,

for every i < ao, where the summation is over all ordered partitions of ao - i into n parts.

Proof:

The proof is similar to the proof of Theorem 1 and will be omitted. []

Note that by using the notation for the y~j's given in (1.1) the formula for

fo~or,...,oo]~,

in the general case takes the form

,to ,ti-1 2~+1 \ ,l. y)8 +oo ... y r ;o , - l y~ ,~ ;o~+ ~ ... y2:+~

where the summation is carried over all ordered partitions

;t0 + ... +2s-1 +2j+1 + ... +2 , of a s - i.

4. Special Hermite Interpolating Polynomials

In this section we will give algorithms for the parallel computation of the GDD's for the following important cases:

CASE (I) f (x i ) and f ' (xi) given for 0 < i < n, and

CASE (II) f (xi) , f ' (xi) and f " (xi) given for 0 < i < n.

As we remarked in Section 2, if the coefficients o f f i are known for all 0 _<j_< n and i < ms then all the necessary GDD's, namely the terms of the form f0oo 1o,... p.. where k>ao>>_al > ... > % and 1 <p<_n, can be calculated in O(10g n) time using O(n 2) processors for a fixed value of k. Hence, the problem reduces to calculating the coefficients

fO",l",...p"plf j for l<_i<_k and O<_j<_p<n. Now we will show that, for k = 2 and k = 3 corresponding to the cases (1) and (2) above, these coefficients can also be calculated in O (log n) time with O (n z) processors.

CASE (1) (k=2)

From Theorem I we have

f021~ ...,o.1 i0 =

xl+xs+...+a,=~ \ 21 / \ 22 " " \ 2 v /

300 O. E~ecio(glu, E. Gallopoulos, and ~. K. Kog:

for 1 _< p < n. This equation simplifies to p

fo~t,,...p~ = --y{lya22 .., y~" ~ aiYi. i=1

Similarly, for the coefficient of f02, we obtain from Theorem 2 that

fo21,, ...p,,l fo = y~, ya22 ...yap,.

Since 2 >_ ax > a2 >-... ->- a, >_ 1, as p ranges from 1 to n we obtain the following table for the coefficients o f f o=:

fo211fo~=Yl

fo212[ fo2= yzl

f02 22] 2 fo~= Yl Y2 2 2

fo21~2~[fo~=Yl Y2 2

fo~ = Y~ Y2 Y3

fo~ x~z~32 [ :o,= Y21Y2z Y3 ~

f02122232,..111 2 2 :o~= Yt Y2 y2 ... y .

2 2 2 foz122232...n2[fo,= y2 y2 Y3 "" Yn "

Clearly these are the prefixes of the quantities (yl, yt, Y2, Y2, Y3, Y3,-.., Yn, Yn), and hence they can be calculated in log 2 n time using 2 n processors. Due to symmetry of the GDD's , all of the terms

fo21222...(p_1)2plh~, foz1222...(p_l)2p2[fj, O<j<p, l < p < n

can be calculated using n + 1 instances of the parallel prefix algorithm. Hence 0 (n 2) processors suffice to calculate all of them in 0 (log n) time.

For the terms

f o ~ . . . , , - ~ ) ~ [ : o and fo~l~z~...(p-l?p~[:o

we obtain the following formulae:

f021 I f o = - -Y l (Yl)

f0212 [ f 0 =

f02122 [ fo =

f021222 [ fo =

f0212223 ] fo =

foz122232[fo=

f o ~ z ~ . . . . [ :o =

fo~1~2~<...~[ :o =

_ y2 (2 Yl)

_ y2 .V2 (2 Yl + Y2)

_ y2 y2 (2 Yx + 2 Y2)

_ y2 y2 Y3 (2 y~ + 2 YE + Ya)

2 2 - Yl Yz y2 (2 Yl + 2 Y 2 + 2 Y3)

_y2 y~ y2 ... Y,(2Yl +2y2 +2y3 + ... +y , )

_y2 2 2 2(2 + 2 y z + 2 y a + . . . + 2 y , ) YzY3...Y, Yl


The terms outside the parentheses are obta ined by negat ing the coefficients offo~, and hence they need not be calculated again. For the terms in the parentheses notice that by applying the parallel prefix a lgor i thm to the quantit ies (2 yt, 2 Y2,.. . , 2 y,), where the opera t ion is taken to be addit ion, we get the terms 2y l , 2y~+2y2 , 2 y~ + 2 Y2 + 2 Y3, etc. The other half of the terms can be calculated f rom these by doing one parallel addition. Hence again 0 (n) processors suffice to calculate all the coefficients of f0 in the linear expansion of f0: r,...po~ for 2 > a~ > . . . > ap > 1 and 1 _< p < n. The coefficients of f j fo r j ranging f rom 0 to n are found by n + 1 concurrent applicat ions of the procedure explained above. Hence the coefficients of all f j and fj~ in the expans ion off0~ r, 2 o~... poo for 1 < p_< n and 2 >__ a~ _> a 2 _>... >__ a , > 1 can be found in O (log n) t ime using 0 (n 2) processors.

CASE (2) ( k = 3 )

In this case we are interested in the coefficients of fo, fo~ and fo~ in the linear expansion of f0~ 1., 2 ~ ... p., for 3 _> a~ > . . . _> a v > 1. We will s tart with the simplest one

fo~l.,2~176 a, "''Yp

which yields the following set of coefficients:

f 0 3 1 ] fo,= Yl

fo312[r

fo~z2l~o~=y~y 2

fo31323[fo~=Y~Y32

f031~23 .... [yo~=y3y32 3 3

fo3z~2~....21 ion= ya Y2

fo~,323 .... ~[fo,=Y~Y3z

3 �9 "" Y . - 1 y.

... y3_1y2

"'" 2 3-1 y 3

I t is clear that these quantit ies can be calculated in log 3 n t ime using 3 n processors. By applying n + 1 concurrent instances of the parallel prefix a lgor i thm we find all f0~ 1o,... ~~ I h, in 0 (log n) t ime using O (n 2) processors for 3 _> a 1 > a2 > . . . _> ap > 1 and l<_p<_n.

For the coefficient offo2 in the expansion off0~ 1o,..po., f rom Theo rem 2 we obtain the following formula,

P

f02r ' ...vo.]So = _y] , y~2 ... y,p, Z alYi, i = 1

which can be given explicitly as

302 (). Egecioglu, E. Gallopoulos, and ~. K. Koq:

fo31 tfo~ = -Yl(Yl)

2(0312 l fo~ = - -y2 (2 Yl)

I:o = (3 yl)

fo~ 1~2 I:o~ = -Y~ Y2 (3Yl +Yz)

fo~1~2~ I:o~ = -Y~Y~(3Yl +2Y2)

fo~1~2~ l:o,= - Y~ y3 (3 Yl + 3 Y2)

f03132 a .... [fo~ = _y~y~ 3 ... y,_~ y,(3 y~ + 3 Y2+ -.- +Y,)

fo~1~2~ .... 2tfo~=--Y~Y32 " a �9 . y.-~ y2.(3Ya + 3 Y z + -.. + 2 y . )

fo~l~z~...,,~lfo~=-y~y3. 3 - .Y, - lY~(3Yl + 3 y 2 + - . . + 3 y , ) .

The terms outside the parentheses are simply obta ined by negat ing the coefficients of f0~ as it is similar to the k = 2 case. Using n processors we can calculate 3y l , 3 Yt + 3 Y2, ..-, 3 Yl + 3 Y2 +- - - + 3 y,, in log n time because they are the prefixes of the terms (3 Yl, 3 Y2, 3 Y3, ..., 3Yn). The rest of the terms in parentheses are calculated f rom these in O (1) t ime by doing parallel addit ions using n processors. I t follows that O (n 2) processors are sufficient to calculate all of the coefficients f031.~ 2.2 . . . p , , I~ for 3>a l >az>_...>%> l, l <p<n and O<_j<_n in O(iogn) time.

To compute the coefficients offo in the expansion offo~ ~~ ... p.o for all 1 _ p_< n we again use the Theorem 1.

/o~ 1., ... F~I fo ~-

(21+al -1 ) (22+a2- l ) . . . ( 2v+ap-1 )y~ l+a ly~+~. . . z.+ap ,q+~2+...+)~v=2 21 22 \ J~e Yv

The sum 2~ + 22 + . . . + 2p can be equal to 2 only in two ways:

(i) , ~ = 2 for l<_i<_p, (ii) 2 ~ = 2 j = 1 for l<_i,j<_p and ir

By separat ing these two cases in the sum operat ion we obtain

~pf ~ [ai+l) ~ (a[](aj) j) fo~I~ v~ ~ L ~ ,, Y~+ Y~Y \l <_i<_p \ Z l <i,j<p \ l,] 1

( a i ( a i + l ) j) . . . . . y~ + ~ alajylY ...y7 Z 2

1 <-i<__p 1 <__i,j<--p i ~ j

aiai+l) Y i+~ - . E aiY ~, a{y =Y~Y~" 'Y ) l<~<p - 2 z [-l<_i<_p l<i<v

= . . . Z + Z y .


These quantities can be arranged in a table as in the previous cases:

fo~l I so--lyl [Y2 +(Yl)2]

fo31= I fo = l Y z [2Y 2 + (2 Ya) 2]

fo313 ] :o = l y 3 [3 y~ + (3 y,)2]

1 3 fo31~2 Ifo = ~ Y~ Y2 [3 y~ + y2 +(3 yl + Y2) z]

fo, 132= [ :o = l y a l Y~ [3 Y2 + 2Y~ +(3Y1 § 2y2)21]

s l f0 = 1 y 3 y3 [3 yl 2 + 3 y~ +(3 Yl § 3 y2) 2]

1 3 3 gY*Y2. ' .Y~- lY ,[3y l2+3Y 2 + . . . + 3 y 2 - , + y ~ z + ( 3 y l § 2]

f03132,...(,-1)3,= [:o= 1 3

Y2+ ... + 3Y.2-1 +2Y.2 +(3Yl + 3 y 2 + ,.. + 3y._~ +2 y.)2] ~_yty3 .... y a_lynZ[3y}+ 3 2

fo31323...(n-1)3n31fo = 1

3 3 . .Yn-tY. ~Y~Y2. 3 313y~+3y~+. . .+3y2-1+3y2+(3yl+3y2+. . .+3yn_l+3y , )23 .

It is not difficult to see that all of these coefficients can be calculated in 0 (log n) time using only 0 (n) processors. We conclude that the coefficients offo, fo= and fo3 can be computed in 0 (log n) time using 0 (n 2) processors. Hence all GDD's of the form f0,~o, ..co. can be calculated in O(logn) time using O(rt 2) processors for 3>_al>_az>_...>_ap>_l and l <_p<_n.

Table 1

Parallel Hermite Interpolation Parallel Neville/Aitken

Processors Time Processors Time

CASE I 2n(n+l) 3togn+5 4n

f, f' given

CASE II 3n(n+l) 41ogn+3+41og 3 9n

f, f; f" given

3n

3n

Neville/Aitken

Time

9 --n(n+l) 2

19 --n(n+l) 2

304 Parallel Hermite Interpolation: An Algebraic Approach

Explicit processor and arithmetic operation counts for the above algorithms, as well as the serial and parallel complexities of the classical algorithms (Neville and Aitken) for the two special cases I and II covered in this section appear in Table 1.

5. The General Case

Next, we will show that in the most general case, where up to k-th order derivatives are given (i.e. each ai<_k), the computation of the coefficients of the Hermite interpolating polynomial can still be performed in 0 (log n) parallel time using 0 (n 2) processors.

Theorem 3:

The coefficients of an arbitrary order Hermite interpolating polynomial can be computed in 0 (log n) parallel time using 0 (n 2) processors.

Proof: The idea of the proof rests on the theory of symmetric functions, and we give a sketch of the basic ideas involved.

Given a positive integer N and a set of variables Yl, Y2, - . . , Yn with n > N, denote by AN the space of symmetric polynomials in these variables homogeneous of total degree N with rational coefficients. The m-th homogeneous symmetric function hm(Yl,Y2,...,Y,) and the m-th power symmetric function O=(Yl,Y2 .... ,y,) are defined by setting

h,,,(Yl,Y2, ...,Y,) = ~, Yi, Yi2 --. Yi,, (5.1) 1 Ni l -<.i2<--...<im<.n

~m (Yl, Y2, "", Y,) = ~, YT', (5.2)

with h 0 = ~o = 1. Furthermore, for any partition 2 of N

put ha = h~ 1 hx2 ... hzN, ~ = 0~1 ~2 ' " O~N'

It is well known [9] that the collections {ha} and {~a}, where 2 runs throug h all partitions of N, form rational bases for AN. Homogeneous and power symmetric functions are related by the determinantal formula

m ! h,, = det

gq - 1 0 0

~2 ~1 - 2 0

~m- 1 @rn- 2 - -m+

4,1

(5.3)


which holds independent of the number of variables. Thus, we have an expansion of the form

ha (Yl, Y2 . . . . , Y,) = ~, Cu 0 , (Yl, Y2, -.., Y,), #

for some rat ional numbers c~ where # runs through all par t i t ions of N. Note that by (5.3), the coefficients c u are independent of n.

F r o m (5.1), it easily follows tha t the generat ing function of the hm's is given by

1 Z h~ (y~, Ya, .--, Y,,) t " - (5.4)

,,_>o (1 -- t y0 (1 - t y z ) . . . (1 - - t y , ) "

F r o m the generat ing function in (5.4), we have that , for any positive integer a,

1 = . . . . , O r e ) t " (5.5)

( 1 - t y a ) " ( 1 - t y z ) " . . . ( 1 - t y , ) ~ m>_o

where Ca,,, is a homogeneous form of degree m.

N o w we consider the expansion of the generat ing function F . in terms of the power symmetr ic functions. Note that by the symmet ry of the computa t ion of the G D D ' s we m a y assume that a 0 >__aa > a 2 _ ... _> a, , and all of these numbers are bounded above by k. The expansion we need for the function F , is best communica ted by an example. Suppose n = 4 and a = (4, 4, 3, 2). Then

14 , 4 . 3 t Y2 Y3 Y~

F . - (1 - - t Yl)4 (1 - - t y2) 4 (1 - - t y3) 3 (1 - - t y4) 2

can be writ ten in the form

�9 3 Y2 Y3 y2).

(1 - t y 0 2 (1 - t y2) 2 (1 - t ya) 2 (1 - t y4) 2

1 1

( 1 - t y l ) ( 1 - t y a ) ( 1 - t y a ) " ( 1 - t y l ) ( 1 - t y 2 ) "

Clearly, in such an expansion, the number of products of the form

1

(1 - t yl)a (1 - t y 2 ) " . . .(1 - t y , ) a depends only on k.

F r o m the expansion in (5.5), it follows that in general

F , y ? . . . y.,~ l , ~

is a linear combina t ion of power symmetr ic functions ~'a (YI, Y2 , . . . , Y~) for various i < n, where 2 is a par t i t ion of m.

Now in the expansion (5.5), the number of terms of the form ~ka tha t appea r as summands of the functions Ca,,, is independent of n. This means in return that the number of terms of the form ~a that appea r in

21 Computing 42/4

306 O. E~ecio~lu, E. Gallopoulos, and (j;. K. Ko~:

Fa I a l a2 an I tm

Yl Y2 ... Yn

is independent of n. Note further that for each power symmetric function q)~ that is such a summand, 2 is a partition of m. Since we are interested in the coefficients of t m in F. for the values m -- 0, 1, ..., a i - 1 and al < k for every i, the number of individual power symmetric functions r that enters as a product in r only depends on k~and not on n.

It follows that the coefficients of the Hermite interpolating polynomial can be calculated in O(log n) parallel time using O (/l z) processors, provided that each ~', (Y~, Y2, y~) and each fixed product of the form a, ,2 ---, Yl Y2 --- Y~(' can be computed in O (log n) time using O (r/z) processors. But for each such computation, the analogue of the parallel prefix algorithm that was demonstrated in the computation of the cases k = 2 and k = 3 is applicable. Our claim follows from this observation. []

We remark that even though the number of parallel arithmetic steps required to compute the coefficients of the Hermite interpolating polynomial is guaranteed to be logarithmic in n with O (n 2) processors in the general case, the constants hidden in the big 0 notation are necessarily exponential in k if no other shortcuts are taken into account. This is not surprising in view of the formula for the coefficient of f0 given in Theorem 1, since the summation involved is over all ordered partitions, and there are exponentially many ordered partitions of k into n parts in general. As an example, the expansion of

fo 1323 ...(n-- 1)3n 2 I fo

in terms of the power symmetric functions is obtained by taking the coefficient of t 2 in

3 2 -'" Y,- t Y,

(1 - t yl) 3 (1 - t y2)3 ... (1 - t y._ ~)3 (1 - t y.)2

�9 . . Y n - l Y , hm(Y l ,Y2 , . . . , Y n ) t hm(Y l ,Y2 . . . . , y , - 1 ) t Lm_>O Lm>_O

and then expressing each homogeneous symmetric function in terms of the power symmetric functions by the determinantal expansion (5.3). For this particular case, we obtain the expression

fo3t323...(, 1)3,2 Izo=y3 y] ... y,3_, y,2 . .

~(Yl, ..., Y,)+ 02 (Yl, . . . , y , )+~-02 (Yl,-.., Y,-1)+

5 - 0 2 ( y l . . . , y , ) + 2 01 (y , , . . ., y,) 01 (yl , . . . , y , - 1

as opposed to the much simpler expansion obtained in Section 4.


Thus , for pa r t i cu l a r cases i n v o l v i n g smal l values of k, it seems poss ib le to cut d o w n the c o n s t a n t s in ques t i on b y cons ide r ing special expans ions wi th smal le r n u m b e r of t e rms t h a n the above t r e a t m e n t wo u l d p roduce . Therefore , the a l g o r i t h m imp l i ed by the above p roo f for a r b i t r a r y k has m o r e of a n exis tent ia l f lavor, a n d special ins t ances (e. g. k = 4, 5) can be m a d e m o r e efficient by jud ic ious g r o u p i n g of the te rms invo lved in the e x p a n s i o n of the fo rmu l a in T h e o r e m 2.

References

[1] Abramowitz, M., Stegun, I. A. : Handbook of Mathematical Functions, pp.877. Dover 1972. [2] Aho, A., Hopcroft, J. E., Ullman, J. D. : The Design and Analysis of Computer Algorithms.

Addison-Wesley 1974. [3] E~ecio~lu, 0., Gallopoulos, E., Kog, ~. K. : Fast and practical parallel polynomial interpolation.

Technical Report No. 646, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, January 1987, submitted to Journal of Parallel and Distributed Computing�9

[4] Kog, (~. K. : Parallel Algorithms for Interpolation and Approximation. PhD. Thesis, Department of ECE, University of California, Santa Barbara, July 1988.

[5] Krogh, F. : Efficient algorithms for polynomial interpolation and divided differences. Mathematics of Computation 24/109, 185- 190 (1970).

[6] Kruskal, C. P., Rudolph, L., Snir, M. : The power of parallel prefix. IEEE Transactions on Computers C-34/10, 965-968 (1985).

[7] Kung, H. T. : Fast evaluation and interpolation. Technical Report, Department of Computer Science, Carnegie-Mellon University, January 1973.

[8] Ladner, R., Fischer, M. : Parallel prefix computation. Journal of ACM 27/4, 831-838 (1980). [9] MacDonald, I. G. : Symmetric Functions and Hall Polynomials. London: Oxford University Press

1979. [10] McKeown, G. P. : Interpolation using a systolic array. ACM Transactions on Mathematical

Software 12/2, 162- 170 (1986). [11] Reif, J. : Logarithmic depth circuits for algebraic functions. SIAM Journal on Computing 15/1,

231-242 (1986). [12] Tsao, N. K., Prior, R.: On multipoint numerical interpolation�9 ACM Transaction on

Mathematical Software 4/1, 51-56 (1978).

Omer E/gecio~lu Department of Computer Science University of California Santa Barbara Santa Barbara, CA93106, U.S.A.

E. Gallopoulos Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign Urbana, IL61801, U.S.A.

~etin K. Ko9 Department of Electrical Engineering University of Houston Houston, TX 77204, U.S.A.

21"

parallel hermite interpolation: an algebraic...

Documents