pivoting and backward stability of fast algorithms for ...olshevsky/papers/pppc3.pdffor the new...

Pivoting and Backward Stability of Fast Algorithms for SolvingCauchy Linear Equations�

Tibor BorosArrayComm, Inc.3141Zanker Road

San Jose,CA 95134, USA

Thomas KailathInformation Systems Laboratory

Department of Electrical EngineeringStanford University, Stanford CA94305-4055, USA

and

Vadim Olshevskyy

Department of Mathematics and StatisticsGeorgia State UniversityAtlanta,GA 30303, USA

Abstract

Three fastO(n2) algorithms for solvingCauchy linear systemsof equations are proposed. A rounding erroranalysis indicates that the backward stability of these new Cauchy solvers is similar to that of Gaussian elimination,thus suggesting to employ variouspivoting techniquesto achieve a favorable backward stability. It is shown thatCauchy structure allows one to achieve inO(n2) operations partial pivoting ordering of the rows and several otherjudicious orderingsin advance, without actually performing the elimination. The analysis also shows that for theimportant class oftotally positiveCauchy matrices it is advantageous to avoid pivoting, which yields a remarkablebackward stability of the suggested algorithms.

It is shown that Vandermonde and Chebyshev-Vandermonde matrices can be efficiently transformed intoCauchy matrices, using Discrete Fourier, Cosine or Sine transforms. This allows us to use the proposed algo-rithms for Cauchy matrices for rapid and accurate solution of Vandermonde and Chebyshev-Vandermonde linearsystems.

The analytical results are illustrated by computed examples.

AMS subject classification:65F05, 65L20, 15A09, 15A23

Key words: Displacement structure, Cauchy matrix, Vandermonde matrix, fast algorithms, pivoting, rounding erroranalysis, backward stability, total positivity.

�This work was supported in part by ARO contract DAAH04-96-1-0176, NSF contracts CCR-962811 and 9732355yweb: www.cs.gsu.edu/ematvro email: [email protected]

1

1 Introduction

1.1 Related facts

Vandermonde and related matrices.Linear systems with Cauchy and Vandermonde coefficient matrices,

C(x1:n; y1:n) =

264

1x1�y1

� � � 1x1�yn

......

...1

xn�y1� � � 1

xn�yn

375 ; V (x1:n) =

26664

1 x1 x21 � � � xn11 x2 x22 � � � xn2...

......

......

1 xn x2n � � � xnn

37775 ; (1.1)

are classical. They are encountered in many applied problems related to polynomial and rational function com-putations. Vandermonde and Cauchy matrices have many similar properties, among them one could mention theexistence of explicit formulas for their determinants and inverses, see, e.g., [BKO99] and references therein. Alongwith many interesting algebraic properties, these matrices have several remarkable numerical properties, often al-lowing us much more accurate computations than those based on the use of general (structure-ignoring) algorithms,say Gaussian elimination with pivoting. At the same time, such favorable numerical properties are much betterunderstood for Vandermonde and related matrices (see, for example [BP70], [TG81], [CF88], [Hig87], [Hig88],[Hig90], [RO91], [CR92], [CR93], [V93], [Ty94]), as compared to the analysis of numerical issues related toCauchy matrices (see [GK90], [GK93]).

The Bjorck-Pereyra algorithm for Vandermonde systems.In particular, most of the above mentioned paperswere devoted to the analysis of numerical properties and extensions of the now well-known Bj¨orck-Pereyra algo-rithm for solving Vandermonde linear systems [BP70], [GVL89]. This algorithm is based on the decomposition ofthe inverse of a Vandermonde matrix into a product of bidiagonal factors,

V �1(x1:n) = U�11 : : : U�1

n�1 L�1n�1 : : : L

�11 ; (1.2)

where

U�1k =

2666664

�(k)1 �

(k)1

......... �

(k)n�1

�(k)n

3777775 ; L�1k =

2666664

(k)1

�(k)2

...

......

�(k)n

(k)n

3777775 : (1.3)

This description allows one to solve the associated linear systems in onlyO(n2) operations, which is by an orderof magnitude less than the complexityO(n3) of general (structure-ignoring) methods. Moreover, the algorithmrequires onlyO(n) locations of memory.

Remarkable accuracy for monotonically ordered nodes. It turns out that along with dramatic speed-upand savings in storage, the BP algorithm often produces a surprisingly high relative accuracy in the computedsolution. N.J.Higham analyzed in [Hig87] the numerical performance of the BP algorithm and identified a class ofVandermonde matrices, viz., those for which the nodes can be strictly orderded,

0 < x1 < x2 < ::: < xn; (1.4)

with a favorable forward error bound,

ja� aj � 5nuV (x1:n)�1 jf j+ O(u2); (1.5)

for the solutiona computed by the BP algorithm. Hereu denotes the unit roundoff in the standard model of floatingpoint arithmetic, and the operations of comparison, and of taking the absolute value of a matrix, are understood in acomponentwisesense. It was further shown in [BKO99] that under the condition (1.4) the BP algorithm is not onlyforward, but also backward stable,

j�V j � 12n2uV (x1:n) +O(u2): (1.6)

Here the computed solutiona is the exact solution of a nearby system(V (x1:n) + �V )a = f .The Bjorck-Pereyra-type algorithm for Cauchy matrices. The above analytic error bounds indicate that

the accuracy of the BP algorithm can be much higher than could be expected from the condition number of the

2

coefficient matrix. Such a high accuracy motivated several authors to extend the BP algorithm to several otherclasses of matrices, see, e.g., [TG81], [Hig88], [RO91]. All these generalizations were developed for Vandermonderelated structures. In a recent paper [BKO99] a similar decomposition

C�1(x1:n; y1:n) = U�11 : : : U�1

n�1 L�1n�1 : : : L

�11 ; (1.7)

was written down for Cauchy matrices, thus leading to a Bj¨orck-Pereyra-typealgorithm which will be referred toas the BP-type algorithm. This algorithm requiresO(n2) operations andO(n) locations of memory. It was furthershown in [BKO99] that the following configuration of the nodes,

yn < ::: < y1 < x1 < ::: < xn ; (1.8)

is an appropriate analog of (1.4) for Cauchy matrices, allowing to prove that the error bounds associated with theBP-type algorithm are entirely similar to (1.5) and (1.6), viz.,

ja� aj � 5(2n+ 1)uC(x1:n; y1:n)�1 jf j+O(u2): (1.9)

j�Cj � 20n(2n� 1)uC(x1:n; y1:n) +O(u2); (1.10)

It is an interesting fact that the conditions (1.4) and (1.8) imply thetotal positivity1 of V (x1:n) andC(x1:n; y1:n),resp. Totally positive matrices are usually extremely ill-conditioned, so that the Gaussian elimination procedureoften fails to compute even one correct digit in the computed solution. The bound (1.9) indicates that also in such“difficult” cases the BP-type algorithm can produce, for special right hand sides, all possible relative precision, see,e.g., [BKO99] for a discussion and numerical illustrations.

Limitations for non-totally-positive matrices. Of course, reordering of the nodesfxkg andfykg is equivalentto row and column permutation ofC(x1:n; y1:n), respectively. Therefore if the two sets of nodes are separated fromeach other,

yk < xj ; 1 � k; j � n; (1.11)

the remarkable error bounds (1.9) (1.10) suggest to reorder the nodes monotonically as in (1.8), and to apply theBP-type algorithm of [BKO99].

However, numerical experiments show that in the generic case, i.e., when (1.11) do not hold, the monotonicordering does not guarantee a satisfactory accuracy, and the corresponding backward error of thefast BP-typealgorithm of [BKO99] (and of the use of the explicit inversion formula as well) may be worse than that of theslowGaussian elimination with pivoting. Employing other orderings of the nodes (for example, the partial pivotingordering of the rows ofC(x1:n; y1:n)) does not seem to essentially improve the backward stability. A heuristicexplanation for this fact can be drawn from the observation that the usual aim of pivoting is to reduce the size of thefactors in the LU factorization of a matrix. Therefore it improves the stability properties of the Gaussian eliminationprocedure, for which the backward error bound involves the productjLj � jU j (computed factors). In contrast, anexamination of the error analysis of the BP-type algorithm in [BKO99] indicates that the corresponding backwardbound involves the quantity

jL1j � : : : � jLn�1j � jUn�1j � : : : � jU1j; (1.12)

which because of a non-cancellation property (i.e.,jM j � jN j � jMN j) can be much higher than the more attractivequantity jLj � jU j. In the totally positive case (1.8), the entries ofLk andUk in (1.12) are all nonnegative, thusallowing one to remove the moduli and to replace (1.12) by justC(x1:n; y1:n), cf. with the favorable bound (1.10).Unfortunately, in the general case, thebidiagonalstructure of theLk andUk in (1.3) does not allow one to removethe moduli, and to easily deduce from it a satisfactory backward error bound. These limitations suggest that in one’sattempts to design a backward stable Cauchy solver it can be better to develop new algorithms rather than to lookfor stabilizing techniques for the Bj¨orck-Pereyra-type algorithm.

1.2 Main results

New algorithms. In this paper we develop several alternatives to the BP-type algorithm, all based on factorizations

C(x1:n; y1:n) = L1 � : : : � Ln�1 � Un�1 � : : : � U1;

1Totally positive matrices are those for which the determinant of every submatrix is positive, see the monographs [GK50] and [K72].

3

where the factorsLk, Uk are of the form (diagonals with one nonzero row or column)

Lk =

266666664

l(k)11

...

l(k)kk...

...

l(k)nk l

(k)nn

377777775; Uk =

266666664

u(k)11

...

u(k)kk � � � u

(k)kn

...

u(k)nn

377777775: (1.13)

Backward error bounds. We produce an error analysis for these new methods, obtaining backward and residualbounds also involving the quantity (1.12), but now with factors of the form (1.13). In contrast to the bidiagonalfactors (1.3) of the BP-type algorithm of [BKO99], the sparsity pattern of the factors (1.13) immediately impliesthe equality

jLjjU j = jL1j � ::: � jLn�1jjUn�1j � ::: � jU1j;resulting in pleasing backward bounds of the form

j�Cj � dnujLjjU j+O(u2); (1.14)

for the new algorithms. Here the computed solutiona is the exact solution of a nearby system(C(x1:n; y1:n) +�C)a = f . These bounds are similar to the well-known bounds for Gaussian elimination,

j�Rj � 2 njLjjU j; where n =nu

1� nu; (1.15)

see, e.g., p. 175 in [Hig96]. HereL andU denote thecomputedtriangular factors.Different pivoting techniques. This resemblance between the backward error bounds (1.14) and (1.15) sug-

gests to employ different row and column pivoting techniques to reduce the size ofjLj andjU j and to stabilize thenew algorithms for Cauchy systems.

There is a variety of possible pivoting techniques that can be incorporated into the new algorithms withoutincreasing theirO(n2) complexity, including partial row and partial column pivoting, Gu’s pivoting (a variation ofcomplete pivoting for Cauchy-like matrices [Gu95]), and others.

Total positivity. There are classes of matrices for which it is advantageous not to pivot. For example, fortotally positive matrices, the exact triangular factors have only positive entries. C.De Boor and A.Pinkus pointedout in [DBP77] that if the entries of the computed factors,L andU remain nonnegative, than the backward error ofGaussian eliminationwithout pivotingis pleasantly small,

j�Rj � 3 nR; (1.16)

see also p.176 in [Hig96]. It turns out that the same recommendation to avoid pivoting can be made for the fastCauchy solvers proposed here, and moreover because the corresponding error bounds (1.14) involve theexacttrian-gular factors, in the case of total positivity we havejLj � jU j = C(x1:n; y1:n), implying that a remarkable backwardstability of the same form (1.16) is guaranteed for the new fast algorithms without any additional assumptions onthecomputedtriangular factors.

1.3 Outline of the paper

It is now well-known that for structured matrices the Gaussian elimination procedure can be speeded-up, leadingto thegeneralized Schur algorithms, see, e.g., [KS95], and the references therein. In the next section we exploitthe displacement structureof Cauchy matrices to specify two such algorithms. Then in the section 3 we exploitthe quasi-Cauchy structure of the Schur complements ofC(x1:n; y1:n) to derive one more algorithm for solvingCauchy linear equations. Then in section 4 we perform a rounding error analysis for these algorithms, obtainingbackward and residual bounds similar to those for Gaussian elimination. This analogy allows us in Section 5 tocarry over the stabilizing techniques known for Gaussian elimination to the new Cauchy solvers. The numericalproperties of new algorithms are illustrated in section 6 by a variety of examples. Then in section 7 we show howVandermonde and Chebyshev-Vandermonde matrices can be efficiently transformed into Cauchy matrices by usingDiscrete Fourier, Cosine or Sine transforms, thus allowing us to use the proposed Cauchy solvers for the rapidsolution of Vandermonde and Chebyshev-Vandermonde linear systems.

4

2 Generalized Schur algorithms for Cauchy matrices.

2.1. Displacement structure. By now, thedisplacement structureapproach is well-known to be useful in thedevelopment of various types of fast algorithms for structured matrices, including Toeplitz, Hankel, Toeplitz-plus-Hankel, Vandermonde, Chebyshev-Vandermonde, and several others. This approach is based on a convenient wayto capture each of the above particular structures by specifying suitabledisplacement operatorsr(�) : C

n�n !C

n�n. In this paper we shall use operators of the form

rfF;Ag(R) = FR�RA; (2.1)

for various choices of (sparce) matricesfA;Bg. Let� := rankrfF;Ag(R), then one can factor

rfF;Ag(R) = FR�RA = GBT ; (2.2)

where both matrices on the right-hand side of (2.2) have only� columns each :G;B 2 Cn��. The number

� = rankrfF;Ag(R) is calledfF;Ag-displacement rankof R, and the pairfG;Bg is calledfF;Ag-generatorof R. The displacement rank measures the complexity ofR, because all itsn2 entries are described by a smallernumber2�n entries of its generatorfG;Bg. We refer to the recent survey [KS95] for more complete informationon displacement and further references, and here we restrict ourselves only with Cauchy matrices. The next lemmaspecifies their displacement structure.

Lemma 2.1 LetrfF;Ag(�) :Cn�n !Cn�nbe defined by (2.1) with

F = Dx = diagfx1; :::; xng; A = Dy = diagfy1; :::; yng: (2.3)

Then the Cauchy matrix has displacement,

rfF;Ag(C(x1:n; y1:n)) =�1 1 : : : 1

�T �1 1 : : : 1

�: (2.4)

2.2. Structure of Schur complements.LetR = R1 in (2.2) be partitioned :R1 =

�d1 u1

l1 R(1)22

�, and denote

by R2 = R(1)22 � 1

d1l1u1 its Schur complement. Then if the matricesF1 andA1 in (2.2) are lower and upper

triangular, resp., say,

F1 =

�f1 0

� F2

�; A1 =

�a1 �0 A2

�; (2.5)

then thefF2; A2g-displacement rankof the Schur complement is less than or equal to�, so that we can write

F2 � R2 �R2 � A2 = G2 �BT2 with some G2; B2 2C(n�1)��

; (2.6)

see, e.g., [KS95] and the references therein, which also show how to run thegenerator recursionfG1; B1g !fG2; B2g. The next lemma (cf. with [GO94a], [GKO95], [KO95b]) provides one particular form for the generatorrecursion.

Lemma 2.2 LetR1 =

�d1 u1

l1 R(1)22

�satisfy the displacement equation (2.2) with triangularF1; A1 partitioned

as in (2.5). If the(1,1) entryd1 of R1 is nonzero, then the Schur complementR2 = R(1)22 � 1

d1l1u1 satisfies the

displacement equation (2.6) with�0

G2

�= G1 �

�11d1l1

�� g1;

�0 B2

�= B1 �

�1 1

d1uT1

� � b1; (2.7)

whereg1 andb1 denote the top rows ofG1 andB1, resp.

5

The standard Gaussian elimination procedure computes the LU factorization usingO(n3) flop. The abovelemma allows us to exploit the structure ofR1 to compute this factorization in onlyO(n2) flops, as described next.

2.3. Speed-up of the Gaussian elimination procedure.The first step of Gaussian elimination procedureapplied to a matrixR1 is described by

R1 =

�d1 u1

l1 R(1)22

�=

�1 01d1l1 I

��d1 u10 R2

�; (2.8)

whereR2 = R(1)22 � 1

d1llu1 is Schur complement of (1,1) entryd1 in the matrixR1. This step provides the first

column

�11d1l1

�of L and the first row

�d1 u1

�in the LU factorization ofR1. Proceeding withR2 similarly,

aftern� 1 steps one obtains the whole LU factorization.Algorithms that exploit the displacement structure of a matrix to speed-up the Gaussian elimination proce-

dure are calledgeneralized Schur algorithms, because the classical Schur algorithm [S17] was shown (see, e.g.,[LAK86], [KS95]) to belong to the class. Instead of computing the(n � k + 1)2 entriesof the Schur complementRk, one has to compute only�(n � k + 1) entries of itsfFk; Akg-generatorfGk; Bkg, which requires much lesscomputations. To run the generator recursion (2.7) as well as to write down the corresponding entries of theL andU factors, one needs only to specify how to recover the first row and column ofRk from its generatorfGk; Bkg.For a Schur complementRk =

hr(k)ij

ik�i;j�n

of a Cauchy matrix this is easy to do :

r(k)i;j =

g(k)i � b(k)1

xi � yj; (2.9)

wherefhg(k)k � � � g

(k)n

iT;hb(k)k � � � b

(k)n

ig designates the corresponding generator.

2.4. Triangular factorization for Cauchy matrices. In the rest of this section we formulate two variants of ageneralized Schur algorithm forC(x1:n; y1:n). The first version is an immediate implementation of (2.7) and (2.9),and its MATLAB code is given next.

Algorithm 2.3 (GS-Cauchy) Complexity:6n2 +O(n) flops.

function [L,U] = Cauchy GS (x,y)% triangular factorization of the Cauchy matrix C(x1:n; y1:n).

n=max(size(x)); L=eye(n,n); U=zeros(n,n);% Generators %g=ones(n,1); b=ones(n,1);for k=1:n-1% Computing the k-th row of L and U %

U(k,k) = ( g(k) * b(k) ) / ( x(k) - y(k) );for j=k+1:n

L(j,k) = (( g(j) * b(k) ) / ( x(j) - y(k) )) / U(k,k);U(k,j) = (( g(k) * b(j) ) / ( x(k) - y(j) ));

end% Computing the new generators %

for j=k+1:ng(j) = g(j) - g(k) * L(j,k);b(j) = b(j) - b(k) * U(k,j) / U(k,k);

endendU(n,n) = ( g(n) * b(n) ) / ( x(n)-y(n) );

return

2.5. Direct generator recursion. In fact the algorithm 2.3 is valid for the more general class of Cauchy-like matrices, see, e.g., [GKO95] [KO95b] for details and applications. However for the special case of ordinaryCauchy matrices we can exploit the fact that the corresponding displacement rank is equal to one, to formulate amore specific version of the generalized Schur algorithm, based on the next Lemma.

Lemma 2.4 LetRk in

diag(xk; :::; xn)Rk �Rkdiag(yk; :::; yn) = GkBk =hg(k)k � � � g

(k)n

iT hb(k)k � � � b

(k)n

i

6

be the successive Schur complements of the Cauchy matrixC(x1:n; y1:n). Then the generator recursion (2.7) canbe specialized to2664g(k+1)k+1

...

g(k+1)n

3775 =

2664

xk+1�xkxk+1�yk

g(k)k+1

...xn�xkxn�yk

g(k)n

3775 ;

hb(k+1)k+1 � � � b

(k+1)n

i=h

yk+1�ykyk+1�xk

b(k)n � � � yn�yk

yn�xkb(k)n

i:

(2.10)The nonzero entries of the factors inC(x1:n; y1:n) = LDU are given bydk = xk � yk and

264lk;k

...ln;k

375 =

2664

1xk�yk

g(k)k

...1

xn�ykg(k)n

3775 ; �

uk;k � � � uk;n�=h

1xk�yk

b(k)k � � � 1

xk�ynb(k)n

i(2.11)

The following MATLAB code implements the algorithm based on Lemma 2.4.

Algorithm 2.5 (GS-direct-Cauchy) Complexity:6n2 +O(n) flops.

function [L,D,U]= Cauchy GSd( x, y)% triangular factorization of the Cauchy matrix C(x1:n; y1:n).

n=max(size(x)); L=eye(n,n); D=zeros(n,n); U=eye(n,n);% Generators %g=ones(n,1); b=ones(n,1);for k=1:n-1

% Computing the k-th row of L and U %D(k,k) = (x(k)-y(k));for j=k:n

L(j,k) = g(j)/(x(j) - y(k));U(k,j) = b(j)/(x(k) - y(j));

end% Computing the new generators %

for j=k+1:ng(j) = g(j) *(x(j)-x(k))/(x(j)-y(k));b(j) = b(j) *(y(j)-y(k))/(y(j)-x(k));

endendD(n,n) = 1/(x(n)-y(n));L(n,n) = g(n);U(n,n) = b(n);

return;

The fast algorithms 2.3 and 2.5 requireO(n2) locations of memory to store the triangular factors. An algorithmwith O(n) storage is described next.

3 Exploiting quasi-Cauchy structure

The concept ofdisplacement structurewas initiated by the paper [KKM79], where it was first applied to studyToeplitz matrices, using a displacement operator of the formrfZ;ZT g(R) = R�Z �R �ZT , whereZ is the lowershift matrix. In this section we make a connection with [LAK86], where the fact that Toeplitz matrices belong to themore general class of matrices withrfZ;ZT g-displacement rank 2, was used to introduce the namequasi-Toeplitzfor such matrices. It is shown that any quasi-Toeplitz matrixR can be represented as a product of three Toeplitzmatrices,

R = LTU; (3.1)

whereL is lower andU is upper triangular Toeplitz matrices. Patterning ourselves upon the above definition, andtaking Lemma 2.1 as a starting point, we shall refer to matrices withfDx; Dyg-displacement rank 1 asquasi-Cauchymatrices. The next simple lemma is an analog of (3.1).

7

Lemma 3.1 LetDx andDy be defined by (2.4). Then the unique solution of the equation

Dx �R�R �Dy =�g1 : : : gn

�T � � b1 : : : bn�

(3.2)

is given byR = diag(g1; g2; :::; gn) � C(x1:n; y1:n) � diag(b1; b2; :::; bn): (3.3)

Lemma 3.1 allows us to obtain below an explicit factorization formula for Cauchy matrices. Indeed, by Lemma2.2 its Schur complementR2 = R22 � 1

d1l1u1 in

C(x1:n; y1:n) =

�d1 u1l1 R22

�=

�d1 0

l1 I

��1d1

0

0 R2

� �d1 u10 I

�

also has displacement rank 1, so by Lemma 3.1 its Cauchy structure can be recovered by dropping diagonal factorsas shown next.

Lemma 3.2 The Cauchy matrix and its inverse can be factored as

C(x1:n; y1:n) = L1 : : : Ln�1DUn�1 : : : U1; (3.4)

whereD = diag((x1 � y1); :::; (xn � yn)), and

Lk =

2664

Ik�11

xk�yk

. ..1

xn�xk

377526664

Ik�1

1

1 1

..... .

1 1

3777526664

Ik�1

1

xk+1 � xk

.. .xn � xk

37775 ;

Uk =

26664

Ik�1

1

yk � yk+1

. ..yk � yn

37775

266664

Ik�1

1 1 : : : 1

1

. ..

1

3777752664

Ik�11

xk�yk+1

. ..1

xk�yn

3775 :

The representation for the inverse matrixC(x1:n; y1:n)�1 obtained from the above leads to the following algo-

rithm for solvingC(x1:n; y1:n)a = f .

Algorithm 3.3 (quasi-Cauchy) 2 Complexity:6n2 +O(n) flops.

function a = Cauchy quasi (x,y,f)% Solving a Cauchy linear system C(x1:n; y1:n)a = f .

n=max(size(x)); a=f;fork=1:n-1

for j=k:na(j) = a(j) * ( x(j) - y(k) );

endfor j=k+1:n

a(j) = a(j) - a(k);endfor j=k+1:n

a(j) = a(j) / ( x(j) - x(k));end

endfor k=1:n-1

a(k) = a(k) / ( x(k) - y(k) );enda(n) = a(n) * ( x(n) - y(n) );for k = n-1:-1:1

for j=k+1:n

2Algorithm 3.3 has lower complexity and better error bounds than its earlier variant called Cauchy-2 in [BKO94].

8

a(j) = a(j) /(y(k)-y(j));endtmp = 0;for j=n:-1:k+1

tmp = tmp + a(j);enda(k) = a(k) -tmp;for j=k:n

a(j) = a(j) * ( x(k) - y(j) );end

endreturn

The reader should note that the multiplication of the central factor ofU�1k by a vector is performed by accumu-

lation of the inner product from the last to the first entry; this order is influenced by the error analysis in the nextsection.

4 Rounding error analyses

4.1 Stability of the algorithm 2.3.

The algorithm 2.3 is a special case of the more general GKO algorithm [GKO95], which is applicable to the widerclass of Cauchy-like matrices. A normwise rounding error analysis for the GKO algorithm appeared in [SB95].Along with the usual factorkLkkUk ( cf. with (1.15)) the backward error bound of [SB95] involves also a so-calledgenerator growthfactor of the form

k diag(jg(k)k j jb(k)k jg(k)k b

(k)k

) k (4.1)

In the context of [GKO95], [SB95] the quantitiesg(k)k andb(k)k were vectors of size equal to the displacement rankofR; so if the quantity in (4.1) is large, then the backward stability of of the GKO algorithm could be less favorablethan that of Gaussian elimination. However, the ordinary Cauchy matrices, considered in the present paper, all havedisplacement rank 1, so that the constant in (4.1) is unity, suggesting that the backward stability of the algorithm2.3 is related to that of Gaussian elimination without pivoting.

4.2 Stability of the algorithm 2.5

Theorem 4.1 Assume that the algorithm 2.5 (GS-direct-Cauchy algorithm) is carried out in floating point arith-metic with a unit roundoffu, and that no overflows were encountered during the computation. Then the computedfactorsL; D; U ofC(x1:n; y1:n) = LDU satisfyL = L+�L, D = D +�D, U = U +�U , where

j�Lj � 4n�2jLj; j�Dj � ujDj; j�U j � 4n�2jU j; ( n =nu

1� nu): (4.2)

Furthermore, if this computed triangular factorization is used to solve the associated linear systemC(x1:n; y1:n)a =

f then the computed solutiona solves a nearby system(C(x1:n; y1:n) + �C)a = f with

j�Cj � ((10n� 2)u+O(u2))jLjjDU j; (4.3)

andjC(x1:n; y1:n)a� f j � ((10n� 2)u+O(u2))jLj jDU j jaj; (4.4)

Proof.Error in triangular factorization. A straightforward error analysis for the direct generator recursion (2.10)

and for (2.11) implies that the computed and the exact columns ofL are related by264lkk...lnk

375 = flf

2664

1xk�yk

g(k)k

...1

xn�ykg(k)n

3775g =

2664lkk�

(k)k

...

lnk�(k)n

3775

9

so that the componentwise error is nicely bounded :

(1� u)3k�2

(1 + u)k� �

(k)j � (1 + u)3k�2

(1� u)k

This and similar arguments forU lead to the favorable bounds (4.2).Error in the computed solution. Standard error analysis, see, e.g., p. 154 in [Hig96], for solving a linear

systemLDUa = f by forward and backsubstitution yields that the computed solutiona satisfies

(L+ �L)(D + �D)(U + �U)a = f;

wherej�Lj � njLj; j�Dj � ujDj; j�U j � njU j: (4.5)

The pleasant bound in (4.3) is now deduced from (4.5) and (4.2).

4.3 Stability of the algorithm 3.3

Theorem 4.2 Assume that the algorithm 3.3 (quasi-Cauchy algorithm) is carried out in floating point arith-metic with unit roundoffu, and that no overflows were encountered during the computation. Then the computedsolutiona solves a nearby system

(C(x1:n; y1:n) + �C)a = (L+�L)(D +�D)(U +�U)a = f

withjC(x1:n; y1:n)a� f j � ((n2 + 11n� 10)u+ O(u2))jLj jDU j jaj; (4.6)

j�Cj � ((n2 + 11n� 10)u+O(u2))jLjjDU j; (4.7)

and

j�Lj � 7(n�1)jLj; j�Dj � 2jDj; j�U j � n2+4n�5jU j; ( n =nu

1� nu): (4.8)

Proof. Let us recall that algorithm 3.3 solves a Cauchy linear system by computing

a = C(x1:n; y1:n)�1f = U�1

1 � ::: � U�1n�1 �D�1 � L�1n�1 � ::: � L�11 f; (4.9)

where thefLi; Uig are given by (3.4). The proof for (4.7) will be obtained in the following steps.

(i) First we apply the standard error analysis for each elementary matrix-vector multiplication in (4.9) to show thatthe computed solutiona satisfies

a = (U�11 � �U1) � ::: � (U�1

n�1 � �U1) � (D�1 � �D) � (L�1n�1 � �Ln�1) � ::: � (L�11 � �L1)f; (4.10)

where the asterisk� denotes the Hadamard ( or componentwise ) product.

(ii) Next, the obtained bounds for�Lk; �D; �Uk will be used to deduce further bounds for�Lk;�D;�Uk, definedby

(L�1k � �Lk)�1 = Lk +�Lk; (D�1 � �D)�1 = D +�D; (U�1k � �Uk)�1 = Uk +�Uk:

(iii) Finally, inverting (4.10) we shall obtain

a = (L1 +�L1) � ::: � (Ln�1 +�L1) � (D +�D) � (Un�1 + �Un�1) � ::: � (U1 +�U1)f; (4.11)

which will lead to the desired bounds in (4.8) and in (4.7).

10

We start with (i) and (ii).Lower triangular factors. We start with obtaining bounds for�Lk and�L in (4.10) and (4.11). The sparse

nature of these matrices, see, e.g., Lemma 3.2, implies the following pleasing bound for the (i,j) entry of�Lk :

(1� u)4

(1 + u)� (�Lk)i;j � (1 + u)4

(1� u): (4.12)

Moreover, even smaller bounds hold for the (k,k) entry :

(1� u)2 � (�Lk)k;k � (1 + u)2: (4.13)

Since theLk all have exactly the same sparsity pattern as their inverses, (4.12) and (4.13) imply that

j�Lkj � ((1 + u)4

(1� u)3� 1)jLkj: (4.14)

Diagonal factor. The simple structure ofD immediately implies that

j�Dj � ((1 + u)

(1� u)� 1)jDj: (4.15)

Upper triangular factors. The analysis shows that fori 6= k, (i.e. excluding the entries of thek-th row) wehave

(1� u)3

(1 + u)� (�Uk)i;j � (1 + u)4

(1� u): (4.16)

If the inner product corresponding to thek-th row ofUk is evaluated from the last to thek-th entry, then the errorin the (k,j) entry is bounded by

(1� u)j�k+3 � (�Uk)k;j � (1 + u)j�k+3: (4.17)

In particular the error in the (k,k) entry is bounded by

(1� u)3 � (�Uk)k;k � (1 + u)3: (4.18)

Again, sinceUk have exactly the same sparsity pattern as their inverses, (4.16), (4.17) and (4.18) imply that

j�Ukj � ((1 + u)n+2

(1� u)3� 1)jUkj: (4.19)

We are now ready to turn to (iii). To prove (4.8) we shall use the following easily verified fact (see, e.g., [Hig96],p.80) : letj�Xkj � �jXkj for k = 1; 2; :::;m, then

jmYk=1

(Xk +�Xk)�mYk=1

Xkj � ((1 + �)m � 1)

mYk=1

jXkj: (4.20)

This and (4.14) imply that

j�Lj = j(L1 +�L1) � ::: � (Ln�1 +�Ln�1)� L1 � ::: � Ln�1j � ((1 + u)4(n�1)

(1� u)3(n�1)� 1)jL1j � ::: � jLn�1j: (4.21)

The sparsity pattern ofLk (k = 1; 2; :::; n� 1) allows us to remove the moduli in the product on the right-hand sideof (4.21), implying the first bound in (4.8). The third bound in (4.8) is deduced from (4.19) and (4.20) analogously.The second bound in (4.8) follows from (4.15) easily. Finally, the bound in (4.7) is deduced from (4.8) (cf. (4.5)).

5 Pivoting

In the previous section we established that the backward stability of all the fast Cauchy solvers suggested in thepresent paper is related to that of Gaussian elimination. This analogy will allow us to carry over the stabilizingtechniques of Gaussian elimination to the new Cauchy solvers. First however, we identify the case when no pivotingis necessary.

11

5.1 Totally positive matrices

If we assume thatyn < ::: < y1 < x1 < ::: < xn ; (5.1)

i.e. the matrixC(x1:n; y1:n) is totally positive, so that all the entries of the exact factors L and U are positive [GK50].In this case theorems 4.1 and 4.2 imply that the algorithms 2.5 and 3.3 produce a favorable small backward error.

Corollary 5.1 Assume that condition (5.1) holds, i.e. thatC(x1:n; y1:n) is totally positive, and assume thatthe algorithms 2.5 (GS-direct algorithm) and 3.3 (quasi-Cauchy algorithm) are performed in the floating pointarithmetic with unit roundoffu, and that no overflows were encountered during the computation.

If the triangular factorization of the GS-direct algorithm is used to solve the associated linear system, then thecomputed solutiona solves a nearby system

(C(x1:n; y1:n) + �C)a = f;

withj�Cj � ((10n� 3)u+O(u2))C(x1:n; y1:n): (5.2)

The analogous backward bound for the quasi-Cauchy algorithm is

j�Cj � ((n2 + 11n� 10)u+O(u2))C(x1:n; y1:n): (5.3)

The above results show that the backward stability of the fast algorithms 2.5, 3.3 for totally positive Cauchymatrices is even more favorable than that of the slow Gaussian elimination procedure, see (1.16). Indeed thedifference is that the bound (1.16) is valid only for the case when the entries of thecomputed factorsL and Uremain positive (which is usually not the case with ill-conditioned matrices), whereas the favorable bounds in thetwo above corollaries hold while there are no overflows. For example, for the Hilbert matrixH =

�1

i+j�1

�the

condition numberk2(H) grows exponentially with the size, so already for smalln we havek2(H) > q(n) 1u

. Hereq(n) is a polynomial of small degree inn. Then in accordance with [W68] the matrixH will likely lose during theelimination not only its total positivity, but also the weaker property of being positive definite. Correspondingly,the single precision LAPACK routineSPOSVfor Cholesky factorization, when applied to the Hilbert matrix, exitswith an error flag already forn = 9, warning that the entries ofL, U became negative, so the pleasing backwardbound (1.16) is no longer valid for Gaussian elimination. In contrast, the favorable bounds (5.2), (5.3) are valid forhigher sizes, as long as there are no overflows.

5.2 General case. Predictive pivoting techniques

Here we assume that the two sets of nodesfxkg andfykg are not separated from each other. The similarity of thebackward bounds (1.15) for Gaussian elimination and of (4.2), (4.7) for the new Cauchy solvers suggests to use thesame pivoting techniques for preventing instability. More precisely, any row or column reordering that reduces thesize ofjLjjU j appearing in the bounds (4.2), (4.7) will stabilize the numerical performance of the algorithms 2.5,3.3. Moreover, the normwise error analysis of [SB95] for the algorithm 2.3, reviewed at the beginning of section 4,also indicates that the pivoting will enhance the accuracy of the algorithm 2.3.

Here we should note that the partial pivoting technique can be directly incorporated into the generalized Schuralgorithms 2.3 and 2.5, see, e.g., [GKO95]. However, the corresponding ordering offxkg can also be computed inadvance inO(n2) flops. Indeed, the partial pivoting technique determines a permutation matrixP , such that at eachelimination step the pivot elementsdk in

PR = L

264d1

...dn

375U

are as large as possible. Clearly, the determinant of thek � k leading submatrix ofR is equal tod1 � ::: � dk, sothe objective of partial pivoting is the successive maximization of the determinants of leading submatrices. This

12

observation, and the well-known formula [C41] for the determinant of a Cauchy matrix, imply that partial pivotingonC(x1:n; y1:n) is equivalent to the successive maximization of the quantities

jdij = jQi�1

j=1(xi � xj)Qi�1

j=1(yi � yj)

(xi � yi)Qi�1

j=1(xi � yj)Qi�1

j=1(xj � yi)j; i = 1; : : : ; n: (5.4)

We shall call this procedurepredictive partial pivoting, because it can be rapidly computedin advanceby thefollowing algorithm.

Algorithm 5.2 Predictive Partial Pivoting3 Complexity: 2n2 � 4 flops.

function x = partial(x,y)n = max(size(x));dist = 0; m = 1; aux = zeros(1,n);for i = 1:n

aux(i) = abs(1 / (x(i) - y(1)));if dist<aux(i) m = i; dist = aux(i); end

endx = swap(x,1,m); aux(m) = aux(1);if n<=2 return; endfor i = 2:(n-1)

dist = 0; m = i;for j = i:n

aux(j) = aux(j) * abs((x(j) - x(i-1)) / (x(j) - y(i)));if dist<aux(j) m = j; dist = aux(j); end

endx = swap(x,i,m); aux(m) = aux(i);

endreturn

A similar row reordering technique for Vandermonde matrices (and a fastO(n2) algorithm for achieving it) wasproposed in [Hig90], and in [R90] it was calledLeja ordering. Therefore, PPP may also be calledrational Lejaordering, by analogy with (polynomial) Leja ordering of [Hig90], [R90].

In a recent paper [Gu95] a variation of complete pivoting was suggested for the more general Cauchy-likematrices. In the context of [Gu95] the corresponding displacement rank is 2 or higher. For the ordinary CauchymatricesC(x1:n; y1:n) (displacement rank = 1), Gu’s pivoting can be described as follows. At each eliminationstep one chooses the column ofRk with the maximal magnitude entryb(k)m in its generatorBk (here we use thenotations of Lemma 2.4). Then one interchanges this column with the 1-st one, and performs the partial pivotingstep. The explicit expression (2.10) for the entries of the successive generatorsBk readily suggests a modificationof the algorithm 5.2 to perform the Gu’s variant of pivoting in advance, leading to what can be calledpredictive Gupivoting.

6 Numerical illustrations

We performed numerous numerical tests for the three algorithms suggested and analyzed in this paper. The resultsconfirm theoretical results (as perhaps should be expected). In this section we illustrate with just a few examplesthe influence of different orderings on the numerical performance of the following algorithms :

(a) GS-Cauchy.(FastO(n) algorithm 2.3 requiringO(n2) storage.)

(b) GS-direct-Cauchy.(FastO(n) algorithm 2.5 requiringO(n2) storage.)

(c) quasi-Cauchy.(FastO(n) algorithm 3.3 requiringO(n) storage.)

(d) BKO. (FastO(n2) algorithm of ([BKO99]) requiringO(n) storage.)

3The subroutine “swap(x,i,m) ” in Algorithm 5.2 swaps thei-th andm-th elements of the vectorx.

13

(e) INV. The use of the explicit inversion formula

C(x1:n; y1:n)�1 =

" Qn

k=1(xk�yi)Q

n

k=1k 6=i

(yk�yi)� 1xj�yi

�Q

n

k=1(xj�yk)Q

n

k=1k 6=j

(xj�xk)

#1�i;j�n

; (6.1)

(FastO(n) algorithm requiringO(n) storage.)

(f) GEPP. Gaussian elimination with partial pivoting. (SlowO(n3) algorithm, requiringO(n2) storage.)

We refer to [BKO99] for the discussion and computed examples related to the important case of totally positiveCauchy matrices, and restrict ourselves here to the generic case in which the two setsfxkg andfykg cannot beseparated from each other, so that they cannot be reordered to achieve (1.8). We solved various Cauchy linearsystems

Ca = f (6.2)

(including interlacedx1 < y1 < x2 < y2 < ::: < xn < yn equidistant, clustered or randomly distributednodes, and with many others configurations) with different right-hand sides (RHS)f . We also solved the so-calledCauchy-Toeplitz linear systems with coefficient matrices of the form

C =h

1a+(i�j)b

i(6.3)

with different choices for the parametersa andb. All the experiments were performed on a DEC 5000/133 RISCworkstation in single precision (unit roundoff� 1:19 � 10�7). For GEPP we used the LAPACK routineSGESV,and all the other algorithms were implemented in C. In order to check the accuracy we implemented all the abovealgorithms in double precision (unit roundoff� 2:22 � 10�16), and in each example we determined two particularalgorithms providing solutions that were the closest to each other. In all cases these two solutions agreed in morethan the 8 significant digits needed to check the accuracy for a solution obtained in single precision, so we regardedone of these double precision solutionsad as being exact, and used it to compute the 2-norm relative error

ei =kai � adk2kadk2

for the solutionsai computed by each of the above algorithms. In addition we computed the residual errors

ri =kf � C � aik2

kfk2

and the backward errorsbi = minfk�Ak2kAk2

: (A+�A)ai = fg ) using the formula

bi =kf � C � aik2kCk2 � kaik2 ;

a result probably first shown by Wilkinson, see, e.g., [Hig96]. The tables below display also the condition number�2(C) of the coefficient matrix, norms for the solutionkadk2 and the right-hand sidekfk2 as well as some otheruseful information.

6.1 Example 1. Well-conditioned Cauchy-Toeplitz matrices

In this example we solved the linear system (6.2) with Cauchy-Toeplitz coefficient matrix in (6.3), witha = 1,b = 2, and with the right-hand sidef =

�1 1 � � � 1

�T. We used two orderings

� The nodes were ordered using the predictive partial pivoting (PPP) technique (1.8).

� The nodesfxkg sorted in an increasing order, and the nodesfykg sorted in a decreasing order; the differencewith (1.8) is in that now two sets of nodesfxkg, fykg are not separated from each other .

Forward error.

14

Table 1. Forward error. Partial pivoting ordering.2-norms INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP

n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e5 e610 2e+00 1.6e+00 6.1e+00 4.5e+00 7e-08 1e-07 1e-07 6e-08 6e-08 8e-0850 3e+00 1.6e+00 1.6e+01 1.0e+01 1e-07 2e-01 9e-08 2e-07 4e-07 1e-07100 3e+00 1.6e+00 2.4e+01 1.4e+01 4e-07 1e+10 1e-07 1e-07 5e-07 1e-07

Table 2. Forward error. Monotonic ordering.2-norms INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e55 1e+00 1.6e+00 4.0e+00 3.2e+00 8e-08 2e-07 3e-06 1e-06 2e-0610 2e+00 1.6e+00 6.1e+00 4.5e+00 1e-07 2e-05 1e-03 7e-03 6e-0320 3e+00 1.6e+00 9.3e+00 6.3e+00 1e-07 6e-02 9e+04 3e+04 5e+0430 3e+00 1.6e+00 1.2e+01 7.7e+00 1e-07 7e+03 3e+16 6e+17 2e+1650 3e+00 1.6e+00 1.6e+01 1.0e+01 2e-07 1e+20 NaN NaN NaN60 3e+00 1.6e+00 1.8e+01 1.1e+01 NaN 4e+26 NaN NaN NaN

Backward error.

Table 3. Backward error. Partial pivoting ordering.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP

n maxij (jLj jDUj ) maxij (C) b1 b2 b3 b4 b5 b610 2e+00 1.0e+00 6e-08 1e-07 6e-08 6e-08 4e-08 5e-0850 2e+00 1.0e+00 1e-07 2e-01 8e-08 1e-07 2e-07 1e-07100 2e+00 1.0e+00 3e-07 1e+00 1e-07 1e-07 2e-07 1e-07

Table 4. Backward error. Monotonic ordering.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n maxij (jLj jDUj ) maxij (C) b1 b2 b3 b4 b55 6e+02 1.0e+00 6e-08 2e-07 3e-06 1e-06 2e-0610 3e+06 1.0e+00 7e-08 2e-05 1e-03 7e-03 6e-0320 1e+14 1.0e+00 1e-07 6e-02 1e+00 1e+00 1e+0030 4e+21 1.0e+00 1e-07 1e+00 1e+00 1e+00 1e+0050 6e+36 1.0e+00 2e-07 1e+00 NaN NaN NaN60 2e+44 1.0e+00 NaN 1e+00 NaN NaN NaN

Residual error.

Table 5. Residual error. Partial pivoting ordering.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP

n maxi (jLj jDUj jadj) maxij (C) r1 r2 r3 r4 r5 r6

10 9e+00 1e+00 1e-07 3e-07 1e-07 1e-07 9e-08 1e-0750 2e+01 1e+00 4e-07 5e-01 2e-07 3e-07 4e-07 3e-07100 3e+01 1e+00 8e-07 3e+10 3e-07 3e-07 5e-07 3e-07

Table 6. Residual error. Monotonic ordering.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n maxi (jLj jDUj jadj) maxij (C) r1 r2 r3 r4 r5

5 9e+02 1e+00 1e-07 3e-07 6e-06 3e-06 4e-0610 1e+07 1e+00 2e-07 4e-05 3e-03 2e-02 1e-0220 7e+14 1e+00 3e-07 1e-01 2e+05 8e+04 1e+0530 3e+22 1e+00 3e-07 2e+04 7e+16 1e+18 4e+1650 7e+37 1e+00 4e-07 3e+20 NaN NaN NaN60 3e+45 1e+00 NaN 9e+26 NaN NaN NaN

Comparing the data in Tables 1-6 indicates that the ordering of the nodes has a profound influence on theaccuracy of all algorithms designed in the present paper. Specifically, let us recall that the quantityjLj jDU j appearin the backward error bounds (4.3) and (4.7) for the algorithms GS-direct-Cauchy and quasi-Cauchy, respectively.The second columns of Tables 3 and 4 show that the latter quantity is huge with monotonic ordering and moderatewith PPP ordering. Correspondingly, the backward errors shown in the tables are large with monotonic ordering,and pleasantly small with PPP ordering.

Analogously, a comparison of the data in the second columns of Tables 5 and 6, shows that PPP techniquereduces the quantity(jLj jDU jjaj appearing in the residual bounds for the algorithms GS-direct-Cauchy and quasi-Cauchy, resulting in a favorable small residual error for these algorithms.

Further, it is well-known thatka� akkak � �2(C)

k�CkkCk :

Since the coefficient matrixC in this example is quite well-conditioned (see, e.g., the data in the second columnof Table 1), the PPP technique yields a pleasant forward accuracy for all algorithms GS-Cauchy, GS-direct-Cauchyand quasi-Cauchy.

The PPP technique also improves the numerical performance of the BKOBP-type algorithm, however for thisalgorithm the results are not as favorable as for other algorithms (see, e.g., introduction for the explanation ofthis phenomena, and [BKO99] for the discussion on extremely high accuracy of this algorithm for totally positiveCauchy matrices).

The use of explicit inversion formula also yields high accuracy, predicted by the analysis of [GK93], and appar-ently, this is the only algorithm whose accuracy does not depend upon the ordering of the nodes. At the same time,

15

comparison of the data in Tables 1 and 2 as well as in other examples indicates that the use of the PPP techniqueprevents the INV algorithm from overflows, allowing to solve larger linear systems.

Since in this example the coefficient matrix is well-conditioned, theO(n3) GEPP algorithm, while slow, alsoprovides good forward and backward accuracy.

The results of many other computed examples are quite similar to those in Tables 1-6, and the algorithms GS-Cauchy, GS-direct-Cauchy and quasi-Cauchy always yield a favorable small backward and residual errors, whichare often better than those of GEPP and of the use of inversion formula. As for the forward stability, there seem to beno clear winner, however the use of of inversion formula often provides provides smaller forward error, especiallywhen using the unit vectors for the right-hand side (which means that one has to find a column ofC(x1:n; y1:n)

�1).At the same time, new algorithms can provide smaller forward error in other cases, as illustrated by the nextexamples.

6.2 Example 2. Ill-conditioned Cauchy-Toeplitz matrices.

The condition number of Cauchy-Toeplitz matrices depends upon the choice of parametersa andb in (6.3). In thisexample we chosea = 1, b = �0:3 to obtain a Cauchy-Toeplitz matrix, whose condition number is several order ifmagnitude bigger than the reciprocal to the machine precisionu � 1:19 �10�7. The next tables 7-12 present the dataon the forward and backward errors for the corresponding linear system with the RHSf =

�1 1 � � � 1

�T.

Along with the PPP ordering we also consider the original ordering (no ordering), and the Gu’s pivoting orderingof fxkg, fykg.Forward error.


n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e5 e660 5e+10 1.2e+01 2.7e+05 7.7e+00 5e-03 4e+23 3e-03 2e-06 2e-04 1e+0080 3e+11 1.2e+01 7.1e+05 8.9e+00 2e-02 NaN 5e-03 1e-05 3e-04 1e+00100 9e+11 1.2e+01 1.5e+06 1.0e+01 2e-02 NaN 8e-03 3e-06 6e-04 1e+00

Table 8. Forward error. Gu’s pivoting.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n e1 e2 e3 e4 e560 2e-03 7e+01 8e-04 1e-04 3e-0480 6e-03 4e+05 4e-03 3e-04 2e-03100 1e-02 2e+09 5e-03 5e-04 4e-03

Table 9. Forward error. No pivoting.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n e1 e2 e3 e4 e560 5e-03 1e+02 9e-04 2e-04 7e-0580 3e-02 2e+05 1e-03 5e-04 1e-04100 2e-02 1e+09 7e-04 6e-04 2e-04

Backward error.

Table 10. Backward error. Partial pivoting ordering.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP

n maxij (jLj jDUj ) maxij (C) b1 b2 b3 b4 b5 b6

60 1e+03 1.0e+01 4e-04 1e+00 2e-07 3e-07 3e-07 4e-0780 1e+03 1.0e+01 1e-03 NaN 4e-07 4e-07 4e-07 7e-07100 2e+03 1.0e+01 6e-03 NaN 6e-07 6e-07 7e-07 1e-06

Table 11. Backward error. Gu’s pivoting.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n maxij (jLj jDUj ) b1 b2 b3 b4 b560 1e+04 2e-04 1e+00 9e-05 1e-04 5e-0580 3e+04 2e-03 1e+00 2e-04 3e-04 1e-04100 4e+04 2e-03 1e+00 5e-04 5e-04 2e-04

Table 12. Backward error. No pivoting.INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n maxij (jLj jDUj ) b1 b2 b3 b4 b5

60 1e+04 1e-03 1e+00 1e-04 2e-04 7e-0580 3e+04 4e-03 1e+00 4e-04 5e-04 1e-04100 5e+04 2e-03 1e+00 4e-04 6e-04 2e-04

Again, comparison of the data in Tables 10-12 confirms the analytical results of sections 4,5, indicating that anappropriate pivoting technique can reduce the size of backward errors for the new algorithms, making them as favor-able as those of GEPP. The coefficient matrix in examples 3 and 4 is quite ill-conditioned, so the forward accuracyof GEPP is less favorable. However, the algorithms GS-Cauchy, GS-direct-Cauchy and quasi-Cauchy combinedwith partial pivoting provide smaller forward errors than GEPP (and the use of inversion formula), showing that

16

the use of the structure often allows us not only to speed-up the computation, but to also achieve more accuracy, ascompared to general structure-ignoring methods.

It may seem to be quite unexpected that for Cauchy-Toeplitz matrices the Gu’s pivoting technique (combiningrow and column permutations) can lead to less accurate solutions as compared to the PPP technique (based on rowpermutations only). To understand this occurrence it is useful to observe that the entries of the diagonals of Cauchy-

Toeplitz matricesh

1a+b(i�j)

idepend hyperbolically on the difference(i� j), thus giving a pick for the diagonal

with (i � j) � �a=b. We next display the MATLAB graphs for the several permuted versions of the matrix inExample 2 forn = 10.

05

10

0

5

10−10

0

10

20

No pivoting

05

10

0

5

10−10

0

10

20

Partial pivoting

05

10

0

5

10−10

0

10

20

Gu’s pivoting

Figure 1: Permuted versions of a Cauchy-Toeplitz matrix, corresponding to different pivoting techniques

One sees that in the original matrixC(x1:10; y1:10) the maximal magnitude entries (all = 10) occupy the 4-thsubdiagonal (i.e., in the lower triangular part of the matrix). Applying partial pivoting technique means movingeach of the rows 4-10 three postions up, so that the maximal magnitude entries are now all located on the maindiagonal. In the next table we list the condition numbers for thek � k leading submatrices corresponding to thethree pivoting techniques.

Table 13.Conditioning of leading submatrices.k No pivoting Gu’s pivoting Partial pivoting1 1.0 1.0 1.02 4:6 � 101 9.8 1.23 7:2 � 102 2:0 � 101 1.34 1:0 � 104 3:2 � 101 1.35 3:8 � 104 2:6 � 101 1.36 1:0 � 105 2:3 � 101 1.37 2:5 � 105 2:3 � 101 1.48 5:5 � 105 1:0 � 102 1:3 � 1029 1:0 � 106 6:8 � 103 1:2 � 10410 1:9 � 106 1:9 � 106 1:9 � 106

We note, however, that the motivation for introducing the Gu’s pivoting technique was given in [Gu95], wherean application of [GKO95] with displacement rank 2 or higher was discussed.

6.3 Example 3. A transposed system

However, an immediate question is what will happen for a transposed to the matrix in Example 2 (clearly thetransposed Cauchy matrix is a Cauchy matrix itself). Therefore we consider here a Cauchy-Toeplitz matrix with

17

the parametersa = 1 andb = 0:3. For such a matrix the maximal magnitude entries will now be located above themain diagonal. Therefore it is reasonable to apply a partialcolumnpivoting technique. As in the above example,we next display the permuted versions of a matrix, corresponding to different pivoting techniques.

05

10

0

5

10−10

0

10

No pivoting

05

10

0

5

10−10

0

10

Partial pivoting

05

10

0

5

10−10

0

10

Gu’s pivoting

05

10

0

5

10−10

0

10

Partial column pivoting

Figure 2: Permuted versions of a Cauchy-Toeplitz matrix, corresponding to different pivoting techniques

In Table 14 we list the corresponding condition numbers for all successive leading submatrices.

Table 14.Conditioning of leading submatrices.k No pivoting Gu’s pivoting Partial column pivoting1 1.0 1.0 1.02 4:6 � 101 2:0 � 101 1.23 7:2 � 102 3:6 � 101 1.34 1:0 � 104 2:9 � 101 1.35 3:8 � 104 6:3 � 101 1.36 1:0 � 105 5:4 � 101 1.37 2:5 � 105 9:2 � 101 1.48 5:5 � 105 1:1 � 102 1:3 � 1029 1:0 � 106 1:2 � 104 1:2 � 10410 1:9 � 106 1:9 � 106 1:9 � 106

We now turn to the numerical results comparing the performance of the algorithms designed in the presentpaper. We again used the vectorf =

�1 1 � � � 1

�Tfor the right-hand side.

Forward error.


n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e5 e640 5e+09 1.2e+01 7.1e+04 6.3e+00 2e-03 3e-05 3e-05 5e-06 5e-05 2e+0060 5e+10 1.2e+01 2.7e+05 7.7e+00 4e-03 1e-04 4e-04 7e-06 6e-04 1e+0080 3e+11 1.2e+01 7.1e+05 8.9e+00 9e-03 6e+02 3e-03 6e-06 3e-04 1e+00100 9e+11 1.2e+01 1.5e+06 1.0e+01 3e-02 3e+04 2e-03 6e-06 2e-03 1e+00

Table 16. Forward error. Partial column pivoting.2-norms INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e540 5e+09 1.2e+01 7.1e+04 6.3e+00 2e-03 3e+14 2e-03 2e-06 2e-0460 5e+10 1.2e+01 2.7e+05 7.7e+00 5e-03 1e+28 3e-03 2e-06 2e-0480 3e+11 1.2e+01 7.1e+05 8.9e+00 6e-03 NaN 9e-03 8e-06 2e-03100 9e+11 1.2e+01 1.5e+06 1.0e+01 1e-02 NaN 3e-03 8e-06 6e-04

Table 17. Forward error. Gu’s pivoting.2-norms INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e540 3e+08 2.0e+01 3.3e+03 6.3e+00 1e-03 9e-07 3e-04 9e-06 5e-0460 3e+08 1.2e+01 1.1e+03 7.7e+00 8e-03 3e-04 1e-03 1e-04 2e-0480 2e+09 1.2e+01 9.9e+02 8.9e+00 4e-02 7e+02 8e-03 2e-05 8e-03100 4e+11 1.2e+01 6.6e+05 1.0e+01 3e-02 8e+03 2e-03 6e-05 5e-03

18

Table 18. Forward error. No pivoting.2-norms INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy

n �2(c) kCk2 kadk2 kfk2 e1 e2 e3 e4 e540 5e+09 1.2e+01 7.1e+04 6.3e+00 4e-04 1e-07 7e-04 2e-05 3e-0460 5e+10 1.2e+01 2.7e+05 7.7e+00 9e-03 1e-04 6e-04 8e-05 1e-0380 3e+11 1.2e+01 7.1e+05 8.9e+00 1e-02 2e+01 4e-03 1e-04 6e-05100 9e+11 1.2e+01 1.5e+06 1.0e+01 2e-02 3e+03 1e-02 4e-04 2e-03

In this example the forward accuracy of the GS-Cauchy algorithm is better than that of the GS-direct-Cauchyand quasi-Cauchy algorithms. Note that there are many other examples, however, where these algorithms haveroughly the same accuracy.

Backward error. It turns out that for many different orderings all the algorithms designed in this paper exhibita favorable backward stability. Moreover, for n varying from 5 to 100, for partial row pivoting, partial columnpivoting, the Gu’s pivoting, and for no-pivoting the algorithms GS-Cauchy, GS-Cauchy-direct and quasi-Cauchyproduced backward errors of the order of10�8 which is comparable to that of GEPP. We however, found thatmonotonic ordering, defined in Sec. 6.1, and randomized ordering produce poor results.

This indicates that analytical error bounds obtained for the fast algorithms of this paper in fact may lead to awide variety of different pivoting techniques, each aimed at the reduction of the quantityjLj � jU j.

7 Transformation of a Polynomial-Vandermonde Matrix into a CauchyMatrix

In this section we shall show that all the fast Cauchy solvers suggested in the present paper can be used to solvelinear systems withpolynomial Vandermondematrices,

VP(x1:n)def=

26664

P0(x1) P1(x1) : : : Pn�1(x1)P0(x2) P1(x2) : : : Pn�1(x2)

......

......

P0(xn) P1(xn) : : : Pn�1(xn)

37775 : (7.1)

whereP = fP0(x); : : : ; Pn�1(x)g denotes a basis in the linear space Cn�1[x] of all complex polynomials whosedegree does not exceedn � 1. WhenP is the power basis, thenVP(x1:n) is the ordinary Vandermonde matrix.If P stands for the basis of Chebyshev polynomials (of the first or of the second kind), thenVP (x1:n) is calleda Chebyshev-Vandermonde matrix. FastO(n2) algorithms for solving Chebyshev-Vandermonde systems weresuggested in [Hig88], [HHR89], [CR93], [GO94d]. Here we suggest an alternative, based on the next result.

Proposition 7.1 Letfx1; :::; xn; y1; :::; yng be2n pairwise distinct complex numbers, and letu(x)=Qn

i=1(x�yi): Then the following formula is valid.

VP (x1:n) = diagfu(x1); : : : ; u(xn)g C(x1:n; y1:n) diagf 1

u0(y1); : : : ; 1

u0(yn)g VP (y1:n): (7.2)

We shall prove the above proposition at the end of this section.Observe that formula (7.2) relatesVP(x1:n) andVP (y1:n), or, in other words, it allows us to change the nodes

from fxkg to fykg, while keeping the polynomial basisP = fP0; : : : ; Pn�1g. Suitable choices of the new pointsfy1:ng can ensure thatVP (y1:n) has low complexity. In such cases Proposition 7.1 allows us to reduce the problemof solving a linear system withVP (x1:n) to the analogous problem of solving a linear system with the CauchymatrixC(x1:n; y1:n).

In the next proposition we specify several sets of pointsfy1:ng for which ordinary Vandermonde matrices andChebyshev-Vandermonde matrices have low complexities.

Proposition 7.2

1 ). Letyj = cos�

j�

n

�, j 2 f0; : : : ; ng be the extrema ofTn(x): Then

VT (y0:n) =�cos jk�

n

�nj;k=0

(7.3)

is the (scaled) Discrete Cosine Transform I matrix.

19

2 ). Letyj = cos�

2j�1n

�2

�, j 2 f1; : : : ; ng be the zeros of theTn(x): Then

VT (y1:n) =hcos

(2j�1)kn

�2

inj;k=1

(7.4)

is the Discrete Cosine Transform II matrix.

3 ). Letyj = cos�

j �n

�, j 2 f1; : : : ; n� 1g be the zeros ofUn�1(x). Then

VU (y1:n�1) =�sin jk�

n

�n�1j;k=1

(7.5)

is the (scaled) Discrete Sine Transform I matrix.

4 ). Letyj = cos�

2j�1n

�2

�, j 2 f1; : : : ; ng be the zeros ofTn(x). Then

VU (y1:n) =hsin

(2j�1)kn

�2

inj;k=1

(7.6)

is the (scaled and transposed) Discrete Sine Transform II matrix.

5 ). Letyj = exp�{ 2�jn

�, {=

p�1, j 2 f0; 1; :::; n� 1g denote the roots of unity. Then

V (y1:n) =hexp

�{ 2�j(k�1)

n

� in�1j;k=0

(7.7)

is the Discrete Fourier Transform matrix.

The latter proposition is easily deduced from the definitions of Chebyshev polynomials. Before proving Propo-sition 7.1, let us introduce the necessary notations. Letn denote the maximal degree of two polynomialsu(x) andv(x). The bivariate function

Bu;v(x; y) = u(x) � v(y)� u(y) � v(x)x� y

is called theBezoutianof u(x) andv(x): Now, letQ=fQ0(x); : : : ; Qn�1(x)g be another basis in the linear spaceCn�1[x]. The matrix

BfP;Q;u;vg =�bij

�n�1i;j=0

;

whose entries are determined by

Bu;v(x; y) =

n�1Xi;j=0

bij � Pi(x) �Qj(y)

=�P0(x) P1(x) � � � Pn�1(x)

�BfP;Q;u;vg

26664

Q0(y)Q1(y)

...Qn�1(y)

37775 : (7.8)

is called theBezout matrixof u(x) andv(x) with respect to the two sets of polynomialsP andQ:Proof of Proposition 7.1.The proof is based on the following useful property of the Bezout matrix

VP (x1:n) BfP;Q;u;vg VQ(y1:n)T =

� Bu;v(xi; yj) �ni;j=1 : (7.9)

which follows immediately from (7.8). It is easy to see that the matrix on the right-hand side of Eq. (7.9) is aquasi-Cauchy matrix :

VP (x1:n) BfP;Q;u;vg VQ(y1:n)T =

hu(xi) v(yj)xi�yj

ini;j=1= diagfu(x1); : : : ; u(xn)g C(x1:n; y1:n) diagfv(y1); : : : ; v(yn)g: (7.10)

On the other hand, by using (7.9) and the obvious relation

Bu;v(y; y) = u0(y) v(y)� u(y) v0(y);

it is easy to check that

VP (y1:n) BfP;Q;u;vg VQ(y1:n)T = diagfu0(y1) v(y1) ; : : : ; u0(yn) v(yn)g:

Now, substituting theBfP;Q;u;vg VQ(x1:n)T obtained from the last equation back into (7.10) yields (7.2).

20

8 Conclusion

In [BKO99] we developed a fastO(n2) Bjorck-Pereyra-type Cauchy solver, and proved that for the important classof totally positive coefficient matrices it yields pleasantly small forward, backward and residual errors. However,experience shows that in the generic case the numerical performance of the BP-type algorithm can be less favorable.Since the use of explicit inversion formula for Cauchy matrices also can produce a large backward error, no fast andaccurate methods methods were available for solving Cauchy linear equations. In this paper we designed severalalternative fastO(n2) Cauchy solvers, and the rounding error analysis suggests that their backward stability issimilar to that of Gaussian elimination (GE), so that various pivoting techniques (so successful for GE) will stabilizethe numerical behavior also for these new algorithms. It is further shown that the row ordering of partial pivoting andof the Gu’s pivoting [Gu95] can be achieved in advance, without actually performing elimination, and fastO(n2)algorithms for these purposes are suggested. We also identified a class of totally positive Cauchy matrices, forwhich it is advantageous not to pivot when using the new algorithms, which yields a remarkable backward stability.This matches the conclusion of de Boor and Pinkus, who suggested to avoid pivoting when performing standardGaussian elimination on totally positive matrices. Analytical error bounds and results of numerical experimentsindicate that the methods suggested in the present paper enjoy favorable backward stability.

References

[BKO94] T.Boros, T.Kailath and V.Olshevsky,Fast algorithms for solving Cauchy linear systems, Stanford In-formation Systems Laboratory report, 1994.

[BKO99] T.Boros, T.Kailath and V.Olshevsky,Fast Bjorck-Pereyra-type algorithm for parallel solution ofCauchy linear equations,” Linear Algebra and Its Applications, 302-303 (1999), p.265-293.

[BP70] A.Bjorck and V.Pereyra,Solution of Vandermonde Systems of Equations, Math. Comp.,24 (1970),893-903.

[C41] A.L. Cauchy,Memoires sur les fonctions alternées et sur les sommes alternées, Exercices d’analyse etde phys. math., ii (1841), 151-159.

[CF88] T.Chan and D.Foulser,Effectively well-conditioned linear systems, SIAM J. Sci. Stat. Computation, 9(1988), 963 – 969.

[CR92] D.Calvetti and L.Reichel,A Chebyshev-Vandermonde solver, Lin. Alg. Appl., 172(1992), 219-229.

[CR93] Calvetti, D. and Reichel, L. :Fast inversion of Vandermonde-like matrices involving orthogonal poly-nomials, BIT, 1993.

[DBP77] C. de Boor and A. Pinkus,Backward error analysis for totally positive linear systems, Numer. Math.,27 (1977), 485-490.

[GK50] F.R.Gantmacher and M.G.Krein,Oscillatory matrices and kernels, and small vibrations of mechanicalsystems, second edition, (in Russian), GITTL, Moscow, 1950.German translation :Oszillationsma-trizen, Oszillationskerne und kleine Schwingungen mechanischer Systeme, Berlin, Akademie Verlag,1960.

[GK90] I.Gohberg and I.Koltracht,On the inversion of Cauchy matrices, In Signal processing, Scattering andOperator Theory, and Numerical methods. Proc of the MTNS-89 (M.A.Kaashoek, J.H.van Schuppenand A.C.M.Ran, eds), 381-392, Birkhäuser, Boston, MA, 1990.

[GK93] I.Gohberg and I.Koltracht,Mixed, componentwise and structured condition numbers, SIAM J. MatrixAnal.,14(1993), 688–704.

[GKO95] I.Gohberg, T.Kailath and V.Olshevsky,Fast Gaussian elimination with partial pivoting for matriceswith displacement structure, Math. of Comp.,64 (1995), 1557-1576.

21

[GO94a] I.Gohberg and V.Olshevsky,Fast state space algorithms for matrix Nehari and Nehari-Takagi interpo-lation problems, Integral Equations and Operator Theory,20, No. 1 (1994), 44 – 83.

[GO94b] I.Gohberg and V.Olshevsky,Fast algorithms with preprocessing for matrix–vector multiplication prob-lems, Journal of Complexity,10 (1994), 411-427.

[GO94c] I.Gohberg and V.Olshevsky,Complexity of multiplication with vectors for structured matrices, LinearAlgebra Appl.,202, 1994, 163 – 192.

[GO94d] I.Gohberg and V.Olshevsky,Fast inversion of Chebyshev–Vandermonde matrices, Numer. Math.,67,No. 1 (1994), 71 – 92.

[GVL89] G. Golub and C, Van Loan,Matrix Computations, second edition, John Hopkins U. P., Baltimore,1989.

[Gu95] Ming Gu,Stable and efficient algorithms for structured linear systems, preprint, 1995.

[Hig87] N.J.Higham,Error analysis of the Bj¨orck–Pereyra algorithms for solving Vandermonde systems, Nu-mer. Math., 50 (1987), 613 – 632.

[Hig88] N.J.Higham,Fast solution of Vandermonde-like systems, involving orthogonal polynomials, IMA J.Numer. Anal.,8 (1988), 473-486.

[Hig90] N.J.Higham,Stability analysis of algorithms for solving confluent Vandermonde-like systems, SIAMJ. Matrix Anal.,11(1) (1990), 23–41.

[Hig96] N.J.Higham,Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996.

[HHR89] G.Heinig, W.Hoppe and K.Rost,Structured matrices in interpolation and approximation problems,Wissenschaftl. Zeitschrift der TU Karl-Marx-Stadt 31, 2 (1989), 196 – 202.

[HR84] G.Heinig and K.RostAlgebraic methods for Toeplitz-like matrices and operators, Operator Theory,vol. 13, 1984, Birkauser Verlag, Basel.

[K72] S.Karlin,Total Positivity, vol 1, Stanford University Press, Stanford, 1972.

[KKM79] T.Kailath, S.Kung and M.Morf,Displacement ranks of matrices and linear equations, J. Math. Anal.and Appl.,68 (1979), 395-407.

[KO95a] T.Kailath and V.Olshevsky,Displacement structure approach to Chebyshev–Vandermonde and relatedmatrices, Integral Equations and Operator Theory,22, 1995, 65 – 92.

[KO95b] T.Kailath and V.Olshevsky,Bunch-Kaufman pivoting for partially reconstructable Cauchy-like ma-trices with application to Toeplitz-like linear equations and to boundary rational matrix interpolationproblems, 1995, to appear in Linear Algebra and Appl.

[KS95] T.Kailath and A.H.Sayed,Displacement structure : Theory and Applications, SIAM Review,37No.3(1995), 297-386.

[LAK86] H.Lev-Ari and T.Kailath,Triangular factorization of structured Hermitian matrices, in Operator The-ory : Advances and Applications(I.Gohberg. ed.), vol.18, 301 - 324, Birkhauser, Boston, 1986.

[R90] L.Reichel,Newton interpolation at Leja points, BIT, 30(1990), 23 - 41.

[RO91] L. Reichel and G. Opfer,Chebyshev-Vandermonde Systems, Math. of Comp.,57(1991) , 703-721.

[S17] I.Schur,Uber potenzreihen die im Inneren des Einheitskreises beschr¨ankt sind, Journal fur die Reineund Angewandte Mathematik,147 (1917), 205 - 232. English translation : inOperator Theory : Ad-vances and Applications(I.Gohberg. ed.), vol.18, 31 – 88, Birkhauser, Boston, 1986.

[SB80] J.Stoer and R.Bulirsch,Introduction to Numerical Analysis, Springer-Verlag, New York, 1980.

22

[SB95] D.Sweet and R.Brent,Error analysis of a partial pivoting method for structured matrices, AdvancedSignal processing algorithms, Proc of SPIE-1995, vol.2563, 266-280.

[TG81] W.Tang and G.Golub,The block decomposition of a Vandermonde matrix and its aplications, BIT, 21(1981), 505-517.

[Ty94] E.Tyrtyshnikov,How bad are Hankel matrices, Numer. Math.,67, No. 2 (1994), 261 – 269.

[V93] J. M. Varah,Errors and Perturbations in Vandermonde Systems, IMA J. of Numer. Anal.,13 (1993),1-12.

[W68] J.Wilkinson,A priori error analysis of algebraic processes, Proc. Intern. Congr. Math. (1966), pp.629-639, Moskow, Mir 1968.

23

pivoting and backward stability of fast algorithms for ...olshevsky/papers/pppc3.pdffor the new...

Documents