a new iterative method for solving large-scale … › ~simoncin › lyap_accel3_revrev.pdf ·...

21
A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE LYAPUNOV MATRIX EQUATIONS * V. SIMONCINI Abstract. In this paper we propose a new projection method to solve large-scale continuous- time Lyapunov matrix equations. The new approach projects the problem onto a much smaller approximation space, generated as a combination of Krylov subspaces in A and A -1 . The reduced problem is then solved by means of a direct Lyapunov scheme based on matrix factorizations. The reported numerical results show the competitiveness of the new method, compared to a state-of-the- art approach based on the factorized Alternating Direction Implicit (ADI) iteration. 1. Introduction. Given the time-invariant linear system x (t)=Ax(t)+ Bu(t), x(0) = x 0 , (1.1) with A R n×n and B R n×s with s n, the solution of the associated continuous- time Lyapunov equation AX + XA T + BB T =0 (1.2) has a fundamental role in many application areas such as signal processing and system and control theory. We assume that s n, and that A is dissipative, that is, x T (A + A T )x< 0 for all real vectors x. The symmetric solution X, called the reachability Gramian, carries important information on the stability and energy of the linear system (1.1) and on the feasibility of order reduction techniques [1], [4], [9]. The numerical solution of the Lyapunov matrix equation has been addressed in a large body of literature, and well established methods exist for small dimensional problems [2], [20]. Research has recently focused mostly on schemes for medium to large-scale Lyapunov equations, for which methods based on matrix factorizations be- come too demanding, both in terms of computational costs and memory requirements. For dense problems, efforts have been devoted to the development of matrix iterative methods based on the sign function, possibly exploiting the problem structure; see, e.g., [3], [6], and references therein. In the case of sparse matrix A, projection-type methods based on Krylov sub- spaces, either using polynomials or rational functions of A, have been employed; see, e.g., [23], [24], [25], [26], [28], [32], and the general treatment in [1]. In particular, since the work by Ellner and Wachspress in [12], the ADI iteration has found great applicability in the solution of (1.2), and software implementing an efficient variant is now available; see, e.g., [30]. A common feature of all these projection methods is that the approximate solution X is given in factored form, X = ZZ T with Z of low column rank. The possibility of obtaining a good low rank approximate solution is supported by recent theoretical results showing that a rapid decay of the eigenvalues of the symmetric and positive semidefinite exact solution may be observed under cer- tain conditions on A; see [16], [29], [37]. This property can be fully exploited when dealing with large problems, since storing the whole, in general full matrix X would require a significant amount of memory space. Other methods have been investigated, such as that in [22], which however do not build a low rank approximation. * Version of February 3, 2007. Dipartimento di Matematica, Universit` a di Bologna, Piazza di Porta S. Donato 5, I-40127 Bologna, Italy and CIRSA, Ravenna, Italy ([email protected]). 1

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE

LYAPUNOV MATRIX EQUATIONS∗

V. SIMONCINI†

Abstract. In this paper we propose a new projection method to solve large-scale continuous-time Lyapunov matrix equations. The new approach projects the problem onto a much smallerapproximation space, generated as a combination of Krylov subspaces in A and A−1. The reducedproblem is then solved by means of a direct Lyapunov scheme based on matrix factorizations. Thereported numerical results show the competitiveness of the new method, compared to a state-of-the-art approach based on the factorized Alternating Direction Implicit (ADI) iteration.

1. Introduction. Given the time-invariant linear system

x′(t) =Ax(t) + Bu(t), x(0) = x0, (1.1)

with A ∈ Rn×n and B ∈ R

n×s with s ≤ n, the solution of the associated continuous-time Lyapunov equation

AX + XAT + BBT = 0 (1.2)

has a fundamental role in many application areas such as signal processing and systemand control theory. We assume that s ¿ n, and that A is dissipative, that is, xT (A+AT )x < 0 for all real vectors x. The symmetric solution X, called the reachabilityGramian, carries important information on the stability and energy of the linearsystem (1.1) and on the feasibility of order reduction techniques [1], [4], [9].

The numerical solution of the Lyapunov matrix equation has been addressed ina large body of literature, and well established methods exist for small dimensionalproblems [2], [20]. Research has recently focused mostly on schemes for medium tolarge-scale Lyapunov equations, for which methods based on matrix factorizations be-come too demanding, both in terms of computational costs and memory requirements.For dense problems, efforts have been devoted to the development of matrix iterativemethods based on the sign function, possibly exploiting the problem structure; see,e.g., [3], [6], and references therein.

In the case of sparse matrix A, projection-type methods based on Krylov sub-spaces, either using polynomials or rational functions of A, have been employed; see,e.g., [23], [24], [25], [26], [28], [32], and the general treatment in [1]. In particular,since the work by Ellner and Wachspress in [12], the ADI iteration has found greatapplicability in the solution of (1.2), and software implementing an efficient variantis now available; see, e.g., [30]. A common feature of all these projection methods is

that the approximate solution X is given in factored form, X = ZZT with Z of lowcolumn rank. The possibility of obtaining a good low rank approximate solution issupported by recent theoretical results showing that a rapid decay of the eigenvaluesof the symmetric and positive semidefinite exact solution may be observed under cer-tain conditions on A; see [16], [29], [37]. This property can be fully exploited when

dealing with large problems, since storing the whole, in general full matrix X wouldrequire a significant amount of memory space. Other methods have been investigated,such as that in [22], which however do not build a low rank approximation.

∗Version of February 3, 2007.†Dipartimento di Matematica, Universita di Bologna, Piazza di Porta S. Donato 5, I-40127

Bologna, Italy and CIRSA, Ravenna, Italy ([email protected]).

1

Page 2: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

2 V. Simoncini

In spite of these recent efforts, projection methods that are directly based onKrylov subspace approaches (cf. [32], [23]) may be slow, requiring the generation ofa very large approximation space to obtain a sufficiently accurate solution; see theexperiments reported in [28]. On the other hand, ADI-type methods need the a-priorievaluation of some parameters which may be hard to tune. In addition, memoryand CPU time requirements remain an issue, for these methods perform various largematrix factorizations.

In this paper we propose a new projection-type method for large-scale problems,that is in general faster than approaches based on the ADI iteration, and does notrequire estimation of parameters. The new method projects the problem onto asmall approximation space, generated as a combination of Krylov subspaces in A andA−1. The reduced problem is then solved by means of a standard scheme for smallLyapunov equations. The method is equipped with a natural stopping criterion, whichcan be computed inexpensively. The performance of the new method is comparedon benchmark problems to that of the cyclic implementation of the factorized ADImethod available in [30].

The paper is organized as follows. Section 2 describes the general subspace pro-jection framework, which characterizes the new approach and several other reductionmethods. Section 3 and its subsections describe the new method, and its algorithmicaspects. Section 4 reviews the ADI iteration and its efficient variant, to prepare thenotation for the numerical experiments reported in section 5. Finally, in section 6 weprovide some conclusive remarks.

The following notation is used throughout. The square identity and zero matricesare denoted by I and O, respectively; rectangular portions will be identified by usingsubscripts. The only exceptions are the (block) columns of the identity, denoted byei, ith column (resp. Ei, the ith block of s columns of the identity matrix). Matlabnotation is used whenever possible [27]. ‖ · ‖ indicates the 2-norm for vectors andthe induced norm for matrices, while ‖ · ‖F denotes the Frobenius norm, that is‖A‖2

F =∑n

i=1

∑mj=1 a2

i,j for A ∈ Rn×m. Range(V ) indicates the space spanned by

the columns of V . Finally, exact arithmetic is assumed throughout.

2. Projection methods for the Lyapunov equation. In this section we re-view a general framework associated with projection methods for the Lyapunov equa-tion. This will allow us to simplify the presentation of the new approach in section 3.

Consider a subspace K of Rn, and let V be a matrix whose orthonormal columns

span K, with B = V E for some matrix E, so that the columns of B belong to K. Weproject the Lyapunov equation onto this subspace K: we set H = V T AV , and seek anapproximate solution in K in the form X = V Y V T . The projected problem may beobtained by imposing that the residual R = AX + XAT + BBT be orthogonal to theapproximation space K, the so-called Galerkin condition. In matrix terms, this canbe written as1 V T RV = O. Since V has orthonormal columns, imposing V T RV = 0gives the following projected Lyapunov equation

V T AV Y + Y V T AT V + EET = 0, (2.1)

whose solution Y defines X. Since we assume that A is dissipative, V T AV is stableand the equation (2.1) admits a unique solution. The projection is effective if a goodapproximation is attained for a small dimensional subspace, so that (2.1) can be

1This condition can be derived by first defining the matrix inner product 〈X, Y 〉 = tr(XY T ) andthen imposing 〈R, P 〉 = 0 for any P = V GV T [32].

Page 3: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 3

cheaply solved by means of procedures based on matrix factorizations; cf. [2], [20]. Inturn, this effectiveness is more likely to take place whenever the exact solution can beaccurately approximated by low rank matrices. An additional theoretical motivationfor using such technique is given by the exponential form of the analytical solution,

X =

∫ ∞

0

etABBT etAT

dt. (2.2)

Note that the use of this formulation has lead to different approximation proceduressuch as that in [17]. The quantity W = etAB can be approximated in K as W =V etHE. Therefore, an approximation to X can be obtained as

X =

∫ ∞

0

WWT dt = V

(∫ ∞

0

etHEET etHT

dt

)V T =: V Y V T .

Using the fact that H = V T AV is stable, the following result can be readily estab-lished; see, e.g., [9].

Proposition 2.1. The matrix Y =∫ ∞0

etHEET etHT

dt is the unique solution tothe Lyapunov equation (2.1).

Although the procedure above holds independently of the specific approximationspace K, its effectiveness crucially depends on K. The presentation outlined abovewas first introduced in [32] for B ∈ R

n and K corresponding to the Krylov subspaceKk(A,B) = span{B,AB, . . . , Ak−1B}. Further developments of Krylov subspacemethods for B ∈ R

n×s and using conditions on the residual different from the Galerkinone were discussed in [23]. Reasons for expecting good accuracy in the approximationfor sufficiently large dimension of the Krylov subspace comes from the approximationproperties of the exponential. Let Vk be such that Range(Vk) = Kk(A,B), and

Hk = V Tk AVk. If Wk = VketHE is a good approximation to W = etAB, we expect

the corresponding integral to approximate X. In particular, assuming for the sake ofthe presentation t = 1, if the numerical range of A is contained in the complex disk|z + ρ| ≤ ρ for some ρ > 0, then a bound of the form ‖ exp(A)v − Vk exp(Hk)e1‖ =O(( eρ

k)k), holds, for k ≥ 2ρ; see [21]. More accurate bounds can be obtained in the

symmetric case [21], [40], [10].Assume now that a sequence of approximation subspaces Kk of dimension k is

given, with Kk ⊂ Kk+1, and satisfying AKk ⊂ Kk+1. Note that these conditions onKk are reasonably general.

Let the columns of Vk span Kk, Hk = V Tk+1AVk, and let Xk = VkYkV T

k be theapproximate solution to (1.2) in Kk, obtained by imposing the Galerkin condition.The associated residual Rk = AXk + XkAT + BBT satisfies (see, e.g., [32])

Rk = Vk+1HkVkYkV Tk + VkYkHT

k V Tk+1 + VkEET V T

k =: Vk+1RV Tk+1. (2.3)

The selection Kk = Kk(A,B) fulfills all conditions above, and in addition it can beshown that ‖Rk‖ = ‖ET

k+1HkYk‖, so that the quality of the approximation Xk can bemeasured by computing the residual norm, without explicitly evaluating the residualmatrix. This setting has been heavily exploited to devise competitive approaches, andto derive theoretical properties; see, e.g., [24], [23], [31].

3. The new iterative procedure. The general description of the previoussection suggests that the approximation space K could be enriched to contain more

Page 4: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

4 V. Simoncini

information than that of the regular Krylov subspace Kk(A,B). To this end, startingwith the pair {B,A−1B}, we propose to generate a sequence of approximation spacesthat contain information on both A and A−1, by adding two vectors at the time, onemultiplied by A, and one by A−1, that is

span{B,A−1B,AB,A−2B,A2B,A−3B, . . .},

where the span is intended block-wise; see, e.g., [33]. As discussed in section 2,once the approximation space is generated, the original problem is projected ontothis space, and the resulting small dimensional problem is solved. Our approach isinspired by a related acceleration procedure for the approximation of matrix functionsdeveloped in [11]; see the discussion at the end of section 3.2.

An outline of the algorithm is given next. Implementation details of the approachare provided in section 3.1.

Given B, A, set V1 = gram sh([B,A−1B]), V0 = ∅.For m = 1, 2, . . . ,1. Vm = [Vm−1, Vm]2. Set Tm = VT

mAVm and E = VTmB

3. Solve TmY + Y T Tm + EET = 0 and set Ym = Y

4. If converged then Xm = VmYmVTm and stop

5. Set V(1)m : first s columns of Vm; V

(2)m : second s columns of Vm

6. V ′m+1 = [AV

(1)m , A−1V

(2)m ]

7. Vm+1 ← orthogonalize V ′m+1 w.r.to Vm

8. Vm+1 = gram sh(Vm+1)

At each iteration of this process, two new blocks of s vectors are added to thespace, so that the space dimension increases by 2s. Unless breakdown occurs, atthe mth iteration the method has constructed an orthonormal basis of dimension2sm, given by the columns of the matrix Vm = [V1, V2, . . . , Vm], Vi ∈ R

n×2s, andKm := Range(Vm). The orthogonalization is performed first with respect to the pre-vious basis vectors, and then within the new block of 2s vectors (function gram sh,see section 3.1). At each iteration (cf. step 3), a small dimensional Lyapunov equa-tion with matrices of size 2sm × 2sm is solved. Only at convergence (step 4), theapproximate solution to X is constructed as

Xm = VmYmVTm, (3.1)

with Ym symmetric. This step can be omitted if Xm is not required explicitly. Thefollowing important properties are a consequence of the subspace definition.

Proposition 3.1. For any m ≥ 1, the space Km satisfies Km = K2m(A,A−mB),where K2m is a block Krylov subspace. In particular, it follows that

AKm ⊆ Km+1. (3.2)

We observe that the results of Proposition 3.1 ensure that Tm is block upper Hes-senberg, since V T

k AVj = O for k > j + 1, j = 1, 2, . . .. In particular, property (3.2)allows us to employ the framework of section 2. Moreover, since A is dissipative, Tm

is stable, so that Ym is positive (semi-)definite.To simplify the presentation, we limit the following discussion to the case s = 1,

however, the stated relations formally hold unchanged for s ≥ 1, by replacing scalar

Page 5: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 5

quantities with s × s matrices. We derive some recursive relations that can be usedto significantly reduce the computational cost of the basic algorithm. Let

Hm =

[Hm

χmETm

]∈ R

2(m+1)×2m,

collect all orthonormalization coefficients, that is

Vm+1 = [AV (1)m , A−1V (2)

m ] − VmHm[e2m−1, e2m] (3.3)

Vm+1χm = Vm+1. (3.4)

Here χm is an upper 2×2 triangular matrix such that Vm+1 has orthonormal columns;note that Hm is block upper Hessenberg with 2×2 blocks. Let also T m := VT

m+1AVm.The computation of Tm seems to require additional matrix-vector products with Aand extra inner products of long vectors. We next derive a recursion for T m thatcompletely avoids this expensive step.

Proposition 3.2. Let `(k) = (`ij) be the 2 × 2 matrix such that Vk = Vk`(k) in(3.3), k = 1, . . . ,m. Let

T m = (ti,j)i=1,...,2m+2,j=1,...,2m and Hm = (hi,j)i=1,...,2m,j=1,...,2m.

Then (odd columns)

t:,2k−1 = h:,2k−1, k = 1, . . . ,m,

while (even columns)

(k = 1) t:,2 = h:,1(`(1)11 )−1`

(1)12 + e1(`

(1)11 )−1`

(1)22 t:,4 = (e2 − T 1h1:2,2)`

(2)22 ,

ρ(2) = (`(2)11 )−1`

(2)12

(1 < k ≤ m) t:,2k = t:,2k + t:,2k−1ρ(k)

t:,2k+2 = (e2k − T kh1:2k,2k)`(k+1)22 , ρ(k+1) = (`

(k+1)11 )−1`

(k+1)12 .

Proof. For k ≥ 1, we write Vk = [V(1)k , V

(2)k ] ∈ R

n×2. Using (3.3), we immediately

have AV(1)k = Vk+1χke1 + VkHke2k−1 = Vk+1Hme2k−1, so that the odd columns of

Tm satisfy

VTmAVme2k−1 = VT

mAV(1)k =

[I2k+1

O

]Hke2k−1 = Hme2k−1.

The even columns need to take into account that the recurrence comes frommultiplications by A−1, and thus the term AV

(2)k is not explicitly available. However,

from (3.3) we obtain

V(2)k = AV

(2)k+1 + AVkHke2k,

which gives a recurrence for the (2k + 2)th column AV(2)k+1. Hence,

VTm+1AV

(2)k+1 = VT

m+1V(2)k − VT

m+1AVkHke2k = e2k −(

T k

Om−k,2k

)Hke2k.

Page 6: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

6 V. Simoncini

To complete the derivation, we recall that Vk+1 = Vk+1`(k+1), with `(k+1) = (`ij)

upper triangular, so that V(2)k+1 = V

(1)k+1`12 + V

(2)k+1`22, from which AV

(2)k+1 = AV

(1)k+1`12 +

AV(2)k+1`22 = AV

(1)k+1`

−111 `12 + AV

(2)k+1`22. Therefore,

VTm+1AV

(2)k+1 = VT

m+1Vk+2Hk+1e2k+1`−111 `12 +

(e2k −

(T k

Om−k,2k

)Hke2k

)`22.

The recursion thus follows.The recurrence for T m shows that the lower diagonal 2× 2 block has zero second

row. This fact can be used to simplify the computation of the residual norm.Proposition 3.3. For m ≥ 1, let Km = Range(Vm) and let Xm be the approx-

imate solution obtained as in (3.1) with associated residual Rm = AXm + XmAT +BBT . Then Range(Rm) = Range(Vm+1) and ‖Rm‖ = ‖eT

2m+1T mYm‖.Proof. Since AVm = Vm+1L, for some matrix L, the first result follows from (2.3).

Therefore, Rm = Vm+1RmVTm+1, with

Rm = VTm+1RmVm+1 =

(0 VT

mRmVm+1

V Tm+1RmVm 0

).

Using V Tm+1RmVm = ET

m+1T mYm, it follows ‖Rm‖ = ‖Rm‖ = ‖V Tm+1RmVm‖ =

‖ETm+1T mYm‖ = ‖eT

2m+1T mYm‖.The matrix Ym in the solution Xm = VmYmVT

m is often numerically positivesemi-definite [29], [16], [37]. If this is so, it is possible to generate a lower rank Xm asXm = ZmZT

m with Zm of rank lower than 2m, the dimension of Ym. More precisely,let Ym = WDWT be the eigendecomposition of the 2m × 2m matrix Ym with Ddiagonal having diagonal entries sorted in decreasing order, and let WkDkWT

k bethe approximate decomposition obtained by truncating to the first k columns of W ,with k ¿ 2m. Then Ym ≈ WkDkWT

k and the approximation is determined by using

Zm = VmWkD1

2

k . The number k of columns to retain may be determined by inspectingthe diagonal elements of D. In our implementation, we discarded the columns of Wcorresponding to diagonal elements of D less than 10−12; a more conservative tolerancecould also be employed.

Due to the form of the approximate solution, it is also possible to monitor at lowcost the amount by which the approximate solution varies as the iterations proceed.Indeed, letting

Ym =

[Ym ym

yTm ηm

],

with Ym of size (2m − 2) × (2m − 2) we have

Xm − Xm−1 = [Vm−1, Vm]

[Ym − Ym−1 ym

yTm ηm

][Vm−1, Vm]T =: VmGmVT

m,

so that

‖Xm − Xm−1‖F

‖Xm−1‖F

=‖Gm‖F

‖Ym−1‖F

, (3.5)

where both Gm and Ym−1 have small dimension. We do not advocate using thisquantity as stopping criterion, however we shall sometime refer to it in our experimentsfor comparison purposes.

Page 7: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 7

3.1. Implementation details. An implementation of the algorithm associatedwith the procedure outlined above is given by the function Krylov-Plus-Inverted-Krylov (hereafter K-PIK). For the sake of simplicity, we report here the case s = 1.The general case s ≥ 1 is depicted in the appendix. The two codes only differ in thematrix indexing. Both codes are available from the author upon request.

function Z=K_PIK(A,rhs,m_max,k_max,tol)

nrmb=norm(rhs,’fro’)^2;

nrma=norm(A,’fro’);

[LA,UA]= lu(A); % For sym matrices use UA=chol(-A); LA=-UA’;

[ibeta,U(:,1:2)] = gram_sh([rhs, UA\ (LA\ rhs)]);

for j=1:m_max,

% Expand the approximation space

jms=(j-1)*2+1;j1s=(j+1)*2;js=j*2;js1=js+1;

Up(:,1) = A*U(:,jms); Up(:,2) = UA\ (LA\ U(:,js));

%New Basis Block

for il=1:2 % Reorthogonalization

k_min=max(1,j-k_max);

for kk=k_min:j

k1=(kk-1)*2+1; k2=kk*2;

coef= U(1:n,k1:k2)’*Up;

H(k1:k2,jms:js) = H(k1:k2,jms:js)+ coef;

Up = Up - U(:,k1:k2)*coef;

end

end

if (j<=m_max)

[hinv,U(1:n,js1:j1s)]=gram_sh(Up); H(js1:j1s,jms:js)=inv(hinv);

end

I = speye(js+2);

% Update the T recurrence

if (j==1),

l(1:js+1,j)=[ H(1:3,1), speye(3,1)]*ibeta(1:2,2)/ibeta(1,1);

else

l(1:js+2,j)=l(1:js+2,j) + H(1:js+2,js-1)*rho;

end

T(1:js+2,1:2:js)=H(1:js+2,1:2:js); T(1:js+2,2:2:js)=l(1:js+2,1:j);

l(1:js+2,j+1)=( I(1:js+2,j*2)-T(1:js+2,1:js)*H(1:js,js))*hinv(2,2);

rho = hinv(1,2)/hinv(1,1);

% Solve small Lyapunov equation

Y = lyap(T(1:js,1:js),beta^2*speye(js,1)*speye(js,1)’);

% Compute residual and test for convergence

cc(1:2,1) = H(js1:j1s,js-1); cc(1:2,2) = l(js1:j1s,j);

nrmx = norm(Y,’fro’);

er2(j) = norm(cc*Y(js-1:js,:))/(nrmb+nrma*nrmx);

if (er2(j)<tol), break, end

end

% Reduced rank solution tol = 1e-12

[uY,sY]=eig(Y); [sY,id]=sort(diag(sY));

sY=flipud(sY); uY=uY(:,id(end:-1:1)); is=sum(abs(sY)>1e-12);

Y0 = uY(:,1:is)*diag(sqrt(sY(1:is)));

Z = U(:,1:js)*Y0;

Page 8: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

8 V. Simoncini

A few comments are in order. The LU factorization of A is employed for mediumsize matrices, and the Cholesky factorization of −A should be used for A symmetric.For large matrices, a preconditioned iterative method could be employed to solvesystems with A at each iteration, where the preconditioner could be generated oncefor all; see section 5.3 for some experiments.

Function gram sh performs the modified Gram-Schmidt orthogonalization of thenew block [14]. The same process is used for the orthogonalization with respect toolder vectors in the basis. Note that this reduces to a procedure similar to the blockLanczos algorithm (see [15]) when the problem is symmetric, by setting k max = 2,so that only orthogonalization with respect to the previous two blocks is enforced. Inparticular, for A symmetric, the whole matrix Vm is only needed at convergence torecover the solution factor. Therefore, one can either store the columns that are notneeded in the orthogonalization process, or use a “two-pass” procedure, in which thebasis vectors are first discarded. At convergence, the short-term iteration is performedonce again to recover the basis vectors and update the final solution. Although thisprocedure is expensive, essentially doubling the computational cost of the generationof the approximation space, the memory requirements become extremely limited,allowing one to tackle very large problems, in the symmetric case. It is well knownthat if orthogonality is not enforced explicitly, as is the case when using short-termrecurrences for A symmetric, loss of orthogonality of the columns of Vm may affectthe computation of the matrix Hm and of the residual norm in Proposition 3.3. If oneis willing to keep the whole basis in fast memory, orthogonalization of all basis vectorscan be explicitly performed. Problems increase for s > 1. We refer to [19] for a recentpresentation of these issues and for safeguard strategies that can be implemented tomitigate the problem in the block setting.

The function lyap computes the numerical solution of the small dimensional Lya-punov equation by means of a Schur decomposition type method; in our experiments,we used the algorithm proposed by Sorensen and Zhou [38], which implements theBartels-Stewart method with only real arithmetic; other alternatives may be consid-ered as, e.g., in the SLICOT package ([5], [36], [41]). Since the matrix Tm is stable,other procedures such as the Hammarling method could also be employed [20]. Notealso that the fact that Tm is block upper Hessenberg significantly reduces the compu-tational work of the method. We should remark, however, that as the approximationsubspace increases, this step becomes the most computational demanding task of thealgorithm.

The stopping criterion is

‖AXm + XmAT + BBT ‖2‖A‖F ‖Ym‖F + ‖B‖2

F

≤ tol (3.6)

where tol is a chosen fixed threshold. The numerator is obtained by using the compu-tationally inexpensive relation of Proposition 3.3. We observe that the output matrixis the memory-conserving factor Zm of Xm = ZmZT

m.

3.2. Finite Termination and convergence. For the sake of simplicity, in thissection we mostly limit our discussion to the case s = 1. In exact arithmetic, theprocedure just described has finite termination, since for m such that 2m ≥ n thegenerated vectors span the whole space. However, since two vectors at the time areadded to the current basis, loss of rank may occur during the orthogonalization withrespect to the older basis vectors, so that the next basis pair using Vm+1 cannot

Page 9: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 9

be built. We next show that this implies convergence, as expected by the result ofProposition 3.1.

Proposition 3.4. Assume that m − 1 iterations of the K-PIK method havebeen taken, with B ∈ R

n. At the mth iteration, assume that Vm+1 in (3.3) has rank

less than two. Then Vm ∪ {Vm+1} is an invariant subspace of A with respect to B.

Therefore, Range(Vm ∪ {Vm+1}) contains the exact solution.Proof. We recall that Range(Vm) = K2m(A,A−mB). Therefore, the given as-

sumption requires that there exist αi ∈ R, i = −m − 1, . . . ,m not all zero suchthat

α−mA−mB+ . . . +α−1A−1B + α0B + α1AB +

. . . +αm−1Am−1B + αmAmB + α−m−1A

−m−1B = 0.

Assume first α−m−1 = 0 and αm 6= 0. Then, A(Am−1B) =∑2m−1

j=0 βjAjA−mB, which

shows that AVm = VmTm, and thus Vm is invariant for A. Using Xm = VmYmVTm, we

obtain

AXm + XmAT + BBT = Vm(TmYm + YmTm + EET )VTm = O.

Next, assume that α−m−1 6= 0 and αm 6= 0. Then,

A−1A−mB =

2m−1∑

j=0

βjAjA−mB + β2mAmB,

or, equivalently, A−mB − ∑2mj=1 βjA

jA−mB = β2mA(AmB), so that A[Vm, V(1)m ] =

[Vm, V(1)m ]Tm where Tm is the restriction of Tm+1 to the first (2m + 1) columns and

rows. Therefore, [Vm, V(1)m ] is invariant for A. As in the previous case, writing X ′

m =

[Vm, V(1)m ]Y ′

m[Vm, V(1)m ]T , where Y ′

m is the solution to the Lyapunov equation with

coefficient matrix Tm, we obtain AX ′m +X ′

mAT +BBT = [Vm, V(1)m ](TmY ′

m +Y ′mTm +

EET )[Vm, V(1)m ]T = O, and the proof is completed.

If B is not a vector, then loss of rank while generating the basis may occur withoutconvergence has taken place. In this case, deflation strategies should be employed, asin standard block Krylov subspace methods [19].

In section 2 we discussed the relation between the approximation to the exponen-tial and the projected Lyapunov equation. The new approach implicitly approximatesW = exp(A)B with W = Vm exp(Tm)E in Km = K2m(A,A−mB), and results in thiscontext can be used to theoretically motivate our approach. For A symmetric and B avector, it is shown by Druskin and Knizhnerman in [11] that the choice K2m(A,A−mB)yields an improved approximation to the exponential, over the standard Krylov sub-space K2m(A,B). More precisely, they show that the quality of the approximation tothe exponential (and in fact to a whole class of functions) in K2m(A,B) correspondsto that in K√

2m(A,A−mB). Therefore, using K√2m(A,A−mB) allows one to achieve

the same approximation quality as a regular Krylov space with a much lower subspacedimension.

We explicitly observe that the method proposed in [11] aims at approximatingfunctions of symmetric matrices. In addition, the authors suggest to fix a-priori thenumber k of matrix-vector products with A−1, so as to implicitly build the approx-imation space span{A−kv,A−k+1v, . . . , Av, . . . , Am−1v, . . .}, where m (but not k) isallowed to increase. Here, we propose to increase the subspace dimension in “both

Page 10: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

10 V. Simoncini

directions” until convergence. This can be easily done thanks to the recurrence inProposition 3.2 for the matrix Tm.

4. The Cholesky-Factorized ADI iteration. The Lyapunov equation wasshown to be a “model problem” for the classical ADI iteration; see [12], [42], [13], [26].More recently, a Cholesky-factorized version has made the method more appealing forlarge-scale computation, for the approximate solution can be written as the product oftwo low-rank matrices [25], [28], [30]. We next review various implementation detailsof this method, to clarify its use in our numerical comparisons in section 5. Finally, weshould mention that for the CF-ADI method to be derived, A is only required to bestable, that is its eigenvalues should lie in the left-half plane, whereas projection-typemethods need dissipativity, to ensure that the matrix V T AV in (2.1) is stable.

Given a set of parameters p1, p2, . . . , p`, the basic ADI iteration determines anapproximate solution as (cf., e.g., [25])

X0 = 0, Xj = −2pj(A + pjI)−1BBT (A + pjI)−T

+(A + pjI)−1(A − pjI)Xj−1(A − pjI)T (A + pjI)−T , j = 1, . . . , `.

It can be shown that Xj is symmetric and positive (semi-)definite so that it can berepresented as Xj = ZjZ

Tj , and is iteratively built one set of s columns at the time.

The actual implementation, hereafter CF-ADI, requires s solves with A + pjI periteration; see, e.g., [25, Algorithm 2]. A cyclic procedure is obtained by continuingthe iteration above and cyclically using the parameters, and this is also referred to asthe cyclic low-rank Smith(`) method [28]. The recurrence can be stopped as soon asthe residual matrix AZjZ

Tj + ZjZ

Tj AT + BBT is sufficiently small in some norm. It

was shown in [28] that

‖AZjZTj + ZjZ

Tj AT + BBT ‖F = ‖[AZj , Zj , B][Z,AZj , B]T ‖F = ‖rIrT ‖F , (4.1)

where r is the triangular square matrix of the “economy-size” QR factorization ofthe matrix [AZj , Zj , B], and I is a permutation matrix. Due to memory limitations,the previous quantities cannot be computed explicitly for large problems, therefore,a criterion based on the current iterate is commonly employed. More precisely, ifZj = [Zj−1, zj ], then using ‖ZjZ

Tj − Zj−1Z

Tj−1‖F = ‖zj‖2, the iteration is stopped

if ‖zj‖ (absolute) or ‖zj‖/‖Zj−1‖F (relative) is sufficiently small, which implies thatthe recurrence stagnates. Note that neither of these last two stopping criteria ensuresthat convergence has taken place.

The ADI parameters represent the key ingredient for the real success of themethod. Optimal parameters are known analytically when A has real spectrum;see, e.g., the discussion in [13]. The situation is significantly more complicated for acomplex spectrum. In both cases, however, quite accurate information on the spectralregion is required. The computation of quasi-optimal parameters has raised consid-erable attention; see, e.g., [39], [13], [34] and references therein. In particular, aneconomical procedure was proposed by Penzl in [28]. In our experiments, we used thefunction2 lp para in the lyapack package ([30]) to compute these suboptimal ADIparameters.

Another important implementation aspect is related to the solves with A + pjI,j = 1, . . . , `. Each matrix A + pjI can be factorized once for all and the Cholesky

2The Arnoldi procedures in lp para were modified to include reorthogonalization, so as to limitthe occurrence of unstable shifts.

Page 11: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 11

factor stored and used cyclically when pj is employed; cf., e.g., [30]. For the sake ofcomparison, we use this strategy in Example 5.1 where factorizing the matrix requireslimited memory resources. Since this procedure is clearly too memory consuming inlarge-scale applications, we opted for solving systems with A + pjI repeatedly atoccurrence.

An additional difficulty is given by the use of complex arithmetic when A is non-symmetric and the ADI parameters are complex conjugate. In the original CF-ADIiteration, conjugate parameters can be coupled [25]. In the cyclic version, couplingdoes not seem to be doable, and the whole computation is carried out in complexarithmetic to get a complex Zj . Note that it is possible to recover a real factor Zj

at the cost of additional computation [30]; alternatively, Gugercin et al. suggest onlydealing with real parameters, even when the matrix has complex spectrum [18]. Tofully explore the properties of the method, in our experiments with the cyclic CF-ADImethod we decided to use complex parameters and thus allowed complex arithmeticthroughout the iteration. Further considerations regarding the typical behavior ofthis method are reported in the next section.

5. Numerical experiments. In this section we report on our numerical ex-perience with the new method described in section 3. Comparisons with the cyclicCF-ADI method are also considered. All reported experiments were obtained usingMatlab 7.1 (R3) ([27]), on a PC with a 2Ghz processor, and 2GBytes of RAM.

Both CPU time (in seconds) and number of iterations (as defined next) are usedto measure the cost of the approaches. For the CF-ADI method, the number ofiterations (counted as s times the number of cycles times the number of parameters)coincides with the number of shifted systems to be solved, and also with the numberof long vectors that need to be stored. For the K-PIK method, each (block) iterationinvolves s system solutions with A, and 2s new vectors entering the basis. Therefore,for the Krylov approach the final memory requirements for the approximation spaceare given by at most 2s times the number of iterations.

In all tests with the new method, we also report the rank of the final approximatesolution as discussed in section 3. The cost of this computation is included in theoverall cost of the method. For the CF-ADI iteration, we did not attempt to computethe factor rank to limit the computational costs. Nonetheless, we remark that aprocedure has been proposed in [18] to more cheaply determine a lower-rank factorthat is updated at each iteration. For the sake of generality, we do not reproducethis implementation, as it requires further conditions and additional computation periteration, although it may provide significant memory savings when the ADI solutionspace is large. In general, this procedure does not improve the convergence rate of themethod; more details can be found in [18]; see also [34] for a recent thorough analysisof the method.

Unless explicitly stated, the CF-ADI complex parameters were determined withsearch parameters (`, k,m) = (10, 40, 20), where ` is the number of ADI parameters3,k is the number of Arnoldi iterations, and m is the number of inverted-Arnoldi itera-tions in the procedure lp para; these are the parameters suggested in [28] for similarproblems. We stress that the cost of generating the ADI parameters was not takeninto account in the performance evaluation. Indeed, unless some tuning on (`, k,m)is needed (cf. for instance Example 5.2), this pre-processing only employs a low per-centage of the whole cost of the algorithm. Our numerical experience showed that the

3This number may need to be adjusted to accommodate complex conjugates.

Page 12: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

12 V. Simoncini

computation of the parameters was also sensitive to the choice of the starting vec-tor that was used for generating spectral information. ADI parameters determinedwith the same values (`, k,m) but with a different starting vector for the two Krylovsubspaces, were significantly different, affecting the performance of the ADI iteration.

Stopping criterion. As already mentioned, a comparison between the new methodand the cyclic CF-ADI method faces the problem of the different stopping criteria usedin the two cases, namely (3.6) and the one discussed in section 4. Indeed, the factthat the two stopping criteria are satisfied in the two cases, does not ensure that thequality of the approximate solution is comparable. For this reason we decided to takethe “user point of view”, that is, we measure what the user can really handle duringthe run of the algorithm. To this end, we fix a certain tolerance, and evaluate theperformance of the two methods by monitoring the associated “standard” stoppingcriterion. More precisely,

K-PIK:‖AXm + XmAT + BBT ‖

2‖A‖F ‖Ym‖F + ‖B‖2F

≤ tol (5.1)

Cyclic CF-ADI:‖zj‖

‖Zj−1‖F

≤√tol. (5.2)

We recall that (5.2) corresponds to ‖Xj − Xj−1‖F /‖Xj−1‖F ≤ tol. In all examplesconsidered, we used tol = 10−10. In some cases, the final residual norm (4.1) ofthe CF-ADI iteration was also computed, but this cost was not included in the totalcomputational requirements of the method, because in most cases overwhelming. Wefound the CF-ADI residual norm often to be above the requested residual tolerance.

5.1. The case s = 1. In this section we focus on the case where B is a vector,that is s = 1, to emphasize some implementation issues.

Example 5.1. This example is taken from [28, Example 6.3] and describes amodel of heat flow with convection in the given domain. The associated parabolicequation is given by

x′ = xxx + xyy − 10xxx − 1000yxy + b(x, y)u(t)

on the unit square, with Dirichlet boundary conditions. The 4900 × 4900 matrix Aresulting from the centered finite difference discretization of the differential terms isnonsymmetric with complex eigenvalues, and has 24220 nonzero entries. Vector b wastaken to be the vector of all ones. This choice allows one to completely replicate thereported experiments. The results in the following table show the performance of thenew method, compared to the cyclic CF-ADI iteration. For the sake of completeness,in this example we also report the performance of the CF-ADI method when all matrixfactorizations are carried out at the beginning of the computation and thus stored, sothat at each ADI iteration only triangular solves are performed (CF-ADI m column).

K-PIK CF-ADI CF-ADI mCPU time (s) 1.1 14 9# iterations 19 80 80dim. Approx. Space 38 80 80Solution rank 35 80 80

The reported values show the competitiveness of the new method on this mediumsize problem, with low CPU time and memory requirements. A posteriori, we com-

Page 13: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 13

puted the residual norm of the CF-ADI solution and found that it was above 10−5.Following (3.5), we also computed the relative norm of the variation in the solutionof the new method, which for this example was 5.7 · 10−8. All these numbers showthat the quality of the approximate solutions significantly differs, depending on theemployed measure. In this particular case, we observed in a separate experiment thatletting the K-PIK solution difference (3.5) go below the threshold 10−10 only tookless than one additional second of CPU time.

Example 5.2. This example is a three-dimensional variant of the previous prob-lem, and the differential equation is given by

x′ = xxx + xyy + xzz − 10xxx − 1000yxy − 10xz + b(x, y)u(t)

on the unit cube, with Dirichlet boundary conditions. The nonsymmetric matrixresulting from centered finite difference discretization with 18 nodes in each directionis of size 5832 × 5832 and has 38880 nonzero entries. Once again, b was taken to bethe vector of all ones. The results are shown in the first part of the following table.

n K-PIK CF-ADI CF-ADI(10,40,20) (16,60,40)

5832 CPU time (s) 15 149 112# iterations 56 110 85dim. Approx. Space 112 110 85Solution rank 47 110 85

K-PIK CF-ADI CF-ADI(16,60,40) (30,60,60)

10648 CPU time (s) 43 336 236# iterations 45 85 65dim. Approx. Space 90 85 65Solution rank 45 85 65

The reported numbers are consistent with those of the previous experiments. Itis worth noticing that at completion, for the new method the difference ‖Xm −Xm−1‖/‖Xm−1‖ was equal to about 2 · 10−8 in both problems (cf. (3.5)).

The behavior for different CF-ADI parameters is also reported. The computedparameters for the two choices are also depicted in Figure 5.1, where the symbol ’×’denotes the eigenvalues of A, ’o’ the ADI parameters for (`, k,m) = (10, 40, 20) and ’*’the case (`, k,m) = (16, 60, 40). The last choice is clearly more successful in capturingthe relevant spectral boundary, suggesting better performance.

In the table we also report the results for a run of the same problem with a slightlyfiner discretization, leading to a 10648 × 10648 nonsymmetric matrix A with 71632nonzero entries. The results are consistent with those for the coarser mesh, confirmingthe difficulty of the CF-ADI pre-processing in finding effective ADI parameters evenfor different discretizations of the same problem. For ` = 30, the CF-ADI stoppingcriterion was monitored at every iteration.

Example 5.3. With this example we start our tests on symmetric problems.Here we consider the parabolic equation

x′ = xxx + xyy + xzz + b(x, y)u(t)

Page 14: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

14 V. Simoncini

−7 −6 −5 −4 −3 −2 −1

x 104

−3

−2

−1

0

1

2

3x 10

5

real part of eigenvalues

ima

gin

ary

pa

rt o

f e

ige

nva

lue

s

Fig. 5.1. Example 5.2. Spectrum of matrix A (’×’); ADI parameters for (`, k, m) = (10, 40, 20)(symbol ’o’); and for (`, k, m) = (16, 60, 40) (symbol ’*’).

with homogeneous boundary conditions, and discretize the 3D Laplace operator withcentered finite differences in the unit cube. The resulting symmetric matrix for thegiven discretization mesh has size n = 27000. The vector b is the vector of all ones.Note that both methods can take great advantage of the symmetry of A. In particular,the CF-ADI is all performed in real arithmetic, with ` = 10 real parameters. Theresults are reported below.

K-PIK CF-ADICPU time (s) 132 202# iterations 8 20dim. Approx. Space 16 20Solution rank 14 20

The numbers confirm the competitiveness of the new method, which does not requireany pre-processing to gain spectral information.

Example 5.4. We next consider a benchmark problem stemming from a semi-discretized Heat Transfer Problem for Optimal Cooling of Steel Profiles [7]. Theassociated linear time-invariant system is given by

Mx′(t) = Nx(t) + B0u(t), with t > 0,

x(0) = x0,

y(t) = C0x(t).

with M symmetric positive definite, N symmetric negative definite and B0 of smallcolumn rank. We considered two tests available in the data set in [8], correspondingto different mesh resolutions, with dimensions n = 1357 (file rail 1357 c60) and

Page 15: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 15

n = 5177 (rail 5177 c60). In our experiments, we used the first column of B0 foreach data set.

The numerical treatment of this problem requires the discussion of some addi-tional details. For M nonsingular, the problem can be recast as in (1.1). UsingM = LLT , we have A = L−1NL−T , B = L−1B0 and C = C0L

−1. Clearly, none ofthese matrices is computed explicitly, rather, the following operations are performed:

Av = L−1(N(L−T v)), A−1v = LT (N−1(Lv)).

Thus, matrix-vector products with A require the sequential solution of two triangularsystems. To limit the computational cost, the stopping criterion (5.1) was used with‖A‖F ≈ ‖N‖F . In the case of the ADI iteration, the system solution is given as(A + pI)−1v = LT (N + pM)−1Lv, therefore requiring the solution of ` = 10 systemswith coefficient matrix N + pM at each ADI cycle.

The results for both problems are summarized in the following table.

n K-PIK CF-ADI CF-ADI(10,40,20) (20,40,40)

1357 CPU time (s) 5.67 4.36 6.96# iterations 59 90 140dim. Approx. Space 118 90 140Solution rank 12 90 140Final Residual F-norm < 10−10(†) 2.8 · 10−5 2.8 · 10−5

5177 CPU time (s) 50 62 32# iterations 86 220 120dim. Approx. Space 172 220 120Solution rank 13 220 120Final Residual F-norm < 10−10(†) 9.9 · 10−5 9.9 · 10−5

†: Before rank truncation

We observe that the first problem is quite small, and the K-PIK method is pe-nalized, since its auxiliary costs (reorthogonalizations and the solution of a smallLyapunov equation) may be significant.

In this set of experiments, ‖B‖ ≈ 10−7 and ‖A‖F ≈ 10−3, so that ‖X‖ ≈ 10−3.These figures make the two stopping criteria behave very differently and seem toshow that the CF-ADI method performs better than Krylov, when the parametersare properly chosen (Note, however, that the performance of the parameters is di-mension dependent). We investigate further this example by reporting the conver-gence curves in Figure 5.2. In there, we show the behavior of the relative difference‖Xm −Xm−1‖/‖Xm−1‖ for both methods, and also the residual norm for the Krylovapproach. Clearly, had the K-PIK convergence been measured in terms of the solu-tion difference, the iteration would have been stopped much earlier, with significantsavings both in memory and CPU time requirements. In the table we also reportthe final relative residual norm (cf. (5.1)) of the CF-ADI iteration. According tothis measure, the accuracy of the CF-ADI solution is not satisfactory, for a tolerancetol=10−10.

5.2. The case s ≥ 1. The algorithm for the general case s ≥ 1 is reported inthe appendix. In Table 5.1 we display the performance of the K-PIK and CF-ADImethods for s = 1, 2, 4, 7 on the largest matrices of two of the test problems. To makethe comparison easier to interpret, and in light of the discussion related to Example

Page 16: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

16 V. Simoncini

0 1 2 3 4 5 6 7

10−10

10−8

10−6

10−4

10−2

100

102

CPU time

Co

nve

rge

nce

his

tory

residual, KPIK

soln diff., KPIK

ADI(10,40,20)

ADI(20,40,40)

10−5

0 10 20 30 40 50 60

10−10

10−8

10−6

10−4

10−2

100

102

CPU time

Co

nve

rge

nce

his

tory

residual, KPIK

soln diff., KPIK

ADI(10,40,20)

ADI(20,40,40)

10−5

Fig. 5.2. Example 5.4. Convergence history of the two methods. Left: n = 1357. Right:n = 5177.

s K-PIK CF-ADICPUtime #its dim.space CPUtime #its dim.space

Example 5.4 1 5.95 12 24 31.66 6 120tol=10−5 2 8.08 10 40 30.83 5 200

4 11.11 7 56 40.20 5 4007 18.12 6 84 54.22 5 700

Example 5.2 1 38.95 34 68 588.68 5 150tol=10−8 2 50.50 33 132 633.41 5 300

4 90.69 33 264 722.92 5 6007 204.91 32 448 857.57 5 1050

Table 5.1

Experiments when B has multiple columns (s ≥ 1).

5.4, the relative change in the iterative solution is used as stopping criterion for bothmethods. The original data in Example 5.4 included up to 7 columns of B, andthese are all considered in the experiment. For Example 5.2 we picked up the firsts ≤ 7 columns of the matrix B ∈ R

n×10 with random normal entries (random numbergenerator initialized as randn(’state’,0)).

The K-PIK method clearly outperforms the CF-ADI procedure, by keeping bothCPU time and memory allocation limited. We recall that for the CF-ADI method,memory requirements may be reduced at some additional cost per iteration [18]. Ass increases, the dimension of the “small” Lyapunov equation to be solved in K-PIKmay become significant. To save computational time, the numerical solution of thisequation may be thus performed periodically. Note that the number of iterationsdecreases as s increases; this is typical of “block” Krylov subspace methods.

5.3. The solution of the linear system with A. The K-PIK method re-quires solving s systems with A at each iteration, and this was performed by a directmethod in the previous examples. For the sake of generality, in our experiments wehave used natural orderings for the entries of A, however it is well known that reorder-

Page 17: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 17

ing techniques can dramatically improve the performance and memory requirementsof a sparse direct solver. In practice, an appropriate reordering technique may allowthe LU factorization to generate much sparser factors, which in turn implies fastersolution with the triangular factors at each iteration. A typical such case is depictedin the following table, where K-PIK is used for the largest matrices in Examples 5.2and 5.3, with s = 1 and with or without reordering of the matrix entries before thecomputation starts. The stopping tolerance for the Lyapunov solver is fixed at 10−10

and the stopping criterion is (5.1). The functions symamd and symrcm are the Mat-lab implementation of “symmetric minimum degree” and of “reverse Cuthill-McKee”permutations, respectively. The former is particularly well suited for symmetric prob-lems.

Example 5.3 Natural ordering 132 secs (8 iterations)(sym. pb) symamd ordering 31 secs (8 iterations)Example 5.2 Natural ordering 43 secs (45 iterations)(nonsym. pb) symrcm ordering 22 secs (45 iterations)

If direct system solvers become prohibitively expensive, the K-PIK method canbe implemented so that preconditioned iterative methods are used to solve the lin-ear systems with A. In Table 5.2 we report the CPU time of K-PIK and s = 1,when the systems with A are solved by the bcgstab method with incomplete LU-typepreconditioning (Matlab functions luinc and cholinc). The threshold used in theincomplete factorization allowed us to decrease the memory requirements for the fac-tor, by at least one order of magnitude. The stopping tolerance for the Lyapunovsolver is again 10−10, whereas various tolerances are tested for the iterative solutionwith A. All these results significantly improve the performance of the correspondingdirect method.

inner CPU # its avg. #tolerance time inner its comments

Example 5.3 10−8 26.19 8 17 fill-in threshold 10−2

(sym. pb) 10−5 19.39 9 10 fill-in threshold 10−2

10−3 14.64 10 6 fill-in threshold 10−2

10−3 12.19 10 6 symamd permut., thr. 10−2

Example 5.2 10−8 22.92 45 4.5 fill-in threshold 10−2

(nonsym. pb) 10−5 17.75 45 3 fill-in threshold 10−2

10−3 15.06 45 2 fill-in threshold 10−2

10−3 10.81 45 2 symrcm permut., thr. 10−2

10−3 17.87 45 6 symrcm permut., thr. 10−1

Table 5.2

K-PIK on two test problems, with s = 1 and iterative solution of systems with A. Cholesky orLU preconditioners with fill-in threshold (Matlab functions cholinc and luinc).

The first lines in Table 5.2 display results with a symmetric matrix. Due to theiterative solution of the system with A, so called “inexact” solves, this symmetryis lost. Nonetheless, the computation of the Lyapunov solver is still carried out bymeans of a short-term recurrence as if the operator were symmetric. This fact shouldexplain the slight change in the total number of iterations as the inner system issolved less accurately, although the significant improvement in computational time

Page 18: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

18 V. Simoncini

more than compensates the higher number of iterations. We refer to [35] for pointersto similar so called inner-outer methods in the linear system setting. The secondset of experiments is with a nonsymmetric solver, for which full orthogonalization isperformed. Note that the method does not seem to be particularly sensitive to aquite inaccurate solution of systems with A, showing that a rough approximation ofthe action of A−1 is sufficient to maintain fast convergence.

6. Conclusions. In this paper we have presented a new projection method fordetermining low-rank approximate solutions to large-scale Lyapunov algebraic equa-tions with dissipative coefficient matrix. Our experiments show that the new algo-rithm is extremely competitive with the state-of-the-art solver cyclic CF-ADI, and itdoes not require parameter estimation. The new approach provides a natural, cheaplycomputable a-posteriori stopping criterion. Note that such a measure is missing in thecyclic CF-ADI iteration. Moreover, the new method provides an economical procedureto further decrease the rank of the final approximate factor. The approach effectivelyreduces the original problem to one of much smaller dimension, in a way that in-formation on the problem augments as the dimension of the approximation spaceincreases. This view point is completely different from CF-ADI type of iteration,where spectral information on the problem is gathered before-hand, and then usedrepeatedly. This difference is reminiscent of the differences between semi-iterative andfully non-stationary methods in the solution of systems of linear equations [14]. Wehave tested the new approach on medium size problems, while even greater benefitscan be obtained when solving larger problems. In this case, the advantages observedin section 5.3 in using preconditioned iterative system solvers may be even more sig-nificant. We also mention that if the dimension of the approximation space becomeslarge, then the costly step of solving the projected Lyapunov equation can be doneperiodically, and not at every iteration. This small variant may significantly reducethe overall cost of the method.

The general framework described in section 2 suggests that further accelerationtechniques could be derived by constructing new approximation spaces that satisfythe given constraints. Moreover, several computational issues deserve deeper investi-gation, such as the possibility of implementing truncation, restarting or some othermemory-conserving strategy. Finally, a complete round-off error analysis of the pro-cedure would help fully understand the behavior of the method in finite precisionarithmetic.

Acknowledgments. We thank Peter Benner and Jens Saak for insightful con-versations and for pointing to [8]. We thank the referees for their comments. Allreported results were obtained using the computer facilities of the SINCEM Labora-tory at CIRSA, Ravenna.

Appendix. In this appendix we describe the Matlab implementation of the K-PIK method when B is a tall n× s matrix. A direct method is used to solve systemswith A.

Page 19: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 19

function X=K_PIK(A,rhs,m,tol,k_max,period)

t=cputime;

[n,sh]=size(rhs);

nrmb=norm(rhs,’fro’)^2; nrma=norm(A,’fro’);

Y=[]; odds=[];

[LA,UA]=lu(A); % For sym matrices use UA = chol(-A); LA = -UA’;

s=2*sh;

rhs1=UA\(LA\rhs);

[ibeta,U(:,1:s)]=gram_sh([rhs, rhs1]);

VV = U;

H=sparse((m+1)*s,m*s);

L=sparse((m+1)*s,m*s);

beta=inv(ibeta); beta = beta(1:sh,1:sh); beta2=beta*beta’;

for j=1:m,

jms=(j-1)*s+1;j1s=(j+1)*s;js=j*s;js1=js+1; jsh=(j-1)*s+sh;

Up(:,1:sh) = A*U(:,jms:jsh);

Up(:,sh+1:s) = UA\(LA\U(:,jsh+1:js));

%new bases block (modified gram)

for l=1:2

k_min=max(1,j-k_max);

for kk=k_min:j

k1=(kk-1)*s+1; k2=kk*s;

coef= U(1:n,k1:k2)’*Up;

H(k1:k2,jms:js) = H(k1:k2,jms:js)+ coef; % U(1:n,k1:k2)’*Up;

Up = Up - U(:,k1:k2)*coef; %H(k1:k2,jms:js);

end

end

if (j<=m)

[hinv,U(1:n,js1:j1s)]=gram_sh(Up);

H(js1:j1s,jms:js)=inv(hinv);

end

I=speye(js+s);

if (j==1),

L(1:j*s+sh,(j-1)*sh+1:j*sh) =...

[ H(1:s+sh,1:sh)/ibeta(1:sh,1:sh), speye(s+sh,sh)/ibeta(1:sh,1:sh)]...

*ibeta(1:s,sh+1:s);

else

L(1:j*s+s,(j-1)*sh+1:j*sh) = L(1:j*s+s,(j-1)*sh+1:j*sh)...

+ H(1:j*s+s,jms:jms-1+sh)*rho;

end

odds = [odds, jms:(jms-1+sh)]; % store the odd block columns

evens = 1:js; evens(odds)=[];

T(1:js+s,odds)=H(1:js+s,odds); %odd columns

T(1:js+sh,evens)=L(1:js+sh,1:j*sh); %even columns

L(1:j*s+s,j*sh+1:(j+1)*sh) = ...

( I(1:j*s+s,(js-sh+1):js)- T(1:js+s,1:js)*H(1:js,js-sh+1:js))...

*hinv(sh+1:s,sh+1:s);

rho = hinv(1:sh,1:sh)\hinv(1:sh,sh+1:s);

I = speye(j*s);

VV=U(:,1:j*s);

k=j;

Page 20: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

20 V. Simoncini

if (rem(j,period)==0)

Y0=Y;

Y = lyap2solve(full(T(1:js,1:js)),speye(k*s,sh)*beta2*speye(k*s,sh)’);

cc = [H(js1:j1s,js-s+1:js-sh), L(js1:j1s,(j-1)*sh+1:j*sh)];

nrmx = norm(Y,’fro’);

er2(k) = norm(cc*Y(js-s+1:js,:),’fro’)/(nrmb+2*nrma*nrmx);

er2_dif(k) =...

norm([Y(1:js-s,1:js-s)-Y0, Y(1:js-s,js-s+1:js);Y(js-sh:js,:)],’fro’)/nrmx;

tot_time(j)= cputime-t;

disp([k,er2(k), er2_dif(k)])

if (er2(k)<tol ),

break

end

end

end

[uY,sY]=eig(Y); [sY,id]=sort(diag(sY));

sY=flipud(sY); uY=uY(:,id(end:-1:1));

trunc_tol=1e-12;

is=sum(abs(sY)>trunc_tol);

Y0 = uY(:,1:is)*diag(sqrt(sY(1:is)));

X = U(:,1:js)*Y0;

er2(j+1) = er2(j);

tot_time(j+1)= cputime-t;

total_time=cputime-t;

return

REFERENCES

[1] A. C. Antoulas. Approximation of large-scale Dynamical Systems. Advances in Design andControl. SIAM, Philadelphia, 2005.

[2] R. H. Bartels and G. W. Stewart. Algorithm 432: Solution of the Matrix Equation AX+XB=C.Comm. of the ACM, 15(9):820–826, 1972.

[3] U. Baur and P. Benner. Factorized solution of Lyapunov equations based on hierarchical matrixarithmetic. Computing, 78 (3): 211–234, 2006.

[4] P. Benner. Control Theory. Technical report, http://www.tu-chemnitz.de/ebenner, 2006. toappear in ’Handbook of Linear Algebra’.

[5] P. Benner, D. Kressner, and V. Sima. The SLICOT toolboxes - a survey. Technical report,December 2006.

[6] P. Benner, E. S. Quintana-Ortı, and G. Quintana-Ortı. Solving stable Sylvester equations viarational iterative schemes. Technical Report 04-08, Technische Universitat, Chemnitz, D,2004. To appear in J. Scientific Computing.

[7] P. Benner and J. Saak. Efficient numerical solution of the LQR-problem for the heat equation.PAMM, Proc. Appl. Math. Mech., 4(1):648–649, 2004.

[8] Oberwolfach model reduction benchmark collection, 2003.http://www.imtek.de/simulation/benchmark.

[9] M. J. Corless and A. E. Frazho. Linear systems and control - An operator perspective. Pureand Applied Mathematics. Marcel Dekker, New York - Basel, 2003.

[10] V. Druskin and L. Knizhnerman. Two polynomial methods of calculating functions of sym-metric matrices. U.S.S.R. Comput. Math. Math. Phys., 29:112–121, 1989.

[11] V. Druskin and L. Knizhnerman. Extended Krylov subspaces: approximation of the matrixsquare root and related functions. SIAM J. Matrix Anal. Appl., 19(3):755–771, 1998.

[12] N. S. Ellner and E. L. Wachspress. New ADI model problem applications. In Proceedings of1986 ACM Fall joint computer conference, Dallas, Texas, United States, pages 528–534.IEEE Computer Society Press, Los Alamitos, CA, 1986.

[13] N. S. Ellner and E. L. Wachspress. Alternating Direction Implicit iteration for systems withcomplex spectra. SIAM J. Numer. Anal., 23(3):859–870, 1991.

Page 21: A NEW ITERATIVE METHOD FOR SOLVING LARGE-SCALE … › ~simoncin › lyap_accel3_revrev.pdf · tion. This will allow us to simplify the presentation of the new approach in section

Iterative solution of Lyapunov equation 21

[14] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press,Baltimore, 3rd edition, 1996.

[15] G.H. Golub and R. Underwood. The block Lanczos method for computing eigenvalues. In J. R.Rice, editor, Mathematical Software III, pages 364–377. Academic Press, New York, 1977.

[16] L. Grasedyck. Existence of a low rank of H-matrix approximant to the solution of a Sylvesterequation. Numerical Linear Algebra with Appl., 11(4):371–389, 2004.

[17] T. Gudmundsson and A. J. Laub. Approximate solution of large sparse Lyapunov equations.IEEE Trans. Automatic Control, 39(5):1110–1114, 1994.

[18] S. Gugercin, D. C. Sorensen, and A. C. Antoulas. A modified low-rank Smith method forlarge-scale Lyapunov equations. Numerical Algorithms, 32:27–55, 2003.

[19] M. H. Gutknecht. Block Krylov space methods for linear systems with multiple right-hand sides:an introduction. In Abul Hasan Siddiqi, Iain S. Duff, and Ole Christensen, editors, ModernMathematical Models, Methods and Algorithms for Real World Systems, New Delhi, 2006.Anamaya Publishers. to appear.

[20] S. J. Hammarling. Numerical solution of the stable, non-negative definite Lyapunov equation.IMA J. Numer. Anal., 2:303–323, 1982.

[21] M. Hochbruck and C. Lubich. On Krylov subspace approximations to the matrix exponentialoperator. SIAM J. Numer. Anal., 34(5):1911–1925, 1997.

[22] M. Hochbruck and G. Starke. Preconditioned Krylov subspace methods for Lyapunov matrixequations. SIAM Matrix Anal. and Appl., 16(1):156–171, 1995.

[23] I. M. Jaimoukha and E. M. Kasenally. Krylov subspace methods for solving large Lyapunovequations. SIAM J. Numer. Anal., 31(1):227–251, Feb. 1994.

[24] K. Jbilou and A.J. Riquet. Projection methods for large Lyapunov matrix equations. LinearAlgebra and its Applications, 415(2-3):344–358, 2006.

[25] J.-R. Li and J. White. Low-Rank solutions of Lyapunov equations. SIAM J. Matrix Anal.Appl., 24(1):260–280, 2002.

[26] A. Lu and E. L. Wachspress. Solution of Lyapunov equations by Alternating Direction Implicititeration. Computers Math. Applic., 21(9):43–58, 1991.

[27] The MathWorks, Inc. MATLAB 7, September 2004.[28] T. Penzl. A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci.

Comput., 21(4):1401–1418, 2000.[29] T. Penzl. Eigenvalue decay bounds for solutions of Lyapunov equations: the symmetric case.

Systems and Control Letters, 40(2):139–144, 2000.[30] T. Penzl. Lyapack users guide. Technical Report SFB393/00-33, TU Chemnitz, 09107 Chem-

nitz, D, 2000. Available from http://www.tu-chemnitz.de/sfb393/sfb00pr.html.[31] M. Robbe and M. Sadkane. A convergence analysis of GMRES and FOM for Sylvester equations.

Numerical Algorithms, 30:71–89, 2002.[32] Y. Saad. Numerical solution of large Lyapunov equations. In M. A. Kaashoek, J. H. van

Schuppen, and A. C. Ran, editors, Signal Processing, Scattering, Operator Theory, andNumerical Methods. Proceedings of the international symposium MTNS-89, vol III, pages503–511, Boston, 1990. Birkhauser.

[33] , Iterative methods for sparse linear systems, SIAM, Society for Industrial and AppliedMathematics, 2nd ed., 2003.

[34] J. Sabino. Solution of Large-Scale Lyapunov Equations via the Block Modified Smith Method.PhD Thesis, Rice University, June 2006.

[35] V. Simoncini and D. B. Szyld, Recent computational developments in Krylov subspace meth-ods for linear systems, J. Numerical Linear Algebra with Appl, (2006). To appear.

[36] The Control and Systems Library SLICOT. http://www.slicot.de/, 2006.[37] D. Sorensen and Y. Zhou. Bounds on eigenvalue decay rates and sensitivity of solutions to

Lyapunov equations. Technical Report 02-07, Rice University, Houston, Texas, 2002.[38] D. C. Sorensen and Y. Zhou. Direct methods for matrix Sylvester and Lyapunov equations. J.

Appl. Math., 2003:277–303, 2003.[39] G. Starke. Fields of values and the ADI method for non-normal matrices. Lin. Alg. Appl.,

180:199–218, 1993.[40] H. Tal-Ezer. Spectral methods in time for parabolic problems. SIAM J. Numer. Anal., 26:1–11,

1989.[41] A. van den Boom, A. Brown, F. Dumortier, A. Geurts, S. Hammarling, R. Kool, M. Vanbegin,

P. Van Dooren, and S. Van Huffel. SLICOT, a subroutine library for control and systemtheory. In Preprints IFAC Symp. on CADCS, pages 89–94, Swansea, UK, July 1991.

[42] E. L. Wachspress. Iterative Solution of the Lyapunov Matrix Equations. Appl. Math. Lett.,1(1):87–90, 1988.