an investigation of feasible descent algorithms for...

19
Top (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation of feasible descent algorithms for estimating the condition number of a matrix Carmo P. Brás · William W. Hager · Joaquim J. Júdice Received: 23 October 2009 / Accepted: 12 October 2010 / Published online: 4 November 2010 © Sociedad de Estadística e Investigación Operativa 2010 Abstract Techniques for estimating the condition number of a nonsingular matrix are developed. It is shown that Hager’s 1-norm condition number estimator is equiv- alent to the conditional gradient algorithm applied to the problem of maximizing the 1-norm of a matrix-vector product over the unit sphere in the 1-norm. By changing the constraint in this optimization problem from the unit sphere to the unit simplex, a new formulation is obtained which is the basis for both conditional gradient and projected gradient algorithms. In the test problems, the spectral projected gradient algorithm yields condition number estimates at least as good as those obtained by the previous approach. Moreover, in some cases, the spectral gradient projection al- gorithm, with a careful choice of the parameters, yields improved condition number estimates. Keywords Condition number · Numerical linear algebra · Nonlinear programming · Gradient algorithms Mathematics Subject Classification (2000) 15A12 · 49M37 · 90C30 Research of W.W. Hager is partly supported by National Science Foundation Grant 0620286. C.P. Brás ( ) CMA, Departamento de Matemática, Faculdade de Ciências e Tecnologia, FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal e-mail: [email protected] W.W. Hager Department of Mathematics, University of Florida, Gainesville, FL 32611-8105, USA e-mail: [email protected]fl.edu J.J. Júdice Departamento de Matemática, Universidade de Coimbra and Instituto de Telecomunicações, 3001-454 Coimbra, Portugal e-mail: [email protected]

Upload: others

Post on 16-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

Top (2012) 20:791–809DOI 10.1007/s11750-010-0161-9

O R I G I NA L PA P E R

An investigation of feasible descent algorithmsfor estimating the condition number of a matrix

Carmo P. Brás · William W. Hager ·Joaquim J. Júdice

Received: 23 October 2009 / Accepted: 12 October 2010 / Published online: 4 November 2010© Sociedad de Estadística e Investigación Operativa 2010

Abstract Techniques for estimating the condition number of a nonsingular matrixare developed. It is shown that Hager’s 1-norm condition number estimator is equiv-alent to the conditional gradient algorithm applied to the problem of maximizing the1-norm of a matrix-vector product over the unit sphere in the 1-norm. By changingthe constraint in this optimization problem from the unit sphere to the unit simplex,a new formulation is obtained which is the basis for both conditional gradient andprojected gradient algorithms. In the test problems, the spectral projected gradientalgorithm yields condition number estimates at least as good as those obtained bythe previous approach. Moreover, in some cases, the spectral gradient projection al-gorithm, with a careful choice of the parameters, yields improved condition numberestimates.

Keywords Condition number · Numerical linear algebra · Nonlinear programming ·Gradient algorithms

Mathematics Subject Classification (2000) 15A12 · 49M37 · 90C30

Research of W.W. Hager is partly supported by National Science Foundation Grant 0620286.

C.P. Brás (�)CMA, Departamento de Matemática, Faculdade de Ciências e Tecnologia, FCT, Universidade Novade Lisboa, 2829-516 Caparica, Portugale-mail: [email protected]

W.W. HagerDepartment of Mathematics, University of Florida, Gainesville, FL 32611-8105, USAe-mail: [email protected]

J.J. JúdiceDepartamento de Matemática, Universidade de Coimbra and Instituto de Telecomunicações,3001-454 Coimbra, Portugale-mail: [email protected]

Page 2: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

792 C.P. Brás et al.

1 Introduction

The role of condition number is well established in a variety of numerical analysisproblems (see, for example Demmel 1997; Golub and Loan 1996; Hager 1998). Infact, the condition number is useful for assessing the complexity of a linear systemand the error in the solution since it provides information about the sensitivity of thesolution to perturbations in the data. For large matrices, the calculation of the exactvalue is impracticable as it involves a large number of operations. Therefore, tech-niques have been developed to approximate ‖A−1‖1 without computing the inverseof the matrix A. Hager’s 1-norm condition number estimator (Hager 1984) is im-plemented in the MATLAB builtin function CONDEST. This estimate is a gradientmethod for computing a stationary point of the problem

Maximize ‖Ax‖1

subject to ‖x‖1 = 1.(1)

The algorithm starts from x = ( 1n, 1

n, . . . , 1

n), evaluates a subgradient, and then moves

to a vertex ±ej , j = 1,2, . . . , n, which results in the largest increase in the objectivefunction based on the subgradient. The iteration is repeated until a stationary point isreached, often within two or three iterations. We show that this algorithm is essen-tially the conditional gradient (CG) algorithm.

Since the solution to (1) is achieved at a vertex of the feasible set, an equivalentformulation is

Maximize ‖Ax‖1

subject to eT x = 1,

x ≥ 0

(2)

where e is the vector with every element equal to 1. We propose conditional gradientand gradient projected algorithms for computing a stationary point of (2).

The paper is organized as follows. In Sect. 2 Hager’s algorithm is described andits equivalence with a CG algorithm is shown. The Simplex formulation of the con-dition number problem and a CG algorithm for solving this nonlinear program areintroduced in Sect. 3. The SPG algorithm is discussed in Sect. 4. Computational ex-perience is reported in Sect. 5 and some conclusions are drawn in the last section.

2 Hager’s algorithm for condition number estimation

Definition 1 The 1-norm of a square matrix A ∈ Rn×n is defined by

‖A‖1 = maxx∈Rn\{0}

‖Ax‖1

‖x‖1(3)

where ‖x‖1 = ∑nj=1 |xj |.

Page 3: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 793

In practice the computation of the 1-norm of a matrix A is not difficult since

‖A‖1 = max1≤j≤n

n∑

i=1

|aij |.

In this paper we are interested in the computation of the 1-norm of the inverse ofa nonsingular matrix A. This norm is important for computing the condition numberof a matrix A, a concept that is quite relevant in numerical linear algebra (Gill et al.1991; Higham 1996). We recall the following definition.

Definition 2 The condition number for the 1-norm of a nonsingular matrix A is givenby

cond1(A) = ‖A‖1∥∥A−1

∥∥

1

where A−1 is the inverse of matrix A.

In practice, computing the inverse of a matrix involves too much effort, particu-larly when the order is large. Instead, the condition number of a nonsingular matrixis estimated by

cond1(A) = ‖A‖1β

where β is an estimate of ‖A−1‖1. To find a β that is close to the true value of‖A−1‖1, the following optimization problem is considered in (Hager 1984):

Maximize F(x) = ‖Ax‖1 =n∑

i=1

∣∣∣∣∣

n∑

j=1

aij xj

∣∣∣∣∣

subject to x ∈ K

(4)

where

K = {x ∈ R

n : ‖x‖1 ≤ 1}.

The following properties hold (Higham 1988).

(i) F is differentiable at all x ∈ K such that (Ax)i �= 0, ∀i.(ii) If

∥∥∇F(x)

∥∥∞ ≤ ∇F(x)T x, (5)

then x is a stationary point of F on K .(iii) F is convex on K and the global maximum of F is attained at one of the vertices

ej of the convex set K .

The optimal solution of the program (4) is not in general unique and there mayexist an optimal solution which is not one of the unit vectors ei . In fact, the followingresult holds.

Page 4: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

794 C.P. Brás et al.

Theorem 1 Let J be a subset of {1, . . . , n} such that ‖A‖1 = ‖A.j‖1, for all j ∈ J

and A.j ≥ 0 for all j ∈ J . If J is nonempty, then any convex combination x of thecanonical vectors ej , j ∈ J also satisfies

‖A‖1 = ‖Ax‖1.

Proof Let

x =∑

j∈J

xj ej

where xj ≥ 0, for all j ∈ J and∑

j∈J xj = 1. Then

(Ax)i =∑

j∈J

aij xj , i = 1,2, . . . , n.

Since A.j ≥ 0 for all j ∈ J , we have

‖Ax‖1 =n∑

i=1

∣∣∣∣∣

j∈J

aij xj

∣∣∣∣∣

=n∑

i=1

j∈J

aij xj

=∑

j∈J

xj

(n∑

i=1

aij

)

=∑

j∈J

xj

∣∣∣∣∣

n∑

i=1

aij

∣∣∣∣∣

=(

j∈J

xj

)

‖A‖1 = ‖A‖1.�

For instance, let

A =[1 4 0

2 0 −13 2 4

]

.

Hence J = {1,2}. If x = 23e1 + 1

3e2 = [ 23

13 0]T , then ‖Ax‖1 = ‖A‖1 = 6.

Hager’s algorithm (Hager 1984) has been designed to compute a stationary pointfor the optimization problem (4). The algorithm investigates only vectors of thecanonical basis with exception of the initial point, and it moves through these vec-tors using search directions based on the gradient of F . If F is not differentiable atone of these vectors, then a subgradient of F can be easily computed (Hager 1984).The inequality (5) provides a stopping criterion for the algorithm. The steps of the

Page 5: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 795

algorithm are presented below, where sign(y) denotes a vector with components

(sign(y)

)i=

{1 if yi ≥ 0,−1 if yi < 0.

HAGER’S ESTIMATOR FOR ‖A‖1

Step 0: InitializationLet x = ( 1

n, 1

n, . . . , 1

n).

Step 1: Determination of the Gradient or Subgradient of F , z

Compute y = Ax,

ξ = sign(y),

z = AT ξ.

Step 2: Stopping CriterionIf ‖z‖∞ ≤ zT x, stop with γ = ‖y‖1 the estimate for ‖A‖1.

Step 3: Search DirectionLet r be such that ‖z‖∞ = |zr | = max1≤i≤n |zi |. Set x = er and return to Step 1.

This algorithm can now be used to find an estimation of ‖B−1‖1 by setting A =B−1. In this case, the vectors y and z are computed by

y = B−1x ⇔ By = x,

z = (B−1)T

ξ ⇔ BT z = ξ.

So, in each iteration of Hager’s method for estimating ‖B−1‖1 two linear systemswith the matrices B and BT are required to be solved. If the LU decompositionof B is known, then each iteration of the algorithm corresponds to the solution offour triangular systems. Since the algorithm only guarantees a stationary point of thefunction F(x) = ‖Ax‖1 on the convex set K , the procedure is applied from differentstarting points in order to get a better estimate for cond1(B). The whole procedure isdiscussed in (Higham 1996) and has been implemented in MATLAB (Moler et al.2001). Furthermore, the algorithm provides an estimator for cond1(B) that is smallerthan or equal to the true value of the condition number of the matrix B in the 1-norm.The purpose of this paper is to investigate Hager’s algorithm and to show how itmight be improved.

In (Bertsekas 2003), the conditional gradient algorithm is discussed to deal withnonlinear programs with simple constraints. The algorithm is a descent procedurethat possess global convergence toward a stationary point under reasonable hypoth-esis (Bertsekas 2003). To describe an iteration of the procedure consider again the

Page 6: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

796 C.P. Brás et al.

nonlinear program (4) and suppose that F is differentiable. If x ∈ K , then a searchdirection is computed as

d = y − x

where y ∈ K is the optimal solution of the convex program

OPT: Maximize ∇F(x)T (y − x)

subject to y ∈ K.

Two cases may occur:

(i) the optimal value is nonpositive and y is a stationary point of F on K ;(ii) the optimal value is positive and d is an ascent direction of F at x.

In order to compute this optimal solution y, we introduce the following change ofvariables:

yi = ui − vi,

ui ≥ 0, vi ≥ 0,

uivi = 0, for all i = 1,2, . . . , n.

(6)

Then OPT is equivalent to the following Mathematical Program with Linear Com-plementarity Constraints:

Maximize ∇F(x)T (u − v − x)

subject to eT u + eT v ≤ 1,

u ≥ 0, v ≥ 0,

uT v = 0,

where e is a vector of ones of order n. As discussed in (Murty 1976), the complemen-tarity constraint is redundant and OPT is equivalent to the following linear program:

LP: Maximize ∇F(x)T (u − v − x)

subject to eT u + eT v ≤ 1,

u ≥ 0, v ≥ 0.

Suppose that∣∣∇rF (x)

∣∣ = max

1≤i≤n

∣∣∇iF (x)

∣∣.

An optimal solution (u, v) of LP is given by

(i) ∇rF (x) > 0 ⇒ uj ={

1 if j = r ,0 otherwise,

v = 0,

(ii) ∇rF (x) < 0 ⇒ vj ={

1 if j = r ,0 otherwise,

u = 0.

Page 7: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 797

It then follows from (6) that

y = sign(∇rF (x)

)er .

After the computation of the search direction, a stepsize can be computed by an exactline-search

Maximize F(x + αd) = g(α)

subject to α ∈ [0,1].Since F is convex on R

n, g is convex on [0,1] and the maximum is achievedat α = 1 (Bazaraa et al. 1993). Hence the next iterate computed by the algorithm isgiven by

x = x + y − x = y = sign(∇rF (x)

)er .

Hence, the conditional gradient algorithm is essentially the same as Hager’s gra-dient method for (4). Except for signs, the same iterates are generated and the algo-rithms terminate within n steps at the same iteration.

3 A simplex formulation

In this section we introduce a different nonlinear program for ‖A‖1 estimation

Maximize F(x) = ‖Ax‖1 =n∑

i=1

∣∣∣∣∣

n∑

j=1

aij xj

∣∣∣∣∣

subject to eT x = 1,

x ≥ 0.

(7)

Note that the constraint set of this nonlinear program is the ordinary simplex Δ.Since Δ has the extreme points e1, e2, . . . , en, the maximum of the F function isattained at one of them, which gives the 1-norm of the matrix A. As in Theorem 1,this maximum can also be attained at a point that is not a vertex of the simplex.

As before the difficulty for computing ‖A‖1 is due to the concavity of program(7). If the matrix has nonnegative elements the problem can be easily solved. In fact,if A ≥ 0 and x ≥ 0, then

F(x) =n∑

i=1

∣∣∣∣∣

n∑

j=1

aij xj

∣∣∣∣∣=

n∑

i=1

n∑

j=1

aij xj =n∑

j=1

(n∑

i=1

aij

)

xj .

Hence

F(x) = dT x

with

d = AT e.

Page 8: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

798 C.P. Brás et al.

Therefore the program (7) reduces to the linear program

Maximize dT x

subject to eT x = 1,

x ≥ 0

which has the optimal solution

x = er , with dr = max{di : i = 1, . . . , n}.Hence

‖A‖1 = dr .

This observation also appears in (Hager 1984).Like the previous algorithm, the importance of this procedure is due to the possi-

bility of computing a good matrix condition number estimator. Let A be a Minkowskimatrix, i.e., A ∈ P and the nondiagonal elements are all nonnegative. Then A−1 ≥ 0(Cottle et al. 1992) and

∥∥A−1

∥∥

1 = dr = max{di, i = 1, . . . , n},with d the vector given by

AT d = e. (8)

It should be noted that this type of matrices appears very often in the solution ofsecond order ordinary and elliptic partial differential equations by finite elements andfinite differences (Johnson 1990). So the condition number of a Minkowski matrixcan be computed by solving a unique system of linear equations (8) and setting

cond1(A) = ‖A‖1dr

where d is the unique solution of (8).In the general case let us assume that F is differentiable. Then the following result

holds.

Theorem 2 If F is differentiable on an open set containing the simplex Δ and ifmaxi ∇iF (x) ≤ ∇F(x)T x, then x is a stationary point of F on Δ.

Proof For each y ∈ Δ,

∇F(x)T y =n∑

i=1

∇iF (x)yi ≤n∑

i=1

(max∇iF (x)

)yi

≤ max1≤i≤n

∇iF (x)

(n∑

i=1

yi

)

= max1≤i≤n

∇iF (x).

Page 9: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 799

Hence, by hypothesis,

∇F(x)T (y − x) ≤ 0,∀y ∈ Δ

and x is a stationary point of F on Δ. �

Next we describe the conditional gradient algorithm for the solution of the nonlin-ear program (7). As before, if x ∈ Δ then the search direction d is given by

d = y − x

where y ∈ Δ is the optimal solution of the linear program

LP: Maximize ∇F(x)T (y − x)

subject to eT y = 1,

y ≥ 0.

Two cases may occur.

(i) Maximum value is nonpositive and y is a stationary point of F on Δ.(ii) Maximum value is positive and d is an ascent direction of F at x.

Furthermore y is given by

yi ={

1 if i = r ,0 otherwise

where r is the index satisfying

∇rF (x) = max1≤i≤n

{∇iF (x)}.

An exact line-search along this direction leads to α = 1 and the next iterate is

x = x + y − x = y = er .

Hence the conditional gradient algorithm for estimating ‖A‖1 can be described asfollows.

CONDITIONAL GRADIENT ESTIMATOR FOR ‖A‖1

Step 0: InitializationLet x = ( 1

n, 1

n, . . . , 1

n).

Step 1: Determination of the Gradient or Subgradient of F , z

Compute y = Ax,

ξ = sign(y),

z = AT ξ.

Step 2: Stopping CriterionIf max1≤i≤n zi ≤ zT x, stop the algorithm with γ = ‖y‖1 the estimate for ‖A‖1.

Page 10: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

800 C.P. Brás et al.

Step 3: Search DirectionConsider zr = max1≤i≤n zi . Set x = er and return to Step 1.

As discussed in (Bertsekas 2003) this algorithm possesses global convergence to-ward a stationary point of the function F on the simplex. Furthermore, the followingresult holds:

Theorem 3 If x is not a stationary point, then F(er) > F(x), for r such that

∇rF (x) = max{∇iF (x) : i = 1, . . . , n

}.

Proof

F(er

) ≥ F(x) + ∇F(x)T(er − x

) = F(x) + [∇rF (x) − ∇F(x)T x]

= F(x) +[max

i∇iF (x) − ∇F(x)T x

]> F(x)

by Theorem 2. �

Since the simplex has n vertices the algorithm should converge to a stationarypoint of F in a finite number of iterations. Furthermore the algorithm is stronglypolynomial, as the computational effort per iteration is polynomial in the order of thematrix A.

As mentioned before, the estimation of ‖A−1‖1 can be done by setting A = B−1

for this algorithm. If the LU decomposition of B is available, then, as in Hager’smethod, the algorithm requires the solution of four triangular systems in each it-eration. Unfortunately, the asymptotic rate of convergence of the conditional gra-dient method is not very fast when Δ is a polyhedron (Bertsekas 2003) and gra-dient projection methods often converge faster. In the next section we discuss theuse of the so-called Spectral Projected Gradient algorithm (Birgin et al. 2000;Júdice et al. 2008) to find an estimate of ‖A‖1.

4 A spectral projected gradient algorithm

As discussed in (Birgin et al. 2000, 2001), the Spectral Projected Gradient (SPG)algorithm uses in each iteration k a vector xk ∈ Δ and moves along the projectedgradient direction defined by

dk = PΔ

(xk + ηk∇F(xk)

) − xk (9)

where PΔ(w) is the projection of w ∈ Rn onto Δ. In each iteration k, xk is updated

by xk+1 = xk + δkdk , where 0 < δk ≤ 1 is computed by a line search technique.The algorithm converges to a stationary point of F (Birgin et al. 2000). Next wediscuss the important issues for the application of the SPG algorithm to the simplexformulation discussed in (Júdice et al. 2008).

Page 11: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 801

(i) Computation of Stepsize: As before exact line search gives αk = 1 in each itera-tion k.

(ii) Computation of the Relaxation Parameter ηk :Let zk = ∇F(xk) and ηmin and ηmax be a small and a huge positive real num-

ber, respectively, and as before, let PX(x) denote the projection of x over a set X.Then (Júdice et al. 2008)(I) For k = 0,

η0 = P[ηmin,ηmax](

1

‖PΔ(x0 + z0) − x0‖∞

)

.

(II) For any k > 0, let xk be the corresponding iterate, zk the gradient (subgradi-ent) of F at xk and

{ sk−1 = xk − xk−1,

yk−1 = zk−1 − zk.

Then

ηk ={

P[ηmin,ηmax]〈sk−1,sk−1〉〈sk−1,yk−1〉 ⇐ 〈sk−1, yk−1〉 > ε,

ηmax ⇐ otherwise(10)

where ε is a quite small positive number.(iii) Computation of the projection over Δ:

The projection onto Δ that is required in each iteration of the algorithm iscomputed as follows:(I) Find u = xk + ηkzk .

(II) The vector PΔ(u) is the unique optimal solution of the strictly convexquadratic problem

Minimizew∈Rn

1

2‖u − w‖2

2

subject to eT w = 1,

w ≥ 0.

As suggested in (Júdice et al. 2008), a very simple and strongly polyno-mial Block Pivotal Principal Pivoting Algorithm is employed to computethis unique global minimum solution.

The steps of the SPG algorithm can be presented as follows.

SPECTRAL PROJECTED GRADIENT ESTIMATOR FOR ‖A−1‖1

Step 0: InitializationLet x0 = ( 1

n, 1

n, . . . , 1

n) and k = 0.

Page 12: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

802 C.P. Brás et al.

Step 1: Determination of the Gradient or Subgradient of F , zk

Compute Ayk = xk,

ξ = sign(yk),

AT zk = ξ.

Step 2: Stopping CriterionIf max1≤i≤n(zk)i ≤ zT

k xk , stop the algorithm with γ = ‖yk‖1 the estimate for‖A−1‖1.

Step 3: Search DirectionUpdate xk+1 = PΔ(xk + ηkzk) and go to Step 1 with k ← k + 1.

5 Computational experience

In this section we report some computational experience with the methods previ-ously described to estimate ‖A−1‖1. All the estimators have been implemented inMATLAB (Moler et al. 2001), version 7.1. The test problems have been taken fromwell-known sources.

Type I. Minkowski matrices (Johnson 1990), with the following structure:⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

4 −1−1 4 −1

−1 4 −1−1 4

∣∣∣∣∣∣ −I4

−I4

∣∣∣∣∣∣

4 −1−1 4 −1

−1 4 −1−1 4

∣∣∣∣∣∣

−I4

−I4. . .

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

Type II. Matrices (Higham 1988) A = αI + e × eT , where e = [1 1 . . . 1]T and theparameter α takes values close to zero.

Type III. Triangular bidiagonal unitary matrices (Higham 1988), where the diagonaland the first lower diagonal are unitary and the remaining elements are zero. Hencethese matrices have the form

⎢⎢⎢⎢⎣

11 10 1 1...

......

. . .

0 0 0 . . . 1

⎥⎥⎥⎥⎦

.

Type IV. Pentadiagonal matrices (Murty 1988) with

mii = 6,

Page 13: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 803

mi,i−1 = mi−1,i = −4,

mi,i−2 = mi−2,i = 1

and all other elements equal to zero.Type V. Murty matrices (Murty 1988) with the structure

⎢⎢⎢⎢⎣

12 12 2 1...

......

. . .

2 2 2 . . . 1

⎥⎥⎥⎥⎦

.

Type VI. Fathy matrices (Murty 1988) of the form

F = MT M

where M is a Murty matrix.Type VII. Matrices (Higham 1988) whose inverses can be written as

A−1 = I + θ ∗ C

where C is an arbitrary matrix such that

Ce = CT e = 0,

and the parameter θ taking values close to zero.Type VIII. Matrices from the MatrixMarket collection (Boisvert et al. 1998) that

covers all of the Harwell-Boeing matrices. In Table 1 some specifications ofthe selected matrices are displayed.

Table 2 includes the numerical results of the performance of the three algorithmsdiscussed in this paper for the estimation of the 1-norm condition number of thematrices mentioned before. The notations N, VALUE and IT are used to denote theorder of the matrices, the value of the condition number estimator and the number ofiterations provided by the three algorithms, which is the number of iterates visited bythe method. Furthermore CN represents the true condition number of these matrices.

In general the SPG algorithm converges to a vertex of the simplex. In Table 2 weuse the notation (∗) to indicate the problems where the algorithm converges to a pointthat is not a vertex of Δ. Finally column ηmax indicates the value of ηmax that leadsto the best performance for the SPG algorithm.

The results show that the three algorithms are able to obtain similar estimationsfor the condition number of the matrices. The number of iterations for the three al-gorithms is always quite small. Furthermore for the best values of ηmax the SPG

algorithm finds in general the best estimate for the condition number.The main drawback of the SPG algorithm relies on its dependence of the choice

of the interval [ηmin, ηmax]. Our experience has shown that a value of ηmin = 10−3

is in general a good choice. However the choice of ηmax has an important factor onthe computational effort. The results show that a value of ηmax of 104 or 105 works

Page 14: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

804 C.P. Brás et al.

Table 1 Properties of the Type VIII matrices

Name Original name Origin Order Number ofnonzeros elements

Type

MATRIX1 s1rmq4m1 Finite-Elements 5489 143300 SPD

MATRIX2 s2rmq4m1 Finite-Elements 5489 143300 SPD

MATRIX3 s3rmq4m1 Finite-Elements 5489 143300 SPD

MATRIX4 s1rmt3m1 Finite-Elements 5489 112505 SPD

MATRIX5 s2rmt3m1 Finite-Elements 5489 112505 SPD

MATRIX6 s3rmt3m1 Finite-Elements 5489 112505 SPD

MATRIX7 bcsstk14 Structural Engineering 1806 32630 SPD

MATRIX8 bcsstk15 Structural Engineering 3948 60882 SPD

MATRIX9 bcsstk27 Structural Engineering 1224 28675 SPD

MATRIX10 bcsstk28 Structural Engineering 4410 111717 Indefinite

MATRIX11 orani678 Economic Modeling 2529 90158 Unsymmetric

MATRIX12 sherman5 Petroleum Modeling 3312 20793 Unsymmetric

MATRIX13 nos7 Finite Differences 729 2673 SPD

MATRIX14 nos2 Finite Differences 957 2547 SPD

MATRIX15 e20r5000 Dynamics of Fluids 4241 131556 Unsymmetric

MATRIX16 fidapm37 Finite Elements Modeling 9152 765944 Unsymmetric

MATRIX17 fidap037 Finite Elements Modeling 3565 67591 Unsymmetric

very well for many cases, but for other problems ηmax should be chosen larger. It isimportant to note that the choice of ηmax has even an impact on the stationary pointobtained by the SPG algorithm, and of the corresponding condition number estimate.

As suggested in (Hager 1984), we decided to apply the SPG algorithm more thanonce starting in each iteration with an initial point that is the barycenter of the nonvisited vertices of the canonical basis in the previously iterations. The results arepresented in Table 3, where NAP represents the number of applications of the SPG

algorithm and NIT the total number of iterations for these experiments. These re-sults show that the algorithm can improve the estimate in some cases, when the truecondition number has not been obtained by the SPG algorithm.

6 Conclusions

The formulation of the condition number estimation problem was changed from aminimization over a unit sphere in the 1-norm to a minimization over the unit simplex.For this new constraint set, it is relatively easy to project a vector onto the feasible set.Hence, a projected gradient algorithm could be used to estimate the 1-norm conditionnumber. Numerical experiments indicate that the so-called spectral projected gradient(SPG) algorithm (Birgin et al. 2000) can yield a better condition number estimate thanthe previous algorithm, while the number of iterations increased by at most one.

Page 15: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 805

Tabl

e2

Con

ditio

nnu

mbe

res

timat

ors

Mat

rix

Hag

er’s

estim

ator

CG

estim

ator

SP

Ges

timat

orC

N

Val

ueN

ITV

alue

NIT

Val

ueN

ITη

max

TY

PE

IN

502.

31E+0

012

2.31

E+0

012

2.31

E+0

012

1.0E

+08

2.31

E+0

01

250

2.40

E+0

012

2.40

E+0

012

2.40

E+0

013∗

1.0E

+08

2.40

E+0

01

500

2.40

E+0

012

2.40

E+0

012

2.40

E+0

013∗

1.0E

+08

2.40

E+0

01

1000

2.40

E+0

012

2.40

E+0

012

2.40

E+0

012∗

1.0E

+08

2.40

E+0

01

2000

2.40

E+0

012

2.40

E+0

012

2.40

E+0

013∗

1.0E

+04

2.40

E+0

01

4000

2.40

E+0

012

2.40

E+0

012

2.40

E+0

012∗

1.0E

+04

2.40

E+0

01

TY

PE

IIα

(N=

4000

)

0.50

1.60

E+0

042

1.60

E+0

042

1.60

E+0

042

1.0E

+16

1.60

E+0

04

0.25

3.20

E+0

042

3.20

E+0

042

3.20

E+0

042

1.0E

+16

3.20

E+0

04

0.12

56.

40E+0

042

6.40

E+0

042

6.40

E+0

042

1.0E

+16

6.40

E+0

04

0.01

8.00

E+0

052

8.00

E+0

052

8.00

E+0

053

1.0E

+12

8.00

E+0

05

1.0E

-03

8.00

E+0

062

8.00

E+0

062

8.00

E+0

063

1.0E

+12

8.00

E+0

06

1.0E

-04

8.00

E+0

072

8.00

E+0

072

8.00

E+0

073∗

1.0E

+10

8.00

E+0

07

1.0E

-05

8.00

E+0

082

8.00

E+0

082

8.00

E+0

083

1.0E

+10

8.00

E+0

08

TY

PE

III

N

509.

80E+0

012

9.80

E+0

012

9.80

E+0

013

1.0E

+04

1.00

E+0

02

250

4.98

E+0

022

4.98

E+0

022

4.98

E+0

023

1.0E

+04

5.00

E+0

02

500

9.98

E+0

022

9.98

E+0

022

9.98

E+0

023

1.0E

+04

1.00

E+0

03

1000

2.00

E+0

032

2.00

E+0

032

2.00

E+0

033

1.0E

+04

2.00

E+0

03

2000

4.00

E+0

032

4.00

E+0

032

4.00

E+0

033

1.0E

+04

4.00

E+0

03

4000

8.00

E+0

032

8.00

E+0

032

8.00

E+0

033

1.0E

+04

8.00

E+0

03

Page 16: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

806 C.P. Brás et al.

Tabl

e2

(Con

tinu

ed)

Mat

rix

Hag

er’s

estim

ator

CG

estim

ator

SP

Ges

timat

orC

N

Val

ueN

ITV

alue

NIT

Val

ueN

ITη

max

TY

PE

IVN

503.

04E+0

052

3.04

E+0

052

3.04

E+0

053

1.0E

+11

3.04

E+0

05

250

1.68

E+0

082

1.68

E+0

082

1.68

E+0

083

1.0E

+06

1.68

E+0

08

500

2.65

E+0

092

2.65

E+0

092

2.65

E+0

093

1.0E

+06

2.65

E+0

09

1000

4.20

E+0

102

4.20

E+0

102

4.20

E+0

102

1.0E

+06

4.20

E+0

10

2000

6.69

E+0

112

6.69

E+0

112

6.69

E+0

112

1.0E

+06

6.69

E+0

11

4000

1.07

E+0

132

1.07

E+0

132

1.07

E+0

132

1.0E

+06

1.07

E+0

13

TY

PE

VN

509.

80E+0

032

9.80

E+0

032

9.80

E+0

032

1.0E

+04

9.80

E+0

03

250

2.49

E+0

052

2.49

E+0

052

2.49

E+0

052

1.0E

+04

2.49

E+0

05

500

9.98

E+0

052

9.98

E+0

052

9.98

E+0

052

1.0E

+04

9.98

E+0

05

1000

4.00

E+0

062

4.00

E+0

062

4.00

E+0

062

1.0E

+04

4.00

E+0

06

2000

1.60

E+0

072

1.60

E+0

072

1.60

E+0

072

1.0E

+04

1.60

E+0

07

4000

6.40

E+0

072

6.40

E+0

072

6.40

E+0

072

1.0E

+04

6.40

E+0

07

TY

PE

VI

N

502.

50E+0

072

2.50

E+0

072

2.50

E+0

072

1.0E

+04

2.50

E+0

07

250

1.56

E+0

102

1.56

E+0

102

1.56

E+0

102

1.0E

+04

1.56

E+0

10

500

2.50

E+0

112

2.50

E+0

112

2.50

E+0

112

1.0E

+04

2.50

E+0

11

1000

4.00

E+0

122

4.00

E+0

122

4.00

E+0

122

1.0E

+04

4.00

E+0

12

2000

6.40

E+0

132

6.40

E+0

132

6.40

E+0

132

1.0E

+04

6.40

E+0

13

4000

1.02

E+0

152

1.02

E+0

152

1.02

E+0

152

1.0E

+04

1.02

E+0

15

Page 17: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 807

Tabl

e2

(Con

tinu

ed)

Mat

rix

Hag

er’s

estim

ator

CG

estim

ator

SP

Ges

timat

orC

N

Val

ueN

ITV

alue

NIT

Val

ueN

ITη

max

TY

PE

VII

θ(N

=40

00)

0.50

5.44

E+0

133

5.43

E+0

133

8.14

E+0

133

1.0E

+02

8.14

E+0

13

0.25

2.37

E+0

133

2.37

E+0

133

2.73

E+0

133

1.0E

+02

2.75

E+0

13

0.12

51.

01E+0

134

1.01

E+0

134

1.09

E+0

133

1.0E

+02

1.10

E+0

13

0.01

1.67

E+0

113

1.67

E+0

113

2.12

E+0

113

1.0E

+05

2.14

E+0

11

1.0E

-03

5.68

E+0

093

5.68

E+0

093

6.89

E+0

093

1.0E

+06

6.99

E+0

09

1.0E

-04

1.27

E+0

083

1.27

E+0

083

2.74

E+0

084

1.0E

+08

2.90

E+0

08

1.0E

-05

7.14

E+0

064

7.14

E+0

064

8.10

E+0

064

1.0E

+08

8.15

E+0

06

TY

PE

VII

IN

AM

E

MA

TR

IX1

7.47

E+0

042

7.47

E+0

042

7.47

E+0

042

1.0E

+05

7.47

E+0

04

MA

TR

IX2

7.66

E+0

052

7.66

E+0

052

7.66

E+0

052

1.0E

+05

7.66

E+0

05

MA

TR

IX3

4.54

E+0

073

4.45

E+0

073

4.45

E+0

073

1.0E

+05

4.54

E+0

07

MA

TR

IX4

4.49

E+0

042

4.49

E+0

042

4.49

E+0

042

1.0E

+05

4.93

E+0

04

MA

TR

IX5

9.22

E+0

052

9.22

E+0

052

9.22

E+0

052

1.0E

+05

9.22

E+0

05

MA

TR

IX6

4.39

E+0

072

4.39

E+0

072

4.39

E+0

072

1.0E

+05

4.62

E+0

07

MA

TR

IX7

1.11

E+0

102

1.11

E+0

102

1.11

E+0

102∗

1.0E

+05

1.11

E+0

10

MA

TR

IX8

7.66

E+0

092

7.66

E+0

092

7.66

E+0

092∗

1.0E

+05

7.66

E+0

09

MA

TR

IX9

1.23

E+0

042

1.23

E+0

042

1.23

E+0

043

1.0E

+05

1.23

E+0

04

MA

TR

IX10

4.49

E+0

042

4.49

E+0

042

4.49

E+0

043

1.0E

+05

4.49

E+0

04

MA

TR

IX11

1.00

E+0

072

1.00

E+0

072

1.00

E+0

072

1.0E

+05

1.00

E+0

07

MA

TR

IX12

3.90

E+0

052

3.90

E+0

052

3.90

E+0

052

1.0E

+05

3.90

E+0

05

MA

TR

IX13

3.00

E+0

082

3.00

E+0

082

3.00

E+0

083∗

1.0E

+08

3.00

E+0

08

Page 18: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

808 C.P. Brás et al.

Tabl

e2

(Con

tinu

ed)

Mat

rix

Hag

er’s

estim

ator

CG

estim

ator

SP

Ges

timat

orC

N

Val

ueN

ITV

alue

NIT

Val

ueN

ITη

max

MA

TR

IX14

3.71

E+0

052

3.71

E+0

052

3.71

E+0

053∗

1.0E

+05

3.71

E+0

05

MA

TR

IX15

1.84

E+0

102

1.84

E+0

102

1.84

E+0

102

1.0E

+05

1.84

E+0

10

MA

TR

IX16

3.05

E+0

102

3.05

E+0

102

3.05

E+0

102

1.0E

+05

3.05

E+0

10

MA

TR

IX17

2.26

E+0

022

2.26

E+0

022

2.26

E+0

022∗

1.0E

+05

2.26

E+0

02

Page 19: An investigation of feasible descent algorithms for ...users.clas.ufl.edu/hager/papers/condition.pdfTop (2012) 20:791–809 DOI 10.1007/s11750-010-0161-9 ORIGINAL PAPER An investigation

An investigation of feasible descent algorithms 809

Table 3 Performance of the SPG algorithm with a different initial point

Matrix Previous SPG estimator CN

estimative Value NIT NAP

TYPE III N

50 9.80E+001 1.00E+002 5 2 1.00E+002

250 4.98E+002 5.00E+002 5 2 5.00E+002

500 9.98E+002 1.00E+003 5 2 1.00E+003

TYPE VII θ (N = 4000)

0.25 2.73E+013 2.73E+013 6 2 2.75E+013

0.125 1.09E+013 1.09E+013 23 7 1.10E+013

0.01 2.12E+011 2.12E+011 6 2 2.14E+011

1.0E-03 6.89E+009 6.89E+009 9 3 6.99E+009

1.0E-04 2.74E+008 2.74E+008 16 4 2.90E+008

1.0E-05 8.10E+006 8.10E+006 12 3 8.15E+006

TYPE VIII NAME

MATRIX3 4.45E+07 4.54E+07 5 2 4.54E+07

MATRIX4 4.49E+04 4.93E+04 8 4 4.93E+04

MATRIX6 4.39E+07 4.60E+07 8 3 4.62E+07

References

Bazaraa MS, Sherali HD, Shetty C (1993) Nonlinear programming: theory and algorithms, 2nd edn. Wiley,New York

Bertsekas DP (2003) Nonlinear programming, 2nd edn. Athena Scientific, BelmontBirgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex

sets. SIAM J Optim 10:1196–1211Birgin EG, Martínez JM, Raydan M (2001) Algorithm 813: SPG-software for convex constrained opti-

mization. ACM Trans Math Softw 27:340–349Boisvert R, Pozo R, Remington K, Miller B, Lipman R (1998) Mathematical and Computational Sciences

Division of Information Technology Laboratory of National Institute of Standards and Technology.Available at http://math.nist.gov/MatrixMarket/formats.html

Cottle R, Pang J, Stone R (1992) The linear complementarity problem. Academic Press, New YorkDemmel JW (1997) Applied numerical linear algebra. SIAM, PhiladelphiaGill P, Murray W, Wright M (1991) Numerical linear algebra and optimization. Addison-Wesley, New

YorkGolub GH, Loan CFV (1996) Matrix computations, 3rd edn. John Hopkins University Press, BaltimoreHager WW (1984) Condition estimates. SIAM J Sci Stat Comput 5(2):311–316Hager WW (1998) Applied numerical linear algebra. Prentice Hall, New JerseyHigham NJ (1988) FORTRAN codes for estimating the one-norm of a real or complex matrix with appli-

cations to condition estimation. ACM Trans Math Softw 14(4):381–396Higham NJ (1996) Accuracy and stability of numerical algorithms. SIAM, PhiladelphiaJohnson C (1990) Numerical solution of partial differential equations by the finite element method. Cam-

bridge University Press, LondonJúdice JJ, Raydan M, Rosa SS, Santos SA (2008) On the solution of the symmetric eigenvalue comple-

mentarity problem by the spectral projected gradient algorithm. Numer Algorithms 44:391–407Moler C, Little JN, Bangert S (2001) Matlab user’s guide—the language of technical computing. The

MathWorks, SherbornMurty K (1976) Linear and combinatorial programming. Wiley, New YorkMurty K (1988) Linear complementarity, linear and nonlinear programming. Heldermann, Berlin