convergence conditions, line search algorithms and trust region implementations for the...

28
Optimization Methods and Software Vol. 20, No. 1, February 2005, 71–98 Convergence conditions, line search algorithms and trust region implementations for the Polak–Ribière conjugate gradient method L. GRIPPO* and S. LUCIDI Dipartimento di Informatica e Sistemistica, Università di Roma ‘La Sapienza’, Via Buonarroti 12, 00185 Roma, Italy (Received 8 August 2003; in final form 30 January 2004) This paper is dedicated to ProfessorYury Evtushenko in honor of his 65th birthday We study globally convergent implementations of the Polak–Ribière (PR) conjugate gradient method for the unconstrained minimization of continuously differentiable functions. More specifically, first we state sufficient convergence conditions, which imply that limit points produced by the PR iteration are stationary points of the objective function and we prove that these conditions are satisfied, in particular, when the objective function has some generalized convexity property and exact line searches are performed. In the general case, we show that the convergence conditions can be enforced by means of various inexact line search schemes where, in addition to the usual acceptance criteria, further conditions are imposed on the stepsize. Then we define a new trust region implementation, which is compatible with the behavior of the PR method in the quadratic case, and may perform different linesearches in dependence of the norm of the search direction. In this framework, we show also that it is possible to define globally convergent modified PR iterations that permit exact linesearches at every iteration. Finally, we report the results of a numerical experimentation on a set of large problems. Keywords: Unconstrained optimization; Conjugate gradient method; Polak–Ribière method 1. Introduction We consider the problem minimize xR n f (x), (1) where f : R n R is a continuously differentiable function with gradient g: R n R n . For the solution of problem (1), we consider conjugate gradient algorithms of the form x k+1 = x k + α k d k , (2) with d k = g k for k = 0 g k + β k d k1 for k 1, (3) *Corresponding author. Tel.: +39 06 48299233; Fax: +39 06 4782516; Email: [email protected] Optimization Methods and Software ISSN 1055-6788 print/ISSN 1029-4937 online © 2005 Taylor & Francis Group Ltd http://www.tandf.co.uk/journals DOI: 10.1080/1055678042000208570

Upload: nocuffs

Post on 25-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Optimization Methods and SoftwareVol. 20, No. 1, February 2005, 71–98

Convergence conditions, line search algorithms andtrust region implementations for the Polak–Ribière conjugate

gradient method

L. GRIPPO* and S. LUCIDI

Dipartimento di Informatica e Sistemistica, Università di Roma ‘La Sapienza’,Via Buonarroti 12, 00185 Roma, Italy

(Received 8 August 2003; in final form 30 January 2004)

This paper is dedicated to Professor Yury Evtushenko in honor of his 65th birthday

We study globally convergent implementations of the Polak–Ribière (PR) conjugate gradient methodfor the unconstrained minimization of continuously differentiable functions. More specifically, firstwe state sufficient convergence conditions, which imply that limit points produced by the PR iterationare stationary points of the objective function and we prove that these conditions are satisfied, inparticular, when the objective function has some generalized convexity property and exact line searchesare performed. In the general case, we show that the convergence conditions can be enforced by meansof various inexact line search schemes where, in addition to the usual acceptance criteria, furtherconditions are imposed on the stepsize. Then we define a new trust region implementation, whichis compatible with the behavior of the PR method in the quadratic case, and may perform differentlinesearches in dependence of the norm of the search direction. In this framework, we show also that itis possible to define globally convergent modified PR iterations that permit exact linesearches at everyiteration. Finally, we report the results of a numerical experimentation on a set of large problems.

Keywords: Unconstrained optimization; Conjugate gradient method; Polak–Ribière method

1. Introduction

We consider the problem

minimizex∈Rn

f (x), (1)

where f : Rn → R is a continuously differentiable function with gradient g: Rn → Rn. Forthe solution of problem (1), we consider conjugate gradient algorithms of the form

xk+1 = xk + αkdk, (2)

with

dk ={

−gk for k = 0

−gk + βkdk−1 for k ≥ 1,(3)

*Corresponding author. Tel.: +39 06 48299233; Fax: +39 06 4782516; Email: [email protected]

Optimization Methods and SoftwareISSN 1055-6788 print/ISSN 1029-4937 online © 2005 Taylor & Francis Group Ltd

http://www.tandf.co.uk/journalsDOI: 10.1080/1055678042000208570

72 L. Grippo and S. Lucidi

where x0 is a given initial point, αk is a steplength along dk , gk := g(xk), and βk is a suitablescalar. When f is a quadratic function with a definite positive Hessian matrix Q, if the stepsizeαk is chosen as the exact one-dimensional minimizer along dk and the scalar βk is defined by

βk = gTk Qdk−1

dTk−1Qdk−1

k ≥ 1, (4)

the algorithms (2) and (3) is the well-known (linear) conjugate gradient method of Hestenesand Stiefel [1], which determines the minimizer of f in n iterations at most.

The extensions to the general case [see, e.g., refs. 2–12] are based on the adoption of some(possibly inexact) line search technique for the computation of αk and make use of formulaefor the evaluation of βk that do not contain explicitly the Hessian matrix of f . The best-knownformulae for βk are the Fletcher and Reeves (FR) [4] formula

βFRk = ‖gk‖2

‖gk−1‖2(5)

and the Polak–Ribière (PR) [9] formula

βPRk = gT

k (gk − gk−1)

‖gk−1‖2, (6)

where ‖·‖ denotes the Euclidean norm on Rn. Numerical experience indicates that the PRformula is, in general, the most convenient choice.

The convergence properties of the various conjugate gradient methods in the nonquadraticcase have been the subject of many investigations. In particular, global convergence results forthe FR method have been obtained by Zoutendijk [13] in case of exact line searches and by Al-Baali [14] in connection with inexact line searches based on the strong Wolfe conditions. Theglobal convergence of the PR method with exact line searches has been proved in ref. [9] understrong convexity assumptions on f . However, it was shown by Powell [15] that in the general(nonconvex) case the PR method, employing a line search technique that accepts the first localminimizer along dk , can cycle infinitely without converging towards a stationary point.

An additional (but related) difficulty arises in the implementation of the PR method; in fact,as remarked in ref. [5], there is no known linesearch algorithm that can guarantee, in the generalcase, both satisfaction of the Wolfe conditions at xk and the descent condition gT

k+1dk+1 < 0 atthe next step. These difficulties can be overcome by restricting βk to nonnegative values, thatis, by letting βk = max{βPR

k , 0}, as suggested by Powell [16]. In connection with this choice,in ref. [5], it has been proved that the property limk→∞ inf ‖gk‖ = 0 can be enforced eitherthrough exact linesearches or also using an implementable inexact algorithm that guaranteessatisfaction of the strong Wolfe conditions and a strong descent condition of the form:

gTk dk ≤ −δ‖gk‖2, δ ∈ (0, 1).

In ref. [17], it has also been shown that this last requirement can be weakened and that onlythe descent condition gT

k dk < 0 must be imposed.An alternative approach, which does not require restarting along the steepest descent

direction and leaves unmodified the PR direction has been proposed in ref. [18], where linesearch rules have been defined that guarantee the property limk→∞ ‖gk‖ = 0. In essence, thetechnique proposed there is based on an Armijo-type linesearch method such that:

(i) The initial tentative stepsize is suitably restricted.

Polak–Ribière conjugate gradient method 73

(ii) The acceptability condition on the stepsize imposes both a sufficient reduction of f

satisfying a ‘parabolic bound’, by requiring that

f (xk + αkdk) ≤ f (xk) − γα2k‖dk‖2,

and a descent condition on the next search direction of the form

−δ2‖gk+1‖2 ≤ gTk+1dk+1 ≤ −δ1‖gk+1‖2.

It has also been proved that the parameters appearing in the acceptance rules can be adaptivelyupdated, so that, asymptotically, the first one-dimensional minimizer along the search directioncan be eventually accepted when the algorithm is converging to a point where the Hessianmatrix is positive definite.

Globally convergent linesearch methods for nonlinear conjugate gradient methods havebeen also proposed in ref. [19], where it is shown that the condition limk→∞ ‖gk‖ = 0 can beenforced in the PR method, through an Armijo-type linesearch that also ensures satisfactionof the condition gT

k+1dk+1 ≤ −σ‖dk+1‖2 for some σ ∈ (0, 1).In this article, we reconsider the convergence properties of the (unmodified) PR method and

extend and improve the results of ref. [18] in several directions. In particular, first we identifysufficient convergence conditions that imply

lim infk→∞ ‖gk‖ = 0

and we show that this property depends, essentially, on the fact that in addition to some standardacceptance condition on the stepsize we can establish the limit

limk→∞ ‖xk+1 − xk‖ = 0,

which can be enforced through a linesearch technique.A theoretical consequence of this is that the PR method with exact linesearches has stronger

convergence properties than it is usually believed. More specifically, if the first stationary pointis chosen at each step, it can be shown that the PR method converges if f is hemivariate [20],that is, if it is not constant on any line segment. This in turn implies that the PR method withexact linesearches converges under generalized convexity assumptions on f , so that the strongconvexity assumption of ref. [9] can be weakened.

We also consider sufficient conditions that imply the stronger result ‖gk‖ → 0 and we provethat this property depends on the existence of an asymptotic bound on αk‖dk‖2, which canalso be enforced through a linesearch.

Starting from these results, we can define various inexact linesearch algorithms that aresimpler and less demanding of those defined in ref. [18]. However, these techniques do notguarantee, in principle, that in the quadratic case the resulting algorithm can be identified rightat the start with the linear conjugate gradient method, unless the parameters are appropriatelychosen in relation to the optimal solution. In order to overcome this difficulty, we also intro-duce a different model, based on a ‘trust region’ approach, in which a linesearch algorithm(compatible with Wolfe conditions and even with exact linesearches) is employed wheneverthe norm of dk does not exceed an adaptive bound, defined on the basis of the behavior of themethod in the quadratic case. When this bound is violated, we can adopt any of the convergentArmijo-type linesearch techniques defined in this article. To the authors’ knowledge, this trustregion version of the PR method (with the additional requirement that the initial tentativestepsize is chosen using quadratic interpolation) is the only globally convergent algorithmproposed so far that employs the unmodified PR iteration and reduces to the linear conjugate

74 L. Grippo and S. Lucidi

gradient method in the quadratic case. On the basis of these results, we can also define amodified globally convergent PR iteration that consists of distinguishing (when required) thestepsize used for defining βk+1 from that used for computing xk+1, and of rescaling βk+1,each time that an adaptive bound on ‖dk‖ is violated. This strategy is again compatible withthe behavior of the linear conjugate gradient method in the quadratic case and admits thepossibility of performing exact linesearches at each step.

The article is organized as follows. In section 2, we describe our notation and we state thebasic assumptions employed in the sequel. In section 3, we establish sufficient convergenceconditions and we derive convergence results for the PR method under generalized convexityassumptions. In section 4, we define various basic inexact linesearch schemes that ensure globalconvergence and include the algorithms proposed in ref. [18] as special cases. In section 5, wedefine our trust region implementation of the PR method and in section 6, we define a globallyconvergent algorithm based on a modified PR iteration. In section 7, we report the numericalresults obtained for a set of large problems. Some concluding remarks are given in section 8.

2. Notation and assumptions

Given a point xk produced by an iteration of the form (2) we set fk := f (xk) and gk := g(xk).We indicate by {xk} the sequence of points generated by an algorithm. A subsequence of

{xk} will be denoted by {xk}K , where K is an infinite index set.We call forcing function [20] a function σ : R+ → R+ such that for every sequence {tk}

such that σ(tk) → 0 we have tk → 0. Therefore, a constant function σ(t) ≡ const for allt ∈ R+ will be a (special) forcing function that satisfies vacuously the preceding condition.We also note that if σa and σb are two given forcing functions, then the function σ(t) =min{σa(t), σb(t)} is another forcing function. Finally, we denote by ‖·‖ the Euclidean normon Rn.

We suppose that the following assumption is satisfied.

ASSUMPTION 1

(i) The level set L = {x ∈ Rn: f (x) ≤ f (x0)} is compact.(ii) For every given r > 0, there exists L > 0 such that

‖g(x) − g(y)‖ ≤ L‖x − y‖, (7)

for all x, y ∈ Br , where Br := {x ∈ Rn: ‖x‖ < r}.

In the sequel, we will (tacitly) assume that the radius r is sufficiently large to have that all thepoints of interest remain in Br , so that we can suppose that equation (7) is valid for any givenpair (x, y).

By Assumption 1, there exists a number � ≥ 1 such that

‖g(x)‖ ≤ �, for all x ∈ L. (8)

We consider conjugate gradient methods defined by the iterations (2) and (3), where βk is ascalar that satisfies

|βk| ≤ C|βPRk |, (9)

with C > 0.

Polak–Ribière conjugate gradient method 75

3. Convergence conditions for the PR method

In this section, we consider sufficient conditions for the global convergence of the PR method.In particular, in the next theorem we state conditions that ensure the existence of, at least, onelimit point of the sequence {xk}, which is a stationary point of f . In essence, we require thatsome usual condition on the linesearch is satisfied and moreover that the distance ‖xk+1 − xk‖goes to zero if the sequence {‖gk‖} is bounded away from zero.

THEOREM 1 Let {xk} be the sequence generated by Algorithms (2) and (3) and assume thatequation (9) holds. Let σi : R+ → R+, for i = 0, 1, 2, be forcing functions and suppose thatthe stepsize αk is computed in a way that the following conditions are satisfied:

(c1) xk ∈ L for all k;

(c2) limk→∞ [σ0(‖gk‖) σ1(|gTk dk|)]/‖dk‖ = 0.

(c3) limk→∞ σ0(‖gk‖) σ2(‖αkdk‖) = 0.

Then we have:

lim infk→∞ ‖gk‖ = 0

and, hence, there exists a limit point of {xk}, which is a stationary point of f .

Proof By (c1) we have that xk ∈ L for all k; as L is compact, the sequence {xk} is boundedand admits limit points in L. Reasoning by contradiction, suppose there exists a number ε > 0such that

‖gk‖ ≥ ε, for all k. (10)

By definition of forcing function, this implies that there exists a number δ > 0 such that

σ0(‖gk‖) ≥ δ, for all k. (11)

Then condition (c3) implies that

limk→∞ αk‖dk‖ = lim

k→∞ ‖xk+1 − xk‖ = 0. (12)

Now, by (c1), equation (10), and the assumptions made, we can write for all k:

‖dk‖ ≤ ‖gk‖ + C|βPRk |‖dk−1‖ (13)

≤ ‖gk‖ + C‖gk‖‖gk − gk−1‖

‖gk−1‖2‖dk−1‖ ≤ � + �C

Lαk−1‖dk−1‖ε2

‖dk−1‖.

Recalling equation (12) and letting q ∈ (0, 1), we have that there exists a sufficiently largeindex k1 such that:

�CL

ε2αk−1‖dk−1‖ ≤ q < 1, for all k ≥ k1, (14)

so that

‖dk‖ ≤ � + q‖dk−1‖, for all k ≥ k1. (15)

From the preceding inequality [see, e.g., ref. 21, Lemma 1, p. 44], we obtain immediately:

‖dk‖ ≤ �

1 − q+

(‖dk1‖ − �

1 − q

)qk−k1 , for all k ≥ k1, (16)

76 L. Grippo and S. Lucidi

and this proves that ‖dk‖ is bounded for all k. Therefore, from equation (12) we get:

limk→∞ αk‖dk‖2 = 0; (17)

moreover, as ‖dk‖ is bounded, by condition (c2) and equation (11) we obtain:

limk→∞ |gT

k dk| = 0. (18)

Recalling equation (9) and the PR formula, we can write:

‖gk‖2 ≤ |gTk dk| + ‖gk‖2CLαk−1‖dk−1‖2

‖gk−1‖2, (19)

so that by equations (10), (17), and (18) and the compactness of L, taking limits we have:

limk→∞ ‖gk‖ = 0,

which contradicts equation (10). This proves our assertion. �

We note that conditions (c1) and (c2) of the preceding theorem can be satisfied through anystandard line search technique that can guarantee a ‘sufficient decrease’ of f and a ‘sufficientdisplacement’from the current point along dk .Additional restrictions on αk must be introducedfor ensuring that the search directions are descent directions and for imposing satisfaction of(c3). The acceptability conditions on the stepsize that can be used in the general case forenforcing all these requirements will be discussed later. Here, we show that an immediateconsequence of Theorem 1 is that the convergence of the PR method with exact linesearchescan be established under weaker convexity conditions than those used in ref. [9], where strongconvexity of f was assumed. First, we recall from refs. [20,22] the following definitions.

DEFINITION 1 A function f is hemivariate on a set D ⊆ Rn if it is not a constant on any linesegment on D, that is, if there exist no distinct points x, y ∈ D such that (1 − t)x + ty ∈ Dand f ((1 − t)x + ty) = f (x) for all t ∈ [0, 1].DEFINITION 2 [22] A function f is strongly quasiconvex on a convex set C ⊆ Rn if, for allx, y ∈ C, with x = y we have:

max{f (x), f (y)} > f ((1 − t)x + ty),

for all t ∈ (0, 1).

DEFINITION 3 A function f is strictly pseudoconvex on a convex set C ⊆ Rn if, for all x, y ∈C, with x = y we have that ∇f (x)T(y − x) ≥ 0 implies f (y) > f (x).

Now we prove that convergence is achieved if the objective function is hemivariate on Land we choose αk as the first stationary point along dk .

THEOREM 2 Suppose that f is hemivariate on the level set L. Let {xk} be the sequencegenerated by Algorithms (2) and (3), where βk satisfies equation (9) and αk is the first stationarypoint of f (xk + αdk) along dk , that is, αk the smallest nonnegative number such that

g(xk + αkdk)Tdk = 0. (20)

Then we have:

lim infk→∞ ‖gk‖ = 0

and, hence, there exists a limit point of {xk}, which is a stationary point of f .

Polak–Ribière conjugate gradient method 77

Proof We prove that the conditions of Theorem 1 are satisfied, provided that we chooseσ0(t) ≡ 1, σ1(t) ≡ t , and σ2(t) ≡ t . Recalling equation (3) and using equation (20), it iseasily seen, by induction, that dk is a descent direction so that αk > 0 and

f (xk) > f (xk + αdk) ≥ f (xk+1), for all α ∈ (0, αk], (21)

whence it follows that xk ∈ L for all k so that condition (c1) holds. By equation (21), recallingthe compactness assumption on L and using result 14.1.3 of ref. [20], we have that:

limk→∞ ‖xk+1 − xk‖ = 0, (22)

which implies that condition (c3) of Theorem 1 is satisfied.Finally, by the orthogonality condition (20) and the Lipschitz continuity assumption we can

write:

|gTk dk|

‖dk‖ =∣∣∣∣∣g

Tk+1dk

‖dk‖ − (gk+1 − gk)Tdk

‖dk‖

∣∣∣∣∣ ≤ L‖xk+1 − xk‖,

and, hence, (c2) follows from equation (22); this completes the proof. �

We note that if we choose αk as the global minimizer of f along dk , condition (21) maynot be satisfied for a general hemivariate function; however, if we assume that f is alsoquasiconvex we have that equation (21) is satisfied even when αk is the global minimizer, butnot necessarily the first stationary point. As quasiconvex functions are hemivariate if and onlyif they are strongly quasiconvex [20], we get the following result.

THEOREM 3 Suppose that f is strongly quasiconvex on a convex set C ⊇ L. Let {xk} be thesequence generated by Algorithms (2) and (3), where βk satisfies equation (9) and αk is theglobal minimizer of f (xk + αdk) along dk . Then we have:

lim infk→∞ ‖gk‖ = 0

and, hence, there exists a limit point of {xk}, which is a stationary point of f .

Proof Noting that equation (21) is satisfied and that strongly quasiconvex functions arehemivariate, we can repeat the proof of Theorem 2 above. �

If we introduce the stronger assumption that f is strictly pseudoconvex, we obtain a conver-gence result for the PR method with exact linesearches, which can be viewed as an extensionof the result given in ref. [9] for strongly convex functions.

THEOREM 4 Suppose that f is strictly pseudoconvex on a convex set C ⊇ L. Let {xk} be thesequence generated by Algorithms (2) and (3), where βk satisfies equation (9) and αk is theglobal minimizer of f (xk + αdk) along dk . Then the sequence {xk} converges to the globalminimizer of f .

Proof As f is strictly pseudoconvex on C ⊇ L, we have that f is also strongly quasiconvexon C. Therefore, by Theorem 3, there exists a subsequence converging to a stationary pointx∗, which is the only global minimum point of f on Rn, because of the strict pseudoconvexityassumption. On the other hand, as f (xk+1) ≤ f (xk), the sequence {f (xk)} converges to theminimum value f (x∗) and, hence, there cannot exist limit points that are distinct from x∗; thisimplies that {xk} converges to x∗. �

78 L. Grippo and S. Lucidi

In the general (nonconvex) case, in order to establish the result that every limit point ofthe sequence {xk} is a stationary point of f, in addition to the requirements considered inTheorem 1, we must impose a condition that bounds the growth of ‖dk‖.

Before stating this result we recall from ref. [18] the following lemma.

LEMMA 1 Let {xk} be a sequence of points in L, let ε > 0, and let {xk}K be a subsequencesuch that

‖gk−1‖ ≥ ε, for all k ∈ K, (23)

and

limk∈K, k→∞ ‖xk − xk−1‖ = 0. (24)

Then, there exist a number ε∗ > 0 and a subsequence {xk}K∗ , with K∗ ⊆ K , such that:

‖gk‖ ≥ ε∗, for all k ∈ K∗. (25)

We can state the following theorem.

THEOREM 5 Let {xk} be the sequence generated by Algorithms (2) and (3) and assume thatequation (9) holds. Let σi : R+ → R+, for i = 0, 1, 2, be forcing functions and suppose thatthe stepsize αk is computed in a way that the following conditions are satisfied:

(c1) xk ∈ L for all k;

(c2) limk→∞ σ0(‖gk‖) [σ1(|gTk dk|)]/‖dk‖ = 0.

(c3) limk→∞ σ0(‖gk‖) σ2(‖αkdk‖) = 0.

(c4) lim supk→∞ σ0(‖gk‖) αk‖dk‖2 < ∞.

Then, we have limk→∞ ‖gk‖ = 0 and, hence, every limit point of {xk} is a stationary pointof f .

Proof As xk ∈ L for all k, the sequence {xk} is bounded and, hence, by the continuityassumption on g, we must only prove that

limk→∞ ‖g(xk)‖ = 0.

Reasoning by contradiction, suppose there exist a subsequence {xk}K1 and a number ε1 > 0such that

‖gk−2‖ ≥ ε1, for all k ∈ K1.

By definition of forcing function, this implies that there exists a number δ1 > 0 such that

σ0(‖gk−2‖) ≥ δ1, for all k ∈ K1. (26)

Then, by (c3) we have:

limk∈K1, k→∞ ‖xk−1 − xk−2‖ = 0,

and, hence, by Lemma 1 we can assert that there exist a subsequence {xk}K2 , with K2 ⊆ K1

and a number ε2 > 0 such that

‖gk−1‖ ≥ ε2, for all k ∈ K2. (27)

Polak–Ribière conjugate gradient method 79

Thus we can write, for some δ2 > 0:

σ0(‖gk−1‖) ≥ δ2, for all k ∈ K2, (28)

and, therefore, using again condition (c3), we get:

limk∈K2, k→∞ ‖xk − xk−1‖ = 0. (29)

Starting from equation (27) and reasoning as above, we can find a further subsequence {xk}K3 ,with K3 ⊆ K2 and numbers ε3 > 0 and δ3 > 0 such that

‖gk‖ ≥ ε3, for all k ∈ K3, (30)

σ0(‖gk‖) ≥ δ3, for all k ∈ K3. (31)

Now, by condition (c4) and equations (26) and (28) we can find a number M > 0 such that

αk−1‖dk−1‖2 ≤ M

δ2, αk−2‖dk−2‖2 ≤ M

δ1for all k ∈ K3, (32)

and, hence, recalling our assumptions and the inequalities established above, we obtain:

‖dk−1‖ ≤ ‖gk−1‖ + C‖gk−1‖‖gk−1 − gk−2‖

‖gk−2‖2‖dk−2‖ ≤ �

(1 + LCM

ε21δ1

)(33)

and

‖dk‖ ≤ ‖gk‖ + C‖gk‖‖gk − gk−1‖

‖gk−1‖2‖dk−1‖ ≤ �

(1 + LCM

ε22δ2

), (34)

so that both ‖dk−1‖ and ‖dk‖ are bounded for all k ∈ K3. Therefore, from equation (29) weget:

limk∈K3, k→∞ αk−1‖dk−1‖2 = 0; (35)

moreover, by condition (c2), equation (31), and the boundedness of {‖dk‖}, we obtain:

limk∈K3, k→∞ |gT

k dk| = 0, (36)

and, hence, we can write:

‖gk‖2 ≤ |gTk dk| + ‖gk‖2LC‖xk − xk−1‖‖dk−1‖

‖gk−1‖2, (37)

so that by equations (27), (35), and (36), taking limits for k ∈ K3, we have:

limk∈K3, k→∞ ‖gk‖ = 0,

which contradicts equation (30). This proves our assertion. �

80 L. Grippo and S. Lucidi

4. Convergent linesearch algorithms

In this section, we suppose that βk is the PR parameter and we refer to the following scheme.

ALGORITHM POLAK–RIBIÈRE (PR)

Data. x0 ∈ Rn.Initialization. Set d0 = −g0 and k = 0.While gk = 0 do

Compute αk using a linesearch procedure and set xk+1 = xk + αkdk;

Compute gk+1 and βk+1 = (gk+1 − gk)Tgk+1

‖ gk‖ 2;

Set dk+1 = −gk+1 + βk+1dk and k = k + 1.End while

The convergence conditions considered in the preceding section can be enforced by means ofan appropriate definition of the linesearch algorithm for the computation of αk .

We consider first conditions (c1) and (c2) in Theorems 1 and 5. As remarked earlier, theseconditions can be satisfied by introducing standard acceptability criteria for the stepsize. Inparticular, there are several well-known rules (such as Armijo’s method, Goldstein conditions,Wolfe conditions) [23], and also derivative-free rules [24,25] that ensure ‘sufficiently large’values of the steplength and guarantee a sufficient reduction of the objective function values.Because of the structure of the PR iteration, we also have that a sufficiently large step isobtained each time that at some tentative point z(α) = xk + αdk the number

g(z(α))T

(−g(z(α)) + g(z(α))T (g(z(α)) − gk)

‖gk‖2dk

)

is significantly different from −‖g(z(α))‖2.Taking this into account, in the next Lemma we collect most of the known conditions that

yield useful lower bounds on the stepsize. For later purposes, in order to avoid the repetitionof similar arguments, we enclose also conditions that are not strictly related to the algorithmsintroduced in this section.

We refer, in particular, to a tentative point xk + ηkdk with ηk > 0, which can be different,in general, from the accepted point

xk+1 = xk + αkdk, (38)

and we state conditions that imply a lower bound on ηk . The relationships between αk andηk will be made more explicit in correspondence to the specific algorithms considered in thesequel.

LEMMA 2 Let {xk} be the sequence defined by equation (38) and assume that gTk dk < 0 for

all k. Suppose also that

f (xk + αkdk) ≤ f (xk + ηkdk) < f (xk), for all k (39)

and assume that there exist index sets K1 and K2 ( possibly empty) such that:

(1) For all k ∈ K1, at least one of the following conditions (a1), (a2), (a3), (a4) is satisfied

(a1) ηk ≥ ρ min{φ, |gTk dk|}/‖dk‖2, where ρ > 0 and ∞ ≥ φ > 0;

(a2) f (xk + ληkdk) ≥ f (xk + ηkdk), where λ > 1;

Polak–Ribière conjugate gradient method 81

(a3) f (xk + ληkdk) ≥ f (xk) + γ̃1ληkgTk dk − γ̃2(ληk)

2‖dk‖2, where λ ≥ 1, 1 > γ̃1 ≥0, γ̃2 ≥ 0, and γ̃1 + γ̃2 > 0;

(a4) g(xk + ηkdk)Tdk ≥ µgT

k dk , where 1 > µ > 0.

(2) For all k ∈ K2, at least one of the following conditions (a5), (a6) is satisfied:

(a5) g(yk)T

(−g(yk) + g(yk)

T(g(yk) − gk)

‖gk‖2dk

)≥ −δ1‖g(yk)‖2,

(a6) g(yk)T

(−g(yk) + g(yk)

T(g(yk) − gk)

‖gk‖2dk

)≤ −δ2‖g(yk)‖2,

where yk = xk + νηkdk, g(yk) = 0, ν > 0, and δ2 > 1 > δ1 ≥ 0.

Then:

(i) We have xk ∈ L for all k;(ii) There exists ρ∗ > 0 and ∞ ≥ φ > 0 such that ηk ≥ ρ∗(min{φ, |gT

k dk|}/‖dk‖2), for allk ∈ K1;

(iii) There exists τ ∗ > 0 such that ηk ≥ τ ∗(‖gk‖2/‖dk‖2), for all k ∈ K2.

Proof Assertion (i) follows from equation (39). Now suppose that k ∈ K1; then assertion (ii)is obviously true if (a1) holds; therefore, let us assume first that (a2) is verified, that is:

f (xk + ληkdk) ≥ f (xk + ηkdk), (40)

for some λ > 1. As xk and xk + ηkdk are both in L, we can assume that also xk + ληkdk

belongs to the ball Br introduced in Assumption 1. Using the Theorem of the Mean, we canwrite

f (xk + ληkdk) = f (xk + ηkdk) + (λ − 1)ηkg(zk)Tdk − (λ − 1)ηkg

Tk dk + (λ − 1)ηkg

Tk dk,

where zk := xk + ξk(λ − 1)ηkdk, for some ξk ∈ (0, 1).Then, substituting the above expression into equation (40), dividing both members by

(λ − 1)ηk > 0, and rearranging, we obtain (g(zk) − gk)Tdk ≥ −gT

k dk, whence it follows,using the Lipschitz continuity assumption on g:

ηk ≥ 1

(λ − 1)L

|gTk dk|

‖dk‖2. (41)

Now assume that (a3) is satisfied, that is:

f (xk + ληkdk) ≥ f (xk) + γ̃1ληkgTk dk − γ̃2(ληk)

2‖dk‖2, (42)

for some λ ≥ 1. Using again the Theorem of the Mean, we can write:

f (xk + ληkdk) = f (xk) + ληkg(wk)Tdk − ληkg

Tk dk + ληkg

Tk dk,

where wk := xk + ζkηkλdk , for some ζk ∈ (0, 1). By substituting this expression intoequation (42), dividing both members by ληk > 0, and rearranging, we obtain:

γ̃2ληk‖dk‖2 + (g(wk) − gk)Tdk ≥ (γ̃1 − 1)gT

k dk,

whence, we get:

ηk ≥ (1 − γ̃1)

λ(L + γ̃2)

|gTk dk|

‖dk‖2. (43)

82 L. Grippo and S. Lucidi

Next assume that (a4) holds. In this case, we can write:

g(xk + ηkdk)Tdk ≥ µgT

k dk + gTk dk − gT

k dk,

whence, it follows: (g(xk + ηkdk) − gk)Tdk ≥ (1 − µ)|gT

k dk|, which implies

ηk ≥ (1 − µ)

L

|gTk dk|

‖dk‖2. (44)

Using (a1) and equations (41), (43), and (44), it can be concluded that if at least one of theconditions (a1), (a2), (a3), (a4) is satisfied there must exist a number

ρ∗ = min

{ρ,

1

(λ − 1)L,

(1 − γ̃1)

λ(L + γ̃2),(1 − µ)

L

}> 0

such that assertion (ii) is valid.Consider now the case k ∈ K2. Letting yk = xk + νηk , we can suppose that yk belongs to

Br . Assume first that (a5) holds; in this case we can write:

−(1 − δ1)‖g(yk)‖2 + ‖g(yk)‖2 ‖g(yk) − gk‖‖gk‖2

‖dk‖ ≥ 0;

similarly, if (a6) is satisfied we have

−(δ2 − 1)‖g(yk)‖2 + ‖g(yk)‖2 ‖g(yk) − gk‖‖gk‖2

‖dk‖ ≥ 0,

whence, recalling the Lipschitz continuity of g, dividing both members of each inequality by‖g(yk)‖2 and rearranging, we get (iii) with

τ ∗ = min

{(1 − δ1)

νL,(δ2 − 1)

νL

}.

Now, let us consider condition (c3) appearing in Theorem 1 and in Theorem 5. We can observethat the key point is that of ensuring that the steplength ‖xk+1 − xk‖ is driven to zero throughthe linesearch, at least when the gradient norm is bounded away from zero. This can be obtainedin two different ways:

(a) By imposing a suitable (adaptive) upper bound on αk .(b) By employing a ‘parabolic’ acceptance rule on the objective function values that also

forces the distance ‖xk+1 − xk‖ to zero.

In both cases, the descent condition gTk+1dk+1 < 0 must be imposed during the linesearch

and sufficiently large values for the stepsizes must be guaranteed using some of the conditionsconsidered in the preceding lemma.

An upper bound on αk and, possibly, additional restrictions on gTk+1dk+1 are required in

order to satisfy condition (c4) of Theorem 5.Some of the simplest possibilities for satisfying all these requirements are combined into

the single procedure described below, where we admit infinite values for some parameter, withan obvious meaning. In order to simplify our analysis, we refer to a conceptual Armijo-typemodel; however, alternative schemes (possibly more convenient from a computational pointof view) can be adopted, taking into account the results of Lemma 2.

Polak–Ribière conjugate gradient method 83

ALGORITHM LSA (modified Armijo linesearch)

Data. ∞ ≥ ρ2 > ρ1 > 0, ∞ ≥ φ > 0, 1 > γ1 ≥ 0, γ2 ≥ 0, γ1 + γ2 > 0, 1 > θ > 0,∞ ≥ δ2 > 1 > δ1 ≥ 0.Step 1. τk = min{φ, | gT

k dk|}/‖ dk‖2 and choose �k ∈ [ρ1τk, ρ2τk].Step 2.

Compute αk = max{θj�k, j = 0, 1, . . .} such that the vectorsxk+1 = xk + αkdk and dk+1 = −gk+1 + βk+1dk satisfy the conditions:

(i) fk+1 ≤ fk + γ1αkgTk dk − γ2α

2k‖dk‖2;

(ii) −δ2‖gk+1‖ 2 ≤ gTk+1dk+1 < −δ1‖gk+1‖2 (if gk+1 = 0).

We prove first that the preceding algorithm is well defined.

PROPOSITION 1 Suppose that gk = 0 for all k. Then for every k, there exists a finite value jk

of j such that the stepsize

αk = θjk�k

computed by Algorithm LSA satisfies conditions (i) and (ii) at Step 2.

Proof We start by proving that if gTk dk < 0, then the number α = �kθ

j satisfies conditions (i)and (ii) for all sufficiently large j . By contradiction, assume first that there exists an infiniteset J of j -values such that condition (i) is violated, so that for every j ∈ J we have:

f (y(j)) − fk

θj�k

> γ1gTk dk − γ2�kθ

j‖dk‖2,

where y(j) := xk + θj�kdk . Then, taking limits for j ∈ J , j → ∞, we have gTk dk ≥ γ1g

Tk dk ,

which contradicts the assumptions gTk dk < 0 and γ1 < 1.

Suppose now that there exists an infinite set, say it again J , such that for j ∈ J condition (ii)is violated and we have

g(y(j))T

(−g(y(j)) + g(y(j))T(g(y(j)) − gk)

‖gk‖2dk

)≥ −δ1‖g(y(j))‖2.

Then, taking limits for j ∈ J , j → ∞, we obtain that the inequality (1 − δ1)‖gk‖2 ≤ 0 mustbe valid, but this contradicts the assumptions that gk = 0 and 1 > δ1. Finally, suppose thatδ2 < ∞ and that for all j ∈ J we have:

g(y(j))T

(−g(y(j)) + g(y(j))T(g(y(j)) − gk)

‖gk‖2dk

)< −δ2‖g(y(j))‖2.

In this case, taking limits for j ∈ J , j → ∞, we obtain that the inequality (δ2 − 1)‖gk‖2 ≤ 0and this contradicts the assumptions gk = 0 and δ2 > 1.

Under the assumption gTk dk < 0, we can conclude that Step 2 is well defined, by taking jk

as the largest index for which both conditions (i) and (ii) are satisfied, and letting αk = θjk�k .Then the proof can be completed, by induction, noting that as dT

0 g0 < 0, we will havegT

k dk < 0 for all k. �

The convergence of Algorithm PR is proved in the following theorem, in correspondenceto some admissible choices of the parameters.

THEOREM 6 Suppose that the stepsize is computed by means of Algorithm LSA, with theconditions stated on the parameters, and let xk , for k = 0, 1, . . . be the points generated by

84 L. Grippo and S. Lucidi

Algorithm PR; then either there exists an index ν such that g(xν) = 0 and the algorithmterminates, or it produces an infinite sequence with the following properties, depending onthe choice of the parameters.

(a) Assume that

ρ2 < ∞ if γ2 = 0, (45)

then we have lim infk→∞ ‖gk‖ = 0 and hence there exists a limit point of {xk}, which is astationary point of f .

(b) Assume that:

ρ2 < ∞ and either φ < ∞ or δ2 < ∞, (46)

then we have limk→∞ ‖gk‖ = 0 and every limit point of {xk} is a stationary point of f .

Proof If the algorithm does not terminate at a stationary point, it will produce an infinitesequence {xk} such that gT

k dk < 0 for all k.We prove first the conditions of Theorem 1 are satisfied if the stepsize is computed through

Algorithm LSA and equation (45) is satisfied. As gTk dk < 0, condition (i) at Step 2 implies

fk − fk+1 ≥ γ1αk|gTk dk| + γ2α

2k‖dk‖2, (47)

so that xk ∈ L for all k and hence (c1) of Theorem 1 is valid.The instructions at Steps 1 and 2, and the assumption (45) imply also that:

αk‖dk‖2 ≤ �k‖dk‖2 ≤ ρ2min{φ, |gTk dk|} ≤ ρ2|gT

k dk|, where ρ2 < ∞ if γ2 = 0,

(48)so that by equations (47) and (48) we obtain:

fk − fk+1 ≥{(

γ1

ρ2+ γ2

)α2

k‖dk‖2 if ρ2 < ∞, γ2 ≥ 0

γ2α2k‖dk‖2 if ρ2 = ∞, γ2 > 0.

(49)

As {fk} is decreasing and bounded below, it admits a limit, and hence by equation (49) wehave that

limk→∞ αk‖dk‖ = 0. (50)

and thus condition (c3) of Theorem 1 holds with σ2(t) ≡ t and σ0(t) ≡ 1 for all t.In order to establish (c2) we shall make use of Lemma 2, where we assume ηk = αk .First, we observe that condition (i) implies that f (xk+1) < f (xk). Then we can distinguish

the two cases: αk = �k and αk < �k , where �k is the number defined at Step 2.In the first case, we have obviously

αk‖dk‖2 ≥ ρ1min{φ, |gTk dk|} (51)

and thus condition (a1) of Lemma 2 holds with ρ = ρ1.If αk < �k , this implies that αk/θ violates one of the conditions of Step 2. If αk/θ violates

condition (i) we have that (a3) of Lemma 2 holds with λ = 1/θ , γ̃1 = γ1, and γ̃2 = γ2; on theother hand, if αk/θ violates (ii), then at least one of the conditions (a5) or (a6) of Lemma 2

Polak–Ribière conjugate gradient method 85

holds with ν = 1/θ . Thus, by Lemma 2 we have either that

αk ≥ ρ∗ min{φ, |gTk dk|}

‖dk‖2

for some ρ∗ > 0 and ∞ ≥ φ > 0, or that

αk ≥ τ ∗ ‖gk‖2

‖dk‖2

for some τ ∗ > 0. Using equation (50), we have that (c2) of Theorem 1 holds with

σ1(t) ≡ min{1, ρ∗φ, ρ∗t} σ0(t) ≡ min{1, τ ∗t2}.

It can be concluded that the conditions of Theorem 1 are satisfied and thus, assertion (a)follows from Theorem 1.

In order to complete the proof we must show that in case (b) also condition (c4) of Theorem 5is valid. On the other hand, recalling equation (46) and the instructions at Steps 1 and 2, wehave

αk‖dk‖2 ≤ �k‖dk‖2 ≤ ρ2 min{φ, |gTk dk|} ≤ ρ2 min{φ, δ2‖gk‖2} ≤ ρ2 min{φ, δ2�

2},

where � is the bound for ‖gk‖ on L. By equation (46) we have that either φ < ∞ or δ2 < ∞,so that condition (c4) is satisfied for σ0(t) ≡ 1. Then, recalling the proof of assertion (a),we can conclude that there exist forcing functions such that all conditions of Theorem 5 aresatisfied and this establishes (b). �

The preceeding results show that there exist, in principle, various globally convergent imple-mentations of the unmodified PR method, which are simpler and less demanding of thosedefined in ref. [18]. A first simple model can be obtained by specializing Algorithm LSA intothe following restricted Armijo-type linesearch algorithm.

ALGORITHM LSA1 (restricted Armijo)

Data. ∞ > ρ2 > ρ1 > 0, 1 > γ > 0, 1 > θ > 0.Step 1. Set τk = | gT

k dk|/‖dk‖2 and choose �k ∈ [ρ1τk, ρ2τk].Step 2. Compute αk = max{θj�k, j = 0, 1, . . .} such that the vectors

xk+1 = xk + αkdk and dk+1 = −gk+1 + βk+1dk satisfy:

(i) fk+1 ≤ fk + γαk gTk dk;

(ii) gTk+1dk+1 < 0 (if gk+1 = 0).

It is easily seen that Algorithm LSA1 can be viewed as a special case of Algorithm LSA,where we choose

γ1 = γ, γ2 = 0, δ1 = 0, δ2 = ∞.

Then it follows from Theorem 6 that if Algorithm LSA1 is employed and the PR algorithmdoes not terminate, we have lim infk→∞ ‖gk‖ = 0. If we want to impose the stronger property

86 L. Grippo and S. Lucidi

‖gk‖ → 0, we can replace condition (ii) at Step 2 with the stronger condition

−δ‖gk+1‖2 ≤ gTk+1dk+1 < 0, with δ < ∞, (52)

which essentially imposes a bound on the component of dk+1 parallel to gk+1. Alternatively,we can keep unchanged condition (ii) at Step 2 and then set at Step 1:

τk = min{φ, |gTk dk|}

‖dk‖2, with ∞ > φ > 0.

A different model can be defined by replacing condition (i) of Step 2 with a ‘parabolic’acceptance rule on the objective function values. In this case, we get the following algorithmthat can be viewed as a simplified version of that introduced in ref. [18].

ALGORITHM LSA2 (parabolic search)

Data. ρ > 0, γ > 0, 1 > θ > 0.Step 1. Set τk = | gT

k dk|/‖dk‖2 and choose �k ≥ ρτk .Step 2. Compute αk = max{θj�k, j = 0, 1, . . .} such that the vectors

xk+1 = xk + αkdk and dk+1 = −gk+1 + βk+1dk satisfy the conditions:

(i) fk+1 ≤ fk − γα2k‖dk‖2;

(ii) gTk+1dk+1 < 0 (if gk+1 = 0).

Algorithm LSA2 can be obtained from Algorithm LSA by choosing

ρ2 = ∞, ρ1 = ρ, γ2 = γ, γ1 = 0, δ1 = 0, δ2 = ∞.

By Theorem 6, we have that if an infinite sequence is generated using Algorithm LSA2 withinthe PR algorithm, there holds the limit

lim infk→∞ ‖gk‖ = 0.

In comparison with Algorithm LSA1, we can note that an upper bound on the initial stepsizeis no more needed. However, if the stronger property

limk→∞ ‖gk‖ = 0

has to be enforced, we must introduce the same modifications described above in connectionwith Algorithm LSA1.

We note that the interpolation phase in the two preceding algorithms can be convenientlyperformed using a safeguarded cubic interpolation, as both the function and the gradientmust be evaluated at each tentative step. This would be compatible with the conditions ofLemma 2, provided that the reduction factor θk for the stepsize is uniformly bounded, that is ifwe impose the safeguards 0 < θl ≤ θk ≤ θu < 1, where θl and θu are given constant bounds.Another simple modification could be that of replacing the Armijo-type acceptability criterionwith Goldstein-type conditions; in this case the lower bound on the initial stepsize �k can beeliminated. The specific implementation used in computations will be illustrated in the sequelin more detail.

Alternative globally convergent line search techniques that ensure the propertylimk→∞ ‖gk‖ = 0 can also be derived from the results given in ref. [19]. In fact, it is shown

Polak–Ribière conjugate gradient method 87

there that an Armijo-type algorithm, starting from a unit stepsize, can produce a sufficientlysmall value of αk that satisfies the conditions:

fk+1 ≤ fk + γαkgTk dk,

gTk+1dk+1 ≤ −σ‖dk+1‖2,

and it is proved that these conditions imply that

limk→∞ ‖gk‖ = 0.

An inherent limitation of the algorithms considered in this section (at least from a theoreticalpoint of view) is that we cannot guarantee that accurate line searches can be performed, sincethe acceptance rules used in these algorithms may have the effect of rejecting a ‘good’stepsize(satisfying, for instance, Wolfe conditions), even when this is not strictly required for enforcingglobal convergence.

In order to overcome this difficulty, in ref. [18] an adaptive choice of the parameters wasintroduced and it was shown that the algorithms defined there eventually accept the optimalstepsize in a neighborhood of a stationary point, where the Hessian is positive definite. Thistechnique can easily be extended to the algorithms considered here. As an example, letting

ψk = min{1, ‖gk‖}τ ,

for some τ > 0, we can replace the parameters ρ1, ρ2, ρ, and γ in Algorithm LSA1 andAlgorithm LSA2 with the numbers ρ1ψk, ρ2/ψk, ρψk , and γψk , respectively, in a way that theacceptance conditions become less demanding as the gradient is converging to zero. However,this is still not entirely satisfactory, since in the quadratic case it does not ensure, in principle,that the algorithm can be identified right at the start with the PR algorithm, unless the parametersare appropriately chosen in relation to the optimal solution.

5. A trust region implementation of the PR method

We introduce here a different model, based on a ‘trust region’approach, in which the linesearchalgorithm is compatible with Wolfe conditions and even with exact linesearches, wheneverthe norm of dk does not exceed a suitable adaptive bound bk .

The theoretical motivation is that of defining an algorithm model with the followingproperties:

– Global convergence in the general nonquadratic case is guaranteed.– The PR formula for the computation of the search directions is unmodified.– The conjugate gradient method of Hestenes and Stiefel is reobtained in the quadratic case.

From a computational point of view, the objective is that of defining a computational scheme,in which the acceptance rules defined in the preceding sections can be relaxed, so that linesearches of any desired accuracy can be performed, at least when ‖dk‖ is not too large.

Before describing this new algorithm, we define formally a procedure based on Wolfeconditions, which also ensures satisfaction of a descent condition on dk+1 or terminates withan arbitrarily small value of ‖gk+1‖.

88 L. Grippo and S. Lucidi

ALGORITHM LSW (modified Wolfe conditions)

Data. 12 > γ > 0, µ > γ, εk > 0, µ∗ > 0, δ1 > 0.

Step 1. Compute ηk such that(i) f (xk + ηkdk) ≤ fk + γ ηkg

Tk dk ,

(ii) g(xk + ηkdk)Tdk ≥ µ gT

k dk (or: |g(xk + ηkdk)Tdk| ≤ µ |gT

k dk|)Step 2. Compute αk such that

either (case 1)the vectors xk+1 = xk + αkdk and dk+1 = −gk+1 + βk+1dk satisfy:

(a1) fk+1 ≤ f (xk + ηkdk),

(a2) gTk+1dk+1 < −δ1 ‖ gk+1‖2

(a3) gTk+1dk ≥ µ∗ gT

k dk (or: | gTk+1dk| ≤ µ∗|gT

k dk|),or (case 2)

the vector xk+1 = xk + αkdk satisfies ‖gk+1‖ ≤ εk .

It can be easily shown that the algorithm is well defined, under the assumption that the levelset L is compact and xk ∈ L.

In fact, under this assumption, it is well known that there exist a finite procedure for com-puting a point ηk where the Wolfe conditions are satisfied. Starting from this point we candefine a convergent minimization process that generates a sequence of stepsizes α(j), forj = 0, 1, . . . with α(0) = ηk and such that for j → ∞ we have

f (xk + α(j)dk) < f (xk + ηkdk) and g(xk + α(j)dk)Tdk → 0.

Recalling the PR formulas, we have that conditions (a1)–(a3) will be satisfied in a finite numberof steps, unless ‖g(xk + α(j)dk)‖ converges to zero. On the other hand, in the latter case, thealgorithm will terminate because of the test ‖g(xk + α(j)dk)‖ ≤ εk .

Then we can define the following scheme, in which we admit the possibility of using eitherAlgorithm LSW defined above or the Armijo-type Algorithm LSA described in the precedingsection.

We will refer to Algorithm LSW by using the notation LSW(x, d, ε) for indicating that thealgorithm computes a stepsize along d starting from the point x, with termination criteriondefined by ε in Case 2.

ALGORITHM PRTR (trust-region PR method)

Data. δ ∈ (0, 1), x0 ∈ Rn and a sequence {εk} such that εk → 0.Initialization. Set x̃0 = x0, d0 = −g0 and k = 0.While gk = 0 do

Step 1. Define a bound bk on the search directionStep 2.

If ‖dk‖ ≤ bk thenCompute αk and βk+1 using Algorithm LSW(x̃k, dk, εk).If termination occurs in Case 1 then

set xk+1 = x̃k + αkdk , x̃k+1 = xk+1 and dk+1 = −gk+1 + βk+1dk

Else (termination occurs in Case 2)set xk+1 = x̃k + αkdk , x̃k+1 = x̃k and dk+1 = dk

End if

Polak–Ribière conjugate gradient method 89

ElseCompute αk using Algorithm LSA and setxk+1 = x̃k + αkdk , x̃k+1 = xk+1 and dk+1 = −gk+1 + βk+1dk

End ifStep 3. Set k = k + 1

End While

We note that some of the technical complications in the preceding scheme and, in particular,the introduction of the variables x̃k are only motivated by the objective of stating a convergenceresults for an infinite sequence. In practice, we can replace the test on the gradient norm atStep 2 of Algorithm LSW by the termination test used in the code.

The convergence of Algorithm PRTR is proved in the following theorem under suitableassumptions on the bound bk specified at each step on the norm of ‖dk‖.

THEOREM 7 Let xk be the points generated by Algorithm PRTR and suppose that bk is definedin a way that the following condition holds:

(H) if lim infk→∞ ‖gk‖ > 0 then there exists B > 0 such that bk ≤ B for all k.

Suppose that we choose in Algorithm LSA

δ1 > 0 and (ρ2 < ∞ if γ2 = 0). (53)

Then either there exists an index ν such that g(xν) = 0 and the algorithm terminates, or itproduces an infinite sequence such that

lim infk→∞ ‖gk‖ = 0,

and hence there exists a limit point of {xk} that is a stationary point of f .

Proof If the algorithm does not terminate at a stationary point, it will produce an infinitesequence {xk} such that gk = 0 for all k. Reasoning by contradiction, we can assume that‖gk‖ ≥ ε for some ε > 0. Because of Assumption (H) we have that

bk ≤ B, for all k.

As the acceptance rules imply that xk ∈ L for all k, every subsequence will have limit pointsin L. Suppose that there exists an infinite subsequence such that Algorithm LSW is used atxk and termination occurs in Case 2; in this case, as εk converges to zero, the test at Step 2of Algorithm LSW implies that the corresponding subsequence of points xk+1 will convergetowards a stationary point and the assertion is proved. Therefore, we can assume that forsufficiently large values of k, when Algorithm LSW is used, termination occurs in Case 1.

Under this assumption, because of the instructions of Algorithm LSW and Algorithm LSA,we have that

gTk dk < −δ1‖gk‖2, for all k. (54)

90 L. Grippo and S. Lucidi

Suppose first that there exists an infinite subsequence {xk}K such that ‖dk‖ ≤ bk ≤ B andAlgorithm LSW is employed. By the instructions of Algorithm LSW, we have that there existsηk > 0 such that

fk+1 = f (xk + αkdk) ≤ f (xk + ηkdk) ≤ fk + γ ηkgTk dk, k ∈ K (55)

and

g(xk + ηkdk)Tdk ≥ µgT

k dk, k ∈ K. (56)

Using equations (54) and (55), we can write

fk − fk+1 ≥ γ ηk|gTk dk| > γδ1ηk‖gk‖2, k ∈ K. (57)

Moreover, the assumptions of Lemma 2 hold and equation (56) implies that condition (a3) ofthis Lemma is valid. This implies that for some ρ∗ > 0, we have

ηk ≥ ρ∗ |gTk dk|

‖dk‖2, k ∈ K. (58)

Therefore, by equations (54), (57), and (58) and the assumption ‖dk‖ ≤ M we get:

fk − fk+1 >γδ2

1ρ∗

B2‖gk‖4, k ∈ K. (59)

As {fk} is converging to a limit, we have limk→∞(fk − fk+1) = 0 and hence from equation (59)we obtain

limk∈K, k→∞ ‖gk‖ = 0,

which yields a contradiction.Assume now that K is a finite set; in this case we have that algorithm LSA will be used for all

sufficiently large k and hence the assertion follows directly from Theorem 6. This completesthe proof. �

In order to complete the description of the algorithm defined above, we must also specifysome rule for defining the bound bk used at each k. In particular, we require that condition (H)of Theorem 7 is satisfied and that exact linesearches are accepted in Algorithm PRTR whenf is quadratic. When f is quadratic and exact linesearches are employed, we obviously havethat β

(PR)k = β

(FR)k and hence we can write [see, e.g., ref. 14], for each k:

‖dk‖2 = ‖gk‖4k∑

j=0

‖gk−j‖−2. (60)

It follows that a reasonable bound for ‖dk‖ can be defined by assuming

bk = b‖gk‖2

min{k,n}∑

j=0

‖gk−j‖−2

1/2

, (61)

where b ≥ 1 is a given constant. It is easily seen that condition (H) is satisfied under theassumptions of section 2, so that the bound bk in Algorithm PRTR can be defined throughequation (61). Finally, in order to guarantee that the linear conjugate gradient method isreobtained in the quadratic case, we must also require that the initial stepsize in the linesearchperformed at Step 1 of Algorithm LSW is the global minimizer of f along dk . This can beachieved, for instance, by performing two function evaluations along dk and then using a(safeguarded) quadratic interpolation formula for computing the initial tentative stepsize.

Polak–Ribière conjugate gradient method 91

6. Trust region version of a modified PR method

In the general case, it follows from Powell’s counterexamples that the algorithm defined inthe preceding section may not permit to perform exact linesearches when the bound on ‖dk‖is violated. However, we can remove this restriction by modifying the PR method, on thebasis of the trust region approach proposed in this section. This yields an alternative techniquefor enforcing global convergence, which does not require resetting along the steepest descentdirection as in ref. [16], but still admits the possibility of performing exact linesearches.

This version of the PR method can be based on the following two modifications of the basicscheme:

(i) A distinction is introduced between the stepsize ηk used for determining βk+1 (which isidentified with the value that satisfies Wolfe conditions) and the actual stepsize αk usedfor computing the new point xk+1.

(ii) A rescaling of βk+1 is performed each time that an adaptive bound on the size of thecurrent search direction dk is violated.

We introduce these new features by modifying Algorithm LSW according to the followingscheme, where we assume that a suitable bound bk is given in input.

ALGORITHM LSW1

Data. 12 > γ > 0, µ > γ, εk > 0, µ∗ > 0, δ1 > 0, bk > 0.

Step 1. Compute ηk such that

(i) f (xk + ηkdk) ≤ fk + γ ηkgTk dk ,

(ii) g(xk + ηkdk)Tdk ≥ µ gT

k dk (or: |g(xk + ηkdk)Tdk| ≤ µ|gT

k dk|)Step 2. Compute

β∗k+1 = min

{1,

bk

‖dk‖}

g(xk + ηkdk)T(g(xk + ηkdk) − gk)

‖gk‖2. (62)

Step 3. Compute αk such thateither (case 1)

the vectors xk+1 = xk + αkdk and dk+1 = −gk+1 + β∗k+1dk satisfy:

(a1) fk+1 ≤ f (xk + ηkdk),

(a2) gTk+1dk+1 < −δ1‖gk+1‖2

(a3) gTk+1dk ≥ µ∗gT

k dk (or: |gTk+1dk| ≤ µ∗|gT

k dk|),or (case 2)

the vector xk+1 = xk + αk dk satisfies ‖gk+1‖ ≤ εk .

Then we can define the following algorithm, where we use again the notation LSW1(x, d, ε)

for indicating that Algorithm LSW1 computes a stepsize along d starting from the point x,with termination criterion defined by ε in Case 2.

ALGORITHM MPRTR (modified trust-region PR method)

Data. δ ∈ (0, 1), x0 ∈ Rn and a sequence {εk} such that εk → 0.Initialization. Set d0 = −g0, x̃0 = x0 and k = 0.While gk = 0 do

Step 1. Define a bound bk on the search direction

92 L. Grippo and S. Lucidi

Step 2. Compute β∗k+1 and αk through Algorithm LSW1(x̃k, dk, εk),

If termination occurs in Case 1 then setxk+1 = x̃k + αkdk , x̃k+1 = xk+1

dk+1 = −gk+1 + β∗k+1dk

else (termination occurs in Case 2) setxk+1 = x̃k + αkdk , x̃k+1 = x̃k , bk+1 = bk and dk+1 = dk

end ifStep 3. Set k = k + 1

End While

The convergence of this scheme is established in the following theorem, whose proof can bederived from the proof of the preceding theorem and the proof of Theorem 1.

THEOREM 8 Let xk be the points generated by Algorithm MPRTR and suppose that bk isdefined in a way that condition (H) of Theorem 7 is satisfied. Then either there exists anindex ν such that g(xν) = 0 and the algorithm terminates, or it produces an infinite sequencesuch that

lim infn→∞ ‖gk‖ = 0,

and hence there exists a limit point of {xk} that is a stationary point of f .

Proof Reasoning by contradiction, as in the proof of Theorem 7, we can assume that thealgorithm produces an infinite sequence of points xk ∈ L such that, for all k, we have that‖gk‖ ≥ ε for some ε > 0. By (a2) of Algorithm LSW1 we have

|gTk dk| > δ1‖gk‖2. (63)

Moreover, recalling the instructions of Algorithm LSW1 and using the same argumentsemployed in the proof of the preceding theorem, we can establish that there exists ηk > 0such that, for all k, we have:

fk − fk+1 ≥ γ ηk|gTk dk| > γδ1ηk‖gk‖2 (64)

and

ηk ≥ ρ∗ |gTk dk|

‖dk‖2, (65)

for some ρ∗ > 0. By equation (64) and the compactness assumptions on L we have that thesequence {f (xk)} has a limit, so that

limk→∞ ηk = 0. (66)

Now, taking into account the expression of dk , the definition of β∗k and the assumption on

bk ,we can write for all k:

‖dk+1‖ ≤ ‖gk+1‖ + |β∗k+1 |‖dk‖

≤ ‖gk+1‖ + min

{1,

bk

‖dk‖} ‖g(xk + ηkdk)‖‖g(xk + ηkdk) − gk‖

‖gk‖2‖dk‖ (67)

≤ � + �BLηk

ε2‖dk‖,

Polak–Ribière conjugate gradient method 93

where � is a bound on the gradient norm. Recalling equation (66) and letting q ∈ (0, 1), wehave that there exists a sufficiently large index k1 such that:

�BL

ε2ηk ≤ q < 1, for all k ≥ k1, (68)

so that

‖dk+1‖ ≤ � + q‖dk‖, for all k ≥ k1. (69)

From equation (69), recalling again a known inequality (ref. 21, Lemma 1, p. 44) we obtain:

‖dk+1‖ ≤ �

1 − q+

(‖dk1‖ − �

1 − q

)qk+1−k1 , for all k ≥ k1,

and this proves that dk is bounded for all k, that is ‖dk‖ ≤ M for some M > 0. Therefore,from this inequality, recalling equations (63)–(65) we get:

fk − fk+1 >γδ2

1ρ∗

M2‖gk‖4. (70)

As (fk − fk+1) → 0 from equation (70) we get a contradiction. �

7. Numerical results

Some of the algorithms introduced in this article have been tested on a set of large problems,already used in refs. [5,26], where appropriate references can be found. The main motivation ofthese experiments was that of verifying whether the strategies defined here for enforcing globalconvergence may have negative effects from a computational point of view on some of the bestknown problems, where the (unmodified) PR method shows a relatively good behavior. Morespecifically, two different codes have been tested: one (Algorithm PRTR) based on the trustregion implementation introduced in section 5 and the other (Algorithm MPRTR) based onthe modified PR iteration defined in section 6. In both cases, the condition (61) was employed,in correspondence to different values of b > 1, and in each linesearch the objective functionvalue was computed at least at two different points.

All the experiments were performed using double precision Compaq Fortran 90 codes on aWindows NT workstation, with the termination criterion

‖gk‖∞ ≤ 10−5(1 + |f (xk)|).Some of the choices that have been made are illustrated below in correspondence to eachalgorithm.

Algorithm PRTR: The algorithm has been implemented by associating the modified Wolfelinesearch (Algorithm LSW) with a parabolic linesearch based onAlgorithm LSA2, using con-dition (61) as switching rule. In Algorithm LSW, we set γ = 10−4, µ = µ∗ = 0.1, δ1 = 0.8,and the test at Step 2 on the gradient norm was replaced by the termination test defined above.The Wolfe linesearch was performed using a safeguarded cubic interpolation and employinga constant extrapolation factor equal to 2.

Algorithm LSA2 was implemented by employing modified Goldstein-type acceptabilityconditions (without specifying a lower bound on the initial stepsize, but including a tenta-tive expansion step when all acceptance conditions are satisfied); the interpolation and theextrapolation phases were carried out as in Algorithm LSW.

94 L. Grippo and S. Lucidi

Table 1. Results with Algorithm PRTR for b = 1.5.

Problem n ni nf f ‖g‖ nmod

Calculus of variations 2 200 686 1381 0.52180D + 02 0.20238D − 02 0Calculus of variations 3 200 2523 5053 −0.14720D + 00 0.37441D − 04 0Generalized Rosenbrock 500 1045 2100 0.10000D + 01 0.29356D − 04 0Calculus of variations 2 500 1900 3817 0.52180D + 02 0.20509D − 02 0Calculus of variations 3 500 6582 13,171 −0.14720D + 00 0.61570D − 04 0Variably dimensioned 500 0 19 0.19870D − 24 0.57633D − 08 0Linear min. surface 961 143 297 0.90000D + 03 0.72369D − 01 0Strictly convex 1 1000 2 16 0.10000D + 04 0.87888D − 01 0Strictly convex 2 1000 11 32 0.50052D + 05 0.52712D + 01 0Oren’s power 1000 106 337 0.12487D − 07 0.32025D − 04 69Generalized Rosenbrock 1000 2065 4140 0.10000D + 01 0.40163D − 04 0Penalty 1 1000 3 36 0.96771D + 00 0.15658D − 04 0Penalty 3 1000 188 396 0.17690D + 02 0.16258D − 02 0Ext. Powell singular 1000 47 132 0.12666D − 06 0.70344D − 04 19Tridiagonal 1 1000 263 530 0.62386D + 00 0.47621D − 04 0Boundary-value prob. 1000 39 81 0.28904D − 08 0.35714D − 04 0Broyden trid. nonlinear 1000 20 50 0.78130D − 11 0.27774D − 04 0Ext. Freud. and Roth 1000 7 39 0.12147D + 06 0.24113D + 01 1Wrong. extended Wood 1000 42 108 0.39379D + 01 0.80084D − 04 0Matrix square root ns = 1 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Matrix square root ns = 2 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Sp. matrix square root 1000 109 229 0.88914D − 09 0.50968D − 04 0Extended Rosenbrock 1000 23 75 0.30738D − 17 0.38721D − 08 7Extended Powell 1000 47 132 0.12666D − 06 0.70346D − 04 19Tridiagonal 2 1000 273 555 0.49252D − 12 0.28351D − 04 0Trigonometric 1000 40 96 0.22754D − 06 0.21198D − 04 0Variably dimensioned 1000 0 58 0.23088D − 17 0.55525D − 04 0Strictly convex 1 10,000 1 13 0.10000D + 05 0.97246D + 00 0Strictly convex 2 10,000 4 18 0.50041D + 07 0.11052D + 04 0Oren’s power 10,000 417 964 0.37180D − 07 0.10137D − 03 7Penalty 1 10,000 1 27 0.98918D + 01 0.85829D − 03 0Ext. Powell singular 10,000 47 132 0.12666D − 05 0.22242D − 03 19Tridiagonal 1 10,000 868 1767 0.62386D + 00 0.85535D − 04 0Extended ENGVL1 10,000 6 35 0.11099D + 05 0.12119D + 00 0Ext. Freud. and Roth 10,000 6 37 0.12165D + 07 0.37368D + 01 0Wrong. extended Wood 10,000 58 142 0.39379D + 01 0.38725D − 04 0Sp. matrix square root 10,000 172 355 0.54795D − 08 0.10657D − 03 0Extended Rosenbrock 10,000 23 75 0.30738D − 16 0.12245D − 07 7Extended Powell 10,000 47 132 0.12666D − 05 0.22247D − 03 19Tridiagonal 2 10,000 901 1842 0.58199D − 12 0.57112D − 04 0Trigonometric 10,000 4 17 0.49172D − 06 0.41844D − 03 0Penalty 1 2nd ver. 10,000 1 27 0.98918D + 01 0.85829D − 03 0

Algorithm MPRTR: We usedAlgorithm LSW1 with the choices defined above within a slightlymodified version of Algorithm MPRTR. In fact, when the bound on ‖dk‖ is violated, thetentative value of βk+1 must be rescaled by the factor bk/‖dk‖ as indicated; however, if ηk isthe stepsize satisfying Wolfe conditions and if g(xk + ηkdk)

Tdk ≥ 0, then a local minimizerhas been bracketed and an accurate linesearch would produce αk ≤ ηk , so that the convergenceproof is unaffected if we compute βk+1 in correspondence to αk . In this situation, we definedβ∗

k+1 during the line search as

β∗k+1 = min

{1,

bk

‖dk‖}

g(xk + αkdk)T(g(xk + αkdk) − gk)

‖gk‖2.

In each table, we show the number of iterations (ni), the number of function evaluations (nf),the objective function value (f ), the gradient norm at the solution (‖g‖), and the number oftimes that the bound on ‖dk‖ is violated (nmod), so that the standard PR algorithm is modified.

Polak–Ribière conjugate gradient method 95

In table 1, we report the results obtained by employing Algorithm PRTR, using weak Wolfeconditions, and letting b = 1.5 in condition (61).

By comparing, whenever possible, the results obtained with those given in references citedabove, we can note that the behavior of the algorithm is essentially similar to that of the standardPR method in most of cases. Only on a very limited number of cases, the number nmod isgreater than zero. This is probably one of the most significant result of our experimentation,since it implies that the bound defined in equation (61) is an effective way of monitoring thebehavior of the PR method, in spite of the fact that the test problems are not quadratic. Wenote also that when nmod > 0, the modification does not deteriorate the behavior of the PRmethod. If the value of b is reduced, say to b = 1.1 we did not observe significant changes inthe results, although nmod may increase considerably in some problems.

In table 2, we report the results obtained by running Algorithm MPRTR on the same testset and for the same value of b = 1.5. We obviously have a set of problems that are unaffected

Table 2. Results with Algorithm MPRTR for b = 1.5.

Problem n ni nf f ‖g‖ nmod

Calculus of variations 2 200 686 1381 0.52180D + 02 0.20238D − 02 0Calculus of variations 3 200 2523 5053 −0.14720D + 00 0.37441D − 04 0Generalized Rosenbrock 500 1045 2100 0.10000D + 01 0.29356D − 04 0Calculus of variations 2 500 1900 3817 0.52180D + 02 0.20509D − 02 0Calculus of variations 3 500 6582 13,171 −0.14720D + 00 0.61570D − 04 0Variably dimensioned 500 0 19 0.19870D − 24 0.57633D − 08 0Linear min. surface 961 143 297 0.90000D + 03 0.72369D − 01 0Strictly convex 1 1000 2 16 0.10000D + 04 0.87888D − 01 2Strictly convex 2 1000 11 32 0.50052D + 05 0.52712D + 01 0Oren’s power 1000 133 398 0.33911D − 07 0.47309D − 04 15Generalized Rosenbrock 1000 2065 4140 0.10000D + 01 0.40163D − 04 0Penalty 1 1000 3 36 0.96771D + 00 0.15658D − 04 0Penalty 3 1000 188 396 0.17690D + 02 0.16258D − 02 0Ext. Powell singular 1000 65 169 0.12569D − 05 0.70345D − 04 15Tridiagonal 1 1000 263 530 0.62386D + 00 0.47621D − 04 0Boundary-value prob. 1000 39 81 0.28904D − 08 0.35714D − 04 0Broyden trid. nonlinear 1000 20 50 0.78130D − 11 0.27774D − 04 0Ext. Freud. and Roth 1000 7 39 0.12147D + 06 0.19383D + 01 1Wrong. extended Wood 1000 42 108 0.39379D + 01 0.80084D − 04 2Matrix square root ns = 1 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Matrix square root ns = 2 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Sp. matrix square root 1000 109 229 0.88914D − 09 0.50968D − 04 0Extended Rosenbrock 1000 52 141 0.26998D − 09 0.21273D − 03 39Extended Powell 1000 65 169 0.12565D − 05 0.70320D − 04 15Tridiagonal 2 1000 273 555 0.49252D − 12 0.28351D − 04 0Trigonometric 1000 40 96 0.22754D − 06 0.21198D − 04 0Variably dimensioned 1000 0 58 0.23088D − 17 0.55525D − 04 0Strictly convex 1 10,000 1 13 0.10000D + 05 0.97246D + 00 1Strictly convex 2 10,000 4 18 0.50041D + 07 0.11052D + 04 0Oren’s power 10,000 413 954 0.35532D − 07 0.95077D − 04 7Penalty 1 10,000 1 27 0.98918D + 01 0.85829D − 03 0Ext. Powell singular 10,000 66 170 0.12629D − 04 0.23486D − 03 16Tridiagonal 1 10,000 868 1767 0.62386D + 00 0.85535D − 04 0Extended ENGVL1 10,000 6 35 0.11099D + 05 0.12119D + 00 0Ext. Freud. and Roth 10,000 6 37 0.12165D + 07 0.37368D + 01 0Wrong. extended Wood 10,000 58 142 0.39379D + 01 0.38725D − 04 2Sp. matrix square root 10,000 172 355 0.54795D − 08 0.10657D − 03 0Extended Rosenbrock 10,000 52 141 0.26987D − 08 0.67256D − 03 39Extended Powell 10,000 66 170 0.12641D − 04 0.23521D − 03 16Tridiagonal 2 10,000 901 1842 0.58199D − 12 0.57112D − 04 0Trigonometric 10,000 4 17 0.49172D − 06 0.41844D − 03 0Penalty 1 2nd ver. 10,000 1 27 0.98918D + 01 0.85829D − 03 0

96 L. Grippo and S. Lucidi

by the test on ‖dk‖ and in almost all cases we get the same results obtained with AlgorithmPRTR; the only exceptions are the problems: Oren, Extend Powell, and Extended Rosenbrock,where a (marginal) deterioration is observed. However, if b is changed to the value b = 1.1the behavior of several problems is appreciably affected (negatively in most of cases). Thisseems to indicate that the modified PR iteration may not be particularly advantageous, butfurther experimentation may be needed. The complete results are shown in table 3.

Table 3. Results with Algorithm MPRTR for b = 1.1.

Problem n ni nf f ‖g‖ nmod

Calculus of variations 2 200 686 1381 0.52180D + 02 0.20238D − 02 0Calculus of variations 3 200 4021 8049 −0.14720D + 00 0.40805D − 04 3470Generalized Rosenbrock 500 1045 2100 0.10000D + 01 0.29356D − 04 0Calculus of variations 2 500 1900 3817 0.52180D + 02 0.20509D − 02 0Calculus of variations 3 500 6231 12,469 −0.14720D + 00 0.63911D − 04 4899Variably dimensioned 500 0 19 0.19870D − 24 0.57633D − 08 0Linear min. surface 961 143 297 0.90000D + 03 0.72369D − 01 0Strictly convex 1 1000 2 16 0.10000D + 04 0.87888D − 01 2Strictly convex 2 1000 11 32 0.50052D + 05 0.52712D + 01 0Oren’s power 1000 147 419 0.29055D − 07 0.49253D − 04 45Generalized Rosenbrock 1000 2065 4140 0.10000D + 01 0.40163D − 04 0Penalty 1 1000 3 36 0.96771D + 00 0.15658D − 04 0Penalty 3 1000 188 396 0.17690D + 02 0.16258D − 02 0Ext. Powell singular 1000 83 210 0.13861D − 05 0.17669D − 03 42Tridiagonal 1 1000 263 530 0.62386D + 00 0.47621D − 04 0Boundary-value prob. 1000 39 81 0.28904D − 08 0.35714D − 04 0Broyden trid. nonlinear 1000 20 50 0.78130D − 11 0.27774D − 04 0Ext. Freud. and Roth 1000 8 41 0.12147D + 06 0.79779D + 00 2Wrong. extended Wood 1000 33 90 0.39379D + 01 0.32977D − 04 5Matrix square root ns = 1 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Matrix square root ns = 2 1000 1314 2638 0.49514D − 08 0.89296D − 04 0Sp. matrix square root 1000 109 229 0.88914D − 09 0.50968D − 04 0Extended Rosenbrock 1000 127 283 0.67676D − 09 0.23199D − 03 117Extended Powell 1000 80 204 0.20376D − 05 0.13538D − 03 38Tridiagonal 2 1000 273 555 0.49252D − 12 0.28351D − 04 0Trigonometric 1000 40 96 0.22754D − 06 0.21198D − 04 0Variably dimensioned 1000 0 58 0.23088D − 17 0.55525D − 04 0Strictly convex 1 10,000 1 13 0.10000D + 05 0.97246D + 00 1Strictly convex 2 10,000 4 18 0.50041D + 07 0.11052D + 04 0Oren’s power 10,000 467 1062 0.33815D − 07 0.75041D − 04 91Penalty 1 10,000 1 27 0.98918D + 01 0.85829D − 03 0Ext. Powell singular 10,000 79 204 0.34745D − 05 0.15067D − 03 40Tridiagonal 1 10,000 868 1767 0.62386D + 00 0.85535D − 04 0Extended ENGVL1 10,000 6 35 0.11099D + 05 0.12119D + 00 0Ext. Freud. and Roth 10,000 6 37 0.12165D + 07 0.35840D + 01 1Wrong. extended Wood 10,000 62 151 0.39379D + 01 0.65649D − 04 6Sp. matrix square root 10,000 172 355 0.54795D − 08 0.10657D − 03 0Extended Rosenbrock 10,000 127 283 0.67676D − 08 0.73363D − 03 117Extended Powell 10,000 78 201 0.74650D − 05 0.28024D − 03 39Tridiagonal 2 10,000 901 1842 0.58199D − 12 0.57112D − 04 0Trigonometric 10,000 4 17 0.49172D − 06 0.41844D − 03 0Penalty 1 2nd ver. 10,000 1 27 0.98918D + 01 0.85829D − 03 0

Polak–Ribière conjugate gradient method 97

8. Concluding remarks

The results obtained in this article show that there are different ways of implementing the PRmethod in a way that global convergence is guaranteed and the computational efficiency isnot deteriorated. Generally speaking, we can follow two basic approaches:

(a) Keep unmodified the PR iteration and employ modified linesearch rules, which are able toguarantee that the sequence {dk} remains bounded, whenever the gradient norm is boundedaway from zero.

(b) Modify the PR iteration in a way that the growth of {dk} is controlled, but permit exactlinesearches along the (modified) search directions.

Both strategies can be realized in a way that the linear conjugate gradient method is reob-tained in the quadratic cases. The method proposed by Powell [16] can be viewed as one(extreme) example of the latter strategy, although it is not easily comparable with the trustregion modification introduced here.

Numerical results do not indicate that the convergence results given here may be useful forimproving the behavior of the PR method in ‘nonpathological’cases. However, the trust regionapproach proposed in this article could be made the basis of alternative, more sophisticatedimplementations.

Acknowledgements

The authors are grateful to the referees for their useful comments and suggestions. Thiswork was supported by MIUR, FIRB Research Program Large-Scale Nonlinear Optimization,Rome, Italy.

References

[1] Hestenes, M.R. and Stiefel, E.L., 1952, Methods of conjugate gradients for solving linear systems. Journal ofReseach Natural Bureau of Standards Section 5, 49, 409–436.

[2] Dixon, L.C.W., 1973, Nonlinear optimization: A survey of the state of the art. In: D.J. Evans (Ed) Software forNumerical Mathematics (New York: Academic Press), pp. 193–216.

[3] Fletcher, R., 1987, Practical Methods of Optimization (New York: John Wiley and Sons).[4] Fletcher, R. and Reeves, C.M., 1964, Function minimization by conjugate gradients. Computer Journal, 7,

149–154.[5] Gilbert, J.C. and Nocedal, J., 1992, Global convergence of conjugate gradient methods for optimization. SIAM

Journal on Optimization, 2, 21–42.[6] Hestenes, M.R., 1980, Conjugate Direction Methods in Optimization (New York: Springer Verlag).[7] Khoda, K.M., Liu, Y. and Storey, C., 1992, Generalized Polak–Ribiere algorithm. Journal on Optimization

Theory and Applications, 75, 345–354.[8] Liu, Y. and Storey, C., 1991, Efficient generalized conjugate gradient algorithms, part 1: Theory. Journal on

Optimization Theory and Applications, 69, 129–137.[9] Polak, E. and Ribière, G., 1969, Note sur la convergence de méthodes de directions conjuguées. Revue Francaise

d’Informatique et de Recherche Opérationnelle, 16, 35–43.[10] Shanno, D.F., 1985, Globally convergent conjugate gradient algorithms. Mathematical Programming, 33, 61–67.[11] Shanno, D.F., 1978, Conjugate gradient methods with inexact searches. Mathematics of Operations Research,

3, 244–2567.[12] Shanno, D.F., 1978, On the convergence of a new conjugate gradient algorithm. SIAM Journal on Numerical

Analysis, 15, 1247–1257.[13] Zoutendijk, G., 1970, Nonlinear programming computational methods. In: J. Abadie (Ed) Integer and Nonlinear

Programming (Amsterdam: North-Holland), pp. 37–86.[14] Al-Baali, M., 1985, Descent property and global convergence of the Fletcher-Reeves method with inexact line

search. IMA Journal on Numerical Analysis, 5, 121–124.

98 L. Grippo and S. Lucidi

[15] Powell, M.J.D., 1984, Nonconvex minimization calculations and the conjugate gradient method. In: LectureNotes in Mathematics 1066 (Berlin: Spring-Verlag), pp. 122–141.

[16] Powell, M.J.D., 1986, Convergence properties of algorithms for nonlinear optimization. SIAM Review, 28,487–500.

[17] Dai, Y.H., Han, J.Y., Liu, G.H., Sun, D.F., Yin, H.X. and Yuan, Y., 1999, Convergence properties of nonlinearconjugate gradient methods. SIAM Journal on Optimization, 10, 345–358.

[18] Grippo, L. and Lucidi, S., 1997, A globally convergent version of the Polak–Ribiere conjugate gradient method.Mathematical Programming, 78, 375–391.

[19] Dai, Y.H., 2002, Conjugate gradient methods with Armijo-type line searches. Acta Mathematicae ApplicataeSinica (English Series), 18(1), 123–130.

[20] Ortega, J.M. and Rheinboldt, W.C., 1970, Iterative Solution of Nonlinear Equations in Several Variables(New York: Academic Press).

[21] Polijak, B.T., 1987, Introduction to Optimization. Optimization Software Inc.[22] Bazaraa, M.S., Sherali, H.D. and Shetty, C.M., 1993, Nonlinear Programming (New York: John Wiley and

Sons).[23] Bertsekas, D.P., 1999, Nonlinear Programming, (2nd edn) (Athena Scientific).[24] De Leone, R., Gaudioso, M. and Grippo, L., 1984, Stopping criteria for linesearch methods without derivatives.

Mathematical Programming, 30, 285–300.[25] Grippo, L., Lampariello, F. and Lucidi, S., 1988, Global convergence and stabilization of unconstrained

minimization methods without derivatives. Journal of Optimization Theory and Applications, 56, 385–406.[26] Raydan, M., 1997, The Barzilai and Borwein gradient method for the large scale unconstrained minimization

problem. SIAM Journal on Optimization, 7, 26–33.