a line search penalty method for nonlinear optimization

A Line Search Penalty Method for Nonlinear Optimization

Richard H. Byrd† Gabriel Lopez-Calva‡ Jorge Nocedal§

September 13, 2008

Abstract

Line search algorithms for nonlinear programming must include safeguards to enjoyglobal convergence properties. This paper describes an exact penalization approachthat extends the class of problems that can be solved with line search SQP methods.In the new algorithm, the penalty parameter is adjusted at every iteration to ensuresufficient progress in linear feasibility and to promote acceptance of the step. A trustregion is used to assist in the determination of the penalty parameter (but not in thestep computation). It is shown that the algorithm enjoys favorable global convergenceproperties. Numerical experiments illustrate the behavior of the algorithm on variousdifficult situations.

1 Introduction{S-intro}

Exact penalty methods have proved to be effective techniques for solving difficult nonlinearprograms. They overcome the difficulties posed by inconsistent constraint linearizations[13] and are successful in solving certain classes of problems in which standard constraintqualifications are not satisfied [2, 3, 21, 19]. Despite their appeal, it has proved difficult todesign penalty methods that perform well over a wide range of problems; the main difficultylies in choosing appropriate values of the penalty parameter. Various approaches proposedin the literature update the penalty parameter only if convergence to an undesirable pointappears to be taking place [10, 18, 17]. This can result in inefficient behavior and requiresheuristics to determine when to change the penalty parameter.

Recently, a new strategy for updating the penalty parameter has been proposed in thecontext of trust region methods [6, 8]. In that approach, the penalty parameter is selected

†Department of Computer Science, University of Colorado, Boulder, CO, 80309. This author was sup-ported by National Science Foundation grant CMMI 0728190.

‡Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston,IL, 60208. This author was supported by Department of Energy grant DE-FG02-87ER25047-A004.

§Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL,60208. This author was supported by National National Science Foundation grants CCR-0219438 and CCF-0514772.

1

at every iteration so that sufficient progress toward feasibility and optimality is guaranteed,to first order. This requires that an auxiliary subproblem (a linear program) be solved incertain cases. A particular implementation of this approach has been incorporated into theactive-set method of the knitro [9, 28] package and has proved to be effective in practice.The technique just mentioned relies on the fact that the optimization algorithm is of thetrust region kind.

In this paper we describe line search penalty methods for nonlinear programming. Thesemethods attack the general nonlinear programming problem {nlp}

minimize f(x) (1.1a)

subject to gi(x) ≥ 0, i ∈ I, (1.1b)

hi(x) = 0, i ∈ E (1.1c)

by performing an unconstrained minimization of the exact penalty function

φπ(x) = f(x) + π v(x), (1.2)

where v(x) is a measure of constraint violation and π is the penalty parameter. Unlike trustregion methods, which control the quality and length of the steps, line search methods canproduce very large and unproductive search directions in the neighborhood of points wherestandard regularity conditions are not satisfied. The penalty approach proposed in thispaper relaxes the constraints and ensures that steady progress toward the solution can bemade in certain difficult situations. To achieve this goal, the algorithm updates an auxiliarytrust region that helps determine the adequacy of the penalty parameter. The algorithm isnevertheless a pure line search method because the step computation does not depend onthe auxiliary trust region—only the choice of the penalty parameter depends on it.

The penalty approach proposed in this paper is applicable to a variety of line searchmethods; for concreteness we focus our discussion on a sequential quadratic programming.The new algorithm incurs an additional cost compared with classical line search methods.At those iterations in which the penalty parameter must be adjusted, an auxiliary linearprogram must be solved and the SQP step must be recomputed one or more times usinglarger values of the penalty parameter. The hope is that this extra work pays off by yieldingsavings in iterations and by improving the robustness of the iteration. An attractive featureof the new algorithm is that it treats all problems (regular or deficient) equally and doesnot need to resort to a feasibility restoration phase when progress is not achieved.

In the next section, we present the new line search SQP algorithm, giving particularattention to the dynamic strategy for updating the penalty parameter. The convergenceproperties of the algorithm are analyzed in Section 3, and numerical experiments are re-ported in Section 4.

2

2 A Line Search Penalty Method{S-algorithm}

We use the 1-norm of the constraint violation as the measure of infeasibility for the nonlinearprogram (1.1). Thus the penalty function is defined by (1.2) with

v(x) =∑

i∈I

[gi(x)]− +∑

i∈E

|hi(x)|, (2.1)

where [gi(x)]− = max{−gi(x), 0} is the negative part of gi(x).It is well known (see for example [20]) that when multipliers exist, stationary points of

the nonlinear problem (1.1) are also stationary points of the exact penalty function (1.2)for all sufficiently large values of π. Conversely, and more important from the standpointof practical penalty methods, any stationary point of the exact penalty function (1.2) thatis feasible for problem (1.1) is a stationary point of (1.1).

Exact penalty methods attempt to find stationary points of the nonlinear program (1.1)by minimizing the penalty function (1.2), and use the exogenous penalty parameter π asa control to promote feasibility. Two key questions arise: a) How can we find stationarypoints of the nonsmooth exact penalty function φπ, for a fixed value of π? b) How shouldwe update the penalty parameter π?

The first of these issues is well understood [13]. We can search for stationary pointsof the penalty function by taking steps based on a quadratic model of φπ. To define thismodel we first construct the following piecewise linear model of the measure of constraintviolation v at an iterate xk:

mk(d) =∑

i∈I

[∇gi(xk)T d + gi(xk)]

− +∑

i∈E

|∇hi(xk)T d + hi(xk)|. (2.2)

Next, we define a piecewise quadratic model of φπ at xk as

qkπ(d) = f(xk) + ∇f(xk)

T d + 12dT Wkd + π mk(d), (2.3)

where Wk is a symmetric positive definite matrix. We compute the search direction dk bysolving the problem

minimized

qkπ(d). (2.4)

In practice, we recast (2.4) as the smooth quadratic optimization program {qp-alg}

minimized,r,s,t

f(xk) + ∇f(xk)T d + 1

2dT Wkd + π∑

i∈E(ri + si) + π∑

i∈I ti (2.5a)

subject to ∇hi(xk)T d + hi(xk) = ri − si, i ∈ E , (2.5b)

∇gi(xk)T d + gi(xk) ≥ −ti, i ∈ I, (2.5c)

r, s, t ≥ 0. (2.5d)

Once the solution dk of problem (2.4) is found, a line search is performed in the directiondk to ensure that sufficient decrease in the exact penalty function (1.2) is achieved at thenew iterate.

3

The positive definiteness assumption on Wk is common in line search methods, whereWk is obtained through a quasi-Newton update or by adding (if necessary) a multiple ofthe identity to the Hessian of the Lagrangian of problem (1.1). Note that the quadraticsubproblem (2.5) is always feasible, and we show in this paper that the introduction of thesurplus variables r, s, t, together with the positive definiteness of Wk provide a regularizationeffect to the algorithm.

The second challenge in penalty methods concerns the selection of the penalty parame-ter. If π is too small, the penalty function (1.2) may be unbounded below, and the iterateswill diverge unless the value of π is corrected in time. If π is too large, the efficiency ofthe penalty approach may be impaired [8]. The goal of this paper is to propose a dynamicstrategy for updating the penalty parameter in a class of line search methods for nonlinearprogramming.

To describe this strategy, we denote the solution of (2.4) by d(πk) to stress its dependenceon the penalty parameter. At iteration k, we first solve (2.4) using the current penaltyparameter πk, to obtain d(πk). We are content if the linearized constraints are satisfied,i.e., if mk(d(πk)) = 0. In this case, the penalty parameter is not changed and we define thesearch direction as dk = d(πk). Thus, no regularization is needed in this case and the searchdirection dk coincides with that computed by a classical SQP method (in which r, s, t arealways zero).

On the other hand, if mk(d(πk)) > 0, we assess the adequacy of the current penaltyparameter by computing the lowest possible violation of the linearized constraints in aneighborhood of the current iterate. This is done by solving the problem

minimized

mk(d), subject to ‖d‖∞ ≤ ∆k, (2.6)

where ∆k > 0 is given. This problem is equivalent to the linear program {lp-alg}

minimized,r,s,t

∑

i∈E

(ri + si) +∑

i∈I

ti (2.7a)

subject to ∇hi(xk)T d + hi(xk) = ri − si, i ∈ E , (2.7b)

∇gi(xk)T d + gi(xk) ≥ −ti, i ∈ I, (2.7c)

‖d‖∞ ≤ ∆k, (2.7d)

r, s, t ≥ 0. (2.7e)

We denote the solution of this problem by dkLP

. A new penalty parameter π+ ≥ πk isnow determined such that the solution d(π+) of problem (2.4) yields an improvement inlinearized feasibility that is commensurate with that obtained by the step dk

LP, as measured

by the model mk. This strategy is specified more precisely in Algorithm I, which is the newline search penalty method.

It is worth emphasizing that Algorithm I is a line search method. The global convergenceresults established in the next section are based on the properties of line search methodsand the regularization effects of the penalty approach. The trust region constraint in (2.6)plays an indirect role, since it only influences the choice of the penalty parameter.

4

Algorithm I: Line Search Penalty SQP Method

Initial data: x1, π1 > 0, ρ > 0, ε1 ∈ (0, 1], ε2 ∈ (0, ε1), τ ∈ (0, 1), η ∈ (0, 1), and0 < ∆min ≤ ∆1 ≤ ∆max.

For k = 1, 2, ...

1. Find a search direction d(πk) by solving the subproblem (2.4) with π = πk. Ifd(πk) = 0 and v(xk) = 0, STOP: xk is a KKT point of problem (1.1).

2. If mk(d(πk)) = 0, set π+ = πk and go to Step 6.

3. Solve the linear programming subproblem (2.6) to get dkLP

. If

0 < mk(0) = mk(dkLP

), (2.8)

STOP: xk is an infeasible stationary point of the penalty function (1.2).{update-nu}

4. Update the penalty parameter:

(a) If mk(dkLP

) = 0, find π+ ≥ πk + ρ and a corresponding vector d(π+) thatsolves (2.4), such that

mk(d(π+)) = 0. (2.9)

(b) Else set π+ = πk. If the following inequality does not hold

mk(0) − mk(d(π+)) ≥ ε1[mk(0) − mk(dk

LP)], (2.10)

then find π+ ≥ πk + ρ and a corresponding vector d(π+), such that (2.10) issatisfied. {update-nu2}

5. If the following inequality does not hold

qkπ+

(0) − qkπ+

(d(π+)) ≥ ε2π+[mk(0) − mk(dkLP

)], (2.11)

then increase π+ (by at least ρ) as necessary, until the solution d(π+) of (2.4)satisfies (2.11).

{line-search}

6. Set dk = d(π+), and let 0 < αk ≤ 1 be the first member of the sequence{1, τ, τ2, . . .} such that

φπ+(xk) − φπ+

(xk + αkdk) ≥ ηαk[qkπ+

(0) − qkπ+

(dk)]. (2.12)

7. Set ∆k+1 ∈ [∆min, ∆max].

8. Let πk+1 = π+ and xk+1 = xk + αkdk.

5

The overall design of Algorithm I is based on the following three updating guidelines,which are an adaptation of the rules given in [8] to the line search setting:

{cond_steer_1:text}

1. If it is possible to satisfy the linearized constraints in a neighborhood of the currentiterate, we compute such a step. This is achieved by enforcing condition (2.9). Inother words, if a classical SQP step exists and is not too long, we would like to use it.

{cond_steer_2:text}

2. If the linearized constraints are locally infeasible, we will be content with taking a stepthat achieves at least a fraction of the best possible local feasibility improvement. Weimpose this requirement through condition (2.10). Note that (2.10) could be satisfiedwith the current penalty parameter πk, in which case no extra quadratic subproblems(2.4) need to be solved.

{cond_steer_3:text}

3. Not only feasibility but also improvement in the penalty function has to be commen-surate with the improvement in feasibility obtained with dk

LP. This is guaranteed by

condition (2.11). We note that it is not necessary to re-solve problem (2.4) whencondition (2.11) is violated because an appropriate value of the penalty parameter isreadily computed. First, if mk(0) − mk(dk

LP) = 0 then (2.11) is satisfied for any π

because d(π) is the minimizer of qπk. Otherwise, let π+ be the value of the penalty at

the beginning of Step 5 of Algorithm I, let d+ = d(π+) and define

π =12dT

+Wkd+ + ∇f(xk)T d+

(1 − ε2)mk(0) − mk(d+) + ε2mk(dkLP

)+ ρ. (2.13)

(Note that the term in the denominator is positive because (2.10) also holds if Step4(a) is executed.) Then, by writing d = d(π) we have from (2.13) that

ε2π[mk(0) − mk(dkLP

)] ≤ −∇fTk d+ − 1

2dT+Wkd+ + π[mk(0) − mk(d+)]

= qkπ(0) − qk

π(d+)

≤ qkπ(0) − qk

π(d),

where the last inequality follows from the fact that d is the minimizer of qkπ. Therefore

condition (2.11) is satisfied if the penalty parameter is given by (2.13).

When the penalty parameter is updated, main cost is in the repeated solution of thequadratic program (2.4) for different values of π; the linear program (2.6) is solved onlyonce. The expectation, however, is that the penalty parameter will be updated only a fewtimes during the solution of a problem, and that the savings in total iterations will overcomethe extra cost incurred in some iterations.

The choice of penalty parameter π+ in Algorithm I ensures that dk is a descent direction(see Theorem 3.5 (b)). Note that in the right-hand side of (2.12) we use the decrease in thepiecewise quadratic model qk

π+, instead of the directional derivative of the penalty function.

This accounts for possible kinks in the merit function near the current iterate that couldbe overlooked by a standard Armijo line search and result in jamming.

The constraint ‖d‖∞ ≤ ∆k in problem (2.6) is not a trust region in the usual sense, andthe value of ∆k is not critical to the performance of the algorithm. The function mk(d) is

6

always bounded below and the radius ∆k is used only to ensure that ||dkLP|| is of reasonable

size in case there is an unbounded ray of minimizers of mk(d). In fact, the radius ∆k couldbe kept constant and the convergence properties of Algorithm I would not be affected. Inpractice, however, it may be advantageous to choose ∆k based on local information of theproblem, as discussed in Section 4.

3 Convergence analysis{S-convergence}

In this section we study the global convergence properties of Algorithm I. We make thefollowing assumptions about the sequence of iterates {xk} and the matrices Wk generatedby the algorithm.

Assumptions I.

A1. The functions f, gi, i ∈ I, and hi, i ∈ E , are twice differentiable with bounded deriva-tives over a bounded convex set that contains the sequence {xk}.

A2. The matrices Wk are uniformly positive definite and bounded above, i.e., there existvalues 0 < µmin < µmax such that

µmin‖p‖2 ≤ pT Wkp ≤ µmax‖p‖

2, (3.1)

for an p ∈ Rn.

We will use the following notation and definitions.

Notation. We denote the directional derivative of a function f at x in the direction p byDf(x; p). A point x is said to be a stationary point of the penalty function if Dφπ(x; p) ≥ 0for all directions p. A point x is called an infeasible stationary point for problem (1.1) ifv(x) > 0 and Dv(x; p) ≥ 0 for all p. We say that problem (1.1) is locally infeasible if thereis an infeasible stationary point for it.

The first lemma provides useful relationships between the directional derivatives of thefunctions φπ(x) and v(x) and their local models, qπ(x) and m(x). The model (2.2) definedat a point x∗ will be denoted by m∗, and similarly for qπ.

{Dfun2Dmod}Lemma 3.1 Given a point x∗ the directional derivatives of φπ and v satisfy

Dv(x∗; p) = Dm∗(0; p) (3.2)

andDφπ(x∗; p) = Dq∗π(0; p). (3.3)

7

Proof. Given x∗ and vector d ∈ Rn, let us define the sets

G∗−(d) = {i ∈ I : ∇gi(x

∗)T d + gi(x∗) < 0}, (3.4)

G∗0(d) = {i ∈ I : ∇gi(x

∗)T d + gi(x∗) = 0},

G∗+(d) = {i ∈ I : ∇gi(x

∗)T d + gi(x∗) > 0},

which determine a partition of I. Similarly, we can define a partition H∗−(d),H∗

0(d) andH∗

+(d) of E induced by the value of ∇hi(x∗)T d + hi(x

∗).The directional derivative of m∗(·) at d in the direction p is given by

Dm∗(d; p) =∑

i∈G∗0(d)

[ ∇gi(x∗)T p ]− −

∑

i∈G∗−(d)

∇gi(x∗)T p (3.5)

+∑

i∈H∗+

(d)

∇hi(x∗)T p +

∑

i∈H∗0(d)

| ∇hi(x∗)T p | −

∑

i∈H∗−(d)

∇hi(x∗)T p.

On the other hand, we have that

Dv(x∗; p) =∑

i∈G∗0(0)

[ ∇gi(x∗)T p ]− −

∑

i∈G∗−(0)

∇gi(x∗)T p

+∑

i∈H∗+

(0)

∇hi(x∗)T p +

∑

i∈H∗0(0)

| ∇hi(x∗)T p | −

∑

i∈H∗−(0)

∇hi(x∗)T p.

The equality (3.2) follows by comparing this expression with (3.5).Given a direction p, we have that Dφπ(x∗; p) = ∇f(x∗)T p + Dv(x∗; p). Also, for any

d we have that Dq∗π(d; p) = (∇f(x∗) + W ∗d)T p + Dm∗(d; p). By evaluating Dq∗π(d; p) atd = 0 and using (3.2) we obtain (3.3). �

The next result is well known (see e.g. [4, 20]) but we prove it for the sake of complete-ness. Note that if W ∗ is positive definite, the piecewise quadratic model (2.3) defined ata point x∗ is strictly convex. Therefore the solution of problem (2.4), which we denote byd(π), is unique. The model m(d) is, by definition, convex (but not strictly convex).

{thm:stationarity}Theorem 3.2 Suppose that the functions f, gi and hi have Lipschitz continuous derivativesin a neighborhood of a point x∗, and let W ∗ be a symmetric positive definite matrix. Thefollowing three statements are true:

(a) x∗ is a stationary point of the penalty function φπ(x) if and only if d(π) = 0 solvesproblem (2.4).

(b) If x∗ is a stationary point of φπ(x) and v(x∗) = 0, then x∗ is a KKT point of (1.1).(c) x∗ is a stationary point of the infeasibility measure v(x) if and only if, for any ∆ > 0,

any solution dLP of the linear feasibility problem (2.6) satisfies

m∗(dLP) = m∗(0). (3.6)

8

Proof. (a) By definition, d(π) = 0 is the minimizer of q∗π(d) if and only if Dq∗π(0; p) ≥ 0 forany direction p. The result follows from (3.3).

(b) If x∗ is stationary for φπ(x), then from Part (a) we know that d(π) = 0 solves (2.4).Since v(x∗) = 0, it is easy to see that d∗ = d(π) = 0, r∗ = 0, s∗ = 0, t∗ = 0 is the minimizerof the equivalent problem (2.5), which has only linear constraints. A standard first-orderoptimality result (see, e.g., [25], Theorem 12.1 and Lemma 12.8 for linear active constraints)guarantees that there exist multipliers λ∗

h, λ∗g ≥ 0, λ∗

r ≥ 0, λ∗s ≥ 0, λ∗

t ≥ 0 that satisfy theKKT conditions for problem (2.5). From here, it is straightforward that x∗ and the samemultipliers λ∗

h, λ∗g satisfy the KKT conditions for (1.1).

(c) Given ∆ > 0, let dLP be a solution of (2.6). Since d = 0 obviously satisfies ‖d‖∞ ≤ ∆,we have that m∗(dLP) ≤ m∗(0). Also, from (3.2) we have that x∗ is stationary for v(x) ifand only if 0 is stationary for m∗(d), which holds (by convexity of m∗) if and only if 0 isan unconstrained global minimizer of m∗(d). Therefore, m∗(0) ≤ m∗(d) for any d, and inparticular for d = dLP. We conclude that (3.6) holds. �

This theorem justifies the stopping tests in Algorithm I. If Algorithm I stops at Step 1,Theorem 3.2 (a) and (b) imply that the current iterate xk is a KKT point of the nonlinearprogram (1.1). If the algorithm stops at Step 3, then Theorem 3.2 (c) and v(xk) > 0 implythat xk is an infeasible stationary point. If neither stop test is satisfied, we need to showthat that Algorithm I will generate a new iterate xk+1 and that it is always possible to meetthe requirements in Steps 4 and 6. This is done in Theorem 3.5; first we need to establishtwo auxiliary results.

{ballBk}Lemma 3.3 Suppose that Assumptions A1 and A2 hold. At any given iterate xk, and forall π > 0, the minimizers d(π) of qk

π(d) are contained in a compact ball

Bk = {d : ‖d‖ ≤ rk} with rk = κ1 + κ2‖dk‖, (3.7)

for some global constants κ1 and κ2 and where dk denotes the minimum norm minimizerof mk(d).

Proof. Let dk be the minimum norm minimizer of mk(d); it is well defined because mk(d)is a piece-wise linear convex function that is bounded below. If ‖d‖ is large enough thatµmin‖d‖ ≥ 8‖∇f(xk)‖ and µmin‖d‖

2 ≥ 2µmax‖d‖2, then

−∇f(xk)T d + ∇f(xk)

T d + 12 dT Wkd ≤ ‖∇fk‖‖d‖ + ‖∇fk‖‖d‖ + µmax

2 ‖d‖2

≤ µmin

8 ‖d‖2 + µmin

8

[

µmin

2µmax

]

12‖d‖2 + µmin

4 ‖d‖2

< µmin

2 ‖d‖2

≤ 12dT Wd,

and therefore

f(xk) + ∇f(xk)T d + 1

2dT Wkd > f(xk) + ∇f(xk)T d + 1

2 dT Wkd.

9

Thus, for all‖d‖ > max{8‖∇f(xk)‖/µmin, 2

√

µmax/µmin‖d‖} (3.8)

and all π ≥ 0 we have

qkπ(d) = f(xk) + ∇f(xk)

T d + 12dT Wkd + πmk(d)

> f(xk) + ∇f(xk)T d + 1

2 dT Wkd + πmk(d) = qkπ(d).

Therefore, no minimizer d(π) can be larger in norm than the right hand side of (3.8). Toestablish (3.7), we let κ1 be an upper bound for ‖∇fk‖ and define κ1 = 8κ1/µmin andκ2 = 2

√

µmax/µmin. Define rk as in (3.7). �

The following result shows that by choosing π sufficiently large, the direction d(π) canattain any achievable level of linear feasibility.

{lem:dbar}Lemma 3.4 Given an arbitrary vector d, for all π sufficiently large the minimizer d(π) ofqkπ(d) satisfies

mk(d(π)) ≤ mk(d). (3.9)

Proof. Chose an arbitray vector d. We show that any vector d such that mk(d) > mk(d)cannot be a minimizer of qk

π(d) for sufficiently large π.For a given d, any solution of the problem

minimizep

Dmk(d; p) (3.10)

subject to ‖p‖ ≤ 1,

is called a steepest descent direction of mk at d. Problem (3.10) is convex program, so anysolution is a global minimizer. By (3.5) this problem is equivalent to {steep-desc-lp}

minimizep,r,s,t

∑

i∈Gk

0(d)

ti −∑

i∈Gk−(d)

∇gi(x)T p

∑

i∈Hk+

(d)

∇hi(x)T p +∑

i∈Hk

0(d)

(ri + si) −∑

i∈Hk−(d)

∇hi(x)T p. (3.11a)

subject to ∇gi(x)T p ≥ −ti, i ∈ Gk0 (d) (3.11b)

∇hi(x)T p = ri − si, i ∈ Hk0(d) (3.11c)

‖p‖ ≤ 1, (3.11d)

r, s, t ≥ 0. (3.11e)

We observe, as before, that (3.5) only depends on d through the partition Gk−(d), Gk

0 (d),Gk

+(d) of I and Hk−(d), Gk

0 (d), Hk+(d) of E , defined in (3.4). As we vary d ∈ R

n, there is onlya finite number of such partitions, hence only a finite number of different problems (3.10)and a corresponding finite number of global optimal values of the objective function.

Let σ1, . . . , σM be all the different optimal values of problem (3.10) that are possible aswe vary d ∈ R

n, and let σk = |maxj=1,...,M{σj : σj < 0}|. It follows that, if d is such that

10

there is a descent direction for mk at d and pd denotes a steepest descent direction of mk

at d, then

Dmk(d; pd) ≤ −σk. (3.12)

In particular, if d is any vector such that m(d) > m(d), by convexity of mk(d), p = d− d isa descent direction for mk at d, and therefore the bound (3.12) holds at d.

By Lemma 3.3, we know that there is a compact ball Bk containing the minimizers ofqkπ(d), for all π. Since we are proving a statement about the minimizers of qk

π(d), we restrictnow to vectors d ∈ Bk. From Lemma 3.3, again, it follows that there is a constant βk suchthat ‖∇f(xk)+Wkd‖ ≤ βk for all d ∈ Bk, because ‖∇f(xk)+Wkd‖ is a continuous functionof d and Bk is compact.

Choose π sufficiently large so that π > βk/σk. Pick any d ∈ Bk such that mk(d) > mk(d)and let pd be a steepest descent direction of mk at d. Then,

Dqkπ(d; pd) = (∇f(xk) + Wkd)T pd + πDmk(d; pd)

≤ βk − πσk < 0.

Hence, for π > βk/σk, if d is such that mk(d) > mk(d), then either d 6∈ Bk and it cannotbe a minimizer of qk

π, by Lemma 3.3, or d ∈ Bk and then any steepest descent directionpd of mk at d is also a direction of strict descent for qk

π at d, which implies that d cannotbe a minimizer of qk

π. Therefore, the minimizer d(π) of qkπ(d) has to satisfy (3.9) for all

sufficiently large π. �

We can now prove that Algorithm I is well defined.

{thm:well-defined}Proposition 3.5 Suppose that xk is neither a KKT point of nonlinear program (1.1) nora stationary point of the infeasibility measure v(x). Then,

a) If Algorithm I executes Step 4, it is always possible to find a value π+, and a corre-sponding vector d(π+), such that condition (2.9) holds if mk(dLP = 0), or condition (2.10)holds if mk(dLP) > 0.

b) If Algorithm I executes Step 5, it is always possible to find a value π+, and a corre-sponding vector d(π+), such that condition (2.11) holds.

c) At Step 6, dk is a descent direction for φπ+(x) at xk. Therefore, there exists αk suchthat condition (2.12) is satisfied.

Proof. (a) If mk(dkLP

) = 0, we apply Lemma 3.4 with d = dkLP

and deduce that condition(2.9) is satisfied for sufficiently large π. Next, if 0 < mk(dk

LP) and given ε1 ∈ (0, 1], we can

apply the same lemma with d = ε1dkLP

. Using the convexity of mk(d), we have that forsufficiently large π

mk(d(π)) ≤ mk(ε1dkLP

)

≤ (1 − ε1)mk(0) + ε1m

k(dkLP

),

which is equivalent to (2.10).

11

b) We have already shown in Section 2 that (2.11) is satisfied if the penalty parameteris chosen by (2.13).

(c) If xk is neither stationary for φπknor for v(x), then at Step 6 we must have dk 6=

0. This is a consequence of the logic of Algorithm I and of Theorem 3.2, parts (a) and(c). Recall that dk is defined to be the minimizer of qk

πkand hence qk

π+(dk) ≤ qkπ+(0).

Furthermore, since qkπ+(d) is strictly convex and dk 6= 0, we have that qk

π+(dk) < qkπ+(0) and

thus dk is a descent direction for qkπ+ at 0. By (3.3), we have

Dφπ+(xk; dk) = Dqkπ+(0; dk) < 0,

and therefore dk is also a descent direction for φπ+(x) at xk. Since the constant η is chosenin (0,1), it follows that φπ+(xk +αdk) < φπ+(xk)+αηDφπ+(xk; dk) for all sufficiently smallα > 0, or

φπ+(xk) − φπ+(xk + αdk) > −αηDφπ+(xk; dk).

From (3.3) and the convexity of qkπ+(d), we have that

−αηDφπ+(xk; dk) = −αηDqkπ+(0; dk) ≥ αη[qk

π+(0) − qkπ+(dk)].

We conclude that there always exists a sufficiently small steplength αk that satisfies condi-tion (2.12). �

We now establish the first global convergence result. It gives conditions under whichAlgorithm I identifies stationary points of the penalty function.

{thm:conv-fixed-nu}Theorem 3.6 Suppose that Algorithm I generates an infinite sequence of iterates {xk} andthat Assumptions A1 and A2 hold. Suppose also that {πk} is bounded, so that πk = π forall k large. Then any accumulation point x∗ of {xk} is a stationary point of the penaltyfunction φπ(x).

Proof. Note that, whenever π is increased in Algorithm I, it is increased by at least ρ > 0.Therefore, if {πk} is bounded, there is a value π such that πk = π for all sufficiently largek.

Let x∗ be a limit point of {xk}, which exists because Assumption A1 states that {xk}is bounded. Let K be an infinite subset of indices such that {xk}k∈K → x∗. The sequenceof matrices {Wk} is also bounded, by Assumption A2. We restrict K if necessary so that{Wk}k∈K → W ∗, where W ∗ is a limit point of {Wk}. Then, from the continuity of the func-tions f, gi, hi and their gradients, and from the definition (2.3), we have that the sequenceof models qk

π, k ∈ K converges (pointwise) to a function q∗π.Each of the functions qk

π, as well as the limiting function q∗π, are strictly convex andhave a unique minimizer. We want to prove that the minimizer of q∗π(d) is d∗ = 0, for thenTheorem 3.2 (a) implies that x∗ is a stationary point of φπ.

We proceed by contradiction. Assume that d∗ 6= 0, or equivalently, that q∗π(0)−q∗π(d∗) >0. From the pointwise convergence of the functions qk

π, we know that there exists ε > 0 suchthat

qkπ(0) − qk

π(d∗) → q∗π(0) − q∗π(d∗) = 2ε > 0.

12

Therefore, there is a number k0 such that for all k ≥ k0, with k ∈ K, we have that πk = πand

qkπ(0) − qk

π(d∗) ≥ ε. (3.13)

Since the sufficient decrease condition (2.12) is always satisfied, we have that for all k ≥ k0,

φπ(xk) − φπ(xk + αkdk) ≥ ηαk[qkπ(0) − qk

π(dk)]

≥ ηαk[qkπ(0) − qk

π(d∗)]

≥ ηαkε, (3.14)

where the second inequality follows from the fact that dk is the minimizer of qkπ(d). Thus

{φπ(xk)}k∈K is a strictly monotone subsequence that is bounded below (by Assumption A1)and thus converges to a value φπ(x∗). We then conclude from (3.14) that αk → 0.

Therefore, there is an infinite sequence of iterations at which the steplength is reduced.Let k ≥ k0 be an index in K such that αk = τ jk < 1. This means that the step size τ jk−1

was rejected, or

φπ(xk) − φπ(xk + τ jk−1dk) < ητ jk−1[qkπ(0) − qk

π(dk)]. (3.15)

It is not difficult to show (see e.g. Lemma 3.4 in [6]) that for any α > 0,

|φπ(xk + αdk) − qkπ(αdk)| ≤ c1‖αdk‖

2,

for some positive constant c1. Also, by convexity of qkπ we have

qkπ(0) − qk

π(αdk) ≥ α[qkπ(0) − qk

π(dk)].

Using these two relations, with α = τ jk−1, and the fact that φπ(xk) = qkπ(0), we get

φπ(xk + τ jk−1dk) − φπ(xk) ≤ φπ(xk + τ jk−1dk) − qkπ(τ jk−1dk) + [qk

π(τ jk−1dk) − qkπ(0)]

≤ −[qkπ(0) − qk

π(τ jk−1dk)] + c1(τjk−1)2‖dk‖

2

≤ −τ jk−1[qkπ(0) − qk

π(dk)] + c1(τjk−1)2‖dk‖

2. (3.16)

Combining (3.15) and (3.16) we obtain

(1 − η)[qkπ(0) − qk

π(dk)] ≤ c1τjk−1‖dk‖

2. (3.17)

The assumptions on Wk imply that {dk}k∈K is bounded, so we may bound the right-hand side of (3.17) by τ jk−1c2. Finally, we know that the expression in squared brackets inthe left-hand side is bounded below by ε, since k ∈ K, k ≥ k0. Thus,

(1 − η)ε

c2< τ jk−1,

and therefore αk 6→ 0, which is a contradiction.We conclude that d∗ = 0 and that x∗ is a stationary point of φπ. �

13

Note that we have not assumed that π is larger than some threshold. Yet, since wehave assumed that Algorithm I generates an infinite sequence, we have implicitly assumed(according to the well-posedness of the algorithm) that the sufficient decrease condition(2.12) is always satisfied, which is related to the size of π.

The next results establishes conditions for convergence to stationary points of the non-linear program (1.1) in the case when the penalty parameter stays bounded.

{thm:conv-bound-nu}Theorem 3.7 Suppose that Algorithm I generates an infinite sequence of iterates {xk},that Assumptions A1 and A2 hold, and that {πk} is bounded. Let x∗ be any accumulationpoint of {xk}. Then either: (a) v(x∗) = 0 and x∗ is a KKT point of (1.1); or (b) v(x∗) > 0and x∗ is a stationary point of v(x).

Proof. Let x∗ be a limit point of the sequence {xk}. Since πk is bounded there is a scalarπ such that πk = π for all large k. From Theorem 3.6 we have that x∗ is a stationary pointof φπ(x).

(a) If v(x∗) = 0, then by Theorem 3.2 (b), x∗ is a KKT point of problem (1.1).(b) In the case when v(x∗) > 0 we want to show that (3.6) holds for some ∆ > 0. Let

K be an infinite subset of indices for which xk → x∗ for k ∈ K. Let W ∗ be a limitingmatrix of the sequence {Wk}k∈K, and let q∗π(d) be the corresponding piecewise quadraticmodel. Restricting K further, if necessary, we obtain pointwise convergence of the models,i.e., qk

π(d) → q∗π(d) for k ∈ K. As before, let dk denote the minimizer of qkπ(d).

Since x∗ is a stationary point of φπ(x), by Theorem 3.2 (a) we have that the minimizerof q∗π(d) is d∗ = 0. From the pointwise convergence of the models, it follows that dk → 0,which in turn implies that

qkπ(0) − qk

π(dk) → 0. (3.18)

This limit together with (2.11) imply that mk(0) − mk(dkLP

) → 0 for k ∈ K.We also have pointwise convergence to a limiting piecewise linear model, i.e., mk(d) →

m∗(d), and hence

0 = limk→∞,k∈K

[mk(0) − mk(dkLP

)]

= limk→∞,k∈K

[mk(0) − min ‖d‖∞≤∆kmk(d)]

≥ limk→∞,k∈K

[mk(0) − min ‖d‖∞≤∆minmk(d)],

where the last inequality follows from the fact that Algorithm I requires that ∆k ≥ ∆min >0. Therefore, 0 < v(x∗) = m∗(0) = m∗(dLP(∆min)) and by Theorem 3.2 (c) we concludethat x∗ is a stationary point of infeasibility for v(x). �

Now we consider the behavior of the algorithm when the penalty parameter increaseswithout bound. For the remainder of the analysis, it is useful to define the model

qkf (d) = ∇fT

k d + 12dT Wkd, (3.19)

so thatqπk

(d) = qkf (d) + πkm

k(d). (3.20)

14

The first result focuses on the development of the penalty parameter in a vicinity of aninfeasible limit point.

{waves}Lemma 3.8 Suppose that Algorithm I generates a sequence {xk} that satisfies Assump-tions A1 and A2. Let x∗ be a cluster point of this sequence such that v(x∗) > 0, andsuppose that m∗(0)−m∗(dLP) > 0. Then, along any subsequence {xk}k∈K that converges tox∗ the penalty parameter is updated only a finite number of times.

Proof. We show that for any subsquence {xk}k∈K that converges to x∗ either condition(2.9), (2.10) or (2.11) are satisfied for sufficiently large k. This will prove the result becausethe penalty parameter is only increased in Steps 4 and 5 of Algorithm I.

Since Wk is positive definite, the lowest value of the quadratic model qkf (d) (defined in

(3.19)) is attained at the Newton step, −W−1k ∇fk. Thus by (3.1) we have

qkf (d) ≥ −∇fT

k W−1k ∇fk + 1

2∇fTk W−1

k WkW−1k ∇fk

= −12∇fT

k W−1k ∇fk

≥ −12‖∇fk‖

2/µmin. (3.21)

Also, since the linear program (2.6) is constrained by a trust region whose radius cannotexceed ∆max, we have from (3.19)

qkf (dLP) ≤ ‖∇fk‖∆max + 1

2µmax∆2max

. (3.22)

By combining (3.21) and (3.22) and recalling that ‖∇fk‖ is bounded (by Assumption A1),we deduce that there is constant γ such that, for all xk

qkf (dLP) − qk

f (d(πk)) ≤ γ. (3.23)

Since we assume thatm∗(0) − m∗(dLP) > 0, (3.24)

there exist constants r > 0 and ζ > 0 such that

mk(0) − mk(dLP) > ζ, for all xk ∈ B∗ , {x : ‖xk − x∗‖ < r}. (3.25)

By (2.11), for any such k

qkπ+(0) − qk

π+(d(π+)) ≥ ε2ε1π+ζ. (3.26)

To see this, note that if Algorithm I executes Step 4(a) we have mk(dkLP

) = mk(d(π)) = 0and hence by (2.11), (3.25) and the fact that ε1 < 1 we obtain (3.26); on the other hand, ifthe algorithm executes Step 4(b), then (3.26) follows from (2.11), (2.10) and (3.25). 1

Now, by the Lipschitz continuity assumptions in A1, one can show [7, Lemma 3.4] thatthere is a constant c1 such that for any xk and any π

|φπ(xk + αdk) − qkπ(αdk)| ≤ c1(1 + π)‖αdk‖2. (3.27)

1what about step 2?

15

Let us consider the sufficient decrease condition (2.12). From (3.26), (3.27), the equalityφπ+(xk) = qk

π+(0) and the convexity of qkπ+ we have, for xk ∈ B∗,

φπ+(xk) − φπ+(xk + αdk) − ηα[qkπ+(0) − qk

π+(dk)]

≥ [qkπ+(0) − qk

π+(αdk)] − [φπ+(xk + αdk) − qkπ+(αdk)]

−ηα[qkπ+(0) − qk

π+(dk)]

≥ [qkπ(0) − qk

π(αdk)] − c1(1 + π)‖αdk‖2 − ηα[qkπ+(0) − qk

π+(dk)]

≥ α(1 − η)[qkπ+(0) − qk

π+(dk)] − c1(1 + π)‖αdk‖2

≥ α(1 − η)ε2ε1π+ζ − c1(1 + π)α2‖dk‖2.

The right hand side is nonnegative if α ≤ (1 − η)ε2ε1π+ζ/(c1(1 + π)‖dk‖2. Therefore Step

6 of Algorithm I will always choose

αk ≥ min{1, τ(1 − η)ε2ε1π+ζ/(c1(1 + π)‖dk‖2}, (3.28)

where τ is the contraction factor used in Step 6 of the algorithm.Having established these preliminary results, we now study the situations in which the

penalty parameter is increased in Algorithm I. This increase can happen in Steps 4a, 4b or5 of Algorithm I, and we study each case separately.

Suppose that Step 4(a) is executed an infinite number of times along a subsequence{xk}k∈K that converges to x∗. Since mk(d

kLP

) = 0, we have that dkLP

is an unconstrainedminimizer of mk and is thus no smaller in norm than the minimum norm minimizer dk

mentioned in Lemma 3.3. By applying Lemma 3.3, we obtain the bound ‖dk‖ ≤ κ1 +κ2‖d

kLP‖ ≤ κ1 + κ2∆max, since ‖dk

LP‖ ≤ ∆max. Thus we have, from (3.28),

αk ≥ min{1, (1 − η)ε2ε1π+ζ/(c1(1 + π)(κ1 + κ2∆max)

2} ≥ c2,

for some constant c2 > 0. This bound, together with (2.12) and (3.26) implies that there isa constant c3 > 0 such that for any k ∈ K,

φπk+1(xk+1) ≤ φπk+1(xk) − c3πk+1. (3.29)

Now, consider the function1πφπ(x) = 1

πf(x) + v(x),

and note that since {fk} is bounded below and the algorithm is unaffected by adding aconstant to f , we may assume without loss of generality that f(xk) ≥ 0 for all k. Thisassumption and the fact that {πk} is nondecreasing, implies that

1πk+1

f(xk) + v(xk) ≤1πk

f(xk) + v(xk). (3.30)

By the sufficient decrease condition (2.12) we have that, for all k,

1πk+1

φ(xk+1) ≤1

πk+1φ(xk).

16

By combining this expression with (3.30) we have

1πk+1

φ(xk+1) ≤1πk

φ(xk),

which shows that the sequence { 1πk

f(xk) + v(xk)} is monotone decreasing for all k. Let usnow consider the iterates in the subsequence {xk}k∈K along which Step 4(a) is executed aninfinite number of times. By combining (3.30) with (3.29) we obtain, for k ∈ K.

1πk+1

f(xk+1) + v(xk+1) ≤1πk

f(xk) + v(xk) − c3,

Therefore, the sequence { 1πk

f(xk)+v(xk)} is unbounded below, which contradicts Assump-tion A1. We conclude that, under the assumptions of the lemma, Step 4(a) cannot beexecuted an infinite number of times.

Next, consider Step 4(b). By definition of d(πk) as the minimizer of qπk, and by (3.19)

we have thatqkf (d(πk)) + πkm

k(d(πk)) ≤ qkf (dLP) + πkm

k(dLP).

Combining this relation with (3.23), we obtain

πk[mk(0) − mk(d(πk))] ≥ πk[m

k(0) − mk(dkLP

)] − qkf (dLP) + qk

f (d(πk))

≥ πk[mk(0) − mk(dk

LP)] − γ

≥ πk[mk(0) − mk(dk

LP)](1 − γ

ζπk),

because (3.25) implies that −γ ≥ −γ[mk(0)−mk(dkLP

)]/ζ. If the penalty parameter is largeenough that

πk ≥γ

ζ(1 − ε1)(3.31)

thenmk(0) − mk(d(πk)) ≥ ε1[m

k(0) − mk(dLP)], (3.32)

and condition (2.10) will be satisfied. Therefore, πk cannot be increased infinitely often inStep 4b, for xk ∈ K.

Lemmas 3.8 and 3.9 should go after lemma 3.10Finally, consider step 5 of Algorithm 1, which enforces condition (2.11). Suppose that

π+ is chosen so thatπ+ ≥

γ

ζ(1 − ε1)(1 − ε2/ε1)(3.33)

(recall that ε2 < ε1). If we let π = π+(1 − ε2/ε1) then the fact that d(π) is a minimizer ofqπ implies that

−q=f (d(π)) + π+(1 − ε2/ε1)∆m(d(π)k

f ≥ 0,

and therefore

−qkf (d(π) + π+∆m(d(π) ≥ ε2

ε1π+∆m(d(π))

≥ ε2π+∆m(dLP)

17

since π satisfies (3.31) and thus (2.10).The fact that π+ satisfies (2.11) follows from noting that

−q(d(π) + π+∆m(d(π) ≤ −q(d(π+) + π+∆m(d(π+)

since d(π+) is a minimizer of qπ+ .Therefore since any π+ satisfying (3.33) satisfies (2.10) and (2.11) and since π increases

by at least ρ, only a finite number of increases can occur for xk ∈ K. �

We now consider the behavior of the algorithm in the vicinity of a feasible point thatsatisfies the well-known Mangasarian-Fromovitz constraint qualification (MFCQ) [24].

{lem:feasd}Lemma 3.9 Let x∗ be a point that satisfies MFCQ and suppose that Assumptions I hold.Then, there is a neighborhood N of x∗ and a constant rF such that, for any iterate xk ∈ N ,there is a vector dk

F with ‖dkF ‖ ≤ rF such that mk(dk

F ) = 0. In addition there are constantsr and β such that, for xk ∈ N , the minimizer dk of qk

π satisfies

‖dk‖ ≤ r (3.34)

and such thatqkf (d1) − qk

f (d2) ≤ β‖d1 − d2‖ (3.35)

for any vectors d1, d2 such that ‖d1‖ ≤ r, ‖d2‖ ≤ r.

Proof. Let h(x) denote the vector with components hi(x), i ∈ E , g(x) the vector withcomponents gi(x), i ∈ I and let ∇h(x)T and ∇g(x)T denote their Jacobians. Since MFCQholds at x∗, ∇h(x∗) has full rank and there is a direction ‖dM‖ < 1 such that

∇h(x∗)T dM = −h(x∗) and ∇g(x∗)T dM + g(x∗) > 0. (3.36)

Since ∇h(x∗) has full rank, for any x sufficiently near x∗ the matrix ∇h(x)T∇h(x) isnonsingular and the vector

dF (x) = dM −∇h(x)(∇h(x)T∇h(x))−1[h(x) + ∇h(x)T dM ] (3.37)

satisfies∇h(x)T dF (x) = −h(x). (3.38)

By continuity of ∇h(x), the vector dF (x) is continuous, and the first condition in (3.36)implies that the term in square brackets in (3.37) is small in norm near x∗. Therefore, fromthe second condition in (3.36) we have that ∇g(x)T dF (x) + g(x) > 0 for x near x∗. Thus,for xk in some neighborhood N of x∗, we have that mk(dk

F ) = 0, where dkF = dF (xk) .

Continuity of dF (x) also implies that there is a constant rF such that ‖dkF ‖ < rF for all

xk in N . By Lemma 3.3 (and the fact that dkF is not smaller in norm than dk) this implies

that (3.34) holds for all xk ∈ N with r = κ1+κ2rF . Since ‖∇qkf (d)‖ ≤ ‖∇f(xk)‖+‖Wk‖‖d‖,

condition (3.35) follows from the bounds on d1, d2 and the boundedness of Wk stipulatedin Assumptions I . �

For the following results, we define A∗ to be the set of active inequality constraints atx∗, i.e., A∗ = {i ∈ I : gi(x

∗) = 0}.

18

{lem:stayfeas}Lemma 3.10 Suppose that Algorithm I generates an infinite sequence of iterates {xk} andthat Assumptions A1 and A2 hold. Let x∗ be a feasible accumulation point of {xk} withactive set A∗, and suppose that at a point xk the step dk satisfies the linearized inequalityconstraints not in A∗, i.e.,

gi(xk) + ∇gi(xk)T dk ≥ 0, for all i 6∈ A∗. (3.39)

Then, there exists a constant γ > 0 such that for all k large enough and such that (3.39)holds, and for any direction d, we have

gi(xk) + ∇gi(xk)T (αdk + τ d) ≥ 0, for all i 6∈ A∗, (3.40)

if τ > 0 and α ∈ (0, 1) satisfyτ‖d‖ ≤ (1 − α)γ. (3.41)

Proof. Multiplying (3.39) by any α ∈ (0, 1) we obtain

gi(xk) + ∇gi(xk)T (αdk) ≥ (1 − α)gi(xk), for all i 6∈ A∗. (3.42)

Let us now consider the composite direction αdk + τ d, with τ ≥ 0 and d an arbitrarydirection. We have

gi(xk) + ∇gi(xk)T (αdk + τ d) ≥ (1 − α)gi(xk) + τ∇gi(xk)

T d, for all i 6∈ A∗. (3.43)

Note that i 6∈ A∗ implies that gi(xk) > 0, i 6∈ A∗ for k large. Therefore, if ∇gi(xk)T d ≥ 0,

then the right-hand side of (3.43) is always nonnegative. If, on the other hand, ∇gi(xk)T d <

0 and τ satisfies

τ ≤(1 − α)gi(xk)

‖∇gi(xk)‖‖d‖≤

(1 − α)gi(xk)

−∇gi(xk)T d, (3.44)

then the right-hand side of (3.43) is nonnegative. The proof is completed by defining γ tobe the minimum of gi(xk)/‖∇gi(xk)‖ for all i 6∈ A∗ such that ∇gi(xk)

T d > 0 and for all xk

sufficiently close to x∗. �

The next result analyzes the behavior of the algorithm in a vicinity of a feasible pointthat satisfies the Mangasarian-Fromowitz constraint qualification. The result states that ifthe penalty parameter is large enough, the step dk generated by the algorithm satisfies thelinearized constraints (i.e. the vectors r, s, t in (2.5) are all zero).

{sailing}Lemma 3.11 Suppose that Algorithm I generates a sequence {xk} that satisfies Assump-tions I. Let x∗ be a cluster point of this sequence such that v(x∗) = 0, and suppose thatMFCQ holds at x∗. Then for all xk sufficiently close to x∗ and πk sufficiently large, theminimizer dk of qk

π satisfies mk(dk) = 0.

Proof. As in the proof of Lemma 3.9, we denote by h(x) the vector with componentshi(x), i ∈ E and denote its Jacobian by ∇h(x)T , and similarly for g(x) and ∇g(x)T . Since

19

MFCQ is satisfied at x∗, we have that ∇h(x∗) has full rank and there is a direction ‖dMF‖ < 1such that

h(x∗) + ∇h(x∗)T dMF = 0 and ∇g(x∗)T dMF + g(x∗) > 0. (3.45)

Since ∇h(x∗) has full rank, ∇h(x) has full rank for x near x∗ and we may define the vector

dM (x) = dMF −∇h(x∗)[∇h(x)T∇h(x∗)]−1∇h(x)T dMF, (3.46)

which satisfies∇h(x)T dM (x) = 0. (3.47)

(Note that the matrix ∇h(x)T∇h(x∗) is nonsingular for x near x∗ because it is close to thenonsingular matrix ∇h(x∗)T∇h(x∗)). Moreover, since ∇h is a continuous function, dM (x)varies continuously with x, and therefore by (3.45) there is a constant σ > 0 such that forall xk near x∗, the vector dk

M , dM (xk) satisfies

∇gi(xk)T dk

M > σ for all i ∈ A∗, (3.48)

where A∗ denotes the set of active inequality constraints at x∗.Recall that x∗ is assumed to be feasible, i.e., v(x∗) = 0, and let g = 1

2 minj /∈A∗{gj(x∗)}.

Then, for xk sufficiently near x∗, we have that v(xk) < ε where ε > 0 is a constant thatmay additionally be chosen sufficiently small to satisfy both

gj(xk) ≥ g − ε > 0 for j /∈ A∗, and 2ε < g. (3.49)

We denote by N a neighborhood of x∗ such that, for all xk ∈ N , conditions (3.47), (3.48)and (3.49) hold and v(xk) < ε. Let us write

mk(d) =∑

i/∈A∗

[gi(xk)+∇gi(xk)T d]− +

∑

i∈A∗

[gi(xk)+∇gi(xk)T d]− +

∑

i∈E

|hi(xk)+∇hi(xk)T d|.

(3.50)The proof proceeds in three stages; each shows that one of the summations is zero for anyxk ∈ N and for sufficiently large πk.

(i) Let dk minimize qkπk

, and suppose by way of contradiction that the first summationis nonzero for xk ∈ N . Then,

gj(xk) + ∇gj(xk)T dk < 0 for some j /∈ A∗. (3.51)

By (3.49), we have that gj(xk) > 0 for xk ∈ N and therefore we can choose α ∈ (0, 1) suchthat gj(xk) + α∇gj(xk)

T dk = 0. It follows that

α(gj(xk) + ∇gj(xk)T d) = −(1 − α)gj(xk),

which together with (3.49) and the condition α < 1 implies

gj(xk) + ∇gj(xk)T dk ≤ −(1 − α)(g − ε). (3.52)

20

Define the function

akj (dk) , mk(dk) − [gj(xk) + ∇gj(xk)

T dk)]−, (3.53)

which consists of excluding the jth term from (3.50). For j satisfying (3.51), this functiontakes the form

akj (dk) = mk(dk) + (gj(xk) + ∇gj(xk)

T dk). (3.54)

Clearly, akj is a convex function, which implies that for any dk

akj (αdk) ≤ (1 − α)ak

j (0) + αakj (dk)

and thus that

akj (αdk) − ak

j (dk) ≤ (1 − α)(akj (0) − ak

j (dk)) ≤ (1 − α)ε, (3.55)

since by (3.53) akj (0) ≤ mk(0) = v(xk) < ε, and ak

j (dk) ≥ 0. Now, since gj(xk) +

α∇gj(xk)T dk = 0, we have from (3.53), (3.54), (3.55) and (3.52) that

m(αdk) − m(dk) = akj (αdk) − ak

j (dk) + (gj(xk) + ∇gj(xk)T dk

≤ (1 − α)ε − (1 − α)(g − ε)

= (1 − α)(2ε − g).

Finally, by Lemma 3.9 and since qkπk

(dk) = qkf (dk) + πkm

k(dk),

qkπk

(αdk) − qkπk

(dk) ≤ (1 − α)βr + πk(1 − α)(2ε − g). (3.56)

By (3.49), 2ε − g < 0, and if πk > βr/(g − 2ε), we have that qkπk

(αdk) < qkπk

(dk), which

contradicts the fact that dk is the minimizer of qkπk

. Therefore, for xk ∈ N and for πk suffi-ciently large, there cannot exist and index j satisfying (3.51), and thus the first summationin (3.50) is zero.

(ii) Next, suppose that the second sum in (3.50) is not zero for xk ∈ N . Then

g`(xk) + ∇g`(xk)T dk < 0 for some ` ∈ A∗. (3.57)

As above, consider the linearized model of the constraints other than `:

ak` (d) = mk(d) − [g`(xk) + ∇g`(xk)

T dk)]−. (3.58)

By (3.47), (3.48) we have that the vector dkM = dM (xk) satisfies ∇hi(xk)

T dkM = 0, i ∈ E

and ∇gi(xk)T dk

M ≥ σ, i ∈ A∗, and for

τ < (1 − α)γ, (3.59)

we have that (3.40) is satisfied for d = dkM and for i /∈ A∗, (recall that ‖dM‖ < 1). These

observations show that each of the terms in (3.50) is not larger for d = αdk + τdkM than for

21

d = αdk, and the same is true for ak` since ak

` consists of all but one of the terms in mk.Thus,

ak` (αdk + τdk

M ) ≤ ak` (αdk) ≤ (1 − α)ak

` (0) + ak` (dk), (3.60)

where the second inequality follows from the convexity of ak` and the condition α ∈ (0, 1).

Since ` ∈ A∗, we also have from (3.48) that

g`(xk) + ∇g`(xk)T (αdk + τdk

M ) ≥ g`(xk) + α∇g`(xk)T dk + τσ. (3.61)

If we choose τ > 0 small enough so that

τσ ≤ −(g`(xk) + ∇g`(xk)T dk) (3.62)

then

[gk` (xk) + ∇g`(xk)

T dk + τσ]− = −(gk` (xk) + ∇g`(xk)

T dk) − τσ

= [gk` (xk) + ∇g`(xk)

T dk]− − τσ. (3.63)

By making use of (3.61), the convexity of the [ · ]− function, the condition α < 1 and (3.63)we have

[g`(xk) + ∇g`(xk)T (αdk + τdk

M )]− ≤ [g`(xk) + α∇g`(xk)T dk + τσ]−

≤ (1 − α)[g`(xk) + τσ]− + α[gk` (xk) + ∇g`(xk)

T dk + τσ]−

≤ (1 − α)[g`(xk)]− + α[gk

` (xk) + ∇g`(xk)T dk + τσ]−

≤ (1 − α)[g`(xk)]− + [gk

` (xk) + ∇g`(xk)T dk + τσ]−

≤ (1 − α)[g`(xk)]− + [gk

` (xk) + ∇g`(xk)T dk]

− − τσ. (3.64)

By (3.58), (3.64), the first inequality in (3.60) and the convexity of ak` , we obtain

mk(αdk + τdkM ) = ak

` (αdk + τdkM ) + [g`(xk) + ∇g`(xk)

T (αdk + τdkM )]−

≤ ak` (αdk + τdk

M ) + (1 − α)[g`(xk)]− + [gk

` (xk) + ∇g`(xk)T dk]

− − τσ

≤ ak` (αdk) + (1 − α)[g`(xk)]

− + [gk` (xk) + ∇g`(xk)

T dk]− − τσ

≤ ak` (dk) + (1 − α)ak

` (0) + (1 − α)[g`(xk)]− + [gk

` (xk) + ∇g`(xk)T dk]

− − στ

≤ mk(dk) + (1 − α)ak` (0) + (1 − α)[g`(xk)]

− − στ

≤ mk(dk) + (1 − α)(mk(0) − [g`(xk)]−) + (1 − α)[g`(xk)]

− − τσ

≤ mk(dk) + (1 − α)mk(0) − τσ

≤ mk(dk) + (1 − α)ε − τσ (3.65)

for any α ∈ [0, 1], and τ > 0 satisfying (3.59) and (3.62). Since, by Lemma 3.9, ‖dk‖ ≤ r, wemay also choose τ small enough that ‖αdk + τdk

M‖ ≤ 2r. Then, from (3.65) and Lemma 3.9it follows that

qkπ(αdk + τdk

M ) − qkπ(dk) ≤ qk

f (αdk + τdkM ) − qk

f (dk) − π(τσ − (1 − α)ε)

≤ β‖(α − 1)dk + τdkM‖ − (πτσ − (1 − α)ε)

≤ (βr + ε)(1 − α) + (‖dkM‖ − πσ)τ

≤ (γ(βr + ε) + 1)‖dkM‖τ − πστ,

22

where we have applied Lemma 3.10 with the condition that τ is small enough that α ∈ (0, 1).If π is larger than (γ(βr + ε)+1)‖dk

M‖, then the right hand side is negative, or equivalently,qkπ(αdk + τdk

M ) < qk(dk), which is not possible because dk is the minimizer of qkπ. Therefore

the inequality (3.51) cannot hold for xk ∈ N and πk large enough.

(iii) Last, consider a step dk satisfying all the linearized inequalities (so that the first twosummations in (3.50) are zero), but such that h(xk) + ∇h(xk)

T dk 6= 0. Then

m(dk) = ‖h(xk) + ∇h(xk)T dk‖. (3.66)

Consider taking a step from dk in the direction

p = −∇h(xk)[∇h(xk)T∇h(xk)]

−1(h(xk) + ∇h(xk)T dk) + θdk

M , (3.67)

for some θ > 0. We have from (3.47) that, for any α and τ ,

h(xk) + ∇h(xk)T (αdk + τp) = h(xk) + ∇h(xk)

T αdk − τ [h(xk) + ∇h(xk)T dk],

= (α − τ)[h(xk) + ∇h(xk)T dk] + (1 − α)h(xk). (3.68)

Since for xk ∈ N we have ‖h(xk)‖ ≤ ε, and since αk ≤ 1, we obtain

‖h(xk) + ∇h(xk)T (αdk + τp)‖ ≤ (1 − τ)‖h(xk) + ∇h(xk)

T dk‖ + (1 − α)ε. (3.69)

By Assumption A1, the fact that ∇h(x) has full rank near x∗ and (3.48), we have that fori ∈ A∗, there is a constant C such that

∇gi(xk)T p ≥ −∇gi(xk)

T∇h(xk)[

∇h(xk)T∇h(xk)

]−1(h(xk) + ∇h(xk)

T dk) + θσ

≥ −C‖(h(xk) + ∇h(xk)T dk)‖ + θσ

= 23θσ > 0,

provided we choose θ = 3C‖(h(xk) + ∇h(xk)T dk)‖/σ. This bound and the convexity of

[ · ]−1 imply that for all i ∈ A∗

[gi(xk) + ∇gi(xk)T (αdk + τp)]− ≤ [gi(xk) + ∇gi(xk)

T αdk]−

≤ (1 − α)[gi(xk)]− + α[gi(xk) + ∇gi(xk)

T dk]−

≤ (1 − α)[gi(xk)]− + [gi(xk) + ∇gi(xk)

T dk]−.(3.70)

Our choice of θ implies that the length of vector p is bounded as follows,

‖p‖ ≤ C2‖h(xk) + ∇h(xk)T dk‖ + θ‖dk

M‖

= ‖h(xk) + ∇h(xk)T dk‖(C2 + 3C‖dk

M‖/σ)

≤ C3‖h(xk) + ∇h(xk)T dk‖, (3.71)

for suitable constants C2 and C3. If we choose τ and α ∈ (0, 1) to satisfy

τC3‖h(xk) + ∇h(xk)T dk‖ ≤ (1 − α)γ, (3.72)

23

then by Lemma 3.10 we have that condition (3.40) is satisfied for d = p. This observationtogether with (3.69), (3.70) and the convexity of [ · ]−, yield

mk(αdk + τp) ≤ ‖h(xk) + ∇h(xk)T (αdk + τp)‖ +

∑

i/∈A∗

[gi(xk) + ∇gi(xk)T (αdk + τp)]−

+∑

i∈A∗


≤ ‖h(xk) + ∇h(xk)T (αdk + τp)‖ +

∑

i∈A∗


≤ (1 − τ)‖h(xk) + ∇h(xk)T dk‖ + (1 − α)ε + (1 − α)

∑

i∈A∗

[gi(xk)]− +

∑

i∈A∗

[gi(xk) + ∇gi(xk)T dk]

−

≤ m(dk) − τ‖h(xk) + ∇h(xk)T dk‖ + 2(1 − α)ε,

where the last inequality follows from (3.66), the assumption that the step dk satisfies alllinearized inequalities, and the condition mk(0) < ε. It follows from this inequality, theLipschitz condition of Lemma 3.9, (3.34) and (3.71), that

qkπ(αdk + τp) − qk

π(dk) ≤ qkf (αdk + τp) − qk

f (dk) + π[−τ‖h(xk) + ∇h(xk)T dk‖ + 2(1 − α)ε]

≤ β(‖τp‖ + (1 − α)‖dk‖) + π[−τ‖h(xk) + ∇h(xk)T dk‖ + 2(1 − α)ε]

≤ β(1 − α)r + (βC3 − π)τ‖h(xk) + ∇h(xk)T dk‖ + 2π(1 − α)ε]

≤ (βC3 − π)τ‖h(xk) + ∇h(xk)T dk‖ + (βr + 2πε)(1 − α).

If τ and α satisfy (3.72), we have

qkπ(αdk + τp) − qk

π(dk) ≤ [(βC3 − π) + (βr + 2πε)C3/γ]τ‖h(xk) + ∇h(xk)T dk‖

≤ [(βC3 + βrC3/γ) + π(−1 + C3ε/γ)]τ‖h(xk) + ∇h(xk)T dk‖

If the neighborhood of x∗ is chosen small enough that εC3/γ < 1/2, then for π > 2βC3(1 +r/γ) , we have that qk

π(αdk + τp) − qkπ(dk) < 0, which contradicts the fact that dk is the

minimizer of qπk. Therefore, we must have that h(xk)+∇h(xk)

T dk = 0, and this concludesthe proof. �

We can now prove the main convergence result for the case when {πk} is unbounded.

{thm:conv-unbound-nu}Theorem 3.12 Suppose that Algorithm I generates an infinite sequence of iterates {xk},that Assumptions A1 and A1 hold, and that {πk} → ∞. Then, either:

(a) there exists a limit point x∗ with v(x∗) > 0 that is a stationary point of v; or(b) there is a limit point x∗ with v(x∗) = 0 such that MFCQ fails at x∗; or(c) every limit point of the sequence {xk} is a KKT point for problem (1.1).

Proof. Suppose that all limit points x∗ are feasible, i.e., v(x∗) > 0. Then, by Lemma 3.8 atleast one of these limit points must satisfy m∗(0) − m∗(dLP), for otherwise the sequence of

24

penalty parameter would not diverge. By Theorem 3.2 (c), such a limit point is an infeasiblestationary point of v. This shows that (a) is one of the possible outcomes of the algorithm.

Next,... �

4 Numerical Experiments{S-numerical}

We developed a matlab implementation of Algorithm I and tested its performance onvarious difficult situations. We present results on five small-dimensional examples that ex-hibit inconsistent constraint linearizations at some iterate or that fail to satisfy the linearindependence constraint qualification at the solution. One of the test problems is infeasi-ble. Important issues related to the efficient sparse implementation of Algorithm I are notaddressed here as they lie outside the scope of this paper.

To solve the subproblems in Algorithm I, we employed the codes provided by the matlab

optimization toolbox. The linear program (2.7) was solved using linprog and thequadratic program (2.5), using quadprog.

We mentioned in Section 2 that the trust region radius ∆k used in the linear program(2.7) is not crucial; in fact the convergence properties established in the previous sectionhold even if this radius is kept constant. In practice, however, it may be advantageous tochoose ∆k based on local information of the problem, and in our implementation this choiceis based on the most recently accepted step. Given the search direction dk and the steplength αk at the end of iteration k of Algorithm I, we compute

akred = φπ+(xk) − φπ+(xk + αkdk), pk

red = qkπ+(0) − qk

π+(αkdk)

and update the LP trust region radius as follows

Procedure for Updating ∆k.Initial data: η1 < η2 ∈ (0, 1), ∆min, ∆max > 0.

If akred < η1p

kred

set ∆k+1 = 12‖αkdk‖;

else if akred > η2p

kred

set ∆k+1 = 2‖αkdk‖;else

set ∆k+1 = ‖αkdk‖;Set ∆k+1 = mid(∆min, ∆k+1, ∆max).

The initial penalty π1 is set to 1 in all tests. Step 3 of Algorithm I is executed only whenthe search direction d(πk) computed in Step 2 violates the linearized constraints by morethan 10−14. Algorithm I stops in Step 1 and reports optimality if the infinity norm of theKKT error is less than 10−6. Convergence to an infeasible stationary point is reported ifAlgorithm I executes Step 3 and mk(0)−mk(dLP) < 10−15. We multiply π by 10 whenever itis increased and set ∆min, ∆1, ∆max as 10−3, 1, 103, respectively. The rest of the parametersare chosen as τ = 0.5, η = 10−4, ε1 = ε2 = 0.1 and η1 = 0.25, η2 = 0.75.

25

The first example illustrates the behavior of Algorithm I when the linearizations of theconstraints are inconsistent. In this situation, some SQP methods trigger a switch andrevert either to a feasibility restoration phase in which the objective function is ignored, asin filterSQP [15], or to an elastic mode (`1 minimization) phase, as in snopt [17]. Thereis no switch in Algorithm I, which always takes steps based on the penalty function φπ.An important difference between Algorithm I and the penalty update strategy in snopt isthat the latter follows a traditional approach in which the penalty parameter is held fixedduring the course of the minimization, and is only increased when a stationary point ofthe penalty function is approximated. Algorithm I, on the other hand, follows a dynamicupdating approach for π, as described in §2.

Example 1. The problem

minimize x1 (4.1)

subject to x21 + 1 − x2 = 0,

x1 − 1 − x3 = 0,

x2 ≥ 0, x3 ≥ 0

was introduced by Wachter and Biegler [27] to show that a class of line search interior-pointmethods may converge to a non-stationary point. We use the starting point (−3, 1, 1); thesolution is x∗ = (1, 2, 0). The output is summarized in Table 1, which reports the iterationnumber (it) the value of the penalty parameter πk, the trust region ∆k, the number ofquadratic programs (QPs) solved at the current iterate, and the number of linear programs(LPs) solved (0 or 1). The table also prints the values of the first two components of x, theKKT and feasibility errors, as well as the value of the objective function f and the penaltyfunction φ.

Table 1: Output for Example 1. {table:Ex1}

it πk ∆k QPs LPs x+

1 x+

2 KKT(x+) feas(x+) f(x+) φπ+(x+)

0 1 -3 1 9.00E+00 1.40E+01 -3.00E+00 1.10E+01

1 1 1.00E+00 1 1 -1.5405 1.2432 2.54E+00 4.67E+00 -1.54E+00 3.13E+00

2 1 2.92E+00 1 1 -0.9428 1.5316 1.94E+00 2.30E+00 -9.43E-01 1.36E+00

3 1 1.20E+00 1 1 -0.8115 1.6414 1.81E+00 1.83E+00 -8.12E-01 1.02E+00

4 1 2.63E-01 1 1 -0.8043 1.6468 1.80E+00 1.80E+00 -8.04E-01 1.00E+00

5 1 1.45E-02 2 1 -0.2924 0.8234 6.10E+00 1.55E+00 -2.92E-01 1.53E+01

6 10 8.23E-01 1 0 0.3538 0.5766 8.25E+00 1.19E+00 3.54E-01 1.23E+01

7 10 6.46E-01 1 1 1 1.4265 1.23E+01 5.73E-01 1.00E+00 6.73E+00

8 10 1.70E+00 1 0 1 2 5.73E-01 2.69E-16 1.00E+00 1.00E+00

9 10 1.15E+00 1 0 1 2 2.22E-15 2.69E-16 1.00E+00 1.00E+00

Algorithm I detects that the linearized constraints are inconsistent in the first iterationsand therefore solves the linear feasibility LP subproblem. Note that progress toward thesolution is made during these initial iterations. The penalty parameter is increased onlyonce, at iteration 5, meaning that the initial penalty π = 1 adequately relaxed the con-straints at the earlier iterations. At iteration iteration 5, the search direction has to be

26

recomputed and hence two QPs are solved. The optimal values for the primal variables arefound at iteration 8, but Algorithm I performs one extra iteration to determine the correctfinal multipliers.

Example 2. The problem

minimize (x2 − 1)2 (4.2)

subject to x21 = 0,

x31 = 0.

is presented in Fletcher et al. [16] and is also discussed by Chen and Goldfarb [10]. MFCQis violated at the solution x∗ = (0, 1) and the linearized constraints are inconsistent at everyinfeasible point.

Fletcher et al. [16] mention that, starting from the infeasible point (1, 0), a feasibilityrestoration phase is likely to converge to (0, 0), which is not the solution of the problem.We ran the filterSQP solver [14] and observed that it did indeed converge to (0, 0).Algorithm I does not exhibit such behavior. The sequence of iterates moves toward thesolution from the very first step, and is not attracted to the origin because the objectivefunction influences the choice of search direction. As shown in Table 2, Algorithm I detectsinconsistent constraint linearizations (an LP is solved at every iteration) but finds that thepenalty π = 1 is adequate to enforce progress. (No LP is solved in the last iteration, becausethe optimality stopping test is satisfied at that point). Thus, although the linear feasibilitysubproblem of Algorithm I has some of the flavor of a feasibility restoration phase, it is onlyused to determine the penalty parameter and not to compute iterates, which is beneficialin this example. �



1 x+


0 1 1 0 2.00E+00 2.00E+00 1.00E+00 3.00E+00

1 1 1.00E+00 1 1 0.6667 0.6667 6.67E-01 7.41E-01 1.11E-01 8.52E-01

2 1 1.33E+00 1 1 0.4444 0.8736 3.95E-01 2.85E-01 1.60E-02 3.01E-01

3 1 4.44E-01 1 1 0.2222 0.952 2.59E-01 6.04E-02 2.30E-03 6.27E-02

4 1 4.44E-01 1 1 0.1111 1 6.48E-02 1.37E-02 0.00E+00 1.37E-02

5 1 2.22E-01 1 1 0.0556 1 1.62E-02 3.26E-03 4.93E-32 3.26E-03

6 1 1.11E-01 1 1 0.0278 1 4.05E-03 7.93E-04 4.93E-32 7.93E-04

7 1 5.56E-02 1 1 0.0139 1 1.01E-03 1.96E-04 1.23E-32 1.96E-04

8 1 2.78E-02 1 1 0.0069 1 2.53E-04 4.86E-05 0.00E+00 4.86E-05

9 1 1.39E-02 1 1 0.0035 1 6.33E-05 1.21E-05 4.93E-32 1.21E-05

10 1 6.94E-03 1 1 0.0017 1 1.58E-05 3.02E-06 4.93E-32 3.02E-06

11 1 3.47E-03 1 1 0.0009 1 3.96E-06 7.54E-07 1.23E-32 7.54E-07

12 1 1.74E-03 1 0 0.0009 1 8.48E-07 7.54E-07 1.23E-32 7.54E-07

Example 3. The following problem belongs to the class of mathematical programs withcomplementarity constraints (MPCCs). These problems have received much attention inrecent years because of their many practical applications [12]; they can be challenging to

27

solve because MFCQ is violated at every feasible point. The problem is given by

minimize x1 + x2 (4.3)

subject to x22 − 1 ≥ 0,

x1x2 ≤ 0,

x1 ≥ 0, x2 ≥ 0.

The solution is x∗ = (0, 1) and is a strongly stationary point, which in the context of thispaper means that there is a finite value of the penalty parameter π∗ such that x∗ is astationary point for the penalty function φπ(x), for all π ≥ π∗. Equivalently, there existmultipliers at x∗ that satisfy the KKT conditions for (1.1) (although these multipliers arenot unique; in fact the set of multipliers is unbounded). Fletcher et al. [?] show that thelinearization of the constraints of this problem is inconsistent for any point of the form(ε, 1 − δ), with ε, δ > 0.



1 x+


0 1 1.00E-01 9.00E-01 1.70E+00 2.80E-01 1.00E+00 1.28E+00

1 1 1.00E+00 1 1 -1.17E-02 1.01E+00 3.08E-01 1.17E-02 9.94E-01 1.01E+00

2 1 2.23E-01 3 1 -6.42E-17 1.01E+00 1.00E+00 6.42E-17 1.01E+00 1.01E+00

3 100 2.35E-02 1 0 -1.28E-16 1.00E+00 4.82E-01 1.28E-16 1.00E+00 1.00E+00

4 100 1.11E-02 1 0 -2.57E-16 1.00E+00 3.07E-05 2.57E-16 1.00E+00 1.00E+00

5 100 1.00E-03 1 0 -2.57E-16 1.00E+00 2.36E-10 2.57E-16 1.00E+00 1.00E+00

In the results reported in Table 3, the starting point was chosen as (0.1, 0.9). At thefirst iteration, the linearized constraints are found to be inconsistent; the search directioncomputed with the initial penalty parameter satisfies condition (2.10), and the penalty pa-rameter is not increased. At the second iteration, the search direction violates the linearizedconstraints, and the LP subproblem indicates that the linearized constraints can be satis-fied. The penalty parameter needs to be increased twice so that the solution of the QPsatisfies the linearized constraints. The new iterate is feasible and from that point on, theiterates converge quadratically to the solution.

Several specialized methods have been developed in recent years that exploit the struc-ture of MPCCs (see e.g. [2, 3, 11, ?, 21, 22, 23, 26]. In these methods, the complementarityconstraints must be singled out and relaxed (or penalized). Algorithm I is, in contrast, ageneral-purpose nonlinear programming solver that treats MCCs as any other problem. �

Example 4. MFCQ is also violated in a subset of the feasible region in the class of switch-offproblems, also known as problems with vanishing constraints; see Achtziger and Kanzow[1]. An instance of such problems is

minimize 2(x1 + x2)

subject to x1 ≥ 0, (4.4)

x1x2 ≥ 0,

x2 ≥ −1,

28

which has a unique solution at x∗ = (0,−1). The feasible region is the union of the firstquadrant, where the constraints are regular (except at the origin), and a portion of thenegative x2–axis, where MFCQ is violated. The performance of Algorithm I, starting atthe origin, is summarized in Table 4.



1 x+


0 1 0 0 1.00E+00 0.00E+00 0.00E+00 0.00E+00

1 1 1.00E+00 2 1 0 -1 9.00E+00 0.00E+00 -2.00E+00 -2.00E+00

2 10 2.00E+00 1 0 0 -1 2.66E-15 0.00E+00 -2.00E+00 -2.00E+00

At the starting point, the direction obtained by solving subproblem (2.5) is given byd = (−1,−1) and leads away from the feasible region. Algorithm I, however, discards thisdirection because it does not satisfy the linearization of the constraints, which are satisfiable.The penalty parameter is increased from 1 to 10, and the new search direction d = (0,−1)not only satisfies the linearized constraints but leads straight to the solution of problem(4.4). The second iteration is required simply to compute the optimal multipliers.

This example shows how a prompt identification of an inadequate penalty parametercan save a great deal of computational work. Classical penalty methods would not increasethe penalty parameter at the first iteration and would follow the intial direction (−1,−1).The penalty function φπ is unbounded below along this direction and a classical algorithmwould generate a series of iterates with decreasing values of x1 until the algorithm detectsthat the iterates appear to be diverging. Not only would those iterations be wasted, butextra effort would be required to return to the vicinity of the solution.

We also ran Algorithm I on problem (4.4) starting from x0 = (1, 1) and observed thatAlgorithm I generates an infinite sequence of iterates that converges to the origin. Thispoint is not a solution of the problem, but is a point where MFCQ is violated. As predictedby the analysis of Section 3, the penalty parameter diverges. It is interesting to describe indetail the behavior of the algorithm in this case.

The initial QP subproblem gives the search direction d1 = (−1,−1). Should Algorithm Iaccept this direction, the next iterate would be the origin — the starting point for the rundiscussed above — and would lead to the solution in one more iteration. This searchdirection is, however, not accepted by the algorithm because d1 violates the linearization ofthe nonlinear constraint x1x2 ≥ 0. Algorithm I therefore solves the feasibility LP problemthat indicates that the linearized constraints can be satisfied in a neighborhood of the currentpoint, so the penalty parameter is increased. The new search direction, d = (−1/2,−1/2),is the longest multiple of the steepest descent direction of f that satisfies all linearizedconstraints. The algorithm follows this direction to produce the new iterate x1 = (1/2, 1/2).

This behavior is then repeated over and over. The step computed at x1 is d = (−1/4,−1/4),which is the longest step along the steepest descent descent direction that satisfies the lin-earization of the constraint x1x2 ≥ 0. The new iterate is (1/4, 1/4). One can show thatAlgorithm I generates the iterates xk = (2−k, 2−k) and therefore never reach the origin. Asimilar behavior is observed starting from other points near the ray (γ, γ) with γ > 0.

29

Although, as mentioned above, this behavior is not in contradiction with the theory itsuggests that perhaps Algorithm I could be altered slightly. Note that both the initial point(1, 1) and the origin (0, 0) are feasible for problem (4.4), and the penalty function φπ=1(x)decreases along the first trial direction d = (−1,−1), Algorithm I rejects a good pointsimply because it violates the linearization of the constraints. Therefore, in practice onecould test the value of the penalty function prior to solving the feasibility problem. Whethersuch a strategy is desirable is a topic of investigation when this method is developed intoproduction software. �

Example 5. The following infeasible problem has been studied by Burke and Han [5] andChen and Goldfarb [10]: 2

minimize x (4.5)

subject to x2 + 1 ≤ 0,

x ≤ 0.

Chen and Goldfarb report that, starting from x1 = 10, their method converges to theinfeasible stationary point x∗ = 0 after 50 iterations, with a final penalty parameter ofπ ' 2.1e + 6. Algorithm I converges to that infeasible starionary point in 3 iterations; seeTable 5.

It may seem surprising that the final penalty parameter reported in the Table 5 is only10, given that our analysis suggests that the penalty parameter will tend to infinity in theinfeasible case. We note, however, that π = 10 is the penalty at the beginning of iteration3 and that Algorithm I drives the penalty parameter to infinity in Step 4. It does so, whilestaying at the current iterate, and once it detects that the problem is locally infeasible, itstops. �


it πk ∆k QPs LPs x+ KKT(x+) feas(x+) f(x+) φπ+(x+)

0 1 10 1.01E+02 1.11E+02 1.00E+01 1.21E+02

1 1 1.00E+00 1 1 4.95 2.55E+01 3.05E+01 4.95E+00 3.54E+01

2 1 1.01E+01 2 1 0 4.01E+00 1.00E+00 8.88E-16 1.00E+01

3 10 9.90E+00 1 1 0 1.00E+00 1.00E+00 8.88E-16 1.00E+01

More results obtained with our matlab implementation of Algorithm I are reported in[23]. The test set used in those experiments includes both regular problems and problemsthat are infeasible or do not satisfy constraint qualifications. The results in [23] indicatethat Algorithm I is quite efficient on problems that do not require regularization because, asexpected, condition (2.9) guarantees that a pure SQP step is used whenever the linearizedconstraints can be satisfied in a neighborhood of the current iterate. Therefore, for regularproblems, Algorithm I performs very similarly to a classical SQP method, except that a

2GL: I understand that they took the constraints from [5] and picked the objective function themselves,but I have not checked that reference.

30

few extra QPs and LPs are solved when the initial penalty parameter is too small. On theother hand, Algorithm I is much more robust and efficient than a classical SQP method onproblems (such as those in the Examples 1-5) that require regularization.

References

[1] W. Achtziger and C. Kanzow. Mathematical programs with vanishing constraints: opti-mality conditions and constraint qualifications. Mathematical Programming, 114(1):69–99, 2008.

[2] M. Anitescu. Global convergence of an elastic mode approach for a class of Mathe-matical Programs with Complementarity Constraints. SIAM Journal on Optimization,16(1):120–145, 2005.

[3] H.Y. Benson, A. Sen, D.F. Shanno, and R.J. Vanderbei. Interior-point algorithms,penalty methods and equilibrium problems. Computational Optimization and Appli-cations, 34(2):155–182, 2006.

[4] J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Optimization: Theoryand Examples. Springer Verlag, New York, 2000.

[5] J. V. Burke and S. P. Han. A robust sequential quadratic-programming method. Math-ematical Programming, 43(3):277–303, 1989.

[6] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. An algorithm for non-linear optimization using linear programming and equality constrained subproblems.Mathematical Programming, Series B, 100(1):27–48, 2004.

[7] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. On the convergence ofsuccessive linear-quadratic programming algorithms. SIAM Journal on Optimization,16(2):471–489, 2006.

[8] R. H. Byrd, J. Nocedal, and R. A. Waltz. Steering exact penalty methods. OptimizationMethods and Software, 23(2), 2008.

[9] R. H. Byrd, J. Nocedal, and R.A. Waltz. KNITRO: An integrated package for nonlinearoptimization. In G. di Pillo and M. Roma, editors, Large-Scale Nonlinear Optimization,pages 35–59. Springer, 2006.

[10] L. Chen and D. Goldfarb. Interior-point `2 penalty methods for nonlinear programmingwith strong global convergence properties. Mathematical Programming, 108(1):1–36,2006.

[11] De Miguel, A.V., M.P. Friedlander, F.J. Nogales, and S. Scholtes. A two-sided relax-ation scheme for mathematical programs with equilibriums constraints. SIAM Journalon Optimization, 16(1):587–609, 2005.

31

[12] M. C. Ferris and J. S. Pang. Engineering and economic applications of complementarityproblems. SIAM Review, 39(4):669–713, 1997.

[13] R. Fletcher. Practical Methods of Optimization. J. Wiley and Sons, Chichester, Eng-land, second edition, 1987.

[14] R. Fletcher and S. Leyffer. User manual for filterSQP. Technical Report NA/181,Dundee, Scotland, 1998.

[15] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Math-ematical Programming, 91:239–269, 2002.

[16] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a filter-SQPalgorithm. SIAM Journal on Optimization, 13(1):44–59, 2002.

[17] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scaleconstrained optimization. SIAM Journal on Optimization, 12:979–1006, 2002.

[18] N. I. M. Gould, D. Orban, and Ph. Toint. An interior-point L1-penalty method fornonlinear optimization. Technical Report RAL-TR-2003-022, Rutherford AppletonLaboratory Chilton, Oxfordshire, UK, 2003.

[19] N. I. M. Gould and D. P. Robinson. A second derivative SQP method with imposeddescent. Technical Report 08/09, Oxford University Computing Laboratory, 2008.

[20] S. P. Han and O. L. Mangasarian. Exact penalty functions in nonlinear programming.Mathematical Programming, 17(3):251–269, 1979.

[21] S. Leyffer, G. Lopez-Calva, and J. Nocedal. Interior methods for mathematical pro-grams with complementarity constraints. SIAM Journal on Optimization, 17(1):52–77,2006.

[22] X. Liu and J. Sun. Generalized stationary points and an interior-point method formathematical programs with equilibrium constraints. Mathematical Programming,101(1):231–261, 2004.

[23] G. Lopez-Calva. Exact-Penalty Methods for Nonlinear Programming. PhD thesis,Industrial Engineering & Management Sciences, Northwestern University, Evanston,IL, USA, 2005.

[24] O. L. Mangasarian and S. Fromovitz. The Fritz John necessary optimality conditions inthe presence of equality and inequality constraints. Journal of Mathematical Analysisand Applications, 17:37–47, 1967.

[25] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in OperationsResearch. Springer, 1999.

32

[26] Raghunathan, A. and L. T. Biegler. An interior point method for mathematical pro-grams with complementarity constraints (MPCCs). SIAM Journal on Optimization,15(3):720–750, 2005.

[27] A. Wachter and L. T. Biegler. Failure of global convergence for a class of interior pointmethods for nonlinear programming. Mathematical Programming, 88(3):565–574, 2000.

[28] R. A. Waltz and T. D. Plantenga. Knitro 5.0 User’s Manual. Technical report, ZienaOptimization, Inc., Evanston, IL, USA, February 2006.

33

a line search penalty method for nonlinear optimization

Documents