a trust region method for nonlinear programming based on primal interior-point techniques

A TRUST REGION METHOD FOR NONLINEAR PROGRAMMINGBASED ON PRIMAL INTERIOR-POINT TECHNIQUES∗

TODD PLANTENGA†

SIAM J. SCI. COMPUT. c© 1998 Society for Industrial and Applied MathematicsVol. 20, No. 1, pp. 282–305

Abstract. This paper describes a new trust region method for solving large-scale optimizationproblems with nonlinear equality and inequality constraints. The new algorithm employs interior-point techniques from linear programming, adapting them for more general nonlinear problems.A software implementation based entirely on sparse matrix methods is described. The softwarehandles infeasible start points, identifies the active set of constraints at a solution, and can usesecond derivative information to solve problems. Numerical results are reported for large and smallproblems, and a comparison is made with other large-scale optimization codes.

Key words. constrained optimization, nonlinear optimization, trust region methods, interior-point algorithms, barrier methods, large-scale optimization

AMS subject classifications. 90C30, 65K05

PII. S1064827595284403

1. Introduction. The success of interior-point methods in the field of linear pro-gramming has prompted research into similar methods for nonlinear programming.This paper describes the implementation of a new algorithm for large-scale nonlin-ear programming that attempts to exploit the strengths of interior-point methods.The algorithm is designed to efficiently find local constrained minimizers for generalnonconvex problems. The problems addressed here are large and computationally ex-pensive to evaluate; hence, algorithm efficiency is judged by the ability to reduce thenumber of points at which function and derivative information must be calculated.

Sequential quadratic programming (SQP) methods are known to be efficient inthis sense on medium size problems, especially if second derivative information is used.However, implementing SQP algorithms for large problems is a formidable task. In aprevious work [30], an SQP software code called ETR was developed for solving large-scale optimization problems subject to nonlinear equality constraints. The ETR codeaddresses all linear algebra subproblems with sparse matrix techniques. It makes fulluse of Hessian information, handling indefiniteness by the trust region method of Byrd[5] and Omojokun [34]. The efficient performance of ETR on a wide range of equalityconstrained problems [30] makes it an excellent foundation to build on.

The new algorithm in this paper extends ETR to cover problems with inequalityconstraints. It uses interior-point techniques rather than an active set strategy in thehope that an interior path will more efficiently lead to a solution. The new method,called BECTR (bound and equality constrained optimization using trust regions), isstill fundamentally an SQP method, although we will see that it resembles interior-point methods. BECTR is a working code, currently used for algorithm research.

∗Received by the editors April 10, 1995; accepted for publication (in revised form) July 22, 1997;published electronically August 5, 1998. Support for this work was provided by the U.S. Departmentof Energy under contract DE-AC04-94AL85000 and grant DE-FG02-87ER25047, and by NationalScience Foundation grant CCR-9400881. The U.S. Government retains a nonexclusive, royalty-freelicense to publish or reproduce the published form of this contribution, or allow others to do so forU.S. Government purposes. Copyright is owned by SIAM to the extent not limited by these rights.

http://www.siam.org/journals/sisc/20-1/28440.html†Sandia National Laboratories, MS 9214, P.O. Box 969, Livermore, CA 94551-0969 (tdplant@

ca.sandia.gov).

282

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

PRIMAL INTERIOR-POINT TRUST REGION METHOD 283

Unlike other interior-point methods, it can solve fully general problems and exploitsecond derivative information.

The optimization problem to be solved is of the form

minx∈Rn

f(x)(1.1)

subject to ci(x) = 0 for i = 1, . . . ,m,(1.2)

li ≤ xi ≤ ui for i = 1, . . . , n.(1.3)

The functions f, ci : Rn → R are assumed to be twice continuously differentiable.Variables need not have bounds; the only requirement is that li ∈ {−∞ ∪ R} <ui ∈ {R ∪ ∞}. The feasible region defined by the set of constraints (1.2)–(1.3) isassumed nonempty, but may be nonconvex. Problems with more general inequalityconstraints can always be put in the form (1.1)–(1.3) through the use of nonnegativeslack variables (as discussed in [21, p. 146] and [27, section 5.6.1]).

The next section explains the new algorithm by first describing the ETR code forsolving equality constrained problems, and then the interior-point extensions for han-dling inequalities. Convergence properties are discussed, and the BECTR algorithmis compared to other interior-point methods. Section 3 describes computational dif-ficulties encountered as iterates approach active bound constraints, then shows howto overcome these difficulties by switching to an active set method after sufficientprogress is made along the interior path. Test results are presented and discussed insection 4, including a comparison with the SNOPT 5.0 [25] and LANCELOT A (7/94)[12] codes.

2. Algorithm description. Let us begin with some notation. The gradient ofthe objective function f is denoted by g, and A denotes the n×m matrix of constraintgradients; i.e.,

A(x) = [∇c1(x) · · · ∇cm(x)].

It is assumed thatA(x) has full column rank at all points. (This restriction is necessaryfor the current software implementation, but not the Byrd–Omojokun theory [6].) Thealgorithms described in this paper generate a sequence of iterates {x0, . . . , xk, . . .}converging to a solution point x∗ of (1.1)–(1.3). Subscripts are used to show theiterate at which a quantity is computed, while superscripts indicate components ofa vector; for example, cik is the ith component of the vector c(xk). The vector ofLagrange multipliers for the m equality constraints (1.2) is denoted by λ. Throughoutthe paper, ‖ · ‖ signifies the `2 norm.

2.1. Solving equality constrained subproblems. The core operation of thenew algorithm is the solution of nonlinear subproblems that contain only equalityconstraints. To fully appreciate the computational issues, it is important to grasphow ETR solves these subproblems. What follows is a condensed description; formore details, see [30] or [35].

Consider the equality constrained problem

minx∈Rn

f(x) subject to c(x) = 0(2.1)

for f : Rn → R and c : Rn → Rm. To provide efficient and robust performance,ETR applies sequential quadratic programming with a trust region. At an iterate xk,

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

284 TODD PLANTENGA

a trust region radius ∆k and Lagrange multipliers λk are chosen, and a trial step dkis generated by solving the quadratic trust region subproblem

mind∈Rn

dT gk +1

2dTWkd(2.2)

subject to ATk d+ ck = 0,(2.3)

‖d‖ ≤ ∆k,(2.4)

where Wk = ∇2xL(xk, λk) is the Hessian of the Lagrangian function L(x, λ) = f−λT c,

and ‖ · ‖ is the Euclidean `2 norm.A known difficulty with this subproblem is that the step size restriction (2.4) may

preclude any step from satisfying the linear constraint (2.3). Various modificationshave been proposed to make the constraints consistent [8, 9, 37, 40], but ETR usesthe method of Byrd [5] and Omojokun [34]. First, a partial step is computed thatlies well within the trust region and satisfies the linear constraint (2.3) as much aspossible. This is done by defining a relaxation parameter ζ ∈ (0, 1) and finding a stepvk that solves the vertical (or normal) subproblem

minv∈Rn

‖ATk v + ck‖ subject to ‖v‖ ≤ ζ∆k.(2.5)

In ETR the step vk is chosen to lie in the range space of Ak, which has the computa-tional advantages of reducing problem (2.5) to size m and of decoupling it from thehorizontal subproblem described below.

The full step d need not move any closer to the feasible manifold than vk does,so (2.2)–(2.4) can be reformulated as

mind∈Rn

dT gk + 12dTWkd

subject to ATk d = ATk vk,‖d‖ ≤ ∆k.

(2.6)

This subproblem, unlike (2.2)–(2.4), is well posed (for example, d = vk is feasible).The ETR code solves for d by using an n × (n −m) matrix Zk that spans the nullspace of Ak (so that ATk Zk = 0 and ZTk vk = 0), then defining the full step to bed = vk +Zku. The unknown vector u ∈ Rn−m is found by substituting into (2.6) andsimplifying, which leaves

minu∈Rn−m

(gk +Wkvk)TZku+1

2uTZTkWkZku(2.7)

subject to ‖Zku‖ ≤√

∆2k − ‖vk‖2.(2.8)

The solution of this horizontal (or tangential) subproblem is denoted by uk, and thestep used by the SQP method is

dk = vk + Zkuk.(2.9)

To preserve matrix sparsity, ETR uses an implicit representation of Zk proposedby Murtagh and Saunders [33]. The constraint Jacobian is partitioned as

ATk = [Bk Nk],(2.10)

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


with the m × m basis matrix Bk chosen to be nonsingular. (For simplicity, (2.10)assumes the basis is formed by the first m columns of ATk .) Then, by direct elimination[21, p. 234],

Zk =

[ −B−1k NkI

].(2.11)

Since B−1k could be large and dense, ETR instead computes and stores a sparse LU

factorization of Bk. (Harwell subroutine MA28 [18] is used both to choose the basisBk and to compute its LU factorization.) This arrangement allows the calculation ofmatrix-vector products involving Zk and its transpose; thus, (2.7)–(2.8) can be solvedapproximately by an iterative method. ETR uses a conjugate gradient iteration,with one preconditioning option that reflects the scaling induced by the trust regionconstraint (2.8).

After computing vertical and horizontal steps, the total step dk defines a trialiterate xk+1 = xk+dk. A trust region method evaluates the trial point’s ability to solvethe original problem (2.1) by using a merit function (described later in section 2.4). Ifthe merit function approves the iterate, then a new subproblem (2.2)–(2.4) is formedat xk+1; otherwise, the same subproblem is resolved with a smaller trust region radius.

The approach of Byrd and Omojokun thus replaces (2.2)–(2.4) by two trust re-gion subproblems of smaller dimension, each with just a single quadratic constraint.The vertical subproblem (2.5) has a spherical trust region and is confined to the m-dimensional range space of Ak, whereas the horizontal subproblem (2.7)–(2.8) has anellipsoidal trust region in the (n − m)-dimensional manifold that is tangent to theconstraints. The strong convergence properties of trust region methods allow the useof approximate solutions to these two subproblems, a property that is exploited in theETR software to reduce linear algebra costs while maintaining robustness. Also, thetrust region constraint (2.8) enables a straightforward handling of indefiniteness inthe horizontal subproblem [39]; thus, second derivatives can be incorporated directlyinto Wk.

Further computational details of the ETR algorithm are available in [30] and [35].Our task now is to reduce the general problem (1.1)–(1.3) to the equality constrainedform (2.1) that ETR can address.

2.2. Interior-point treatment of inequalities. The strategy for attacking(1.1)–(1.3) is to transform the problem into coordinates where the explicit boundconstraints (1.3) can be treated as implicit side conditions; thus, the transformedproblem is of the form (2.1). A trust region constraint enters into this transformationin a natural way, helping to ensure that the missing (but implicitly present) boundconstraints (1.3) are not violated.

The transformation is based on Dikin’s [17] affine scaling, which was used inearly interior-point methods for solving linear programming problems (see [28] for anhistorical perspective). Let us say that a point x which satisfies li < xi < ui is aninterior point of problem (1.1)–(1.3). Note that this definition does not require thepoint to be feasible with respect to the equality constraints (1.2). For every interiorpoint xk, an n × n diagonal matrix Dk is defined, with its ith component set to thedistance between xik and the nearest bound constraint on that variable; i.e.,

Diik =

xik − li if li is finite and ui is not,ui − xik if ui is finite and li is not,

min{xik − li, ui − xik} if both li and ui are finite,1 if neither li nor ui is finite.

(2.12)Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

286 TODD PLANTENGA

(The definition of (2.12) assumes li and ui are reasonably defined. As pointed out bya referee, unnecessarily large magnitudes can cause Dii to be large when a value of 1is preferred.) The affine scaling transformation is the one-to-one mapping

x = D−1k x.(2.13)

A key property of this transformation is that xk is at least unit distance from allbounds in the scaled coordinates; i.e., an arbitrary step d to the point xk + d does notviolate any bound if ‖d‖ < 1. To see this, first observe that

‖d‖2 = ‖D−1k d‖2 =

n∑i=1

(di

Diik

)2

< 1 implies |di| < Diik for i = 1, . . . , n.

Now suppose the closest bound to xik is li, so that Diik = xik − li. Then |di| < xik − li,

and no matter what the sign of di is, the inequality xik + di > li holds. Since it wasassumed that li is the closest bound, the step cannot violate the upper bound either.A similar argument applies if the closest bound to xik is ui.

The transformation (2.13) is extended for the equality constrained SQP subprob-lem (2.2)–(2.3) by defining the quantities

d = D−1k d,

gk = Dkgk,

ATk = ATkDk,

Wk = DkWkDk.

(2.14)

Note that this transformation preserves the sparsity of the matrices Ak and Wk, andthat its computational cost is proportional to the number of nonzero elements.

Substituting into (2.2)–(2.3) gives an affinely scaled quadratic subproblem at thepoint xk. The new algorithm solves a sequence of these subproblems using a trust re-gion constraint to bring about convergence. Thus, the basic affinely scaled subproblemof the new algorithm is

mind∈Rn

dT gk +1

2dT Wkd(2.15)

subject to ATk d+ ck = 0,(2.16)

‖d‖ ≤ ∆k.(2.17)

Trust region theory allows a variety of metrics for the constraint. Here the constraint(2.17) is chosen to be spherical in the scaled coordinates instead of the unscaledcoordinates because the scaled metric makes the nearest bound constraints uniformlydistant from xk.

For clarity, let us pause and sketch a simple outline of a possible algorithm forsolving (1.1)–(1.3). Start at an interior point and set up the scaled subproblem(2.15)–(2.17). Solve it in scaled coordinates using the Byrd–Omojokun techniqueof computing vertical and horizontal steps. Obtain dk by transforming the scaledstep dk back to the original coordinates, then compute f and c at the trial pointxk + dk. Evaluate the trial point using a merit function, either accepting or rejectingit. If the trial point is accepted, set up a new scaled subproblem at xk+1 and repeat;otherwise, solve the same scaled subproblem at xk with a smaller trust region size.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


If we insist that ∆k < 1 at every iteration, then affine scaling will keep all trialpoints in the interior of the region; thus, our hypothetical algorithm enforces (1.3) im-plicitly. A major virtue of this approach is that the heavy computational work of solv-ing (2.15)–(2.17) can be done by the existing ETR software. Large-scale nonconvexproblems can be solved using function and constraint Hessians, with full exploitationof matrix sparsity. The scaled trust region algorithm also enjoys the strong conver-gence properties conferred by the Byrd–Omojokun trust region technique (more willbe said about convergence in section 2.5). This rudimentary algorithm introduces theidea of trust region scaling, but it is not an efficient algorithm in practice.

2.3. Practical algorithm issues. This section fills in the general outline ofthe BECTR algorithm by describing three improvements to the scaled trust regionmethod described above. Each improvement is vital to obtaining reasonable algorithmperformance.

The first improvement is to allow the trust region size ∆k to be larger thanunity. This permits large steps, and in many situations a large step makes rapidprogress towards a solution, for instance, when the solution has no active inequalityconstraints or when the current iterate is well in the interior. But if the restriction on∆k is relaxed, then some action must be taken to prevent a large step from violatingthe implicit bound constraints (1.3). BECTR uses backtracking to keep iterates in theinterior. Specifically, let τ be the distance from xk along dk to the first violated bound(so τ < ‖dk‖). BECTR changes the trial point to be xk+(0.99τ/‖dk‖)dk. In addition,the trust region size is reduced to make the trust region constraint (2.17) pass exactlythrough the new trial point. This helps prevent future trial steps from violating theconstraints. (An exact value for τ is difficult to compute when the violated constraintsare nonlinear inequalities. In this case, BECTR makes an approximation based onlinearized constraints.)

A more profound difficulty with the basic algorithm is ill-conditioning introducedby the Dikin scaling transformation. If a variable xik is close to its bound constraint,then the corresponding component Dii

k is nearly zero, and (2.14) shows that the trans-

formed matrices Ak and Wk could be ill-conditioned. Affine scaling is important be-cause it limits further progress towards an active bound; however, the ill-conditioningmakes (2.15)–(2.17) numerically difficult to solve. When iterates are not close to asolution of (1.1)–(1.3), then computational costs can be reduced by keeping iteratesaway from the bound constraints. BECTR accomplishes this by employing the samedevice used in linear programming: the addition of a potential barrier.

Let Lf denote a subset of the indices {1, . . . , n} corresponding to variables havingfinite lower bounds, and let Uf denote a similar subset of indices for the upper bounds.The logarithmic potential barrier is defined for interior points as

φ(x) ≡∑i∈Lf

ln(xi − li) +∑i∈Uf

ln(ui − xi).(2.18)

It is weighted by a constant ρ ≥ 0 and combined with the objective (1.1) to give amodified nonlinear problem

minx∈Rn

f(x)− ρφ(x)

subject to ci(x) = 0 for i = 1, . . . ,m,li ≤ xi ≤ ui for i = 1, . . . , n.

(2.19)

The effect of the potential barrier term is to push iterates into the interior andkeep the scaled subproblems well conditioned. In linear programming the modified

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

288 TODD PLANTENGA

problem (2.19) has a unique solution on the so-called central path, this path beingparameterized by the weight ρ (see [28], for instance). The value of ρk is changed ateach iteration to encourage iterates to follow the central path to a solution. However,for nonlinear problems the central path often does not exist outside the local neigh-borhood of a solution (consult [41] or [35] for examples). Choosing ρk and followinga “central path” cannot easily be treated as simple extensions of linear programmingmethods. These issues are side-stepped by the BECTR algorithm, which does notpursue a central path strategy. BECTR only uses the potential barrier to keep thesolution of each quadratic subproblem from getting too close to the bound constraints.The weight ρk is chosen at each iteration according to an ad hoc criterion designedspecifically for this purpose (formulas for ρk are given later in section 2.4). After afinite number of iterations ρk is fixed at zero, so that (2.19) reduces to the originalproblem (1.1)–(1.3).

To incorporate the potential barrier in the scaled quadratic subproblem (2.15)–(2.17), a quadratic model of the potential barrier is used, giving

mind∈Rn

dT gk +1

2dT Wkd− ρk

(dT∇φk +

1

2dT∇2φkd

)subject to AT d+ ck = 0,

‖d‖ ≤ ∆k,

(2.20)

where ∇φk = Dk∇φ(xk) and ∇2φk = Dk∇2φ(xk)Dk. The quadratic model of the

potential barrier is inexpensive to compute because the Hessian of φ is diagonal.A final algorithmic improvement is needed to handle problems with an infeasible

initial guess x0. BECTR borrows from linear programming and employs a “big M”[1] extension to accommodate this class of problems. A modified version of (1.1)–(1.3)is constructed so that x0 appears to be an interior point, and then BECTR is used tofind a point that is truly in the interior. This point serves as an acceptable startingpoint for the original problem.

The big M modification adds the artificial variable s to every violated boundconstraint. For example, if xi0 < li and xj0 > uj , then the original two constraintsli ≤ xi and xj ≤ uj are changed to

xi − li + s ≥ 0 and uj − xj + s ≥ 0.(2.21)

After figuring out which constraints to modify, a positive value s0 is computed thatstrictly satisfies all the constraints at x0. This gives an interior starting point fora modified problem in the n + 1 unknowns x and s. Adding a logarithmic barrierfunction φ(x, s) that uses the new constraints causes iterates of the modified problemto increase the quantities xi − li + s and uj − xj + s. Thus, if sk does not increase,the potential barrier will force xk to eventually satisfy the original bound constraints.To keep sk from increasing, a special penalty term with weight M is added to theobjective. The modified problem looks like

minx,s

f(x)− ρφ(x, s) +Ms

subject to ci(x) = 0 for i = 1, . . . ,m,li ≤ xi ≤ ui for i such that li < xi0 < ui,xi − li + s ≥ 0 for i such that xi0 ≤ li,ui − xi + s ≥ 0 for i such that xi0 ≥ ui.

(2.22)

BECTR computes iterates for this modified problem until xk satisfies the originalbound constraints (sk need not be zero for this to happen, provided that Mk remains

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


bounded). Then it reverts to solving (2.20), starting from the newly determinedfeasible point.

2.4. Summary of BECTR algorithm. The previous discussion presented thefundamentals of the BECTR algorithm with the intent of understanding its operatingprinciples. This section gives a more comprehensive statement of the algorithm, cul-minating in a pseudocode description. For the sake of brevity, some implementationdetails are only touched upon, but a full description is available in [35].

An algorithm for constrained optimization needs a merit function to measure itsprogress towards the twin goals of minimizing the objective and reducing constraintviolations. BECTR uses the exact `2 merit function

f(x) + µ‖c(x)‖,(2.23)

where µ > 0 is the merit function penalty parameter, and f(x) includes the potentialbarrier and big M terms. The bound constraints are not part of the merit functionbecause they are always satisfied by interior points. A step dk gives an actual reductiona red in the merit function, and a predicted reduction p red with respect to the modelobjective and constraints (2.2)–(2.3):

a red = f(xk)− f(xk + dk) + µ{‖c(xk)‖ − ‖c(xk + dk)‖},(2.24)

p red = −dTk gk −1

2dTkWkdk + µ{‖ck‖ − ‖ATk dk + ck‖}.(2.25)

The parameter µ is chosen for each dk to make p red be positive. The `2 meritfunction does not depend on Lagrange multiplier estimates, but it can suffer from theMaratos effect [32]. Hence, a second-order correction step is sometimes needed, at thecost of computing another vertical step.

Lagrange multiplier estimates λk ∈ Rm for the equality constraints are needed tocompute the Hessian of the Lagrangian Wk and to evaluate possible solution points.BECTR uses standard first-order least squares estimates, obtained by solving

(ATkAk)λk = ATk gk.(2.26)

In some optimization problems it is convenient to define a variable and then fixit to a known value (for example, the boundary value of a discretized differentialequation may be specified in this manner). BECTR treats each fixed variable as asimple equality constraint with no upper or lower bound. This formulation is easyto implement, but it does permit a fixed variable to stray from its fixed value duringintermediate algorithm iterations.

As discussed earlier, the potential barrier penalty parameter ρ is chosen largeenough during early iterations of the algorithm to force iterates away from the bounds.A first-order estimate of the direction the algorithm might take without the potentialbarrier is given by the vector −(gk−Akλk). If this vector is heading towards a nearbybound, then ρ needs to be large enough to offset the anticipated step. Calling a bound“near” to xk when it is within a constant ερ, upper and lower values for ρ that accountfor all bounds are estimated as

ρlg ≡ maxi

{(gk −Akλk)i(xik − li) if xik − li ≤ ui − xik and (gk −Akλk)i > 0,−(gk −Akλk)i(ui − xik) if xik − li > ui − xik and (gk −Akλk)i < 0,

ρsm ≡ mini

{(gk −Akλk)i(xik − li) if xik − li ≤ ερ and (gk −Akλk)i > 0,−(gk −Akλk)i(ui − xik) if ui − xik ≤ ερ and (gk −Akλk)i < 0.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

290 TODD PLANTENGA

Now ρk is calculated according to the logic shown in Algorithm 2.1.The idea of Algorithm 2.1 is to use a large potential barrier for early iterations

(ρ-stage = 1), which keeps the calculation of scaled steps for (2.20) well conditioned.

Algorithm 2.1. Heuristic for computing ρk(Start with ρ-stage← 1 for iterate k = 0)if ρ-stage = 3

then ρk = 0if ρ-stage = 2

then ρk = ρk−1/10if ρk < 10−6 then ρk = 0, ρ-stage← 3

if ρ-stage = 1then if closest bound to xk ≥ 10−1 or ‖ck‖∞ < 10−1 or ρlg = 0

then ρ-stage← 2ρk = ρsm/10if ρk < 10−6 then ρk = 0, ρ-stage← 3

else ρk = min{100ρlg, 106}

The value ρlg gives a first-order barrier term large enough to overcome the anticipatedstep towards any bound constraint. Of course the problem is not linear, so ρk is madebigger than ρlg by a factor of 100 to be safe. After iterates have made good progresstowards a solution, Algorithm 2.1 switches to a more modest barrier (ρ-stage = 2),and then quickly drives the barrier term to zero (ρ-stage = 3).

The pseudocode description of BECTR following this section combines everythingdiscussed so far. In general, BECTR computes a solution in three phases: (1) the bigM method is used to find an interior point; (2) the potential barrier is employed tokeep iterates away from the bounds until the neighborhood of a solution is reached;and then (3) the original problem with no added barrier (ρk = 0) is solved. Everyphase solves affinely scaled subproblems that have only equality constraints; thus, themain loop in the pseudocode applies to all three phases. As described in [30], the ETRsoftware at the heart of the algorithm provides both iterative and direct sparse solversfor the large linear systems that must be addressed. The major subproblem solutionoptions are indicated in the pseudocode description. Finally, note that discussion ofthe first-order stop test for BECTR is deferred until section 3.2.

2.5. Convergence of the algorithm. Every iterate in the BECTR algorithmis generated as the approximate solution to some trust region subproblem having onlyequality constraints. Even the use of backtracking merely results in a subproblem witha smaller trust region size; it does not alter the trust region nature of the method.Each trust region constraint viewed from unscaled coordinates is of the form

‖D−1k d‖ ≤ ∆k.(2.27)

This equation defines an ellipsoidal region whose shape changes every time a trial stepis accepted; hence, the BECTR algorithm may be viewed as a trust region method thatuses a constantly changing ellipsoidal norm in the trust region constraint. BECTRuses the Byrd–Omojokun technique to solve each trust region subproblem, but inscaled form. This means BECTR shares the convergence properties of the Byrd–Omojokun method, provided that iterates stay in the interior and that the scalingtransformation (2.14) is incorporated into the analysis.

Several researchers have developed trust region algorithms based on interior-pointideas and proved global convergence. Theoretical results were first established by

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


Algorithm BECTR. New interior-point trust region method for solving (1.1)–(1.3)Constants η, ζ, γ ∈ (0, 1) are given (defaults are η = 0.1, ζ = 0.8, γ = 0.5)Start with some x0 in the interior (use big M method first if necessary)Choose ∆0 > 0 (default is ∆0 = 1)Compute ρ0 ≥ 0 using Algorithm 2.1loop, starting with k = 0

Compute fk, ck, gk and Ak for (1.1)–(1.2)if xk is sufficiently close to a solution of (1.1)–(1.3) (see section 3.2)

then stop BECTR (compute final x∗ as in section 3.1)if ρk > 0

then Include the barrier gradient by setting gk ← gk − ρk∇φkCompute scaling matrix Dk and obtain gk and Ak using (2.14)

Compute null space basis Zk from Ak using MA28 [18]Solve vertical subproblem (2.5) in scaled coordinates for vk

Option 1: use dogleg method [36] with sparse factors of ATk AkOption 2: use dogleg method with conjugate gradient (Craig’s method [13])Option 3: use Steihaug’s conjugate gradient method [39] on the

matrix AkATk

Compute an unscaled multiplier estimate λkOption 1: solve (ATkAk)λk = ATk gk with sparse factors of ATkAkOption 2: solve (ATkAk)λk = ATk gk with conjugate gradient least squares

Compute Wk using λk by an option described belowif ρk > 0

then Include the barrier Hessian by setting Wk ←Wk − ρk∇2φkObtain Wk using (2.14), and solve horizontal subproblem (2.7)–(2.8)in scaled coordinates for uk

Option 1: use the second derivatives ∇2xL(xk, λk) for Wk and

solve (2.7)–(2.8) by Steihaug’s methodOption 2: use a limited memory Brayden–Fletcher–Goldfarb–Shanno [7]

approximation for Wk and solve (2.7)–(2.8) by the doglegmethod

Set dk = vk + Zkuk and compute unscaled step dk = Dkdkif xk + dk violates a bound constraint

then Find τ ∈ (0, 1) such that xk + τdk is in the interior of allbound constraints, and set dk ← τdk, ∆k = ‖D−1

k dk‖Compute a red from (2.24) and p red from (2.25)

if a redp red

< η and ‖vk‖ ≤ 0.8ζ∆k and ‖vk‖ ≤ 0.1 ‖Zkuk‖then Compute second-order correction vsoc, set dk ← dk + vsoc,

recompute dk = Dkdk, and recalculate a red and p red

if a redp red

≥ ηthen Set xk+1 = xk + dk, compute ρk+1 using Algorithm 2.1,

choose ∆k+1 ≥ ∆k, and guess an active set with Algorithm 3.2else Set xk+1 = xk, ρk+1 = ρk, and choose ∆k+1 ≤ γ‖dk‖

continue, after incrementing k

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

292 TODD PLANTENGA

Coleman and Li [11] for problems with no equality constraints. Dennis, Vicente,and Heinkenschloss [15, 16] also explored this area. More recently, Byrd, Gilbert,and Nocedal [6] have proved convergence for a trust region algorithm that works ongeneral optimization problems. Although these methods have significant differences,they claim similar properties of global convergence that are shared by the BECTRalgorithm.

Briefly, first-order global convergence is proved by assuming that the scaled ver-tical and horizontal subproblems are each solved to at least the accuracy of a Cauchystep (that is, a minimizer in the direction of scaled steepest descent). From this it canbe proved that the sequence of iterates generated by a trust region algorithm satisfies

limk→∞

inf ‖DkZTk gk‖+ ‖Dkck‖ = 0.(2.28)

The proof requires a regularity assumption that every Ak be full rank.The BECTR code is globally convergent because it makes an accurate compu-

tation of the Cauchy step when determining trial iterates. It is important that thescaling matrix Dk be defined by (2.12), which chooses the nearest bound to eachvariable, and not by a criterion that chooses bounds based on the direction of steep-est descent. As shown in [11], the latter scaling criterion needs more than Cauchydecrease to attain global convergence. Much more could be said about convergence,but a full analysis is beyond the scope of this paper.

2.6. Comparison with other interior-point methods. In this section theBECTR algorithm is compared with other recently proposed methods for extendinginterior-point ideas to nonlinear programming.

Coleman and Li [10, 11] derived an affinely scaled trust region algorithm fromdirect examination of the Karush–Kuhn–Tucker (KKT) conditions. Their methodadds a special positive semidefinite diagonal term to the Hessian which, like the logbarrier in BECTR, helps keep iterates away from the bounds. Coleman and Li use

D1/2k for affine scaling instead of the Dikin scaling. BECTR could also employ D

1/2k

(it is not inconsistent with global convergence), but preliminary experiments showedno strong advantage.

Coleman and Li choose the scaling coefficients Diik as the distance to the bound

that the direction of steepest descent points towards. BECTR instead uses the dis-tance to the nearest bound (see equation (2.12)). There are two reasons for this.On the practical side, it is not clear how to generalize steepest descent to take intoaccount nonlinear constraints. Steepest descent may point away from a bound, whileconstraints force the bound to become active. On the theoretical side, Coleman andLi prove that global convergence requires a Cauchy decrease criteria and “constraintcompatibility” [11, section 4], which they obtain by using a “reflective line search.”By using equation (2.12), BECTR requires only the Cauchy criteria and a rule thatshrinks the trust region when a step is backtracked. (The situation can be illustratedby a simple convex QP with one active bound constraint. Start near the bound butnot at the solution, at a point where −∇f points into the interior. Then Newtonsteps with backtracking decrease f but do not converge.)

The algorithm of Coleman and Li is highly original and has great intuitive ap-peal. However, it is not apparent how to extend the theory to deal with nonlinearconstraints. They might be put into the objective as penalty terms, but a penaltystrategy is unlikely to be competitive with an SQP-based method with regard toreducing function evaluations.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


Byrd, Gilbert, and Nocedal [6] have developed an algorithm that has many of thesame features as BECTR, in particular, the use of ETR to solve a sequence of equalityconstrained subproblems. A major difference is that they derive their method from aprimal-dual perspective. Their algorithm places greater emphasis on solving barriersubproblems, and the barrier weight ρk is controlled in a much more deliberate fashionthan in Algorithm 2.1.

Dennis and Vicente [16] investigated affinely scaled trust region methods for prob-lems with simple bounds, and then with Heinkenschloss [15] applied the idea to opti-mal control problems subject to nonlinear equality constraints and simple bounds onthe control variables. Their approach is SQP-based, but applies affine scaling only tothe horizontal subproblem. They do not use a log barrier, but try to accelerate con-vergence by computing step length with an unscaled trust region (the step directionis still affinely scaled).

Many authors, including Yamashita [44], Jarre and Saunders [29], Forsgren andGill [22], and El-Bakry et al. [19], have considered interior-point methods that do notemploy a trust region. Some [19, 22, 44] are primal-dual formulations that attemptto extend the very successful primal-dual algorithms of linear programming. Thesealgorithms also have a strong resemblance to SQP methods for nonlinear programming[6, 44]. Only [22] does not require the Hessian of the Lagrangian to be positive definiteon a subspace, which can be an obstacle to utilizing second derivatives.

The BECTR algorithm is conceived as an SQP method globalized by trust regions;hence, primary emphasis is placed on the use of affine scaling. A log barrier is vitalfor efficient performance of the algorithm, but it is treated as less fundamental. Thisformulation contrasts markedly with primal-dual methods such as [44] and [19]. There,the search direction is computed by solving a set of first-order KKT equations thatinclude a “perturbed” complementary slackness condition. The perturbed system isnearly identical to one obtained by adding a log barrier into the objective of theproblem. In contrast with BECTR, the log barrier (or its equivalent) is treated asfundamental, and affine scaling does not even seem to be present. But in fact there isan element of affine scaling implicit in the line search. It is standard for primal-dualmethods in both the linear [31] and nonlinear [19, 44] programming arenas to utilizea specialized pair of line searches to choose independent step lengths for the primaland dual variables. Let us consider a problem with only simple bound constraints ofthe form xi ≥ 0. Given a step direction δx, the step length α is computed as [31]

α = 0.99995τ, τ = mini

{xi

−δxi | δxi < 0

}(2.29)

(since (2.29) comes from linear programming, it is assumed that at least one com-ponent of δx is negative). The step taken is d = α(δx), so the components of dsatisfy

di ≤ 0.99995xi

δxiδxi = 0.99995xi for components with δxi < 0.(2.30)

Following [6], we may express (2.30) as a trust region inequality. Define D to be adiagonal matrix with Dii = xi when δxi < 0, and Dii ≥ δxi/τ otherwise. Then (2.30)is equivalent to

‖D−1d‖∞ ≤ 0.99995.(2.31)

This affinely scaled trust region box constraint is equivalent to (2.29) for purposesof calculating the step length (in [44] the box constraint is quite explicit, though

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

294 TODD PLANTENGA

slightly different). To summarize, we see that in a general sense primal-dual algorithmscompute a search direction using a log barrier, then determine step length in thatdirection from an affinely scaled trust region box. The BECTR algorithm chooses toincorporate the scaled trust region earlier, allowing it to influence both step directionand length.

The fundamental idea of an affinely scaled trust region is certainly not new.Researchers in linear programming [2, 28] have noted the equivalence between takinga unit step in scaled coordinates and imposing an ellipsoidal trust region constraint inunscaled coordinates. Nevertheless, a trust region strategy is not usually pursued inlinear programming, in part because the strong global convergence properties conferno great advantage.

3. Finding an exact solution to (1.1)–(1.3). If a solution x∗ has one ormore bound constraints that are active, then the scaled subproblem (2.15)–(2.17)becomes progressively more ill conditioned as iterates converge towards x∗. In par-ticular, the affine scaling transformation (2.14) indicates that Wk becomes nearlysingular in the components of x∗ that have an active bound constraint. Recall fromsection 2.1 that the software approximately solves the scaled horizontal subproblemby a conjugate gradient iteration. The conjugate gradient iteration can be carriedout using the sparse, implicit representation for the null space basis defined in (2.11),but direct methods of solution are computationally intractable because they involvedense matrices. Unfortunately, one consequence of the implicit form is that a goodpreconditioner for ZTk WkZk cannot be readily computed. Hence, as components of xkapproach an active bound, the work needed to solve the scaled horizontal subproblemmay increase prohibitively. This is indeed the case for large-scale problems.

Consider Table 3.1, which illustrates a typical increase in the number of innerconjugate gradient iterations as Dk becomes more ill conditioned. The data are fortest problem OBSTCLAE, a member of the CUTE [4] problem set. This nonlinearproblem has 900 unknowns plus 124 other variables that are fixed at one value usingsimple equality constraints. The local minimizer has 384 active bounds, so scaledsubproblems near the solution can be ill conditioned in a large number of the variables.For selected iterations k of BECTR, Table 3.1 lists the distance from the currentiterate xk to the nearest bound (closest bound), the number of conjugate gradientinner iterations required to solve the horizontal subproblem (horiz CG iters), and thenumber of bounds that are within 10−3 of a component of xk (num bnds < 10−3).

Table 3.1Computational cost of approaching bound constraints.

Outer CG iterations to Distance to the Number of bnds withiteration k solve horiz subprob closest bound distance < 10−3

9 34 4.4× 10−5 1811 39 4.3× 10−5 11013 39 1.8× 10−5 38215 92 1.6× 10−5 39017 379 3.3× 10−7 39019 498 3.8× 10−8 38721 1352 9.9× 10−9 387

Smaller problems also show the adverse effects of scaling, but the excessive numberof inner conjugate gradient iterations can often be computed in negligible time. Onlarge problems like OBSTCLAE the extra work becomes significant. Furthermore,

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


as the horizontal subproblem becomes numerically singular, the conjugate gradientmethod is less successful in computing an accurate truncated Newton step (in the senseof [14]). Hence, close to the bounds the quality of the SQP Newton step computedby BECTR deteriorates, leading to an increase in the number of outer iterations aswell.

The deteriorating performance in Table 3.1 highlights an important differencebetween nonlinear and linear programming. In LP an interior-point method canreach a vertex solution with great accuracy in spite of severe ill-conditioning. Thisis due in part to the use of direct sparse solvers that are carefully tailored to exploitthe special structure of LP problems [23, 42, 43]. These solvers are not applicable tothe more complex structure of the null space subproblem (2.7)–(2.8), and conjugategradient is a viable alternative only until ill-conditioning makes it too expensive.

3.1. A terminal active set method. The last column of Table 3.1 suggestsa way around these computational difficulties. It shows that long before scaled sub-problems become intractably ill conditioned, the algorithm is already close to all 384of the active bound constraints (plus a few others). This behavior implies that theinterior-point method can successfully guess the active set of bound constraints with areasonable amount of computation (the guess, described in section 3.3, involves morethan just being “close” to a bound). Given an accurate guess, problem (1.1)–(1.3)can be solved by an active set method that does not require scaling.

Let us start from (1.1)–(1.3) and construct a new problem in which the activebound constraint inequalities are converted to equality constraints and the inactivebounds are ignored. This yields a nonlinear problem of the form (2.1), which can besolved by the ordinary Byrd–Omojokun method, that is, without applying scaling orother interior-point techniques. If the set of active bounds is chosen correctly, thenthe solution of this new problem also solves (1.1)–(1.3); if not, then the active setguess is revised according to standard rules (see [21, section 10.3] or [27, section 5.2]for details).

To summarize, the exact solution to (1.1)–(1.3) is attained by running the BECTRinterior-point trust region method until a good guess for the active set can be made,then stopping and switching to an unscaled active set trust region method.

3.2. Deciding when to stop BECTR. The goal is to stop BECTR as soon asan accurate guess of the active set of inequalities can be made. One approach investi-gated calculated the dual variables corresponding to the current active set guess andevaluated algorithm convergence in terms of the full set of first-order KKT conditions.Dual variables were computed as in [27, section 5.6.2.2] and required solving a sparsewell-scaled system of equations. However, numerical testing revealed that the muchsimpler procedure defined below works just as well. This procedure halts BECTRwhen the barrier is gone, the constraints are reasonably satisfied, and the relativechange in the objective is small.

Algorithm 3.1. Stop test for Algorithm BECTRStop tolerance ε > 0 is givenif ρk = 0 and ‖ck‖∞ < ε and (fk − fk−1)/max{fk, fk−1} < ε

then Stop BECTR and switch to active set methodelse Continue with interior-point method

As an example of the effectiveness of this stop test, refer back to the data forproblem OBSTCLAE in Table 3.1. Algorithm 3.1 terminates the interior-point phase

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

296 TODD PLANTENGA

after the 17th iteration, correctly choosing 380 of the 384 active bounds at the solution.The terminal active set method determines the remaining bounds and converges toa solution. It uses only 68 additional inner conjugate gradient iterations to solveunscaled horizontal subproblems, far fewer than the 379 iterations required to solvethe last badly scaled subproblem.

3.3. Guessing the active set. BECTR makes an initial guess before turn-ing the final computation over to the unscaled active set algorithm. A number ofsophisticated methods for determining active bounds have been devised for linearprogramming problems (see [20] for a review). Unfortunately, most of them makeuse of dual variables, which are not always well behaved in nonconvex programmingproblems (although they should be near a solution). Currently, only the quotient ofslack variables has been investigated as an indicator, according to the scheme givenbelow in Algorithm 3.2. Note that a new active set is selected after every iteration ofBECTR.

Algorithm 3.2. Rules for choosing an active set of bound constraints within BECTRfor i = 1 to n

Set ζi = min{xik − li, ui − xik} and θi = min

{xik − lixik−1 − li

,ui − xikui − xik−1

}if bound i is currently active then

if ζi > 10−2 or θi > 1then make bound i inactiveelse leave bound i active

elseif (ζi ≤ 10−2 and θi < 0.4) or

(ζi ≤ 10−3 and bound i has never previously been active)then make bound i activeelse leave bound i inactive

continue

4. Test results. The BECTR algorithm has been implemented as a mixed FOR-TRAN / C program. This section reports the results of computational tests on anumber of standard problems. The tests indicate that BECTR is reasonably robustand performs acceptably for an algorithm in its current stage of development. Theproblems were also solved by two existing codes that do not use interior-point tech-niques. A comparison between BECTR and these algorithms revealed some intriguingdifferences.

All problems are from the CUTE test set [4]. Some problems have been alteredto correct mistakes or to make them more realistic (altered SIF files are availablefrom the author). CUTE problems are written in partially separable form, which theCUTE software exploits to compute derivatives very efficiently. As a consequence,function and gradient evaluations are quite cheap, rarely constituting more than 5%of the total CPU time required to compute a solution. This fact should be kept inmind when examining results.

4.1. Testing conditions. The BECTR interior point phase was executed oneach problem until halted by the stop test in Algorithm 3.1. The stop tolerance wasε = 10−4. The final iterate and active set guess produced by BECTR were then usedto start the simple active set SQP method described in section 3.1. This algorithm

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


executed until every constraint violation and reduced gradient component was lessthan 10−5. The function evaluations and CPU times reported in this section arecumulative totals for BECTR plus the terminal active set method.

BECTR solved each problem using analytic first and second derivatives. Thealgorithm was executed with both direct and iterative options for solving subproblemsthat involve the constraint Jacobian Ak. The choice of option affected execution times,and only the fastest result is reported.

The performance of BECTR was compared with two other codes: SNOPT 5.0[25] and LANCELOT A (7/94) [12] (two other SQP methods suitable for large-scaleproblems are the codes of Franke and Arnold [24] and Betts and Frank [3]). TheSNOPT software of Gill, Murray, and Saunders is a recent code that extends theNPSOL [26] algorithm to large problems. Like BECTR, it is an SQP method; how-ever, SNOPT deals with bounds and inequalities using an active set strategy insteadof interior-point techniques (other differences include special treatment of nonlinearterms in a constraint and the use of a line search). The current SNOPT software doesnot exploit second derivative information; instead, it makes a full or limited memoryBFGS approximation to ∇2

xL. SNOPT also forms a dense matrix to approximate thereduced Hessian; hence, problems with a large number of degrees of freedom requireexcessive memory and CPU time. Despite these caveats, SNOPT is an excellent activeset SQP method to contrast with BECTR.

SNOPT was run with a stop tolerance of 10−5 using the control file declarationsMajor Feasibility and Major Optimality. Both unscaled and scaled options weretried, and the result requiring the fewest function evaluations is reported (a smallnumber of problems were strongly affected by this option). The limited memoryBFGS approximation to ∇2

xL stored up to 20 updates.

The LANCELOT software of Conn, Gould, and Toint [12] is based on an algo-rithm quite different from successive quadratic programming. LANCELOT replacesthe equality constraints (1.2) by penalty terms in the objective (an augmented La-grangian function is used), leaving a subproblem with only bound constraints. Theactive set of bounds is then identified using a projected gradient technique. LikeBECTR, the LANCELOT code exploits second derivative information and solves allsubproblems using sparse linear algebra. The largest linear systems in LANCELOTare attacked using conjugate gradient, and the user can choose from a suite of generalpreconditioners.

LANCELOT was run with a stop tolerance of 10−5 using the control file dec-larations constraint-accuracy-required and gradient-accuracy-required. Theinfinity-norm-trust-region was applied, and an exact-cauchy-point-required.Eleven different preconditioners were tried for each problem, but only the result re-quiring the fewest function evaluations is reported. This result was not necessarilythe fastest preconditioning option in terms of CPU time. As stated at the begin-ning of this paper, we are primarily concerned with reducing the number of functionand gradient evaluations, even if linear algebra costs increase. This point of view isconsistent with many engineering design applications.

Test problems were solved on a Silicon Graphics workstation running IRIX 5.2.The workstation used a 150 MHz MIPS R4400 processor and had 32M RAM. Cal-culations by all the algorithms were made in double precision (machine epsilon of2.2× 10−16).

4.2. Results for small problems. Table 4.1 contains results for the three al-gorithms on a number of standard test problems. Each problem is described by the

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

298 TODD PLANTENGA

number of variables n, simple bounds mbnd (both upper and lower are counted), equal-ity constraints meq (which include fixed variable constraints), and general inequalityconstraints minq. Recall that BECTR uses slack variables to convert problems intothe form (1.1)–(1.3), so the m in (1.2) equals meq + minq. The column marked |A∗|in Table 4.1 gives the number of inequality and bound constraints active at the localsolution.

Table 4.1Results for small problems (function evals / gradient evals).

Problem n mbnd meq minq f∗ |A∗| BECTR SNOPT LANCELOT

HS11 2 0 0 1 -8.49847 1 13 / 12 12 / 12 14 / 15HS12 2 0 0 1 -30.0000 1 11 / 11 15 / 15 21 / 20HS14 2 0 1 1 1.39346 1 10 / 10 9 / 9 12 / 13HS15 2 1 0 2 306.502 2 43 / 31 5 / 5 39 / 38HS16 2 3 0 2 0.250000 1 240 / 104 6 / 6 ** 14 / 14HS17 2 3 0 2 1.00000 2 failed 26 / 26 15 / 16HS18 2 4 0 2 5.00000 1 42 / 25 11 / 11 69 / 62HS19 2 4 0 2 -6961.81 2 failed 15 / 15 30 / 29HS20 2 2 0 2 40.1987 2 failed 6 / 6 23 / 22HS21 2 4 0 1 -99.9600 1 16 / 16 6 / 6 1 / 2HS22 2 0 0 2 1.00000 2 13 / 13 3 / 3 9 / 10HS23 2 4 0 5 2.00000 2 19 / 19 12 / 12 43 / 42HS24 2 2 0 3 -1.00000 2 8 / 8 8 / 8 7 / 8HS25 3 6 0 0 0.00000 1 23 / 18 3 / 3 * failedHS29 3 0 0 1 -22.6274 1 12 / 12 16 / 16 19 / 19HS30 3 6 0 1 1.00000 2 20 / 20 12 / 12 7 / 8HS31 3 6 0 1 6.00000 1 12 / 12 10 / 10 12 / 11HS32 3 3 1 1 1.00000 2 13 / 10 6 / 6 5 / 6HS33 3 4 0 2 -4.58579 3 23 / 22 4 / 4 * 12 / 12 *HS34 3 6 0 2 -0.834032 3 22 / 20 7 / 7 16 / 17HS35 3 3 0 1 0.111111 1 8 / 8 10 / 10 4 / 5HS36 3 6 0 1 -3300.00 3 10 / 10 4 / 4 11 / 12HS37 3 6 0 2 -3456.00 1 12 / 12 7 / 7 16 / 17HS97 6 12 0 4 3.13581 6 32 / 25 29 / 29 20 / 20HS98 6 12 0 4 3.13581 6 35 / 26 29 / 29 20 / 20HS99 7 14 2 0 -8.311E+8 0 7 / 7 22 / 22 53 / 47 *HS100 7 0 0 4 680.630 2 19 / 15 21 / 21 21 / 21HS101 7 14 0 6 1809.76 2 failed 113 / 113 failedHS102 7 14 0 6 911.811 3 failed 122 / 122 failedHS103 7 14 0 6 543.668 4 failed 89 / 89 failedHS104 8 16 0 6 3.95117 4 43 / 25 28 / 28 30 / 27HS105 8 16 0 1 1044.73 4 44 / 36 56 / 56 16 / 16HS106 8 16 0 6 7049.25 6 failed 14 / 14 failedHS107 9 8 6 0 5055.01 2 14 / 14 9 / 9 22 / 23HS108 9 1 0 13 -0.866025 7 23 / 16 * 13 / 13 * 14 / 14HS110 50 100 0 0 -9.990E+9 50 18 / 15 4 / 4 1 / 2HS111 10 20 3 0 -47.7611 0 81 / 69 * 87 / 87 36 / 36HS112 10 10 3 0 -47.7611 0 11 / 11 38 / 38 37 / 38HS113 10 0 0 8 24.3062 6 23 / 23 18 / 18 35 / 33HS114 10 20 3 8 -1768.81 6 24 / 24 25 / 25 255 / 223HS116 13 26 0 15 97.5910 10 failed 23 / 23 failedHS117 15 15 0 5 32.3487 9 failed 21 / 21 37 / 33HS118 15 30 0 29 664.820 15 16 / 16 5 / 5 8 / 9HS119 16 32 8 0 244.900 5 32 / 32 12 / 12 19 / 20

* A different, suboptimal local minimizer was found.

** Stopped at a first-order KKT point that was not a local minimum.Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


When successfully solved by an algorithm, Table 4.1 gives the number of function(including constraint) and gradient evaluations that were made. Each algorithm foundthe same local minimizer, except for a few cases that are marked by an asterisk.

Table 4.1 also reports that some problems could not be solved by a given algo-rithm. This means every option for that algorithm failed. BECTR failed to locate aninterior starting point during the big M phase (section 2.3) for problems HS13, HS17,HS20, and HS117. The other BECTR failures were problems that did not converge af-ter 1000 iterations (an iteration of BECTR makes one or two function evaluations). ALANCELOT failure means that none of the 11 preconditioning options converged after1000 iterations (an iteration of LANCELOT usually makes one function evaluation).

The results in Table 4.1 show BECTR to be less robust than the other twoalgorithms. No further discussion of these small problems will be given, since thethree algorithms were designed primarily to solve large-scale problems.

4.3. Results for large problems with simple bounds. Table 4.2 presentsresults for large problems that have only simple bound constraints on the variables.Problems may also have variables fixed at a constant value; they are counted sepa-rately in the table under mfix. The column marked |A∗| gives the number of boundsactive at the local solution, excluding fixed variables. For example, OBSTCLAE has900 unfixed variables, and at the solution, 384 of them are constrained. Table 4.2gives the number of function and gradient evaluations made by each algorithm, andthe relative CPU time (the fastest algorithm is arbitrarily assigned the value one andother CPU times are given as multiples of this time). The absolute CPU time for thefastest algorithm was typically less than 10 seconds.

Table 4.2Results for problems with simple bounds (function evals / gradient evals, rel CPU time).

Problem n mbnd mfix |A∗| BECTR SNOPT LANCELOT

BDEXP 1000 2000 0 0 15/15 2 16/16* 152 10/11* 1BQPGAUSS 2003 4006 0 94 35/35 73 failed 6/7 1JNLBRNGA 1024 900 124 304 15/15 16 101/101 82 7/8 1JNLBRNGB 1024 900 124 392 17/17 19 247/247 247 3/4 1LINVERSE 199 100 0 ≈ 50 131/109 52 125/125 25 11/11 1MCCORMCK 1000 2000 0 1 9/9 2 77/77 254 4/5 1OBSTCLAE 1024 1800 124 384 23/23 2 80/80 29 2/3 1OBSTCLAL 1024 1800 124 384 14/14 12 45/45 40 7/8 1OBSTCLBL 1024 1800 124 351 26/26 28 55/55 49 5/6 1OBSTCLBM 1024 1800 124 351 16/16 22 40/40 29 3/4 1OBSTCLBU 1024 1800 124 351 21/21 59 57/57 77 6/7 1S368 100 200 0 30 34/29* 5 57/57 3 7/7* 1TORSION1 1024 1800 124 312 15/15 12 31/31 54 9/10 1TORSION2 1024 1800 124 312 16/16 9 91/91 26 4/5 1TORSION3 1024 1800 124 624 16/16 11 20/20 13 4/5 1TORSION4 1024 1800 124 624 17/17 13 69/69 421 4/5 1


The LANCELOT algorithm clearly outperformed BECTR and SNOPT on thisclass of problem. The superiority of LANCELOT with regard to CPU time was duepartly to the fewer iterations it required, and partly to clever tailoring of the algorithmfor bound constrained problems. LANCELOT recognized the special structure of thereduced Hessian in this class of problem and applied conjugate gradient precondition-ers. By contrast, BECTR and SNOPT assumed the reduced Hessian lies in a moregeneral linear manifold. The linear manifold representation (ZTWZ in equation (2.7))

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

300 TODD PLANTENGA

improved performance on problems with nonlinear constraints (see section 4.4), but itlimited the ability to exploit the Hessian structure of problems in Table 4.2. SNOPTwas additionally slower because it used a dense matrix representation of the reducedHessian. (To be fair we must recognize that SNOPT was specifically designed for largeproblems with a small number of degrees of freedom [25], such as those in section 4.4.)

More striking was the substantial advantage LANCELOT enjoyed in terms of func-tion evaluations. It appears that the projected gradient method identified the activeset of bounds much more quickly than BECTR or SNOPT. Identifying the activeset is precisely the task that an interior-point strategy was expected to do best, sothe unimpressive performance of BECTR must be considered a disappointment. Theoutstanding performance of LANCELOT A (7/94) on these test problems might beconsidered a challenge to any nonlinear interior-point strategy.

Problem LINVERSE deserves special mention. The active bounds at a solutionof this problem are nearly degenerate in the sense that the magnitude of the corre-sponding Lagrange multipliers is very small (< 10−8); i.e., the function in LINVERSEchanges very little as the solution point is varied perpendicular to an active bound.This kind of degeneracy was not handled well by BECTR for the following reason.A degenerate bound was approached very slowly because affine scaling restricted thealready tiny gradient component in this direction. Slow progress means f did notchange much, and then Algorithm 3.1 halted BECTR, forcing the terminal activeset method to finish solving the problem. This is not a bad strategy for one or twodegenerate bounds, but it leaves too much work for the terminal phase when thereare many degenerate constraints.

4.4. Results for optimal control problems. Let us turn now to large prob-lems with nonlinear constraints. An important application of this nature is the opti-mal control of a dynamic system. Equality constraints arise in these problems fromdiscretization of the ODEs or PDEs that model the system being controlled. Usually,both state and control variables are treated as unknowns, an arrangement that permitsgeneral inequality constraints. With this formulation an optimal control problem canhave a large number of state variables and constraints, but only a moderate numberof degrees of freedom.

Table 4.3Results for optimal control problems (function evals / gradient evals, rel CPU time).

Problem n mbnd meq minq |A∗| BECTR SNOPT LANCELOT

OPTMASS 610 0 408 101 100 138/41 1 373/373 2 failedOPTMASSb 305 202 204 0 100 11/11 3 13/13 1 failedCAR2 1199 797 804 200 199 failed 95/95 1 177/148 1CAR2a 1199 797 804 199 198 61/29 16 80/80 2 146/118 1CAR2b 1199 399 804 0 198 48/29 31 33/33 1 72/63 52OPTCDEG2 602 599 403 0 171 34/31 10 8/8 1 103/105 4OPTCDEG3 602 599 403 0 164 28/28 3 21/21 1 52/53 18

Table 4.3 shows computational results for a set of optimal control problems. Theset derives from three CUTE problems (OPTMASS, CAR2, and OPTCNTRL) whichwere carefully decoded and analyzed. In each case a more intelligent formulation ofthe original CUTE problem was found, leading to improved algorithm performance.The remainder of this section discusses the improved test problems.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


OPTMASS is a simple control problem that models the motion of a particle on africtionless surface. The goal is to specify a controlling force that moves the particleas far as possible from its starting point while also minimizing the final velocity. Thecontrol variables are force vectors (F ix, F

iy) at time samples i = 1, . . . ,K, and each

applied force must satisfy ‖F i‖2 ≤ 1. The system dynamics are simple linear equalityconstraints.

The optimal solution applies maximum force to accelerate the particle until acertain time, then applies maximum force in the reverse direction to make the fi-nal velocity small. This is a “bang-bang” trajectory, characterized by many activeinequality constraints.

OPTMASS is written as a two-dimensional problem, but the optimal motion isin a straight line along the x coordinate. Variables along the y coordinate can bedeleted without changing the problem at all. The control constraint of the remainingone-dimensional problem can then be simplified from a quadratic inequality to a pairof bounds:

‖F i‖2 = (F ix)2 ≤ 1 → −1 ≤ F ix ≤ 1.(4.1)

OPTMASSb incorporates these changes. It models the same problem (with half asmany variables) and has the same solution as OPTMASS. The improved problem wassignificantly easier to optimize (except for LANCELOT, which did not converge in 1000iterations for any of the eleven preconditioning options). The dramatic performancedifference for BECTR was due to quicker identification of the correct active set. OnOPTMASS the interior-point phase of the algorithm initially approached the wrongactive set, then labored to correct this early mistake.

CAR2 resembles OPTMASS except that the particle’s motion is opposed by africtional force whose magnitude is proportional to velocity. The goal is to specify acontrolling force that moves the particle to a given point in minimum time. Discretiza-tion of the problem is similar to OPTMASS, and applied force vectors must againsatisfy ‖F i‖2 ≤ 1. Friction makes the system dynamics nonlinear, but the optimalsolution is still a “bang-bang” trajectory.

In CAR2 one of the force vector inequalities is inexplicably duplicated. Sincethis inequality is active at the solution, it makes the constraint Jacobian matrix rankdeficient. BECTR failed to solve this problem because the linear algebra used to forma null space basis (see equations (2.10) and (2.11)) does not handle rank deficiency.SNOPT was able to find a full rank subset of the Jacobian and solve the problem.Nevertheless, the duplicated inequality serves no purpose and is removed in CAR2a.This full rank problem was easier to solve for all three algorithms.

Like OPTMASS, CAR2 is written as a two-dimensional problem even though theoptimal motion is along a straight line. In CAR2b (derived from CAR2a) all variablesare retained, but each quadratic inequality is simplified to a pair of bounds as in (4.1).The optimal trajectory is unchanged by these modifications, and Table 4.3 shows thatCAR2b is easier to solve. Note that BECTR needed fewer gradient evaluations thanthe other algorithms, but used several more function evaluations. The extra functionsarose from second-order correction steps (section 2.4) that were computed to followthe nonlinear equality constraints in this problem.

OPTCDEG2 and OPTCDEG3 are problems derived from OPTCNTRL. Thisproblem (also called the “spring” problem in [25]) contains an error that makes thecode badly conditioned and physically meaningless unless exactly 100 discretized sam-ple points are used. Tracing back to the original source of the application [38], two

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

302 TODD PLANTENGA

more challenging variations of the problem were discovered. These were coded asOPTCDEG2 and OPTCDEG3. Both problems correct the discretization error inOPTCNTRL, but they optimize a more complex dynamic system. A damped mass-spring system is modeled in which the damping friction is proportional to either thevelocity squared (OPTCDEG2) or the velocity cubed (OPTCDEG3). Simple boundsare imposed on the state and control variables.

Table 4.3 indicates that the SQP-based BECTR and SNOPT algorithms were gen-erally more efficient than the penalty-based LANCELOT algorithm on optimal controlproblems. These results are the reverse of what was seen in Table 4.2. The mostlikely explanation for this behavior is the presence of more complicated constraints.

4.5. Other test results. Table 4.4 lists results for a few other CUTE problemsthat have constraints. Unlike section 4.4, these test problems have not been examinedto see if they are formulated correctly; hence, results should be interpreted withcaution.

Table 4.4Results for other constrained problems (function evals / gradient evals, rel CPU time).

Problem n mbnd meq minq |A∗| BECTR SNOPT LANCELOT

COSHFUN 61 0 0 20 ≈ 15 748/220* 5 44/44 1 109/86 2DIXCHLNV 100 100 50 0 0 16/16 1 44/44 2 41/42 5HAGER4 1001 500 501 0 251 19/19 1 8/8 1 8/9 2HIMMELBI 100 200 0 12 82 76/76 16 65/65 1 23/24 3ORTHREGF 305 2 100 0 0 18/12 1 25/25* 11 29/27 1READING1 202 402 101 0 101 88/40 3 49/49 1 275/248 45SVANBERG 250 500 0 250 196 43/43 7 53/53 1 39/40 4ZIGZAG 304 300 206 50 ≈ 91 failed 50/50 1 75/76 1


The relative performance of the three algorithms varied. Based on gradient evalu-ations, BECTR and SNOPT were the best on three problems apiece, while LANCELOTwas the best on two problems. Interestingly, BECTR was best on the two problemsfor which no inequalities were active at the solution. Perhaps this merely reflects thatan interior-point method does a better job of finding minima that are in the interior.The poor performance of BECTR on problems COSHFUN and ZIGZAG was due tothe same degeneracy of active bounds that was described for problem LINVERSE insection 4.3.

Table 4.5 shows the behavior of the BECTR algorithm as problem size increases.Three previously considered test problems were solved with different numbers of un-knowns. All three test problems result from discretizations of infinite-dimensionalproblems; hence, increasing the number of unknowns does not change the nature ofthe problem but does give a more accurate solution. We see that BECTR remainedefficient as problem size increased.

5. Conclusions. This paper has described the new BECTR algorithm for solv-ing large-scale nonlinearly constrained optimization problems. BECTR is unique inbeing able to exploit second derivative information in an SQP framework for arbitrar-ily large problems. This allows it to solve problems with a small number of functionand gradient evaluations.

BECTR solves a sequence of equality constrained SQP subproblems subject to atrust region constraint. It employs primal interior-point techniques borrowed from the

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


Table 4.5Results showing scalability of BECTR.

Problem n mbnd meq minq f∗ |A∗| fn evals grad evals

HAGER4 1001 500 501 0 2.79452 251 19 1910,001 5000 5001 0 2.79403 2501 19 1920,001 10,000 10,001 0 2.79400 5002 19 19

OBSTCLAL 1024 1800 124 0 1.74827 384 14 145625 10,658 296 0 1.86300 2431 16 16

10,000 19,208 396 0 1.88646 4475 16 1615,625 30,258 496 0 1.90097 7182 19 19

OPTMASSb 305 202 204 0 -0.126219 100 11 111505 1002 1004 0 -0.121526 500 10 103005 2002 2004 0 -0.120943 1000 11 11

15,005 10,002 10,004 0 -0.120487 5000 11 11

field of linear programming. BECTR has been designed as an SQP-based method thatuses interior-point modifications to guide iterates on a path through the interior of thefeasible region. The interior-point affine scaling necessarily generates ill-conditionedsubproblems near a solution, but BECTR recognizes this situation and switches toan unscaled active set method for final convergence.

Testing showed that BECTR was reasonably robust and fairly successful in its goalof reducing function and gradient evaluations. Comparative testing with LANCELOTA (7/94) and SNOPT 5.0 revealed some differences in performance. LANCELOT isstrongly recommended for problems with only bound constraints (assuming the prob-lem can be cast in partially separable form). BECTR might be preferred for problemswith nonlinear constraints and a large number of degrees of freedom. BECTR per-formed poorly on problems with degenerately active inequality constraints.

Interior-point trust region methods for nonlinear programming are an active areaof research. We have mentioned primal methods [10, 11, 15, 16, 29] and primal-dualmethods [6, 19, 44]. In the future, we can expect this work to become more refinedand to yield more efficient algorithms for the solution of large constrained problems.

Acknowledgments. The author thanks Jorge Nocedal for his encouragement indeveloping the algorithm, Andy Conn for his careful proofreading of an earlier versionof this paper, and the referees for their insights. Special thanks go to Philip Gill forgraciously supplying SNOPT 5.0 for testing.

REFERENCES

[1] I. Adler, M. G. C. Resende, and G. Veiga, An implementation of Karmarkar’s algorithmfor linear programming, Math. Programming (Ser. A), 44 (1989), pp. 297–335.

[2] E. R. Barnes, A variation on Karmarkar’s algorithm for solving linear programming problems,Math. Programming, 36 (1986), pp. 174–182.

[3] J. T. Betts and P. D. Frank, A sparse nonlinear optimization algorithm, J. Optim. TheoryAppl., 82 (1994), pp. 519–541.

[4] I. Bongartz, A. R. Conn, N. I. M. Gould, and Ph. L. Toint, CUTE: Constrained andunconstrained testing environment, ACM Trans. Math. Software, 21 (1995), pp. 123–160.

[5] R. H. Byrd, Robust trust region methods for constrained optimization, in Proc. Third SIAMConference on Optimization, Houston, TX, May 1987.

[6] R. H. Byrd, J. Ch. Gilbert, and J. Nocedal, A Trust Region Method Based on InteriorPoint Techniques for Nonlinear Programming, Tech. Report OTC 96/02, OptimizationTechnology Center, Argonne National Laboratory and Northwestern University, Evanston,IL, 1996.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

304 TODD PLANTENGA

[7] R. H. Byrd, J. Nocedal, and R. B. Schnabel, Representations of quasi-Newton matrices andtheir use in limited memory methods, Math. Programming (Ser. A), 63 (1994), pp. 129–156.

[8] R. H. Byrd, R. B. Schnabel, and G. A. Schultz, A trust region algorithm for nonlinearlyconstrained optimization, SIAM J. Numer. Anal., 24 (1987), pp. 1152–1170.

[9] M. R. Celis, J. E. Dennis, and R. A. Tapia, A trust region strategy for nonlinear equalityconstrained optimization, in Numerical Optimization 1984, P. T. Boggs, R. H. Byrd, andR. B. Schnabel, eds., Society for Industrial and Applied Mathematics, Philadelphia, 1985,pp. 71–82.

[10] T. F. Coleman and Y. Li, An Interior Trust Region Approach for Nonlinear MinimizationSubject to Bounds, Tech. Report 93-1342, Dept. of Computer Science, Cornell University,1993.

[11] T. F. Coleman and Y. Li, On the convergence of interior-reflective Newton methods for non-linear minimization subject to bounds, Math. Programming (Ser. A), 67 (1994), pp. 189–224.

[12] A. R. Conn, N. I. M. Gould, and Ph. L. Toint, LANCELOT : A Fortran Package for Large-Scale Nonlinear Optimization (Release A), Springer Series in Computational Mathematics17, Springer-Verlag, Berlin, 1992.

[13] E. J. Craig, The N-step iteration procedure, J. Math. Phys., 34 (1955), pp. 65–73.[14] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer.

Anal., 19 (1982), pp. 400–408.[15] J. E. Dennis, M. Heinkenschloss, and L. N. Vicente, Trust-Region Interior-Point Algo-

rithms for a Class of Nonlinear Programming Problems, Tech. Report 94-45, Dept. ofMathematical Sciences, Rice University, Houston, TX, 1994.

[16] J. E. Dennis and L. N. Vicente, Trust-Region Interior-Point Algorithms for MinimizationProblems with Simple Bounds, Tech. Report 94-42, Dept. of Mathematical Sciences, RiceUniversity, Houston, TX, 1994.

[17] I. I. Dikin, Iterative solution of problems of linear and quadratic programming, Soviet Math.Dokl., 8 (1967), pp. 674–675.

[18] I. S. Duff and J. K. Reid, Some design features of a sparse matrix code, ACM Trans. Math.Software, 5 (1979), pp. 18–35.

[19] A. S. El-Bakry, R. A. Tapia, T. Tsuchiya, and Y. Zhang, On the Formulation and Theory ofthe Primal-Dual Newton Interior-Point Method for Nonlinear Programming, Tech. Report92-40, Dept. of Mathematical Sciences, Rice University, Houston, TX, 1992.

[20] A. S. El-Bakry, R. A. Tapia, and Y. Zhang, A study of indicators for identifying zerovariables in interior-point methods, SIAM Review, 36 (1994), pp. 45–72.

[21] R. Fletcher, Practical Methods of Optimization, second ed., Wiley, Chichester, UK, 1990.[22] A. Forsgren and P. E. Gill, Primal-Dual Interior Methods for Nonconvex Nonlinear Pro-

gramming, Numerical Analysis Report 96-3, Dept of Mathematics, University of Californiaat San Diego, La Jolla, CA, 1996.

[23] A. Forsgren, P. E. Gill, and J. R. Shinnerl, Stability of symmetric ill-conditioned systemsarising in interior methods for constrained optimization, SIAM J. Matrix Anal. Appl., 17(1996), pp. 187–211.

[24] R. Franke and E. Arnold, On the integration of a large-scale nonlinear optimization toolwith open modeling and simulation environments for dynamic systems, in 10th EuropeanSimulation Multiconference, Budapest, Hungary, June 1996.

[25] P. E. Gill, W. Murray, and M. A. Saunders, SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization, Numerical Analysis Report 96-2, Dept of Mathematics,University of California at San Diego, La Jolla, CA, 1996.

[26] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright, User’s Guide for NPSOL(Version 4.0): A Fortran Package for Nonlinear Programming, Department of OperationsResearch, Stanford University, CA, 1986.

[27] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, Har-court, London, 1981.

[28] C. C. Gonzaga, Path-following methods for linear programming, SIAM Review, 34 (1992),pp. 167–224.

[29] F. Jarre and M. A. Saunders, A practical interior-point method for convex programming,SIAM J. Optim., 5 (1995), pp. 149–171.

[30] M. Lalee, J. Nocedal, and T. D. Plantenga, On the implementation of an algorithm forlarge-scale equality constrained optimization, SIAM J. Optim., to appear.

[31] I. J. Lustig, R. E. Marsten, and D. F. Shanno, On implementing Mehrotra’s predictor-corrector interior-point method for linear programming, SIAM J. Optim., 3 (1992), pp. 435–449.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


[32] N. Maratos, Exact Penalty Function Algorithms for Finite Dimensional and Control Opti-mization Problems, Ph.D. thesis, University of London, 1978.

[33] B. A. Murtagh and M. A. Saunders, Large-scale linearly constrained optimization, Math.Programming, 14 (1978), pp. 41–72.

[34] E. O. Omojokun, Trust Region Algorithms for Optimization with Nonlinear Equality andInequality Constraints, Ph.D. thesis, Dept. of Computer Science, University of Colorado,Boulder, CO, 1989.

[35] T. D. Plantenga, Large-Scale Nonlinear Constrained Optimization Using Trust Regions,Ph.D. thesis, Dept. of Electrical Engineering and Computer Science, Northwestern Uni-versity, Evanston, IL, 1994.

[36] M. J. D. Powell, A hybrid method for nonlinear equations, in Numerical Methods for Nonlin-ear Algebraic Equations, P. Rabinowitz, ed., Gordon and Breach, London, 1970, pp. 87–114.

[37] M. J. D. Powell and Y. Yuan, A trust region algorithm for equality constrained optimization,Math. Programming (Ser. A), 49 (1991), pp. 189–211.

[38] P. S. Ritch, Discrete optimal control with multiple constraints I: Constraint separation andtransformation techniques, Automatica, 9 (1973), pp. 415–429.

[39] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization,SIAM J. Numer. Anal., 20 (1983), pp. 626–637.

[40] A. Vardi, A trust region algorithm for equality constrained minimization: Convergence prop-erties and implementation, SIAM J. Numer. Anal., 22 (1985), pp. 575–591.

[41] M. H. Wright, Interior methods for constrained optimization, in Acta Numerica 1992, A. Iser-les, ed., Cambridge University Press, UK, 1992, pp. 341–407.

[42] S. J. Wright, Stability of augmented system factorizations in interior-point methods, SIAMJ. Matrix Anal. Appl., 18 (1997), pp. 191–222.

[43] S. J. Wright, Stability of linear equations solvers in interior-point methods, SIAM J. MatrixAnal. Appl., 16 (1995), pp. 1287–1307.

[44] H. Yamashita, A Globally Convergent Primal-Dual Interior Point Method for ConstrainedOptimization, Tech. report, Mathematic Systems Institute Inc., Tokyo, Japan, 1992.

Dow

nloa

ded

12/0

4/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

a trust region method for nonlinear programming based on primal interior-point techniques

Documents