an algorithm for nonlinear optimization problems with...

32
Comput Optim Appl (2010) 47: 257–288 DOI 10.1007/s10589-008-9218-1 An algorithm for nonlinear optimization problems with binary variables Walter Murray · Kien-Ming Ng Received: 6 April 2007 / Revised: 22 October 2008 / Published online: 2 December 2008 © Springer Science+Business Media, LLC 2008 Abstract One of the challenging optimization problems is determining the mini- mizer of a nonlinear programming problem that has binary variables. A vexing diffi- culty is the rate the work to solve such problems increases as the number of discrete variables increases. Any such problem with bounded discrete variables, especially bi- nary variables, may be transformed to that of finding a global optimum of a problem in continuous variables. However, the transformed problems usually have astronom- ically large numbers of local minimizers, making them harder to solve than typical global optimization problems. Despite this apparent disadvantage, we show that the approach is not futile if we use smoothing techniques. The method we advocate first convexifies the problem and then solves a sequence of subproblems, whose solutions form a trajectory that leads to the solution. To illustrate how well the algorithm per- forms we show the computational results of applying it to problems taken from the literature and new test problems with known optimal solutions. Keywords Nonlinear integer programming · Discrete variables · Smoothing methods The research of W. Murray was supported by Office of Naval Research Grant N00014-08-1-0191 and Army Grant W911NF-07-2-0027-1. W. Murray ( ) Systems Optimization Laboratory, Department of Management Science and Engineering, Stanford University, Stanford, CA 94305-4022, USA e-mail: [email protected] K.-M. Ng Department of Industrial & Systems Engineering, National University of Singapore, Singapore, Singapore e-mail: [email protected]

Upload: others

Post on 21-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

Comput Optim Appl (2010) 47: 257–288DOI 10.1007/s10589-008-9218-1

An algorithm for nonlinear optimization problemswith binary variables

Walter Murray · Kien-Ming Ng

Received: 6 April 2007 / Revised: 22 October 2008 / Published online: 2 December 2008© Springer Science+Business Media, LLC 2008

Abstract One of the challenging optimization problems is determining the mini-mizer of a nonlinear programming problem that has binary variables. A vexing diffi-culty is the rate the work to solve such problems increases as the number of discretevariables increases. Any such problem with bounded discrete variables, especially bi-nary variables, may be transformed to that of finding a global optimum of a problemin continuous variables. However, the transformed problems usually have astronom-ically large numbers of local minimizers, making them harder to solve than typicalglobal optimization problems. Despite this apparent disadvantage, we show that theapproach is not futile if we use smoothing techniques. The method we advocate firstconvexifies the problem and then solves a sequence of subproblems, whose solutionsform a trajectory that leads to the solution. To illustrate how well the algorithm per-forms we show the computational results of applying it to problems taken from theliterature and new test problems with known optimal solutions.

Keywords Nonlinear integer programming · Discrete variables · Smoothingmethods

The research of W. Murray was supported by Office of Naval Research Grant N00014-08-1-0191 andArmy Grant W911NF-07-2-0027-1.

W. Murray (�)Systems Optimization Laboratory, Department of Management Science and Engineering,Stanford University, Stanford, CA 94305-4022, USAe-mail: [email protected]

K.-M. NgDepartment of Industrial & Systems Engineering, National University of Singapore, Singapore,Singaporee-mail: [email protected]

Page 2: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

258 W. Murray, K.-M. Ng

1 Introduction

Nonlinear optimization problems whose variables can only take on integer quantitiesor discrete values abound in the real world, yet there are few successful solution meth-ods and even one of the simplest cases with quadratic objective function and linearconstraints is NP-hard (see, e.g., [6]). We focus our interest on problems with linearequality constraints and binary variables. However, once the proposed algorithm hasbeen described it will be seen that extending the algorithm to more general cases,such as problems with linear inequality constraints, or problems with some continu-ous variables is trivial. Also, the binary variable requirement is not restrictive sinceany problem with bounded discrete variables can easily be transformed to a problemwith binary variables.

The problem of interest can thus be stated as

Minimize f (x)

subject to Ax = b,

x ∈ {0,1}n,(1.1)

where f : Rn → R is a twice continuously differentiable function, A is an m × n

matrix with rank m and b ∈ Rm is a column vector. Assuming differentiability may

seem strange when x is discrete. (Often f (x) cannot be evaluated unless every xj = 0or 1.) Nonetheless for a broad class of real problems f (x) is smooth and when it isnot it is often possible to alter the original function suitably (see [26]). As notedproblems in the form of (1.1) are generally NP-hard. There is no known algorithmthat can be assured of obtaining the optimal solution to NP-hard problems for largeproblems within a reasonable amount of computational effort and time. It may evenbe difficult just to obtain a feasible solution to such problems. The challenge is todiscover algorithms that can generate a good approximate solution to this class ofproblems for an effort that increases only slowly as a function of n.

If the requirement that x ∈ {0,1}n is dropped then typically the effort to find a localminimizer of (1.1) increases only slowly with the size of the problem. It would seemattractive if (1.1) could be replaced by a problem in continuous variables. The equiv-alence of (1.1) and that of finding the global minimizer of a problem with continuousvariables is well known (see [14, 34]). One simple way of enforcing x ∈ {0,1}n isto add the constraints 0 ≤ x ≤ e and xT(e − x) = 0, where e is the vector of unitelements. Clearly for xi to be feasible then xi = 0 or 1. The difficulty is that thereis a world of difference between finding a global minimizer as opposed to a localminimizer. If there are only a few local minimizers then the task is relatively easy butin this case it is possible that every vertex of the feasible region (and there could be2n vertices) is a local minimizer. Typically the minimizer found by an algorithm tofind a local minimizer is entirely determined by the starting point chosen. Traditionalmethods for global optimization (see [25]) perform particularly poorly on problemswith numerous local minimizers.

A number of approaches have been developed to handle mixed-integer nonlinearprogramming problems via a transformation of the discrete problem into a continu-ous problem. These approaches involve geometric, analytic and algebraic techniques,

Page 3: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 259

including the use of global or concave optimization formulations, semidefinite pro-gramming and spectral theory (see e.g., [9, 19, 20, 30, 31]). In particular, [23, 29]give an overview of many of these continuous approaches and interior-point meth-ods. Perhaps the most obvious approach is to simply ignore the discrete requirement,solve the resulting “relaxed” problem and then discretize the solution obtained byusing an intelligent rounding scheme (see [16], Chapter 7 and [22]). Such approacheswork best when the discrete variables are in some sense artificial. For example, whenoptimizing a pipe layout it is known that pipes are available only in certain sizes. Inprinciple pipes could be of any size and the physics of the model is valid for sizesother than those available. However, even in these circumstances there may be diffi-cult issues in rounding since it may not be easy to retain feasibility. It is not difficultto see that this approach can fail badly. Consider minimizing a strictly concave func-tion subject to the variables being binary. The relaxed solution has a discrete solutionand obviously there is no need to round. Rather than being good news it illustratesthe solution found is entirely determined by the choice of the initial point used in thecontinuous optimization algorithm. Given that such methods find local minimizersthe probability of determining the global minimizer is extremely small since theremay be 2n local minimizers. There are other problems in which little information canbe determined from the continuous solution. For example, in determining the opti-mum location of substations in an electrical grid relaxing the discrete requirementsresults in a substation being placed at every node that precisely matches the load. Inthis case this is the global minimizer of the continuous problem but it gives almostno information on which node to place a substation. To be feasible it is necessary toround up to avoid violating voltage constraints. Moreover, for many nodes the loadsare identical so any scheme that rounds up some variables and rounds down others inthe hope of the solution still being feasible has trouble identifying a good choice.

There are even fewer methods that are able to solve general problems with alarge number of variables. Three commercial packages that are available are DI-COPT (see [18]), SBB (see [5]) and BARON (see [32]). DICOPT is a package forsolving mixed-integer nonlinear programming problems based on extensions of theouter-approximation algorithm (see [10]) for the equality relaxation strategy and thensolving a sequence of nonlinear programming and mixed-integer linear programmingproblems. SBB is based on a combination of the standard branch-and-bound methodfor the mixed-integer linear programming problems and standard nonlinear program-ming solvers. BARON is a global optimization package based on the branch-and-reduce method (see [33]). These solver packages can in principle solve large non-linear mixed-integer programming problems. However, it will be seen that the per-formance of these algorithms, especially on large problems, are sometimes poor andthere is a clear need for better algorithms for this class of problems. It may be that asingle algorithm that can find good solutions for most large problems is a step too far,but it helps to add to the arsenal of solution methods new methods whose behaviorand character differ from current methodology.

Throughout this paper, we let ‖ · ‖ denote the 2-norm, i.e., the Euclidean norm, ofa vector, and let e denote the column vector of ones with dimension relevant to thecontext. If x and y are vectors in R

n, we let x ≤ y, x < y to mean respectively thatxi ≤ yi , xi < yi for every i ∈ {1,2, . . . , n}, and Diag(x) to denote the diagonal matrix

Page 4: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

260 W. Murray, K.-M. Ng

with the diagonal elements made up of the respective elements xi , i ∈ {1,2, . . . , n}.Also, the null space of a matrix A is defined as {x ∈ R

n : Ax = 0}. For a real-valuedfunction f , we say that f is a Ck function if it is kth-order continuously differen-tiable.

2 An exact penalty function

Rather than add nonlinear constraints we have chosen to add a penalty term∑

j∈J

xj (1 − xj ), (2.1)

with a penalty parameter γ > 0, where J is the index set of the variables that arejudged to require forcing to a bound. We could have added the penalty xT(e − x), butoften the objective is nearly concave and if there are not many equality constraintssome of the variables will be forced to be 0 or 1 without any penalty term beingrequired. The problem then becomes

Minimize F(x) � f (x) + γ∑

j∈J

xj (1 − xj )

subject to Ax = b,

0 ≤ x ≤ e.

(2.2)

In general, it is possible to show that under suitable assumptions, the penalty func-tion introduced this way is “exact” in the sense that the following two problems havethe same global minimizers for a sufficiently large value of the penalty parameter γ :

Minimize g(x)

subject to Ax = b,

x ∈ {0,1}n(2.3)

and

Minimize g(x) + γ xT(e − x)

subject to Ax = b,

0 ≤ x ≤ e.

(2.4)

An example of such a result (see for example [30]) is:

Theorem 2.1 Let g : [0,1]n → R be a C1 function and consider the two prob-lems (2.3) and (2.4) with the feasible region of (2.3) being non-empty. Then thereexists M > 0 such that for all γ > M , problems (2.3) and (2.4) have the same globalminimizers.

Note that although this is an exact penalty function it is twice continuously differ-entiable and the extreme ill-conditioning arising out of the need for γ → ∞ normallyrequired for smooth penalty functions is avoided. Indeed, a modestly large value ofγ is usually sufficient to indicate whether a variable is converging to 0 or 1.

Page 5: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 261

Since the penalty term is concave we are highly likely to introduce local mini-mizers at almost all the feasible integer points, and just as significantly, saddle pointsat interior points. It is clearly critical that any method used to solve the transformedproblem be one that can be assured of not converging to a saddle point.

The idea of using penalty methods for discrete optimization problems is not new(see, e.g., [3]). However, it is not sufficient to introduce only the penalty terms andhope to obtain the global minimizer by solving the resulting problem, because it ishighly likely that many undesired stationary points and local minimizers are beingintroduced in the process. This flaw also applies to the process of transforming adiscrete optimization problem into a global optimization problem simply by replacingthe discrete requirements of the variables with a nonlinear constraint. The danger ofusing the penalty function alone is illustrated by the following example:

Example Consider the problem min{x2 : x ∈ {0,1}}. It is clear that the globalminimizer is given by x∗ = 0. Suppose the problem has been transformed tomin0≤x≤1{x2 + γ x(1 − x)}, where γ > 0 is the penalty parameter. If γ > 2 thenthe first-order KKT conditions are satisfied at 0,1 and γ

2(γ−1). Suppose in a descent

method the initial point is chosen in the interval [ γ2(γ−1)

,1], then the sequence gener-ated will not converge to 0. Thus for large values of γ the probability of convergingto 0 from a random initial point is close to 0.5. A probability of 0.5 may not seemso bad but if we now consider the n-dimensional problem min{‖x‖2 : x ∈ {0,1}n},then every vertex is a local minimizer of the continuous problem and the probabilityof converging from a random initial point to the correct local minimizer is not muchbetter than 1/2n.

In the following we assume that the feasible space is bounded, which is the casefor the problems of interest.

Definition Let x̄ be a minimizer of (2.2) and S(x̄) denote the set of feasible pointsfor which there exists an arc emanating from x̄ such that the tangent to the arc at x isZT∇xf (x), where the columns of Z are a basis for the null space of the constraintsactive at x. Let the volume of S(x̄) be denoted by Sv(x̄). We may rank the minimizersin terms of the size of Sv(x̄).

If x∗ ∈ Rn is the global minimizer and V is the volume of the feasible space then

we can expect that

limn→∞Sv(x

∗)/V = 0.

It is this fact that makes it unlikely that simply applying a local search method to (2.2)will be successful. Indeed there is little to hope that a good local minimizer would befound. If the minima are uniformly distributed with mean M then the expected valueof f (x̄) for a random starting point is M . There is little reason to suppose M is closeto f (x∗).

There are alternative penalty functions. Moreover, xi(1 −xi) = 0 is a complemen-tarity condition and there is a vast literature on how to impose these conditions. Inaddition to penalty functions there are other means of trying to enforce variables to

Page 6: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

262 W. Murray, K.-M. Ng

take their discrete values. A particularly useful approach is to replace terms such ascTx that often appear in the objective, where ci > 0 (xi = 1 implies a capital cost),by the expression

∑ni=1 ci(1 − e−βxi ) with β = 10 typically. This term favors set-

ting variables to 0 and 1 rather than 0.5 and 0.5. It may be thought that it wouldalways prefer all variables to be 0, but that is typically prevented by constraints. Alsorounding up ensures feasibility but rounding down does not.

3 Smoothing methods

In general, the presence of multiple local minima in an optimization problem is com-mon and often makes the search for the global minimum very difficult. The fewer thelocal minima, the more likely it is that an algorithm that finds a local minimizer willfind the global minimum. Smoothing methods refer to a class of methods that replacethe original problem by either a single or a sequence of problems whose objective is“smoother”. In this context “smoother” may be interpreted as a function whose sec-ond or higher order derivatives are smaller in magnitude, with a straight line beingthe ultimate smooth function.

The concept of smoothing has already been exploited in nonconvex optimizationand discussed in the context of global optimization (see, e.g., [19]). There are a num-ber of ways to smooth a function and which is best depends to a degree on whatcharacteristics of the original function we are trying to suppress. Smoothing methodscan be categorized into two types: local or global smoothing. Local smoothing algo-rithms are particularly suited to problems in which noise arises during the evaluationof a function. Unfortunately, noise may produce many local minimizers, or it mayproduce a minimizer along the search direction in a linesearch method and result ina tiny step. Local smoothing has been suggested to eliminate poor minimizers thatare part of the true problem. Under such circumstances the introduction of smoothingalters the problem and may change the required global minimizer. To overcome this,a sequence of problems is solved in which the degree of smoothing is reduced to zero.While local smoothing may be useful in eliminating tiny local minimizers even whenthere are large numbers of them, they are less useful at removing significant but poorminimizers (see [25]).

We would like to transform the original problem with many local minima intoone that has fewer local minima or even just one local minimum and so obtaining aglobal optimization problem that is easier to solve (see Fig. 1). Obviously we wishto eliminate poor local minima. In the subsequent discussion, we consider modifiedobjective functions of the form

F(x,μ) = f (x) + μg(x),

where f and g are real-valued C2 functions on Rn and μ ≥ 0.

3.1 Global smoothing

The basic idea of global smoothing is to add a strictly convex function to the originalobjective, i.e.,

F (x,μ) = f (x) + μ�(x),

Page 7: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 263

Fig. 1 Effect of smoothing theoriginal optimization problem

where � is a strictly convex function. If � is chosen to have a Hessian that is suffi-ciently positive definite for all x, i.e., the eigenvalues of this Hessian are uniformlybounded away from zero, it implies that for μ large enough, F (x,μ) is strictly con-vex. Similar results or proofs of such an assertion can be found, for example, in[1, Lemma 3.2.1].

Theorem 3.1 Suppose f : [0,1]n → R is a C2 function and � : X → R is a C2

function such that the minimum eigenvalue of ∇2�(x) is greater than or equal toε for all x ∈ X, where X ⊂ [0,1]n and ε > 0. Then there exists M > 0 such that ifμ > M , then f + μ� is a strictly convex function on X.

Consequently, for μ sufficiently large, any local minimizer of F (x,μ) is also theunique global minimizer. Typically the minimizer of �(x) is known or is easy to findand hence minimizing F (x,μ) for large μ is also easy. This is important in smoothingmethods because the basic idea is to solve the problem for a decreasing sequence of μ

starting with a large value and ending with one that may be close to zero. The solutionx(μk) of minx F (x,μk) is used as the starting point of minx F (x,μk+1).

The idea behind global smoothing is similar to that of local smoothing, namely,the hope is that by adding μ�(x) to f (x), poor local minimizers will be eliminated.There are, however, important differences between global and local smoothing. A keyone is that local smoothing does not guarantee that the function is unimodal for suffi-ciently large values of the smoothing parameter. For example, if the algorithm in [24]is applied to the function cos(x), a multiple of cos(x) is obtained. Thus, the numberof minimizers of the smoothed function has not been reduced. It is easy to appreciatethat the global smoothing approach is largely independent of the initial estimate of asolution, since if the initial function is unimodal, the choice of initial point is irrele-vant to the minimizer found. When μ is decreased and the subsequent functions haveseveral minimizers, the old solution is used as the initial point. Consequently, whichminimizer is found is predetermined. Independence of the choice of initial point maybe viewed as both a strength and a weakness. What is happening is that any initialpoint is being replaced by a point close to the minimizer of �(x). An obvious con-cern is that convergence will then be to the minimizer closest to the minimizer of�(x). The key to the success of this approach is to choose �(x) to have a minimizer

Page 8: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

264 W. Murray, K.-M. Ng

that is not close to any of the minimizers of f (x). This may not seem easy, but it ispossible for constrained problems. If it is known that the minimizers are on the edgeof the feasible region (e.g., with concave objective functions), then the “center of thefeasible region” may be viewed as being removed from all of them. An example isthe analytic center and there are certainly many other choices of generating an initialfeasible solution since the feasible region is linearly constrained.

4 Logarithmic smoothing

Consider the following relaxation of (1.1):

Minimize f (x)

subject to Ax = b,

0 ≤ x ≤ e.

(4.1)

A suitable smoothing function that we have used is given by the logarithmic smooth-ing function:

�(x) ≡ −n∑

j=1

lnxj −n∑

j=1

ln(1 − xj ). (4.2)

This function is clearly well-defined when 0 < x < e. If any value of xj is 0 or 1,we have �(x) = ∞, which implies we can dispense with the bounds on x to get thefollowing transformed problem:

Minimize f (x) − μ

n∑

j=1

[lnxj + ln(1 − xj )]

subject to Ax = b,

(4.3)

where μ > 0 is the smoothing parameter. When a linesearch algorithm starts with aninitial point 0 < x0 < e, then all iterates generated by the linesearch also satisfy thisproperty, provided care is taken in the linesearch to ensure that the maximum steptaken is within the bounds 0 < x < e.

4.1 Properties of logarithmic smoothing function

The function �(x) is a logarithmic barrier function and is used with barrier methods(see [11]) to eliminate inequality constraints from a problem. In fact, (4.3) is some-times known as the barrier subproblem for (4.1). Our use of this barrier function isnot to eliminate the constraints but because a barrier function appears to be an idealsmoothing function. Elimination of the inequality constraints is a useful bonus. Italso enables us to draw upon the extensive theoretical and practical results concern-ing barrier methods.

A key property of the barrier term is that �(x) is strictly convex. If μ is largeenough, the function f + μ� will also be strictly convex, as follows from Theo-rem 3.1. By Theorem 8 of [11], under the assumptions already imposed on (4.1),

Page 9: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 265

if x∗(μ) is a solution to (4.3), then there exists a solution x∗ to (4.1) such thatlimμ↘0 x∗(μ) = x∗. The following theorem is then a consequence of the resultsof [11]:

Theorem 4.1 Let x(μ,γ ) be any local minimizer of

Minimize f (x) − μ

n∑

j=1

[lnxj + ln(1 − xj )] + γ∑

j∈J

xj (1 − xj )

subject to Ax = b,

0 ≤ x ≤ e.

(4.4)

Then limγ→∞ limμ↘0 xj (μ,γ ) = 0 or 1 for j ∈ J .

The general procedure of the barrier function method is to solve problem (4.3)approximately for a sequence of decreasing values of μ since [11] has shown thatthe solution to (4.3), x∗(μ), is a continuously differentiable curve. Note that iflimμ↘0 x∗(μ) = x∗ ∈ {0,1}n, then we need not solve (4.3) for μ very small becausethe rounded solution for a modestly small value of μ should be adequate.

Consider now the example given in Sect. 2 to illustrate the possible failure of theuse of a penalty function. The problem is now transformed to

Minimize x2 − μ logx − μ log(1 − x) + γ x(1 − x)

subject to 0 ≤ x ≤ 1,(4.5)

where μ > 0 is the barrier parameter. If x∗(μ,γ ) denotes a stationary point thenx∗(10,10) = 0.3588 and x∗(1,10) = 0.1072. Thus, rounding of x∗(μ,γ ) in thesecases will give the global minimizer. In fact, a trajectory of x∗(μ,10) can be obtainedsuch that x∗(μ,10) → 0 as μ ↘ 0.

We have in effect replaced the hard problem of a nonlinear integer optimizationproblem by what at first appearance is an equally hard problem of finding a globalminimizer for a problem with continuous variables and a large number of local min-ima. The basis for our optimism that this is not the case lies in how we can utilizethe parameters μ and γ and try to obtain a global minimizer, or at least a good localminimizer of the composite objective function. Note that the term xj (1 − xj ) attainsits maximum at xj = 1

2 and that the logarithmic barrier term attains its minimum atthe same point. Consequently, at this point, the gradient is given solely by f (x). Inother words, which vertex looks most attractive from the perspective of the objectiveis the direction we tilt regardless of the value of μ or γ . Starting at a neutral pointand slowly imposing integrality is a key idea in the approach we advocate.

Also, note that provided μ is sufficiently large compared to γ , the problem willhave a unique and hence global solution x∗(μ,γ ), which is a continuous function ofμ and γ . The hope is that the global or at least a good minimizer of (1.1) is the oneconnected by a continuous trajectory to x∗(μ,γ ) for μ large and γ small.

Even if a global minimizer is not identified, we hope to obtain a good local mini-mizer and perhaps combine this approach with traditional methods.

Page 10: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

266 W. Murray, K.-M. Ng

4.2 The parameters μ and γ

The parameter μ is the standard barrier parameter. However, its initial value and howit is adjusted is not how this is typically done when utilizing the normal barrier func-tion approach. For example, when a good estimate of a minimizer is known then inthe normal algorithm one tries to choose a barrier parameter that is compatible withthis point. By that we mean one that is suitably small. This is not required in the ap-proach advocated here. Indeed it would be a poor policy. Since we are trying to findthe global minimizer we do not want the iterates to converge to a local minimizer andit is vital that the iterates move away should the initial estimate be close to a localminimizer. The initial choice of μ is made to ensure it is suitably large. By that wemean it dominates the other terms in the objective. It is not difficult to do that sincethis does not incur a significant cost if it is larger than is necessary. The overall objec-tive is trivial to optimize when μ is large. Moreover, subsequent objectives are easyto optimize when μ is reduced at a modest rate. Consequently, the additional effortof overestimating μ is small. Underestimating μ increases the danger of convergingto a local minimizer. Unlike the regular barrier approach the parameter is reducedat a modest rate and a reasonable estimate is obtained to the solution of the currentsubproblem before the parameter is reduced. The basic idea is to stay close to the“central trajectory”.

Even though there may not be a high cost to overestimating the initial value of μ

it is sensible to make an attempt to estimate a reasonable value especially since thisimpacts the choice of γ . Generally, the initial value of μ can be set as the maximumeigenvalue of ∇2f (x) and it is easy to show that such a choice of μ would be suffi-cient to make the function f + μ� strictly convex. If a poor initial value is chosenthis can be deduced from the difference between the minimizer obtained and that ofthe minimizer of just the barrier function. This latter minimizer is easy to determineand may be used as the initial point for the first minimization. Another issue thatdiffers from the regular barrier function method is that there is no need to drive μ tonear zero before terminating. Consequently, the ill-conditioning that is typical whenusing barrier functions is avoided.

Estimating a suitable initial value for γ is more challenging. What is importantis not just its value but its value relative to μ and ‖∇2F(x)‖. The basic idea is thatwe start with a strongly convex function and as μ is decreased and γ is increasedthe function becomes nonconvex. However, it is important that the nonconvexity ofthe penalty term does not overwhelm the nonconvexity of F(x) in the early stages,and so the initial value of γ should be much smaller than the absolute value of theminimum eigenvalue of ∇2F(x). Typically, an initial value of γ could be as smallas 1% of the absolute value of the minimum eigenvalue of ∇2F(x). It helps that itis not necessary for γ → ∞. Provided a good initial value is chosen the issue ofadjusting it is not hard since it may be increased at a similar rate to that at which μ

is decreased. Usually, θμ, the rate of decrease of μ, is set to be a value lying in therange of [0.1,0.9], while θγ , the rate of increase of γ , is set to be a value lying in therange of [1.1,10] with θμθγ ≈ 1.

It is of interest to note what is the consequence of extreme strategies in the uncon-strained case. If the initial values of μ and γ are small we will get a rounded solution

Page 11: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 267

of the local minimizer that a typical active-set algorithm would converge from theinitial point chosen. If the initial value of μ is large but that of γ is kept small exceptin the later stages, we will obtain the rounded solution of an attempt to find the globalminimizer of F(x). While this is better than finding the rounded solution of an arbi-trary local minimizer it is not the best we can do. However, it is clear that of these twoextremes the former is the worse. What is not known is how small the initial value ofγ needs to be compared to the initial value of μ.

4.3 A simple example

To illustrate the approach, we show how it would work on the following two-variableproblem. Let

f (x1, x2) = −(x1 − 1)2 − (x2 − 1)2 − 0.1(x1 + x2 − 2).

Consider the following problem

Minimize f (x1, x2)

subject to xi ∈ {0,2} for i = 1,2.(4.6)

Since f is separable, solving problem (4.6) is equivalent to solving two identicalproblems with one variable and a global optimal solution of x∗

1 = x∗2 = 2 can easily

be deduced.In our approach, we first relax problem (4.6) to

Minimize f (x1, x2)

subject to 0 ≤ xi ≤ 2 for i = 1,2.(4.7)

We can then transform (4.7) to the following unconstrained optimization problem byintroducing logarithmic barrier terms:

Minimize F(x1, x2) ≡ f (x1, x2) − μ[log(x1) + log(2 − x1)

+ log(x2) + log(2 − x2)], (4.8)

where μ > 0 and we have also omitted the implicit bound constraints on x1 and x2.The contours of F for 4 different values of μ are illustrated in Fig. 2.

Problem (4.8) can be solved for each μ by applying Newton’s method to obtain asolution to the first-order necessary conditions of optimality for (4.8), i.e.,

∇F(x1, x2) =[−2x1 + 1.9 − μ

x1+ μ

2−x1

−2x2 + 1.9 − μx2

+ μ2−x2

]= 0.

Our approach involves solving (4.8) for a fixed μ = μl > 0 to obtain an iterate x(l) =(x

(l)1 , x

(l)2 ), and then using this iterate as the initial iterate to solve (4.8) for μ = μl+1,

where μl > μl+1 > 0. The initial iterate for the first μ parameter, μ0, is set as x(0) =(1,1), which is a point with equal Euclidean distances from the extreme points of theregion X = [0,2] × [0,2]. The resulting sequence of iterates forms a trajectory and

Page 12: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

268 W. Murray, K.-M. Ng

Fig. 2 The contours for theobjective function of (4.8) as theparameter μ varies

Fig. 3 Trajectories taken byiterates for problem (4.8)

converges to one of the extreme points of X. Fig. 3 illustrates the iterates as μ varies,as well as the possible trajectories. With the choice of a large μ0 value of 100, we areable to obtain a trajectory that converges to the optimal solution of (4.6).

Though the above example is only a two-variable problem, our approach can easilybe extended to problems with more variables. A point to note is that despite the localminimizers being quite similar both in terms of their objective function and the sizeof the corresponding sets Sv the transformed function is unimodal until μ is quitesmall and rounding at the solution just prior to the function becoming non-unimodalwould give the global minimizer. Of course this is a simple example and it remainsto be seen whether this feature is present for more complex problems.

Page 13: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 269

5 Description of the algorithm

Since the continuous problems of interest may have many local minimizers and sad-dle points, first-order methods are inadequate as they are only assured of convergingto points satisfying first-order optimality conditions. It is therefore imperative thatsecond-order methods be used in the algorithm. Any second-order method that isassured of converging to a solution of the second-order optimality conditions mustexplicitly or implicitly compute a direction of negative curvature for the reducedHessian matrix. A key feature of our approach is a very efficient second-order methodfor solving the continuous problem.

We may solve (4.4) for a specific choice of μ and γ by starting at a feasible pointand generating a descent direction, if one exists, in the null space of A. Let Z bea matrix with columns that form a basis for the null space of A. Then AZ = 0 andthe rank of Z is n − m. If x0 is any feasible point so that we have Ax0 = b, thefeasible region can be described as {x : x = x0 + Zy, y ∈ R

n−m}. Also, if we letφ be the restriction of F to the feasible region, the problem becomes the followingunconstrained problem:

Minimizey∈Rn−m

φ(y). (5.1)

Since the gradient of φ is ZT∇F(x), it is straightforward to obtain a stationary pointby solving the equation ZT∇F(x) = 0. This gradient is referred to as the reducedgradient. Likewise, the reduced Hessian, i.e., the Hessian of φ, is ZT∇2F(x)Z.

For small or moderately-sized problems, a variety of methods may be used (seee.g., [12, 15]). Here, we investigate the case where the number of variables is large.One approach to solving the problem is to use a linesearch method, such as thetruncated-Newton method (see [8]) we are adopting, in which the descent directionand a direction of negative curvature are computed. Instead of using the index set J

for the definition of F in our discussion, we let γ be a vector of penalty parameterswith zero values for those xi such that i /∈ J , and � = Diag(γ ).

The first-order optimality conditions for (4.4) (ignoring the implicit bounds on x)may be written as

∇f − μXge + �(e − 2x) + ATλ = 0,

Ax = b,(5.2)

where Xg = Diag(xg), (xg)i = 1xi

− 11−xi

, i = 1, . . . , n, and λ corresponds to theLagrange multiplier vector of the constraint Ax = b. Applying Newton’s methoddirectly, we obtain the system

[H AT

A 0

][�x

�λ

]=

[−∇f + μXge − � (e − 2x) − ATλ

b − Ax

], (5.3)

where H = ∇2f + μ Diag(xH ) − 2� and (xH )i = 1x2i

+ 1(1−xi )

2 , i = 1,2, . . . , n.

Assuming that x0 satisfies Ax0 = b, the second equation A�x = 0 implies that�x = Zy for some y. Substituting this into the first equation and premultiplying both

Page 14: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

270 W. Murray, K.-M. Ng

sides by ZT, we obtain

ZTHZy = ZT(−∇f + μXge − � (e − 2x)). (5.4)

To obtain a descent direction in this method, we first attempt to solve (5.4), orfrom the definition of F(x), the equivalent reduced Hessian system

ZT∇2F(xl)Zy = −ZT∇F(xl), (5.5)

by the conjugate gradient method, where xl is the lth iterate. Generally, Z may be alarge matrix, especially if the number of linear constraints is small. Thus, even though∇2F(xl) is likely to be a sparse matrix, ZT∇2F(xl)Z may be a large dense matrix.The virtue of the conjugate gradient method is that the explicit reduced Hessian neednot be formed. There may be specific problems where the structure of ∇2F and Z

does allow the matrix to be formed. Under such circumstances alternative methodssuch as those based on Cholesky factorization may also be applicable. Since we areinterested in developing a method for general application we have pursued the conju-gate gradient approach.

In the process of solving (5.5) with the conjugate gradient algorithm (see, [17]),we may determine that ZT∇2F(xl)Z is indefinite for some l. In such a case, weobtain a negative curvature direction q such that

qTZT∇2F(xl)Zq < 0.

This negative curvature direction is required to ensure that the iterates do not convergeto a saddle point. Also, the objective is decreased along this direction. In practice,the best choice for q is an eigenvector corresponding to the smallest eigenvalue ofZT∇2F(xl)Z. Computing q is usually expensive but fortunately unnecessary. A gooddirection of negative curvature will suffice and efficient ways of computing direc-tions within a modified-Newton algorithm are described in [2]. The descent directionin such modified-Newton algorithms may also be obtained using factorization meth-ods (see, e.g., [13]). In any case, it is essential to compute both a descent directionand a direction of negative curvature (when one exists). One possibility that mayarise is that the conjugate gradient algorithm terminates with a direction q such thatqTZT∇2F(xl)Zq = 0. In that case, we may have to use other iterative methods suchas the Lanczos method to obtain a direction of negative curvature as described in [2].The vector q is a good initial estimate to use in such methods.

If we let p be a suitable linear combination of the negative curvature direction q

with a descent direction, the convergence of the iterates is still ensured with the searchdirection Zp. The next iterate xl+1 is thus defined by xl + αlZp, where αl is beingdetermined using a linesearch. The iterations will be performed until ZT∇F(xl) issufficiently close to 0 and ZT∇2F(xl)Z is positive semidefinite. Also, as the currentiterate xl may still be some distance away from the actual optimal solution we areseeking, and since we do not necessarily use an exact solution of (5.5) to get thesearch direction, we only need to solve (5.5) approximately. An alternative is to usethe two directions to form a differential equation (see [7]), which leads to a searchalong an arc.

A summary of the new algorithm is shown below:

Page 15: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 271

Global Smoothing Algorithm: GSA

Set εF = tolerance for function evaluation,εμ = tolerance for barrier value,M = maximum penalty value,N = iteration limit for applying Newton’s method,θμ = ratio for barrier parameter decrement,θγ = ratio for penalty parameter increment,μ0 = initial barrier parameter,γ0 = initial penalty parameter,r = any feasible starting point.

Set γ = γ0, μ = μ0,while γ < M or μ > εμ.

Set x0 = r

for l = 0,1, . . . ,N ,if ‖ZT∇F(xl)‖ < εF μ,

Set xN = xl , l = N .Check if xN is a direction of negative curvature,

elseApply conjugate gradient algorithm to

[ZT∇2F(xl)Z]y = −ZT∇F(xl).Obtain pl as a combination of directions of descent and

negative curvature.Perform a linesearch to determine αl and set

xl+1 = xl + αlZpl ,end if

end forSet r = xN , μ = θμμ, γ = θγ γ .

end while

6 Computational results

To show the behavior of GSA we give some results on problems in which problemsize can be varied. The purpose of these results is more for illustration than as conclu-sive evidence of the efficacy of the new algorithm. Still the success achieved suggeststhat the algorithm has merit. More results are given in [26] and what these results hereshow is that there are problems for which GSA performs well. Moreover, the rate ofincrease in computational work as problem size increases is modest. All results forGSA were obtained from a MATLAB 6.5 implementation on an Intel Pentium IV2.4 GHz PC with 512 MB RAM running on Windows XP Operating System.

As a preliminary illustration of the performance of GSA, we consider the follow-ing set of test problems from [28]:

Minimize −(n − 1)

n∑

i=1

xi − 1

n

n/2∑

i=1

xi + 2∑

1≤i<j≤n

xixj

subject to xi ∈ {0,1} for i = 1,2, . . . , n.

(6.1)

Page 16: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

272 W. Murray, K.-M. Ng

In the case when n is even, Pardalos [28] shows that the unique global minimumof (6.1) is given by the feasible point

x∗ = (1, . . . ,1︸ ︷︷ ︸n/2

,0, . . . ,0︸ ︷︷ ︸n/2

),

with objective value −n2+24 , and there is an exponential number of discrete local

minima for (6.1). Here, a point x ∈ {0,1}n is a discrete local minimum if and only iff (x) ≤ f (y) for all points y ∈ {0,1}n adjacent to x, when all points are consideredto be vertices of the unit hypercube {0,1}n.

As ∇2f (x) = 2(eeT − I ) for this set of test problems, it is possible to sim-plify all the matrix-vector products involving ∇2f (x) in GSA by taking note that[∇2f (x)]v = 2[(eTv)e − v] for any v ∈ R

n, leading to more efficient computations.The initial μ and γ parameters are set as 100 and 0.1 respectively, with a μ-reductionratio of 0.1 and γ -increment ratio of 10. The analytic center of [0,1]n, i.e., 1

2e, is usedas the initial iterate and the tolerance is set as εμ = 10−3 for the termination criteria.In all the instances tested with the number of variables ranging from 1000 to 10000,GSA was able to obtain convergence to the global optimal solution. In particular, itis possible to show that using the initial iterate of 1

2e, the next iterate obtained byGSA would be of the form (u1, . . . , u1︸ ︷︷ ︸

n/2

, u2, . . . , u2︸ ︷︷ ︸n/2

) where u1 > 12 and u2 < 1

2 , i.e.,

an immediate rounding of this iterate would have given the global optimal solution.A plot of the computation time required versus the number of variables is shown

in Fig. 4. These results indicate that the amount of computational work in GSA doesnot increase drastically with an increase in problem size for this set of test problems.Moreover, it illustrates the ability of GSA to find the optimal solution to large prob-lems with as much as 10000 variables within a reasonable amount of computationtime, as well as the potential to exploit possible problem structure to improve com-putational efficiency in solving problems.

For purposes of evaluating the performance of GSA especially when we do notknow the exact optimal solutions to the test problems, we have used some of thenonlinear mixed-integer programming solvers of GAMS,1 namely DICOPT, SBB andBARON as described earlier. MINOS was used as the common underlying nonlinearprogramming solver required by all the three solvers, while CPLEX was used as theunderlying linear programming solver required by BARON. While DICOPT and SBBsolvers are able to handle nonlinear discrete optimization problems, these solversdo not necessarily attain the global optimal solution for nonconvex problems. AsBARON is a global optimization solver that could potentially return the exact optimalsolution or relevant bounds on the optimal objective value, we have made extensivecomparisons between GSA and BARON in certain computational instances. We haveused GAMS version 22.2 that includes BARON version 7.5.

When using the GAMS solvers, other than the maximum computation time, themaximum number of iterations and the maximum amount of workspace memoryallocated to the solvers, the other options were set to default values. We generally

1http://www.gams.com.

Page 17: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 273

Fig. 4 Graph of computationtime required (CPU seconds) forsolving instances ofproblem (6.1)

set a computation time limit of 10000 CPU seconds and the maximum number ofiterations to 500 000 for the runs. To ensure that the solvers have a reasonable amountof memory for setting up problem instances to solve, we have configured a maximumworkspace memory of 1 GB. It turns out that less than 100 MB were actually neededby all the GAMS solvers to set up each of the problem instances.

All the runs involving the GAMS solvers were mostly performed using the samePC used for GSA, except where noted. It should be pointed out that GSA requiresvery little additional memory other than that needed to define the problem, which isa tiny fraction of that used by the other methods considered here.

6.1 Unconstrained binary quadratic problems

One of the simpler classes of problems in the form of (1.1) with a nonlinear objectivefunction is that of the unconstrained binary quadratic problem, i.e., the objective func-tion is quadratic in the binary variables x, and both A and b are the zero matrix and thezero vector respectively. There are many relaxation and approximation approachesavailable for this class of problems, such as [4] and [21]. Despite its simplicity suchproblems are still hard to solve as the numerical testing reveals. What these problemsgive is another opportunity to examine performance as size increases even thoughsuch a numerical comparison may not be sufficient to measure the efficacy of GSAfor other types of nonlinear optimization problems with binary variables. Moreover,there exists publicly available test problems in this class of optimization problems.

The unconstrained binary quadratic problem (BQP) may be stated as follows:

Minimize xTPx + cTx

subject to x ∈ {0,1}n, (6.2)

where P ∈ Rn×n and c ∈ R

n. By noting that xTMx = xTMTx and cTx = xTCx forany feasible x and any M ∈ R

n×n, where C = Diag(c), it suffices to consider thefollowing “simpler” problem

Minimize xTQx

subject to x ∈ {0,1}n, (6.3)

where Q is a symmetric matrix.

Page 18: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

274 W. Murray, K.-M. Ng

Since any unconstrained quadratic programming problem with purely integer vari-ables from a bounded set can be transformed into (6.2), many classes of problemswould then fall under this category of BQP problems, such as the least-squares prob-lem with bounded integer variables:

Minimizex∈D

‖s − Ax‖,

where s ∈ Rm, A ∈ R

m×n, and D is a bounded subset of Zn.

A test set for BQP problems in the form of (6.3) is given in the OR-Library, whichis maintained by J.E. Beasley.2 The entries of matrix Q are integers uniformly drawnfrom [−100,100], with density 10%. As the test problems were formulated as max-imization problems, for purposes of comparison of objective values, we now maxi-mize the objective function.

The GSA was run on all 60 problems of Beasley’s test set ranging from 50 to2500 variables. The initial μ and γ parameters are set as 100 and 1 respectively, witha μ-reduction ratio of 0.5 and γ -increment ratio of 2. The analytic center of [0,1]n,i.e., 1

2e, is used as the initial iterate and the tolerance is set as εμ = 0.1 for the ter-mination criteria. For the GAMS solvers, we initially notice that DICOPT and SBBalways return the zero vector as the solution to the test problems, which could be dueto the solvers accepting a local solution in the underlying nonlinear programmingsubproblems being solved. To allow these two solvers to return non-zero solutionsfor comparison, we have added the cut eTx ≥ 1 or eTx ≥ 2 when applying DICOPTand SBB on the test problems. The objective values obtained by the GSA and therespective GAMS solvers, as well as the respective computation time and numberof iterations required are given in Tables 1–3. There are preprocessing procedures inBARON that attempt to obtain a good solution before letting the solver find the opti-mal or a better solution and the results obtained by these preprocessing procedures areshown in the tables. Finally, the objective values of the best known solutions basedon RUTCOR’s website3 are given in the last column of the tables.

BARON found the optimal solution for all 20 problems in the test sets involving 50to 100 binary variables, with the objective values listed in Tables 1(a), (b). Using theseoptimal objective values as a basis of comparison, we conclude that GSA successfullyfound 11 of these optima, with another 6 solutions obtained having less than 0.5%deviation from the optimal objective value. The percentage deviation in the remaining3 solutions obtained by the GSA were also close, with the worst one having 1.66%deviation from the optimal objective value. On the other hand, DICOPT obtainedresults ranging from 2.25% to 26.26% deviation from the optimal objective valuefor the 50-variable test problems and from 0.56% to 13.03% for the 100-variabletest problems, while SBB obtained results ranging from 1.9% to 26.26% deviationfrom the optimal objective value for the 50-variable test problems and from 0.56% to13.03% for the 100-variable test problems. For all these instances, GSA was able toobtain better objective values than DICOPT or SBB.

2http://mscmga.ms.ic.ac.uk/info.html.3http://rutcor.rutgers.edu/~pbo/families/beasley.htm.

Page 19: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 275

Tabl

e1

Num

eric

alco

mpa

riso

nof

algo

rith

ms

appl

ied

toB

easl

ey’s

BQ

Pte

stpr

oble

ms

(max

imiz

atio

nob

ject

ive)

with

(a)

50bi

nary

vari

able

s;(b

)10

0bi

nary

vari

able

s

Prob

lem

inst

ance

DIC

OPT

SBB

BA

RO

NG

SAB

est

know

nob

j.va

lue

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Upp

erbo

und

(a)

50_1

1547

0.03

1815

470.

3818

2030

0.08

2098

2098

0.09

120

980.

281.

0011

920

98

50_2

3360

0.98

2733

750.

1727

3678

0.08

3702

3702

0.12

137

020.

220.

8610

537

02

50_3

4463

0.05

3544

630.

0535

4626

0.08

4626

4626

0.09

146

260.

611.

6111

446

26

50_4

3412

0.36

6234

150.

2830

3544

0.11

3544

3544

0.16

135

440.

341.

0611

135

44

50_5

3796

0.03

3137

960.

0531

4012

0.08

4012

4012

0.11

140

120.

220.

6711

340

12

50_6

3610

0.98

162

3610

0.42

4636

590.

1236

9336

930.

271

3693

0.25

0.61

109

3693

50_7

4379

0.03

3643

790.

0336

4520

0.08

4520

4520

0.09

145

100.

061.

0611

445

20

50_8

4016

0.37

7941

360.

2041

4216

0.07

4216

4216

0.14

142

160.

270.

7211

342

16

50_9

3370

0.05

3533

700.

0435

3780

0.11

3780

3780

0.25

137

320.

360.

8011

537

80

50_1

030

470.

5713

730

470.

2131

3505

0.12

3507

3507

0.34

135

050.

580.

9111

435

07

(b)

100_

172

180.

0463

7218

0.08

6377

920.

1279

7079

7094

9.86

1527

7838

0.84

1.33

128

7970

100_

210

922

2.01

262

1092

20.

3867

1102

60.

1311

036

1103

614

.95

3510

986

0.49

1.17

125

1103

6

100_

312

652

0.92

224

1265

20.

4374

1272

30.

1612

723

1272

310

.38

912

723

0.34

1.13

127

1272

3

100_

410

136

0.08

6510

136

0.13

6510

368

0.10

1036

810

368

46.7

085

1036

80.

221.

0612

810

368

100_

586

640.

0660

8664

0.06

6089

160.

1490

8390

8356

.16

7590

400.

581.

5815

590

83

100_

695

790.

8618

995

860.

3468

1012

60.

1410

210

1021

080

5.09

1787

1010

10.

301.

3014

410

210

100_

796

760.

0663

9676

0.19

6310

035

0.14

1012

510

125

60.5

013

510

094

0.91

1.53

140

1012

5

100_

810

966

0.05

5610

966

0.10

5611

380

0.08

1143

511

435

15.3

041

1143

50.

521.

0012

411

435

100_

999

620.

0663

9962

0.07

6311

455

0.10

1145

511

455

7.94

1111

455

0.22

1.11

131

1145

5

100_

1012

027

0.79

206

1202

70.

2573

1254

70.

1712

565

1256

59.

9225

1254

70.

411.

2512

312

565

Page 20: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

276 W. Murray, K.-M. Ng

Fig. 5 Graph of averagepercentage difference inobjective value of solutionobtained by the variousalgorithms to best knownsolution when solving Beasley’sBQP test problems

For the test sets with more than 100 variables, BARON was not able to completethe runs by the computation time limit of 10000 seconds and so the runs were trun-cated with BARON returning the best found solution. Thus, we do not have optimalobjective values as a reference for these larger test problems, except for the upperbounds obtained by BARON or the objective values obtained from the RUTCORwebsite. We were unable to obtain any solution from BARON for the 2500-variabletest problems within the computation time limit and this is also the case despite try-ing out on another platform, which is an Intel Pentium IV 2.4 GHz server with 2 GBRAM running on Linux Operating System.

From Tables 2 and 3, we can see that GSA again produced better solutions thanDICOPT and SBB in all the instances. For all the test problems with 250 variables,GSA obtained objective values that have less than 2% deviation from the best-foundobjective value, with the worst one being 1.55% deviation. For the test problems with500 variables, GSA obtained objective values that are better than BARON in 7 out of10 instances, with all the instances having less than 0.64% deviation from the best-found objective value. For the test problems with 1000 variables, GSA obtained ob-jective values that are better than BARON in 9 out of 10 instances, with the remaininginstance having 0.62% deviation from the best-found objective value, which is alsothe worst deviation out of the 10 instances. For the test problems with 2500 vari-ables, GSA obtained objective values ranging from 0.09% to 0.43% deviation fromthe best-found objective value. A graph showing the average percentage differencein the objective value of the solution obtained by the GSA and the GAMS solvers tothat of the best known solution for Beasley’s test problems is given in Fig. 5.

Figure 6 gives a plot of the relative average computation time required to solvethe Beasley’s test problems by DICOPT, SBB, BARON and GSA, with respect toproblem size. Since we were unable to determine the computation time needed byBARON to complete its run for test problems of certain sizes, there were only limiteddata points for BARON in the figure. It can be seen that while there is a substantialincrease in computation time required by BARON to solve larger test problems, theincrease in computation time for the other solvers and GSA has been less significant.However, GSA has been able to maintain the quality of solutions obtained whencompared to all the three solvers for problems with increasing size as illustrated inFig. 5, indicating its efficiency and potential in obtaining good solutions to largeproblems without incurring unreasonable computation times.

Page 21: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 277

Tabl

e2

Num

eric

alco

mpa

riso

nof

algo

rith

ms

appl

ied

toB

easl

ey’s

BQ

Pte

stpr

oble

ms

(max

imiz

atio

nob

ject

ive)

with

(a)

250

bina

ryva

riab

les;

(b)

500

bina

ryva

riab

les

Prob

lem

inst

ance

DIC

OPT

SBB

BA

RO

NG

SAB

est

know

nob

j.va

lue

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Upp

erbo

und

(a)

250_

145

079

1.65

471

4507

90.

5016

445

453

1.81

4557

974

589.

8>

1e4

169

4546

32.

927.

9416

945

607

250_

243

117

0.28

178

4311

70.

3017

844

149

1.64

4443

874

902.

3>

1e4

193

4425

11.

366.

3015

044

810

250_

348

326

0.29

188

4832

60.

2718

848

947

2.09

4903

777

808.

8>

1e4

175

4894

74.

818.

7416

149

037

250_

441

127

0.33

174

4112

70.

2817

440

837

2.03

4121

971

448.

0>

1e4

180

4118

12.

305.

2014

741

274

250_

547

471

0.28

180

4747

10.

2818

047

928

1.81

4795

575

927.

8>

1e4

190

4784

54.

057.

5614

747

961

250_

639

502

1.14

304

3950

20.

4715

940

593

1.64

4077

774

665.

0>

1e4

198

4079

71.

845.

9514

541

014

250_

746

323

0.30

183

4632

30.

2818

346

757

1.75

4675

776

411.

5>

1e4

199

4675

72.

255.

4114

446

757

250_

834

037

1.53

467

3403

70.

6713

734

545

1.65

3565

068

558.

5>

1e4

225

3517

43.

817.

5315

935

726

250_

948

048

1.01

360

4804

80.

8018

148

690

1.70

4881

377

998.

3>

1e4

208

4870

52.

005.

6414

648

916

250_

1039

194

1.52

457

3926

40.

4815

840

140

1.61

4044

272

113.

3>

1e4

221

4019

83.

596.

8015

540

442

(b)

500_

111

4182

1.52

356

1141

821.

5035

611

5141

28.3

911

5141

3084

07.3

>1e

47

1158

4816

.77

26.8

916

611

6586

500_

212

5191

1.56

357

1251

911.

4235

712

8085

28.2

312

8085

3073

10.5

>1e

414

1280

0115

.84

26.5

615

912

8339

500_

312

8512

1.52

358

1285

121.

4835

813

0033

27.7

813

0547

3149

95.3

>1e

410

1308

1211

.80

18.1

714

813

0812

500_

412

5445

2.53

662

1265

361.

9137

812

8962

27.4

612

9298

3136

18.8

>1e

413

1296

4711

.44

31.5

318

413

0097

500_

512

3066

1.40

325

1230

661.

3332

512

4785

26.6

312

4785

3097

63.8

>1e

414

1251

419.

6718

.55

156

1254

87

500_

612

0863

1.54

368

1208

631.

5236

812

0258

26.7

712

1074

3081

83.5

>1e

413

1216

0312

.89

29.0

817

412

1772

500_

711

7619

1.39

326

1176

191.

4132

612

1748

24.8

912

2164

3099

75.5

>1e

410

1218

724.

6614

.80

165

1222

01

500_

811

8755

1.55

378

1187

551.

5937

812

2657

25.4

712

3282

3112

51.3

>1e

413

1233

2914

.06

26.3

917

612

3559

500_

911

5548

1.41

336

1155

481.

3633

611

9518

25.1

311

9921

3096

43.3

>1e

414

1204

5612

.84

23.0

516

212

0798

500_

1012

3266

1.42

342

1232

661.

5034

212

9963

25.1

513

0547

3147

25.0

>1e

414

1298

4914

.81

25.3

616

613

0619

Page 22: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

278 W. Murray, K.-M. Ng

Tabl

e3

Num

eric

alco

mpa

riso

nof

algo

rith

ms

appl

ied

toB

easl

ey’s

BQ

Pte

stpr

oble

ms

(max

imiz

atio

nob

ject

ive)

with

(a)

1000

bina

ryva

riab

les;

(b)

2500

bina

ryva

riab

les

Prob

lem

inst

ance

DIC

OPT

SBB

BA

RO

NG

SAB

est

know

nob

j.va

lue

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Upp

erbo

und

(a)

1000

_136

1439

13.7

618

3136

1439

19.6

372

537

0209

399.

5237

0513

1257

155.

8>

1e4

137

0926

37.0

884

.78

180

3714

38

1000

_234

6137

16.5

618

4134

6153

12.2

070

635

0619

400.

7735

2816

1252

151.

0>

1e4

135

3621

41.7

284

.44

167

3549

32

1000

_336

4005

14.2

517

7636

4009

11.8

171

636

8936

374.

8736

9864

1264

529.

8>

1e4

136

9972

31.8

683

.73

177

3712

36

1000

_436

9083

13.4

075

136

9083

11.6

375

136

8356

360.

2736

8770

1270

025.

0>

1e4

136

9894

31.8

067

.06

169

3706

75

1000

_534

2971

12.5

270

534

2971

10.8

470

534

8893

369.

6934

9774

1261

032.

5>

1e4

135

2358

42.2

583

.00

202

3527

60

1000

_635

3908

11.5

068

035

3908

10.4

868

035

7013

369.

3935

7610

1257

987.

3>

1e4

135

8578

48.2

073

.42

175

3596

29

1000

_736

3538

11.3

668

636

3538

10.5

868

636

9342

365.

8836

9342

1259

804.

3>

1e4

137

0484

71.8

310

7.25

208

3711

93

1000

_834

5011

11.5

671

634

5011

10.8

871

634

8733

365.

6634

8871

1253

878.

0>

1e4

135

0762

74.9

710

6.09

186

3519

94

1000

_934

3910

11.5

375

334

3910

11.5

675

334

6534

364.

2834

6761

1255

519.

3>

1e4

134

9187

45.5

371

.75

185

3493

37

1000

_10

3433

3712

.34

1823

3433

3711

.72

730

3476

6335

4.94

3497

1712

4115

8.5

>1e

41

3492

2235

.34

90.8

817

635

1415

(b)

2500

_114

9383

322

6.8

1851

1493

833

166.

218

51–

>1e

4–

–>

1e4

–15

1205

357

6.8

788.

9832

515

1594

4

2500

_214

5483

634

7.0

3100

1454

870

170.

518

80–

>1e

4–

–>

1e4

–14

6762

848

1.1

638.

125

114

7139

2

2500

_313

9688

118

5.9

5713

1396

881

178.

019

40–

>1e

4–

–>

1e4

–14

0848

078

6.1

926.

624

314

1419

2

2500

_414

9006

733

2.4

1909

1490

067

171.

219

09–

>1e

4–

–>

1e4

–15

0393

942

5.4

572.

722

315

0770

1

2500

_514

8172

717

8.0

1985

1481

727

177.

519

85–

>1e

4–

–>

1e4

–14

9017

935

7.6

669.

227

414

9181

6

2500

_614

6090

929

3.6

1981

1460

909

176.

719

81–

>1e

4–

–>

1e4

–14

6663

350

3.3

687.

521

614

6916

2

2500

_714

6307

632

2.0

3188

1463

076

183.

620

20–

>1e

4–

–>

1e4

–14

7271

640

0.2

634.

826

614

7904

0

2500

_814

7001

316

9.1

1893

1470

013

170.

018

93–

>1e

4–

–>

1e4

–14

8285

936

5.9

738.

124

214

8419

9

2500

_914

6061

118

5.5

1935

1460

611

173.

719

35–

>1e

4–

–>

1e4

–14

8049

451

6.7

714.

224

114

8241

3

2500

_10

1470

882

175.

519

4814

7088

217

4.7

1948

–>

1e4

––

>1e

4–

1477

279

354.

756

9.6

252

1483

355

Page 23: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 279

Fig. 6 Graph of relativeaverage computation timerequired by the algorithms forsolving Beasley’s BQP testproblems

As the number of local minima for the problems of Beasley’s test set may besmall, we also applied GSA to problems generated from the standardized test problemgenerator (Subroutine Q01MKD) written by P.M. Pardalos in [28], which may havea larger expected number of local minima. In generating the test matrices, we setthe density of the matrices to be 0.1, with all entries of the matrices being integersbounded between −100 and 100. The seed to initialize the random number generatorwas set to 1. Note that unlike Beasley’s test set, the test problems were formulatedin [28] as minimization problems.

The GSA was run on these test problems ranging from 200 to 1000 binary vari-ables. We used the same parameters as in Beasley’s test problems, i.e., initial μ andγ being 100 and 1 respectively with a μ-reduction ratio of 0.5 and γ -increment ratioof 2. The analytic center of [0,1]n, i.e., 1

2e, is also used as the initial iterate and a tol-erance of εμ = 0.1 is used for the termination criteria. The objective values obtainedby the GSA and the three GAMS solvers, as well as the respective computation timeand number of iterations required are given in Table 4. The DICOPT and SBB resultswere obtained after adding the cut eTx ≥ 1 as was done in the Beasley’s test sets.However, we found that while DICOPT and SBB were able to return solutions to allthe test problems, BARON was unable to return any solution for test problems withmore than 700 variables. As such, we have performed all the BARON runs on theLinux platform mentioned earlier and BARON was then able to return the solutionsreported in Table 4 for all the test problems.

From Table 4, we can see that GSA still produced better solutions than DICOPTand SBB in all the 17 instances. GSA also obtained objective values that are betterthan BARON in 13 instances. For the remaining 4 instances, other than one instancein which GSA obtained the same objective value as BARON, the percentage devia-tions from the best found solution with respect to objective values are all less than0.25%. Thus the solutions obtained by GSA for the remaining instances are also com-parable to that of BARON.

6.2 Other types of nonlinear optimization problems

The objective functions considered so far are relatively simple being purely quadraticfunctions and could be solved by specific methods for such problems, such as [27].

Page 24: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

280 W. Murray, K.-M. Ng

Tabl

e4

Num

eric

alco

mpa

riso

nof

algo

rith

ms

for

solv

ing

Pard

alos

’sB

QP

test

prob

lem

s(m

inim

izat

ion

obje

ctiv

e)w

ith20

0to

1000

bina

ryva

riab

les

Prob

lem

size

DIC

OPT

SBB

BA

RO

NG

SA

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Low

erbo

und

200

−349

133.

670

9−3

4913

2.4

179

−354

042.

1−3

5408

−4.8

e4>

1e4

341

−354

081.

85.

314

1

250

−443

945.

715

18−4

5289

2.5

235

−451

704.

8−4

5475

−7.6

e4>

1e4

118

−455

168.

013

.618

9

300

−545

261.

543

1−5

5448

3.5

241

−562

9210

.7−5

6936

−1.1

e5>

1e4

51−5

6966

5.0

11.9

154

350

−690

995.

114

60−6

9594

3.6

279

−698

9718

.6−7

0225

−1.5

e5>

1e4

26−7

0625

9.7

19.8

166

400

−806

685.

315

84−8

0772

2.6

283

−840

2632

.1−8

4786

−2.0

e5>

1e4

15−8

5448

11.4

23.7

186

450

−111

534

3.2

605

−111

534

3.4

320

−112

087

51.0

−112

340

−2.6

e5>

1e4

7−1

1279

025

.942

.319

0

500

−123

982

5.1

1307

−124

348

5.1

340

−129

212

75.2

−129

444

−3.2

e5>

1e4

5−1

2985

215

.628

.516

0

550

−141

026

6.4

1105

−144

857

7.4

478

−145

008

107.

6−1

4540

5−3

.9e5

>1e

45

−145

565

24.6

42.8

171

600

−150

479

7.4

1209

−151

236

12.0

454

−154

702

151.

4−1

5575

3−4

.6e5

>1e

41

−155

615

38.0

55.5

210

650

−188

059

7.8

907

−190

737

33.7

553

−192

353

206.

9−1

9235

3−5

.4e5

>1e

41

−192

902

22.8

49.1

179

700

−214

125

9.2

964

−214

181

27.7

549

−216

110

270.

6−2

1773

9−6

.2e5

>1e

41

−217

628

41.0

74.0

179

750

−221

304

12.3

2099

−223

495

30.5

596

−227

174

355.

1−2

2791

3−7

.1e5

>1e

41

−228

209

34.4

71.7

171

800

−243

821

14.8

2561

−245

284

32.3

613

−245

886

455.

6−2

4702

3−8

.1e5

>1e

41

−248

328

58.1

92.1

223

850

−266

907

16.7

1143

−267

325

26.3

683

−268

361

577.

2−2

6929

5−9

.2e5

>1e

41

−269

945

51.3

99.8

194

900

−276

661

21.0

1800

−276

718

39.2

650

−283

977

722.

9−2

8721

8−1

.0e6

>1e

41

−288

014

99.3

158.

022

5

950

−318

730

23.7

2527

−318

751

41.3

658

−326

427

891.

4−3

2642

7−1

.2e6

>1e

41

−325

640

72.0

148.

919

7

1000

−331

726

22.4

1292

−331

940

35.4

695

−334

556

1086

.1−3

3745

9−1

.3e6

>1e

41

−338

797

64.7

126.

020

6

Page 25: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 281

It may be seen that when BARON works well the preprocessor in the BARON algo-rithm often obtains good solutions. That is unlikely to be the case for real problemsespecially when continuous variables are present and appear nonlinearly. We now ap-ply GSA to test problems with nonlinear nonquadratic objective functions. The firstone considered is a quartic function generated from the product of two quadratic func-tions. For ease in deriving the optimal solution, we have used the following matrix Q

mentioned in Example 4 of [28] for our construction of the test problems:

Q =

⎢⎢⎢⎢⎣

15 −4 1 0 2−4 −17 2 1 11 2 −25 −8 10 1 −8 30 −52 1 1 −5 −20

⎥⎥⎥⎥⎦.

In that example, we know that the solution to problem (6.3) with this Q matrix isgiven by x∗ = (0,1,1,0,1)T with objective value −54.

We define the following (5n × 5n)-block diagonal matrix and (5n × 1)-vector:

Qn =

⎢⎢⎢⎣

Q 0 . . . 00 Q . . . 0...

.... . .

...

0 0 . . . Q

⎥⎥⎥⎦ ,

x∗n = (

(x∗)T(x∗)T . . . (x∗)T)T.

Then it is easy to see that an optimal solution to problem (6.3) with Q = Qn is givenby x = x∗

n with objective value −54n.The test problems can now be defined in the general form as follows:

Minimize (xTQnx + 54n + 1)(yTQny + 54n + 1)

subject to x, y ∈ {0,1}5n.(6.4)

The rationale for constructing these test problems in the above form, especially inthe addition of a constant to each quadratic term in the product, is evident from thefollowing observation:

Proposition 6.1 An optimal solution to problem (6.4) is given by x = y = x∗n with

objective value 1.

Proof From above, for any x ∈ {0,1}5n, xTQnx ≥ −54n with equality achievedwhen x = x∗

n . Then we have xTQnx +54n+1 ≥ 1 and yTQny +54n+1 ≥ 1 for anyx, y ∈ {0,1}5n with equality achieved when x = x∗

n and y = x∗n respectively. Thus,

(xTQnx + 54n + 1)(yTQny + 54n + 1) ≥ 1 for any x, y ∈ {0,1}5n with equalityachieved when x = y = x∗

n . �

Both GSA and the GAMS solvers were applied to this set of test problems with n

ranging from 1 to 200, i.e., the number of binary variables ranges from 10 to 2000.

Page 26: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

282 W. Murray, K.-M. Ng

More experimentation was being carried out for test problems with smaller problemsizes as we detected a deterioration in the performance of the GAMS solvers for smalltest problems. The initial μ and γ parameters of the GSA are set as 100 and 1 respec-tively, with a μ-reduction ratio of 0.5 and γ -increment ratio of 2. The analytic centerof [0,1]n, i.e., 1

2e, is used as the initial iterate and the tolerance is set as εμ = 0.1 forthe termination criteria. As we found that DICOPT and SBB are again returning thezero vector as the solution, we have also imposed the cut eTx + eTy ≥ 1 for the DI-COPT and SBB runs. The objective values obtained by the GSA and the three GAMSsolvers, as well as the respective computation time and number of iterations requiredare given in Table 5.

GSA found the optimal solution for all the 14 instances and the computation timerequired to reach the optimal solution is less than 30 seconds for each instance, withalmost all the instances requiring less than 10 seconds. BARON was able to findthe optimal solution to the first 3 instances, with a significant increase in computa-tion time even for a small instance of 30 variables. The objective values obtained byBARON for the other 11 instances with size more than 30 variables have not beensatisfactory. The lower bounds found by BARON are also poor for these larger in-stances and are hardly helpful. DICOPT and SBB were unable to find any optimalsolution to all the instances.

In the test problems we have considered so far, the objective function involvespolynomial terms only. To show that GSA is able to handle other types of differ-entiable objective functions, we have constructed another set of test problems thatincludes a sum of exponential functions in the objective function. The objective func-tion constructed here follows closely to that of problem (6.1). Also, the previous testproblems do not show GSA’s performance on test problems with constraints. Thus,we have added a linear constraint in this set of test problems, which also enables aneasier argument for deriving the optimal solution.

The test problems can be defined as follows:

Minimize −(n − 1)

n∑

i=1

xi − 1

n

n/2∑

i=1

xi + 2∑

1≤i<j≤n

exixj

subject ton∑

i=1

xi ≤ n

2,

xi ∈ {0,1} for i = 1,2, . . . , n,

(6.5)

where n is even. By extending the argument in [28], we have the following result forthis set of test problems:

Proposition 6.2 x∗ = (1, . . . ,1︸ ︷︷ ︸p∗

,0, . . . ,0︸ ︷︷ ︸n−p∗

) is an optimal solution to problem (6.5)

with objective value

(e − 1)(p∗)2 +(

2 − n − 1

n− e

)p∗ + n2 − n, where p∗ =

[n + 1

n+ e − 2

2(e − 1)

]

and [x] refers to the integer obtained from rounding x.

Page 27: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 283

Tabl

e5

Num

eric

alco

mpa

riso

nof

algo

rith

ms

for

solv

ing

test

prob

lem

s(6

.4)

(min

imiz

atio

nob

ject

ive)

with

10to

2000

vari

able

s

Prob

lem

size

DIC

OPT

SBB

BA

RO

NG

SA

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Low

erbo

und

5×2

760

0.14

1220

900.

134

550.

091

10.

4165

12.

8937

.80

124

10×2

2223

0.31

121.

0e4

0.22

427

60.

071

111

.12

959

11.

6936

.72

117

15×2

4408

0.30

82.

4e4

0.16

416

660.

081

146

1.23

2397

61

2.99

35.3

010

8

20×2

380.

5552

4.3e

40.

204

5461

0.08

22−6

808

>1e

422

1691

13.

1326

.64

101

25×2

1.2e

40.

299

6.9e

40.

185

9216

0.04

28−2

.9e4

>1e

474

542

12.

6932

.22

103

30×2

1.6e

40.

288

1.0e

50.

165

1.3e

40.

0863

−6.3

e4>

1e4

4435

51

2.91

22.0

910

0

35×2

380.

3424

1.4e

50.

205

1.8e

40.

0512

9−1

.1e5

>1e

430

105

13.

5033

.75

106

40×2

380.

5624

1.8e

50.

175

2.3e

40.

0946

3−1

.7e5

>1e

420

657

14.

8036

.52

105

45×2

380.

7230

2.3e

50.

255

3.0e

40.

0654

7−2

.3e5

>1e

414

737

15.

3737

.75

115

50×2

4.1e

40.

339

2.8e

50.

165

3.6e

40.

1161

6−3

.1e5

>1e

411

341

15.

0442

.28

103

100×2

380.

3632

1.2e

60.

175

1.5e

50.

2814

11−1

.7e6

>1e

420

491

4.61

30.2

511

3

250×2

9.3e

50.

288

7.2e

60.

255

9.0e

51.

2336

55−1

.2e7

>1e

424

31

4.35

47.5

410

9

500×2

3.7e

60.

298

2.9e

70.

365

3.6e

65.

0678

73−4

.8e7

>1e

446

18.

1980

.26

138

1000

×21.

2e8

0.09

11.

2e8

0.09

11.

4e7

26.0

41.

4e7

−1.9

e8>

1e4

71

24.7

718

2.97

124

Page 28: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

284 W. Murray, K.-M. Ng

Proof For any feasible x, i.e., any x with exactly p ones and (n − p) zeros, where0 ≤ p ≤ n

2 ,

f (x) = −(n − 1)p − p̄

n+ p(p − 1)e(1)(1) + (n(n − 1) − p(p − 1))e0

= (e − 1)p2 + (2 − n − e)p − p̄

n+ n2 − n

for some p̄ where 0 ≤ p̄ ≤ p. By having the p ones occurring in the first n/2 variablesof x, f (x) assumes the minimum value of (e − 1)p2 + (2 − n − 1

n− e)p + n2 − n

for a fixed value of p. Since this convex function in p is minimized when

p = − (2 − n − 1n

− e)

2(e − 1), p∗ =

[− (2 − n − 1

n− e)

2(e − 1)

]

will be an integer that minimizes this quadratic function over integral values of p

lying in [0, n2 ]. �

Both the GSA and the GAMS solvers were applied to this set of test problemswith n ranging from 10 to 1000. We have 3 instances with n ≤ 20 (n = 10,16,20)to allow a better comparison of the performance and the computation time needed bythe GSA and the GAMS solvers for solving the smaller test problems.

When applying GSA to (6.5), we can first convert the linear inequality constraintto an equality constraint by adding a non-negative slack variable s. An appropriateglobal smoothing function is then given by

�(x, s) = −n∑

j=1

lnxj −n∑

j=1

ln(1 − xj ) − ln s, (6.6)

so that we are handling the following relaxed equality-constrained problem:

Minimize f (x) + μ�(x, s)

subject ton∑

i=1

xi + s = n

2,

0 ≤ x ≤ e, s ≥ 0.

(6.7)

The initial μ parameter is set to 100 with a reduction ratio of 0.9. For n ≤ 20,a small initial γ parameter of 0.01 is sufficient for convergence to integral solu-tions. For larger dimensional problems, the initial γ parameter is progressively in-creased to reflect the increase in the objective as n increases. In particular, forn = 50,100,200,500 and 1000, the initial γ parameter is set to 5, 20, 80, 250 and500 respectively. The increment ratio for the γ parameter is set to the reciprocal ofthe μ-reduction ratio of 0.9, i.e., 10

9 and the tolerance is set as εμ = 10−3 for thetermination criteria. We chose the initial iterate as n

2(n+1)e based on uniform distrib-

ution of n2 over the x and s variables. The objective values obtained by the GSA and

the three GAMS solvers, as well as the respective computation time and number of

Page 29: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 285

Fig. 7 Graph of relativeaverage computation timerequired by the algorithms forsolving instances ofproblem (6.5)

iterations required are given in Table 6. The global objective values are also includedas a basis for performance comparisons.

From Table 6, we can see that GSA outperforms or performs as well as all theGAMS solvers in terms of the objective values obtained for all the 8 instances. Inparticular, GSA found the optimal solution to the test problems for 5 of the instances.For all the remaining instances, GSA has obtained solutions with less than 0.025%deviation from the optimal objective values. DICOPT was unable to obtain any opti-mal solution for all the instances and its solutions have percentage deviation from theoptimal objective values ranging from 0.79% to 8.77%. SBB was able to obtain theoptimal solution to 4 instances and for 3 of the remaining instances, SBB’s solutionshave percentage deviation from the optimal objective values ranging from 0.032% to0.056%. However, SBB was unable to obtain any solution for the largest instance with1000 variables. BARON was also able to obtain the optimal solution to 3 instancesand for 3 of the remaining instances, BARON’s solutions have percentage deviationfrom the optimal objective values ranging from 1.18% to 8.73%. However, BARONwas unable to obtain any solution for the instances with 500 and 1000 variables. De-spite running SBB and BARON on the Linux platform mentioned in Sect. 6.1, bothsolvers were still not able to produce any feasible solution for the respective largeinstances of the test set.

Figure 7 gives a plot of the relative average computation times required by GSAand the GAMS solvers for solving test problem (6.5). Some data points for BARONand SBB were not included for reasons discussed above. It can be seen that both GSAand SBB do not show a significant increase in the computation time needed withproblem size, unlike BARON or DICOPT. In addition, GSA also appears to have asmaller rate of increase in computation time with problem size when compared withSBB.

7 Conclusions

We have described a global smoothing algorithm that enables optimization methodsfor continuous problems to be applied to certain types of discrete optimization prob-lems. We were surprised by the quality of the solutions from what is a relatively un-sophisticated implementation and without any need to fiddle with parameters. In thenumerical experiments conducted, the performance of GSA relative to alternativesboth in terms of the quality of solution and time to solution improved dramatically as

Page 30: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

286 W. Murray, K.-M. Ng

Tabl

e6

Num

eric

alco

mpa

riso

nof

algo

rith

ms

for

solv

ing

test

prob

lem

s(6

.5)

(min

imiz

atio

nob

ject

ive)

with

10to

1000

vari

able

s

Prob

lem

DIC

OPT

SBB

BA

RO

NG

SA

Size

Glo

bal

obj.

valu

e

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Bes

tob

j.va

lue

Tota

ltim

e(s

ec)

Tota

litn

.co

unt.

Prep

roce

ssin

gPo

stpr

oces

sing

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Bes

tob

j.va

lue

Tim

eto

best

(sec

)

Tota

ltim

e(s

ec)

Tota

litn

.co

unt

Obj

.va

lue

Tim

e(s

ec)

Obj

.va

lue

Low

erbo

und

1073

790.

430

7334

.717

5879

0.1

7373

6.0

348

737.

111

.990

4

1619

921

60.

439

199

35.4

3985

216

0.2

199

199

498.

014

086

199

11.2

17.3

890

2031

734

40.

444

317

35.8

4203

344

0.2

317

315

>1e

418

9748

317

12.1

16.7

883

5020

7622

565.

184

2076

260.

935

9322

551.

821

0014

26>

1e4

481

2076

11.0

23.9

645

100

8424

9160

6.7

146

8429

108.

937

9491

5922

.791

5949

50>

1e4

484

2519

.050

.757

5

200

3393

836

911

25.1

262

3395

057

7.3

4385

3515

328

1.1

3515

319

900

>1e

41

3393

835

.612

7.7

505

500

2130

2223

1713

289.

862

321

3090

5000

.471

07–

>1e

4–

–>

1e4

–21

3060

78.5

455.

443

6

1000

8532

9786

0074

1250

.582

5–

>1e

4–

–>

1e4

––

>1e

4–

8534

9711

4.5

1460

.239

5

Page 31: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

An algorithm for nonlinear optimization problems 287

the size of the problem increases, reflecting the encouraging feature of the low rateof growth in the work required by the new algorithm to solve problems as the size ofthe problem increases. Moreover, the memory requirements of the algorithm are alsomodest and typically add little to that required to prescribe the problem. Alternativemethods are often memory intensive.

Given the difference in character to that of other algorithms, GSA could be used inconjunction with alternative methods by obtaining good solutions for the alternativemethods, such as by providing a good upper bound in a branch-and-bound method.Certain improvement heuristics could also be applied to the solutions obtained byGSA in a hybrid algorithm framework. These are possible areas of future research, aswell as the possibility of extending GSA to solve more general classes of nonlineardiscrete optimization problems.

Acknowledgements The authors would like to thank the referees whose suggestions and comments ledto improvements to the paper.

References

1. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1995)2. Boman, E.G.: Infeasibility and negative curvature in optimization. Ph.D. thesis, Scientific Computing

and Computational Mathematics Program, Stanford University, Stanford (1999)3. Borchardt, M.: An exact penalty approach for solving a class of minimization problems with boolean

variables. Optimization 19(6), 829–838 (1988)4. Burer, S., Vandenbussche, D.: Globally solving box-constrained nonconvex quadratic programs with

semidefinite-based finite branch-and-bound. Comput. Optim. Appl. (2007). doi: 10.1007/s10589-007-9137-6

5. Bussieck, M.R., Drud, A.S.: SBB: a new solver for mixed integer nonlinear programming, OR 2001presentation. http://www.gams.com/presentations/or01/sbb.pdf (2001)

6. Cela, E.: The Quadratic Assignment Problem: Theory and Algorithms. Kluwer Academic, Dordrecht(1998)

7. Del Gatto, A.: A subspace method based on a differential equation approach to solve unconstrainedoptimization problems. Ph.D. thesis, Management Science and Engineering Department, StanfordUniversity, Stanford (2000)

8. Dembo, R.S., Steihaug, T.: Truncated-Newton algorithms for large-scale unconstrained optimization.Math. Program. 26, 190–212 (1983)

9. Du, D.-Z., Pardalos, P.M.: Global minimax approaches for solving discrete problems. In: Recent Ad-vances in Optimization: Proceedings of the 8th French–German Conference on Optimization, Trier,21–26 July 1996, pp. 34–48. Springer, Berlin (1997)

10. Duran, M.A., Grossmann, I.E.: An outer approximation algorithm for a class of mixed-integer non-linear programs. Math. Program. 36, 307–339 (1986)

11. Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained MinimizationTechniques. Wiley, New York/Toronto (1968)

12. Forsgren, A., Gill, P.E., Murray, W.: Computing modified Newton directions using a partial Choleskyfactorization. SIAM J. Sci. Comput. 16, 139–150 (1995)

13. Forsgren, A., Murray, W.: Newton methods for large-scale linear equality-constrained minimization.SIAM J. Matrix Anal. Appl. 14(2), 560–587 (1993)

14. Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput.34, 39–60 (1989)

15. Gill, P.E., Murray, W.: Newton-type methods for unconstrained and linearly constrained optimization.Math. Program. 7, 311–350 (1974)

16. Gill, P.E., Murray, W., Wright, M.: Practical Optimization. Academic Press, London (1981)17. Golub, G.H., Van Loan, C.F.: Matrix Computation. John Hopkins University Press, Baltimore/London

(1996)

Page 32: An algorithm for nonlinear optimization problems with ...web.stanford.edu/class/cme334/docs/2012-10-22... · solving a sequence of nonlinear programming and mixed-integer linear programming

288 W. Murray, K.-M. Ng

18. Grossmann, I.E., Viswanathan, J., Vecchietti, A., Raman, R., Kalvelagen, E.: GAMS/DICOPT: a dis-crete continuous optimization package (2003)

19. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (1996)20. Leyffer, S.: Deterministic methods for mixed integer nonlinear programming. Ph.D. thesis, Depart-

ment of Mathematics & Computer Science, University of Dundee, Dundee (1993)21. Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization. SIAM J. Optim.

1, 166–190 (1991)22. Mawengkang, H., Murtagh, B.A.: Solving nonlinear integer programs with large-scale optimization

software. Ann. Oper. Res. 5, 425–437 (1985)23. Mitchell, J., Pardalos, P.M., Resende, M.G.C.: Interior point methods for combinatorial optimization.

In: Handbook of Combinatorial Optimization, vol. 1, pp. 189–298 (1998)24. Moré, J.J., Wu, Z.: Global continuation for distance geometry problems. SIAM J. Optim. 7, 814–836

(1997)25. Murray, W., Ng, K.-M.: Algorithms for global optimization and discrete problems based on methods

for local optimization. In: Pardalos, P., Romeijn, E. (eds.) Heuristic Approaches. Handbook of GlobalOptimization, vol. 2, pp. 87–114. Kluwer Academic, Boston (2002), Chapter 3

26. Ng, K.-M.: A continuation approach for solving nonlinear optimization problems with discrete vari-ables. Ph.D. thesis, Management Science and Engineering Department, Stanford University, Stanford(2002)

27. Pan, S., Tan, T., Jiang, Y.: A global continuation algorithm for solving binary quadratic programmingproblems. Comput. Optim. Appl. 41(3), 349–362 (2008)

28. Pardalos, P.M.: Construction of test problems in quadratic bivalent programming. ACM Trans. Math.Softw. 17(1), 74–87 (1991)

29. Pardalos, P.M.: Continuous approaches to discrete optimization problems. In: Di, G., Giannesi, F.(eds.) Nonlinear Optimization and Applications, pp. 313–328. Plenum, New York (1996)

30. Pardalos, P.M., Rosen, J.B.: Constrained Global Optimization: Algorithms and Applications. Springer,Berlin (1987)

31. Pardalos, P.M., Wolkowicz, H.: Topics in Semidefinite and Interior-Point Methods. Am. Math. Soc.,Providence (1998)

32. Sahinidis, N.V.: BARON global optimization software user manual. http://archimedes.scs.uiuc.edu/baron/manuse.pdf (2000)

33. Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: A theo-retical and computational study. Math. Program. 99(3), 563–591 (2004)

34. Zhang, L.-S., Gao, F., Zhu, W.-X.: Nonlinear integer programming and global optimization. J. Com-put. Math. 17(2), 179–190 (1999)