an algorithm for nonlinear optimization problems with...
TRANSCRIPT
Comput Optim Appl (2010) 47: 257–288DOI 10.1007/s10589-008-9218-1
An algorithm for nonlinear optimization problemswith binary variables
Walter Murray · Kien-Ming Ng
Received: 6 April 2007 / Revised: 22 October 2008 / Published online: 2 December 2008© Springer Science+Business Media, LLC 2008
Abstract One of the challenging optimization problems is determining the mini-mizer of a nonlinear programming problem that has binary variables. A vexing diffi-culty is the rate the work to solve such problems increases as the number of discretevariables increases. Any such problem with bounded discrete variables, especially bi-nary variables, may be transformed to that of finding a global optimum of a problemin continuous variables. However, the transformed problems usually have astronom-ically large numbers of local minimizers, making them harder to solve than typicalglobal optimization problems. Despite this apparent disadvantage, we show that theapproach is not futile if we use smoothing techniques. The method we advocate firstconvexifies the problem and then solves a sequence of subproblems, whose solutionsform a trajectory that leads to the solution. To illustrate how well the algorithm per-forms we show the computational results of applying it to problems taken from theliterature and new test problems with known optimal solutions.
Keywords Nonlinear integer programming · Discrete variables · Smoothingmethods
The research of W. Murray was supported by Office of Naval Research Grant N00014-08-1-0191 andArmy Grant W911NF-07-2-0027-1.
W. Murray (�)Systems Optimization Laboratory, Department of Management Science and Engineering,Stanford University, Stanford, CA 94305-4022, USAe-mail: [email protected]
K.-M. NgDepartment of Industrial & Systems Engineering, National University of Singapore, Singapore,Singaporee-mail: [email protected]
258 W. Murray, K.-M. Ng
1 Introduction
Nonlinear optimization problems whose variables can only take on integer quantitiesor discrete values abound in the real world, yet there are few successful solution meth-ods and even one of the simplest cases with quadratic objective function and linearconstraints is NP-hard (see, e.g., [6]). We focus our interest on problems with linearequality constraints and binary variables. However, once the proposed algorithm hasbeen described it will be seen that extending the algorithm to more general cases,such as problems with linear inequality constraints, or problems with some continu-ous variables is trivial. Also, the binary variable requirement is not restrictive sinceany problem with bounded discrete variables can easily be transformed to a problemwith binary variables.
The problem of interest can thus be stated as
Minimize f (x)
subject to Ax = b,
x ∈ {0,1}n,(1.1)
where f : Rn → R is a twice continuously differentiable function, A is an m × n
matrix with rank m and b ∈ Rm is a column vector. Assuming differentiability may
seem strange when x is discrete. (Often f (x) cannot be evaluated unless every xj = 0or 1.) Nonetheless for a broad class of real problems f (x) is smooth and when it isnot it is often possible to alter the original function suitably (see [26]). As notedproblems in the form of (1.1) are generally NP-hard. There is no known algorithmthat can be assured of obtaining the optimal solution to NP-hard problems for largeproblems within a reasonable amount of computational effort and time. It may evenbe difficult just to obtain a feasible solution to such problems. The challenge is todiscover algorithms that can generate a good approximate solution to this class ofproblems for an effort that increases only slowly as a function of n.
If the requirement that x ∈ {0,1}n is dropped then typically the effort to find a localminimizer of (1.1) increases only slowly with the size of the problem. It would seemattractive if (1.1) could be replaced by a problem in continuous variables. The equiv-alence of (1.1) and that of finding the global minimizer of a problem with continuousvariables is well known (see [14, 34]). One simple way of enforcing x ∈ {0,1}n isto add the constraints 0 ≤ x ≤ e and xT(e − x) = 0, where e is the vector of unitelements. Clearly for xi to be feasible then xi = 0 or 1. The difficulty is that thereis a world of difference between finding a global minimizer as opposed to a localminimizer. If there are only a few local minimizers then the task is relatively easy butin this case it is possible that every vertex of the feasible region (and there could be2n vertices) is a local minimizer. Typically the minimizer found by an algorithm tofind a local minimizer is entirely determined by the starting point chosen. Traditionalmethods for global optimization (see [25]) perform particularly poorly on problemswith numerous local minimizers.
A number of approaches have been developed to handle mixed-integer nonlinearprogramming problems via a transformation of the discrete problem into a continu-ous problem. These approaches involve geometric, analytic and algebraic techniques,
An algorithm for nonlinear optimization problems 259
including the use of global or concave optimization formulations, semidefinite pro-gramming and spectral theory (see e.g., [9, 19, 20, 30, 31]). In particular, [23, 29]give an overview of many of these continuous approaches and interior-point meth-ods. Perhaps the most obvious approach is to simply ignore the discrete requirement,solve the resulting “relaxed” problem and then discretize the solution obtained byusing an intelligent rounding scheme (see [16], Chapter 7 and [22]). Such approacheswork best when the discrete variables are in some sense artificial. For example, whenoptimizing a pipe layout it is known that pipes are available only in certain sizes. Inprinciple pipes could be of any size and the physics of the model is valid for sizesother than those available. However, even in these circumstances there may be diffi-cult issues in rounding since it may not be easy to retain feasibility. It is not difficultto see that this approach can fail badly. Consider minimizing a strictly concave func-tion subject to the variables being binary. The relaxed solution has a discrete solutionand obviously there is no need to round. Rather than being good news it illustratesthe solution found is entirely determined by the choice of the initial point used in thecontinuous optimization algorithm. Given that such methods find local minimizersthe probability of determining the global minimizer is extremely small since theremay be 2n local minimizers. There are other problems in which little information canbe determined from the continuous solution. For example, in determining the opti-mum location of substations in an electrical grid relaxing the discrete requirementsresults in a substation being placed at every node that precisely matches the load. Inthis case this is the global minimizer of the continuous problem but it gives almostno information on which node to place a substation. To be feasible it is necessary toround up to avoid violating voltage constraints. Moreover, for many nodes the loadsare identical so any scheme that rounds up some variables and rounds down others inthe hope of the solution still being feasible has trouble identifying a good choice.
There are even fewer methods that are able to solve general problems with alarge number of variables. Three commercial packages that are available are DI-COPT (see [18]), SBB (see [5]) and BARON (see [32]). DICOPT is a package forsolving mixed-integer nonlinear programming problems based on extensions of theouter-approximation algorithm (see [10]) for the equality relaxation strategy and thensolving a sequence of nonlinear programming and mixed-integer linear programmingproblems. SBB is based on a combination of the standard branch-and-bound methodfor the mixed-integer linear programming problems and standard nonlinear program-ming solvers. BARON is a global optimization package based on the branch-and-reduce method (see [33]). These solver packages can in principle solve large non-linear mixed-integer programming problems. However, it will be seen that the per-formance of these algorithms, especially on large problems, are sometimes poor andthere is a clear need for better algorithms for this class of problems. It may be that asingle algorithm that can find good solutions for most large problems is a step too far,but it helps to add to the arsenal of solution methods new methods whose behaviorand character differ from current methodology.
Throughout this paper, we let ‖ · ‖ denote the 2-norm, i.e., the Euclidean norm, ofa vector, and let e denote the column vector of ones with dimension relevant to thecontext. If x and y are vectors in R
n, we let x ≤ y, x < y to mean respectively thatxi ≤ yi , xi < yi for every i ∈ {1,2, . . . , n}, and Diag(x) to denote the diagonal matrix
260 W. Murray, K.-M. Ng
with the diagonal elements made up of the respective elements xi , i ∈ {1,2, . . . , n}.Also, the null space of a matrix A is defined as {x ∈ R
n : Ax = 0}. For a real-valuedfunction f , we say that f is a Ck function if it is kth-order continuously differen-tiable.
2 An exact penalty function
Rather than add nonlinear constraints we have chosen to add a penalty term∑
j∈J
xj (1 − xj ), (2.1)
with a penalty parameter γ > 0, where J is the index set of the variables that arejudged to require forcing to a bound. We could have added the penalty xT(e − x), butoften the objective is nearly concave and if there are not many equality constraintssome of the variables will be forced to be 0 or 1 without any penalty term beingrequired. The problem then becomes
Minimize F(x) � f (x) + γ∑
j∈J
xj (1 − xj )
subject to Ax = b,
0 ≤ x ≤ e.
(2.2)
In general, it is possible to show that under suitable assumptions, the penalty func-tion introduced this way is “exact” in the sense that the following two problems havethe same global minimizers for a sufficiently large value of the penalty parameter γ :
Minimize g(x)
subject to Ax = b,
x ∈ {0,1}n(2.3)
and
Minimize g(x) + γ xT(e − x)
subject to Ax = b,
0 ≤ x ≤ e.
(2.4)
An example of such a result (see for example [30]) is:
Theorem 2.1 Let g : [0,1]n → R be a C1 function and consider the two prob-lems (2.3) and (2.4) with the feasible region of (2.3) being non-empty. Then thereexists M > 0 such that for all γ > M , problems (2.3) and (2.4) have the same globalminimizers.
Note that although this is an exact penalty function it is twice continuously differ-entiable and the extreme ill-conditioning arising out of the need for γ → ∞ normallyrequired for smooth penalty functions is avoided. Indeed, a modestly large value ofγ is usually sufficient to indicate whether a variable is converging to 0 or 1.
An algorithm for nonlinear optimization problems 261
Since the penalty term is concave we are highly likely to introduce local mini-mizers at almost all the feasible integer points, and just as significantly, saddle pointsat interior points. It is clearly critical that any method used to solve the transformedproblem be one that can be assured of not converging to a saddle point.
The idea of using penalty methods for discrete optimization problems is not new(see, e.g., [3]). However, it is not sufficient to introduce only the penalty terms andhope to obtain the global minimizer by solving the resulting problem, because it ishighly likely that many undesired stationary points and local minimizers are beingintroduced in the process. This flaw also applies to the process of transforming adiscrete optimization problem into a global optimization problem simply by replacingthe discrete requirements of the variables with a nonlinear constraint. The danger ofusing the penalty function alone is illustrated by the following example:
Example Consider the problem min{x2 : x ∈ {0,1}}. It is clear that the globalminimizer is given by x∗ = 0. Suppose the problem has been transformed tomin0≤x≤1{x2 + γ x(1 − x)}, where γ > 0 is the penalty parameter. If γ > 2 thenthe first-order KKT conditions are satisfied at 0,1 and γ
2(γ−1). Suppose in a descent
method the initial point is chosen in the interval [ γ2(γ−1)
,1], then the sequence gener-ated will not converge to 0. Thus for large values of γ the probability of convergingto 0 from a random initial point is close to 0.5. A probability of 0.5 may not seemso bad but if we now consider the n-dimensional problem min{‖x‖2 : x ∈ {0,1}n},then every vertex is a local minimizer of the continuous problem and the probabilityof converging from a random initial point to the correct local minimizer is not muchbetter than 1/2n.
In the following we assume that the feasible space is bounded, which is the casefor the problems of interest.
Definition Let x̄ be a minimizer of (2.2) and S(x̄) denote the set of feasible pointsfor which there exists an arc emanating from x̄ such that the tangent to the arc at x isZT∇xf (x), where the columns of Z are a basis for the null space of the constraintsactive at x. Let the volume of S(x̄) be denoted by Sv(x̄). We may rank the minimizersin terms of the size of Sv(x̄).
If x∗ ∈ Rn is the global minimizer and V is the volume of the feasible space then
we can expect that
limn→∞Sv(x
∗)/V = 0.
It is this fact that makes it unlikely that simply applying a local search method to (2.2)will be successful. Indeed there is little to hope that a good local minimizer would befound. If the minima are uniformly distributed with mean M then the expected valueof f (x̄) for a random starting point is M . There is little reason to suppose M is closeto f (x∗).
There are alternative penalty functions. Moreover, xi(1 −xi) = 0 is a complemen-tarity condition and there is a vast literature on how to impose these conditions. Inaddition to penalty functions there are other means of trying to enforce variables to
262 W. Murray, K.-M. Ng
take their discrete values. A particularly useful approach is to replace terms such ascTx that often appear in the objective, where ci > 0 (xi = 1 implies a capital cost),by the expression
∑ni=1 ci(1 − e−βxi ) with β = 10 typically. This term favors set-
ting variables to 0 and 1 rather than 0.5 and 0.5. It may be thought that it wouldalways prefer all variables to be 0, but that is typically prevented by constraints. Alsorounding up ensures feasibility but rounding down does not.
3 Smoothing methods
In general, the presence of multiple local minima in an optimization problem is com-mon and often makes the search for the global minimum very difficult. The fewer thelocal minima, the more likely it is that an algorithm that finds a local minimizer willfind the global minimum. Smoothing methods refer to a class of methods that replacethe original problem by either a single or a sequence of problems whose objective is“smoother”. In this context “smoother” may be interpreted as a function whose sec-ond or higher order derivatives are smaller in magnitude, with a straight line beingthe ultimate smooth function.
The concept of smoothing has already been exploited in nonconvex optimizationand discussed in the context of global optimization (see, e.g., [19]). There are a num-ber of ways to smooth a function and which is best depends to a degree on whatcharacteristics of the original function we are trying to suppress. Smoothing methodscan be categorized into two types: local or global smoothing. Local smoothing algo-rithms are particularly suited to problems in which noise arises during the evaluationof a function. Unfortunately, noise may produce many local minimizers, or it mayproduce a minimizer along the search direction in a linesearch method and result ina tiny step. Local smoothing has been suggested to eliminate poor minimizers thatare part of the true problem. Under such circumstances the introduction of smoothingalters the problem and may change the required global minimizer. To overcome this,a sequence of problems is solved in which the degree of smoothing is reduced to zero.While local smoothing may be useful in eliminating tiny local minimizers even whenthere are large numbers of them, they are less useful at removing significant but poorminimizers (see [25]).
We would like to transform the original problem with many local minima intoone that has fewer local minima or even just one local minimum and so obtaining aglobal optimization problem that is easier to solve (see Fig. 1). Obviously we wishto eliminate poor local minima. In the subsequent discussion, we consider modifiedobjective functions of the form
F(x,μ) = f (x) + μg(x),
where f and g are real-valued C2 functions on Rn and μ ≥ 0.
3.1 Global smoothing
The basic idea of global smoothing is to add a strictly convex function to the originalobjective, i.e.,
F (x,μ) = f (x) + μ�(x),
An algorithm for nonlinear optimization problems 263
Fig. 1 Effect of smoothing theoriginal optimization problem
where � is a strictly convex function. If � is chosen to have a Hessian that is suffi-ciently positive definite for all x, i.e., the eigenvalues of this Hessian are uniformlybounded away from zero, it implies that for μ large enough, F (x,μ) is strictly con-vex. Similar results or proofs of such an assertion can be found, for example, in[1, Lemma 3.2.1].
Theorem 3.1 Suppose f : [0,1]n → R is a C2 function and � : X → R is a C2
function such that the minimum eigenvalue of ∇2�(x) is greater than or equal toε for all x ∈ X, where X ⊂ [0,1]n and ε > 0. Then there exists M > 0 such that ifμ > M , then f + μ� is a strictly convex function on X.
Consequently, for μ sufficiently large, any local minimizer of F (x,μ) is also theunique global minimizer. Typically the minimizer of �(x) is known or is easy to findand hence minimizing F (x,μ) for large μ is also easy. This is important in smoothingmethods because the basic idea is to solve the problem for a decreasing sequence of μ
starting with a large value and ending with one that may be close to zero. The solutionx(μk) of minx F (x,μk) is used as the starting point of minx F (x,μk+1).
The idea behind global smoothing is similar to that of local smoothing, namely,the hope is that by adding μ�(x) to f (x), poor local minimizers will be eliminated.There are, however, important differences between global and local smoothing. A keyone is that local smoothing does not guarantee that the function is unimodal for suffi-ciently large values of the smoothing parameter. For example, if the algorithm in [24]is applied to the function cos(x), a multiple of cos(x) is obtained. Thus, the numberof minimizers of the smoothed function has not been reduced. It is easy to appreciatethat the global smoothing approach is largely independent of the initial estimate of asolution, since if the initial function is unimodal, the choice of initial point is irrele-vant to the minimizer found. When μ is decreased and the subsequent functions haveseveral minimizers, the old solution is used as the initial point. Consequently, whichminimizer is found is predetermined. Independence of the choice of initial point maybe viewed as both a strength and a weakness. What is happening is that any initialpoint is being replaced by a point close to the minimizer of �(x). An obvious con-cern is that convergence will then be to the minimizer closest to the minimizer of�(x). The key to the success of this approach is to choose �(x) to have a minimizer
264 W. Murray, K.-M. Ng
that is not close to any of the minimizers of f (x). This may not seem easy, but it ispossible for constrained problems. If it is known that the minimizers are on the edgeof the feasible region (e.g., with concave objective functions), then the “center of thefeasible region” may be viewed as being removed from all of them. An example isthe analytic center and there are certainly many other choices of generating an initialfeasible solution since the feasible region is linearly constrained.
4 Logarithmic smoothing
Consider the following relaxation of (1.1):
Minimize f (x)
subject to Ax = b,
0 ≤ x ≤ e.
(4.1)
A suitable smoothing function that we have used is given by the logarithmic smooth-ing function:
�(x) ≡ −n∑
j=1
lnxj −n∑
j=1
ln(1 − xj ). (4.2)
This function is clearly well-defined when 0 < x < e. If any value of xj is 0 or 1,we have �(x) = ∞, which implies we can dispense with the bounds on x to get thefollowing transformed problem:
Minimize f (x) − μ
n∑
j=1
[lnxj + ln(1 − xj )]
subject to Ax = b,
(4.3)
where μ > 0 is the smoothing parameter. When a linesearch algorithm starts with aninitial point 0 < x0 < e, then all iterates generated by the linesearch also satisfy thisproperty, provided care is taken in the linesearch to ensure that the maximum steptaken is within the bounds 0 < x < e.
4.1 Properties of logarithmic smoothing function
The function �(x) is a logarithmic barrier function and is used with barrier methods(see [11]) to eliminate inequality constraints from a problem. In fact, (4.3) is some-times known as the barrier subproblem for (4.1). Our use of this barrier function isnot to eliminate the constraints but because a barrier function appears to be an idealsmoothing function. Elimination of the inequality constraints is a useful bonus. Italso enables us to draw upon the extensive theoretical and practical results concern-ing barrier methods.
A key property of the barrier term is that �(x) is strictly convex. If μ is largeenough, the function f + μ� will also be strictly convex, as follows from Theo-rem 3.1. By Theorem 8 of [11], under the assumptions already imposed on (4.1),
An algorithm for nonlinear optimization problems 265
if x∗(μ) is a solution to (4.3), then there exists a solution x∗ to (4.1) such thatlimμ↘0 x∗(μ) = x∗. The following theorem is then a consequence of the resultsof [11]:
Theorem 4.1 Let x(μ,γ ) be any local minimizer of
Minimize f (x) − μ
n∑
j=1
[lnxj + ln(1 − xj )] + γ∑
j∈J
xj (1 − xj )
subject to Ax = b,
0 ≤ x ≤ e.
(4.4)
Then limγ→∞ limμ↘0 xj (μ,γ ) = 0 or 1 for j ∈ J .
The general procedure of the barrier function method is to solve problem (4.3)approximately for a sequence of decreasing values of μ since [11] has shown thatthe solution to (4.3), x∗(μ), is a continuously differentiable curve. Note that iflimμ↘0 x∗(μ) = x∗ ∈ {0,1}n, then we need not solve (4.3) for μ very small becausethe rounded solution for a modestly small value of μ should be adequate.
Consider now the example given in Sect. 2 to illustrate the possible failure of theuse of a penalty function. The problem is now transformed to
Minimize x2 − μ logx − μ log(1 − x) + γ x(1 − x)
subject to 0 ≤ x ≤ 1,(4.5)
where μ > 0 is the barrier parameter. If x∗(μ,γ ) denotes a stationary point thenx∗(10,10) = 0.3588 and x∗(1,10) = 0.1072. Thus, rounding of x∗(μ,γ ) in thesecases will give the global minimizer. In fact, a trajectory of x∗(μ,10) can be obtainedsuch that x∗(μ,10) → 0 as μ ↘ 0.
We have in effect replaced the hard problem of a nonlinear integer optimizationproblem by what at first appearance is an equally hard problem of finding a globalminimizer for a problem with continuous variables and a large number of local min-ima. The basis for our optimism that this is not the case lies in how we can utilizethe parameters μ and γ and try to obtain a global minimizer, or at least a good localminimizer of the composite objective function. Note that the term xj (1 − xj ) attainsits maximum at xj = 1
2 and that the logarithmic barrier term attains its minimum atthe same point. Consequently, at this point, the gradient is given solely by f (x). Inother words, which vertex looks most attractive from the perspective of the objectiveis the direction we tilt regardless of the value of μ or γ . Starting at a neutral pointand slowly imposing integrality is a key idea in the approach we advocate.
Also, note that provided μ is sufficiently large compared to γ , the problem willhave a unique and hence global solution x∗(μ,γ ), which is a continuous function ofμ and γ . The hope is that the global or at least a good minimizer of (1.1) is the oneconnected by a continuous trajectory to x∗(μ,γ ) for μ large and γ small.
Even if a global minimizer is not identified, we hope to obtain a good local mini-mizer and perhaps combine this approach with traditional methods.
266 W. Murray, K.-M. Ng
4.2 The parameters μ and γ
The parameter μ is the standard barrier parameter. However, its initial value and howit is adjusted is not how this is typically done when utilizing the normal barrier func-tion approach. For example, when a good estimate of a minimizer is known then inthe normal algorithm one tries to choose a barrier parameter that is compatible withthis point. By that we mean one that is suitably small. This is not required in the ap-proach advocated here. Indeed it would be a poor policy. Since we are trying to findthe global minimizer we do not want the iterates to converge to a local minimizer andit is vital that the iterates move away should the initial estimate be close to a localminimizer. The initial choice of μ is made to ensure it is suitably large. By that wemean it dominates the other terms in the objective. It is not difficult to do that sincethis does not incur a significant cost if it is larger than is necessary. The overall objec-tive is trivial to optimize when μ is large. Moreover, subsequent objectives are easyto optimize when μ is reduced at a modest rate. Consequently, the additional effortof overestimating μ is small. Underestimating μ increases the danger of convergingto a local minimizer. Unlike the regular barrier approach the parameter is reducedat a modest rate and a reasonable estimate is obtained to the solution of the currentsubproblem before the parameter is reduced. The basic idea is to stay close to the“central trajectory”.
Even though there may not be a high cost to overestimating the initial value of μ
it is sensible to make an attempt to estimate a reasonable value especially since thisimpacts the choice of γ . Generally, the initial value of μ can be set as the maximumeigenvalue of ∇2f (x) and it is easy to show that such a choice of μ would be suffi-cient to make the function f + μ� strictly convex. If a poor initial value is chosenthis can be deduced from the difference between the minimizer obtained and that ofthe minimizer of just the barrier function. This latter minimizer is easy to determineand may be used as the initial point for the first minimization. Another issue thatdiffers from the regular barrier function method is that there is no need to drive μ tonear zero before terminating. Consequently, the ill-conditioning that is typical whenusing barrier functions is avoided.
Estimating a suitable initial value for γ is more challenging. What is importantis not just its value but its value relative to μ and ‖∇2F(x)‖. The basic idea is thatwe start with a strongly convex function and as μ is decreased and γ is increasedthe function becomes nonconvex. However, it is important that the nonconvexity ofthe penalty term does not overwhelm the nonconvexity of F(x) in the early stages,and so the initial value of γ should be much smaller than the absolute value of theminimum eigenvalue of ∇2F(x). Typically, an initial value of γ could be as smallas 1% of the absolute value of the minimum eigenvalue of ∇2F(x). It helps that itis not necessary for γ → ∞. Provided a good initial value is chosen the issue ofadjusting it is not hard since it may be increased at a similar rate to that at which μ
is decreased. Usually, θμ, the rate of decrease of μ, is set to be a value lying in therange of [0.1,0.9], while θγ , the rate of increase of γ , is set to be a value lying in therange of [1.1,10] with θμθγ ≈ 1.
It is of interest to note what is the consequence of extreme strategies in the uncon-strained case. If the initial values of μ and γ are small we will get a rounded solution
An algorithm for nonlinear optimization problems 267
of the local minimizer that a typical active-set algorithm would converge from theinitial point chosen. If the initial value of μ is large but that of γ is kept small exceptin the later stages, we will obtain the rounded solution of an attempt to find the globalminimizer of F(x). While this is better than finding the rounded solution of an arbi-trary local minimizer it is not the best we can do. However, it is clear that of these twoextremes the former is the worse. What is not known is how small the initial value ofγ needs to be compared to the initial value of μ.
4.3 A simple example
To illustrate the approach, we show how it would work on the following two-variableproblem. Let
f (x1, x2) = −(x1 − 1)2 − (x2 − 1)2 − 0.1(x1 + x2 − 2).
Consider the following problem
Minimize f (x1, x2)
subject to xi ∈ {0,2} for i = 1,2.(4.6)
Since f is separable, solving problem (4.6) is equivalent to solving two identicalproblems with one variable and a global optimal solution of x∗
1 = x∗2 = 2 can easily
be deduced.In our approach, we first relax problem (4.6) to
Minimize f (x1, x2)
subject to 0 ≤ xi ≤ 2 for i = 1,2.(4.7)
We can then transform (4.7) to the following unconstrained optimization problem byintroducing logarithmic barrier terms:
Minimize F(x1, x2) ≡ f (x1, x2) − μ[log(x1) + log(2 − x1)
+ log(x2) + log(2 − x2)], (4.8)
where μ > 0 and we have also omitted the implicit bound constraints on x1 and x2.The contours of F for 4 different values of μ are illustrated in Fig. 2.
Problem (4.8) can be solved for each μ by applying Newton’s method to obtain asolution to the first-order necessary conditions of optimality for (4.8), i.e.,
∇F(x1, x2) =[−2x1 + 1.9 − μ
x1+ μ
2−x1
−2x2 + 1.9 − μx2
+ μ2−x2
]= 0.
Our approach involves solving (4.8) for a fixed μ = μl > 0 to obtain an iterate x(l) =(x
(l)1 , x
(l)2 ), and then using this iterate as the initial iterate to solve (4.8) for μ = μl+1,
where μl > μl+1 > 0. The initial iterate for the first μ parameter, μ0, is set as x(0) =(1,1), which is a point with equal Euclidean distances from the extreme points of theregion X = [0,2] × [0,2]. The resulting sequence of iterates forms a trajectory and
268 W. Murray, K.-M. Ng
Fig. 2 The contours for theobjective function of (4.8) as theparameter μ varies
Fig. 3 Trajectories taken byiterates for problem (4.8)
converges to one of the extreme points of X. Fig. 3 illustrates the iterates as μ varies,as well as the possible trajectories. With the choice of a large μ0 value of 100, we areable to obtain a trajectory that converges to the optimal solution of (4.6).
Though the above example is only a two-variable problem, our approach can easilybe extended to problems with more variables. A point to note is that despite the localminimizers being quite similar both in terms of their objective function and the sizeof the corresponding sets Sv the transformed function is unimodal until μ is quitesmall and rounding at the solution just prior to the function becoming non-unimodalwould give the global minimizer. Of course this is a simple example and it remainsto be seen whether this feature is present for more complex problems.
An algorithm for nonlinear optimization problems 269
5 Description of the algorithm
Since the continuous problems of interest may have many local minimizers and sad-dle points, first-order methods are inadequate as they are only assured of convergingto points satisfying first-order optimality conditions. It is therefore imperative thatsecond-order methods be used in the algorithm. Any second-order method that isassured of converging to a solution of the second-order optimality conditions mustexplicitly or implicitly compute a direction of negative curvature for the reducedHessian matrix. A key feature of our approach is a very efficient second-order methodfor solving the continuous problem.
We may solve (4.4) for a specific choice of μ and γ by starting at a feasible pointand generating a descent direction, if one exists, in the null space of A. Let Z bea matrix with columns that form a basis for the null space of A. Then AZ = 0 andthe rank of Z is n − m. If x0 is any feasible point so that we have Ax0 = b, thefeasible region can be described as {x : x = x0 + Zy, y ∈ R
n−m}. Also, if we letφ be the restriction of F to the feasible region, the problem becomes the followingunconstrained problem:
Minimizey∈Rn−m
φ(y). (5.1)
Since the gradient of φ is ZT∇F(x), it is straightforward to obtain a stationary pointby solving the equation ZT∇F(x) = 0. This gradient is referred to as the reducedgradient. Likewise, the reduced Hessian, i.e., the Hessian of φ, is ZT∇2F(x)Z.
For small or moderately-sized problems, a variety of methods may be used (seee.g., [12, 15]). Here, we investigate the case where the number of variables is large.One approach to solving the problem is to use a linesearch method, such as thetruncated-Newton method (see [8]) we are adopting, in which the descent directionand a direction of negative curvature are computed. Instead of using the index set J
for the definition of F in our discussion, we let γ be a vector of penalty parameterswith zero values for those xi such that i /∈ J , and � = Diag(γ ).
The first-order optimality conditions for (4.4) (ignoring the implicit bounds on x)may be written as
∇f − μXge + �(e − 2x) + ATλ = 0,
Ax = b,(5.2)
where Xg = Diag(xg), (xg)i = 1xi
− 11−xi
, i = 1, . . . , n, and λ corresponds to theLagrange multiplier vector of the constraint Ax = b. Applying Newton’s methoddirectly, we obtain the system
[H AT
A 0
][�x
�λ
]=
[−∇f + μXge − � (e − 2x) − ATλ
b − Ax
], (5.3)
where H = ∇2f + μ Diag(xH ) − 2� and (xH )i = 1x2i
+ 1(1−xi )
2 , i = 1,2, . . . , n.
Assuming that x0 satisfies Ax0 = b, the second equation A�x = 0 implies that�x = Zy for some y. Substituting this into the first equation and premultiplying both
270 W. Murray, K.-M. Ng
sides by ZT, we obtain
ZTHZy = ZT(−∇f + μXge − � (e − 2x)). (5.4)
To obtain a descent direction in this method, we first attempt to solve (5.4), orfrom the definition of F(x), the equivalent reduced Hessian system
ZT∇2F(xl)Zy = −ZT∇F(xl), (5.5)
by the conjugate gradient method, where xl is the lth iterate. Generally, Z may be alarge matrix, especially if the number of linear constraints is small. Thus, even though∇2F(xl) is likely to be a sparse matrix, ZT∇2F(xl)Z may be a large dense matrix.The virtue of the conjugate gradient method is that the explicit reduced Hessian neednot be formed. There may be specific problems where the structure of ∇2F and Z
does allow the matrix to be formed. Under such circumstances alternative methodssuch as those based on Cholesky factorization may also be applicable. Since we areinterested in developing a method for general application we have pursued the conju-gate gradient approach.
In the process of solving (5.5) with the conjugate gradient algorithm (see, [17]),we may determine that ZT∇2F(xl)Z is indefinite for some l. In such a case, weobtain a negative curvature direction q such that
qTZT∇2F(xl)Zq < 0.
This negative curvature direction is required to ensure that the iterates do not convergeto a saddle point. Also, the objective is decreased along this direction. In practice,the best choice for q is an eigenvector corresponding to the smallest eigenvalue ofZT∇2F(xl)Z. Computing q is usually expensive but fortunately unnecessary. A gooddirection of negative curvature will suffice and efficient ways of computing direc-tions within a modified-Newton algorithm are described in [2]. The descent directionin such modified-Newton algorithms may also be obtained using factorization meth-ods (see, e.g., [13]). In any case, it is essential to compute both a descent directionand a direction of negative curvature (when one exists). One possibility that mayarise is that the conjugate gradient algorithm terminates with a direction q such thatqTZT∇2F(xl)Zq = 0. In that case, we may have to use other iterative methods suchas the Lanczos method to obtain a direction of negative curvature as described in [2].The vector q is a good initial estimate to use in such methods.
If we let p be a suitable linear combination of the negative curvature direction q
with a descent direction, the convergence of the iterates is still ensured with the searchdirection Zp. The next iterate xl+1 is thus defined by xl + αlZp, where αl is beingdetermined using a linesearch. The iterations will be performed until ZT∇F(xl) issufficiently close to 0 and ZT∇2F(xl)Z is positive semidefinite. Also, as the currentiterate xl may still be some distance away from the actual optimal solution we areseeking, and since we do not necessarily use an exact solution of (5.5) to get thesearch direction, we only need to solve (5.5) approximately. An alternative is to usethe two directions to form a differential equation (see [7]), which leads to a searchalong an arc.
A summary of the new algorithm is shown below:
An algorithm for nonlinear optimization problems 271
Global Smoothing Algorithm: GSA
Set εF = tolerance for function evaluation,εμ = tolerance for barrier value,M = maximum penalty value,N = iteration limit for applying Newton’s method,θμ = ratio for barrier parameter decrement,θγ = ratio for penalty parameter increment,μ0 = initial barrier parameter,γ0 = initial penalty parameter,r = any feasible starting point.
Set γ = γ0, μ = μ0,while γ < M or μ > εμ.
Set x0 = r
for l = 0,1, . . . ,N ,if ‖ZT∇F(xl)‖ < εF μ,
Set xN = xl , l = N .Check if xN is a direction of negative curvature,
elseApply conjugate gradient algorithm to
[ZT∇2F(xl)Z]y = −ZT∇F(xl).Obtain pl as a combination of directions of descent and
negative curvature.Perform a linesearch to determine αl and set
xl+1 = xl + αlZpl ,end if
end forSet r = xN , μ = θμμ, γ = θγ γ .
end while
6 Computational results
To show the behavior of GSA we give some results on problems in which problemsize can be varied. The purpose of these results is more for illustration than as conclu-sive evidence of the efficacy of the new algorithm. Still the success achieved suggeststhat the algorithm has merit. More results are given in [26] and what these results hereshow is that there are problems for which GSA performs well. Moreover, the rate ofincrease in computational work as problem size increases is modest. All results forGSA were obtained from a MATLAB 6.5 implementation on an Intel Pentium IV2.4 GHz PC with 512 MB RAM running on Windows XP Operating System.
As a preliminary illustration of the performance of GSA, we consider the follow-ing set of test problems from [28]:
Minimize −(n − 1)
n∑
i=1
xi − 1
n
n/2∑
i=1
xi + 2∑
1≤i<j≤n
xixj
subject to xi ∈ {0,1} for i = 1,2, . . . , n.
(6.1)
272 W. Murray, K.-M. Ng
In the case when n is even, Pardalos [28] shows that the unique global minimumof (6.1) is given by the feasible point
x∗ = (1, . . . ,1︸ ︷︷ ︸n/2
,0, . . . ,0︸ ︷︷ ︸n/2
),
with objective value −n2+24 , and there is an exponential number of discrete local
minima for (6.1). Here, a point x ∈ {0,1}n is a discrete local minimum if and only iff (x) ≤ f (y) for all points y ∈ {0,1}n adjacent to x, when all points are consideredto be vertices of the unit hypercube {0,1}n.
As ∇2f (x) = 2(eeT − I ) for this set of test problems, it is possible to sim-plify all the matrix-vector products involving ∇2f (x) in GSA by taking note that[∇2f (x)]v = 2[(eTv)e − v] for any v ∈ R
n, leading to more efficient computations.The initial μ and γ parameters are set as 100 and 0.1 respectively, with a μ-reductionratio of 0.1 and γ -increment ratio of 10. The analytic center of [0,1]n, i.e., 1
2e, is usedas the initial iterate and the tolerance is set as εμ = 10−3 for the termination criteria.In all the instances tested with the number of variables ranging from 1000 to 10000,GSA was able to obtain convergence to the global optimal solution. In particular, itis possible to show that using the initial iterate of 1
2e, the next iterate obtained byGSA would be of the form (u1, . . . , u1︸ ︷︷ ︸
n/2
, u2, . . . , u2︸ ︷︷ ︸n/2
) where u1 > 12 and u2 < 1
2 , i.e.,
an immediate rounding of this iterate would have given the global optimal solution.A plot of the computation time required versus the number of variables is shown
in Fig. 4. These results indicate that the amount of computational work in GSA doesnot increase drastically with an increase in problem size for this set of test problems.Moreover, it illustrates the ability of GSA to find the optimal solution to large prob-lems with as much as 10000 variables within a reasonable amount of computationtime, as well as the potential to exploit possible problem structure to improve com-putational efficiency in solving problems.
For purposes of evaluating the performance of GSA especially when we do notknow the exact optimal solutions to the test problems, we have used some of thenonlinear mixed-integer programming solvers of GAMS,1 namely DICOPT, SBB andBARON as described earlier. MINOS was used as the common underlying nonlinearprogramming solver required by all the three solvers, while CPLEX was used as theunderlying linear programming solver required by BARON. While DICOPT and SBBsolvers are able to handle nonlinear discrete optimization problems, these solversdo not necessarily attain the global optimal solution for nonconvex problems. AsBARON is a global optimization solver that could potentially return the exact optimalsolution or relevant bounds on the optimal objective value, we have made extensivecomparisons between GSA and BARON in certain computational instances. We haveused GAMS version 22.2 that includes BARON version 7.5.
When using the GAMS solvers, other than the maximum computation time, themaximum number of iterations and the maximum amount of workspace memoryallocated to the solvers, the other options were set to default values. We generally
1http://www.gams.com.
An algorithm for nonlinear optimization problems 273
Fig. 4 Graph of computationtime required (CPU seconds) forsolving instances ofproblem (6.1)
set a computation time limit of 10000 CPU seconds and the maximum number ofiterations to 500 000 for the runs. To ensure that the solvers have a reasonable amountof memory for setting up problem instances to solve, we have configured a maximumworkspace memory of 1 GB. It turns out that less than 100 MB were actually neededby all the GAMS solvers to set up each of the problem instances.
All the runs involving the GAMS solvers were mostly performed using the samePC used for GSA, except where noted. It should be pointed out that GSA requiresvery little additional memory other than that needed to define the problem, which isa tiny fraction of that used by the other methods considered here.
6.1 Unconstrained binary quadratic problems
One of the simpler classes of problems in the form of (1.1) with a nonlinear objectivefunction is that of the unconstrained binary quadratic problem, i.e., the objective func-tion is quadratic in the binary variables x, and both A and b are the zero matrix and thezero vector respectively. There are many relaxation and approximation approachesavailable for this class of problems, such as [4] and [21]. Despite its simplicity suchproblems are still hard to solve as the numerical testing reveals. What these problemsgive is another opportunity to examine performance as size increases even thoughsuch a numerical comparison may not be sufficient to measure the efficacy of GSAfor other types of nonlinear optimization problems with binary variables. Moreover,there exists publicly available test problems in this class of optimization problems.
The unconstrained binary quadratic problem (BQP) may be stated as follows:
Minimize xTPx + cTx
subject to x ∈ {0,1}n, (6.2)
where P ∈ Rn×n and c ∈ R
n. By noting that xTMx = xTMTx and cTx = xTCx forany feasible x and any M ∈ R
n×n, where C = Diag(c), it suffices to consider thefollowing “simpler” problem
Minimize xTQx
subject to x ∈ {0,1}n, (6.3)
where Q is a symmetric matrix.
274 W. Murray, K.-M. Ng
Since any unconstrained quadratic programming problem with purely integer vari-ables from a bounded set can be transformed into (6.2), many classes of problemswould then fall under this category of BQP problems, such as the least-squares prob-lem with bounded integer variables:
Minimizex∈D
‖s − Ax‖,
where s ∈ Rm, A ∈ R
m×n, and D is a bounded subset of Zn.
A test set for BQP problems in the form of (6.3) is given in the OR-Library, whichis maintained by J.E. Beasley.2 The entries of matrix Q are integers uniformly drawnfrom [−100,100], with density 10%. As the test problems were formulated as max-imization problems, for purposes of comparison of objective values, we now maxi-mize the objective function.
The GSA was run on all 60 problems of Beasley’s test set ranging from 50 to2500 variables. The initial μ and γ parameters are set as 100 and 1 respectively, witha μ-reduction ratio of 0.5 and γ -increment ratio of 2. The analytic center of [0,1]n,i.e., 1
2e, is used as the initial iterate and the tolerance is set as εμ = 0.1 for the ter-mination criteria. For the GAMS solvers, we initially notice that DICOPT and SBBalways return the zero vector as the solution to the test problems, which could be dueto the solvers accepting a local solution in the underlying nonlinear programmingsubproblems being solved. To allow these two solvers to return non-zero solutionsfor comparison, we have added the cut eTx ≥ 1 or eTx ≥ 2 when applying DICOPTand SBB on the test problems. The objective values obtained by the GSA and therespective GAMS solvers, as well as the respective computation time and numberof iterations required are given in Tables 1–3. There are preprocessing procedures inBARON that attempt to obtain a good solution before letting the solver find the opti-mal or a better solution and the results obtained by these preprocessing procedures areshown in the tables. Finally, the objective values of the best known solutions basedon RUTCOR’s website3 are given in the last column of the tables.
BARON found the optimal solution for all 20 problems in the test sets involving 50to 100 binary variables, with the objective values listed in Tables 1(a), (b). Using theseoptimal objective values as a basis of comparison, we conclude that GSA successfullyfound 11 of these optima, with another 6 solutions obtained having less than 0.5%deviation from the optimal objective value. The percentage deviation in the remaining3 solutions obtained by the GSA were also close, with the worst one having 1.66%deviation from the optimal objective value. On the other hand, DICOPT obtainedresults ranging from 2.25% to 26.26% deviation from the optimal objective valuefor the 50-variable test problems and from 0.56% to 13.03% for the 100-variabletest problems, while SBB obtained results ranging from 1.9% to 26.26% deviationfrom the optimal objective value for the 50-variable test problems and from 0.56% to13.03% for the 100-variable test problems. For all these instances, GSA was able toobtain better objective values than DICOPT or SBB.
2http://mscmga.ms.ic.ac.uk/info.html.3http://rutcor.rutgers.edu/~pbo/families/beasley.htm.
An algorithm for nonlinear optimization problems 275
Tabl
e1
Num
eric
alco
mpa
riso
nof
algo
rith
ms
appl
ied
toB
easl
ey’s
BQ
Pte
stpr
oble
ms
(max
imiz
atio
nob
ject
ive)
with
(a)
50bi
nary
vari
able
s;(b
)10
0bi
nary
vari
able
s
Prob
lem
inst
ance
DIC
OPT
SBB
BA
RO
NG
SAB
est
know
nob
j.va
lue
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Upp
erbo
und
(a)
50_1
1547
0.03
1815
470.
3818
2030
0.08
2098
2098
0.09
120
980.
281.
0011
920
98
50_2
3360
0.98
2733
750.
1727
3678
0.08
3702
3702
0.12
137
020.
220.
8610
537
02
50_3
4463
0.05
3544
630.
0535
4626
0.08
4626
4626
0.09
146
260.
611.
6111
446
26
50_4
3412
0.36
6234
150.
2830
3544
0.11
3544
3544
0.16
135
440.
341.
0611
135
44
50_5
3796
0.03
3137
960.
0531
4012
0.08
4012
4012
0.11
140
120.
220.
6711
340
12
50_6
3610
0.98
162
3610
0.42
4636
590.
1236
9336
930.
271
3693
0.25
0.61
109
3693
50_7
4379
0.03
3643
790.
0336
4520
0.08
4520
4520
0.09
145
100.
061.
0611
445
20
50_8
4016
0.37
7941
360.
2041
4216
0.07
4216
4216
0.14
142
160.
270.
7211
342
16
50_9
3370
0.05
3533
700.
0435
3780
0.11
3780
3780
0.25
137
320.
360.
8011
537
80
50_1
030
470.
5713
730
470.
2131
3505
0.12
3507
3507
0.34
135
050.
580.
9111
435
07
(b)
100_
172
180.
0463
7218
0.08
6377
920.
1279
7079
7094
9.86
1527
7838
0.84
1.33
128
7970
100_
210
922
2.01
262
1092
20.
3867
1102
60.
1311
036
1103
614
.95
3510
986
0.49
1.17
125
1103
6
100_
312
652
0.92
224
1265
20.
4374
1272
30.
1612
723
1272
310
.38
912
723
0.34
1.13
127
1272
3
100_
410
136
0.08
6510
136
0.13
6510
368
0.10
1036
810
368
46.7
085
1036
80.
221.
0612
810
368
100_
586
640.
0660
8664
0.06
6089
160.
1490
8390
8356
.16
7590
400.
581.
5815
590
83
100_
695
790.
8618
995
860.
3468
1012
60.
1410
210
1021
080
5.09
1787
1010
10.
301.
3014
410
210
100_
796
760.
0663
9676
0.19
6310
035
0.14
1012
510
125
60.5
013
510
094
0.91
1.53
140
1012
5
100_
810
966
0.05
5610
966
0.10
5611
380
0.08
1143
511
435
15.3
041
1143
50.
521.
0012
411
435
100_
999
620.
0663
9962
0.07
6311
455
0.10
1145
511
455
7.94
1111
455
0.22
1.11
131
1145
5
100_
1012
027
0.79
206
1202
70.
2573
1254
70.
1712
565
1256
59.
9225
1254
70.
411.
2512
312
565
276 W. Murray, K.-M. Ng
Fig. 5 Graph of averagepercentage difference inobjective value of solutionobtained by the variousalgorithms to best knownsolution when solving Beasley’sBQP test problems
For the test sets with more than 100 variables, BARON was not able to completethe runs by the computation time limit of 10000 seconds and so the runs were trun-cated with BARON returning the best found solution. Thus, we do not have optimalobjective values as a reference for these larger test problems, except for the upperbounds obtained by BARON or the objective values obtained from the RUTCORwebsite. We were unable to obtain any solution from BARON for the 2500-variabletest problems within the computation time limit and this is also the case despite try-ing out on another platform, which is an Intel Pentium IV 2.4 GHz server with 2 GBRAM running on Linux Operating System.
From Tables 2 and 3, we can see that GSA again produced better solutions thanDICOPT and SBB in all the instances. For all the test problems with 250 variables,GSA obtained objective values that have less than 2% deviation from the best-foundobjective value, with the worst one being 1.55% deviation. For the test problems with500 variables, GSA obtained objective values that are better than BARON in 7 out of10 instances, with all the instances having less than 0.64% deviation from the best-found objective value. For the test problems with 1000 variables, GSA obtained ob-jective values that are better than BARON in 9 out of 10 instances, with the remaininginstance having 0.62% deviation from the best-found objective value, which is alsothe worst deviation out of the 10 instances. For the test problems with 2500 vari-ables, GSA obtained objective values ranging from 0.09% to 0.43% deviation fromthe best-found objective value. A graph showing the average percentage differencein the objective value of the solution obtained by the GSA and the GAMS solvers tothat of the best known solution for Beasley’s test problems is given in Fig. 5.
Figure 6 gives a plot of the relative average computation time required to solvethe Beasley’s test problems by DICOPT, SBB, BARON and GSA, with respect toproblem size. Since we were unable to determine the computation time needed byBARON to complete its run for test problems of certain sizes, there were only limiteddata points for BARON in the figure. It can be seen that while there is a substantialincrease in computation time required by BARON to solve larger test problems, theincrease in computation time for the other solvers and GSA has been less significant.However, GSA has been able to maintain the quality of solutions obtained whencompared to all the three solvers for problems with increasing size as illustrated inFig. 5, indicating its efficiency and potential in obtaining good solutions to largeproblems without incurring unreasonable computation times.
An algorithm for nonlinear optimization problems 277
Tabl
e2
Num
eric
alco
mpa
riso
nof
algo
rith
ms
appl
ied
toB
easl
ey’s
BQ
Pte
stpr
oble
ms
(max
imiz
atio
nob
ject
ive)
with
(a)
250
bina
ryva
riab
les;
(b)
500
bina
ryva
riab
les
Prob
lem
inst
ance
DIC
OPT
SBB
BA
RO
NG
SAB
est
know
nob
j.va
lue
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Upp
erbo
und
(a)
250_
145
079
1.65
471
4507
90.
5016
445
453
1.81
4557
974
589.
8>
1e4
169
4546
32.
927.
9416
945
607
250_
243
117
0.28
178
4311
70.
3017
844
149
1.64
4443
874
902.
3>
1e4
193
4425
11.
366.
3015
044
810
250_
348
326
0.29
188
4832
60.
2718
848
947
2.09
4903
777
808.
8>
1e4
175
4894
74.
818.
7416
149
037
250_
441
127
0.33
174
4112
70.
2817
440
837
2.03
4121
971
448.
0>
1e4
180
4118
12.
305.
2014
741
274
250_
547
471
0.28
180
4747
10.
2818
047
928
1.81
4795
575
927.
8>
1e4
190
4784
54.
057.
5614
747
961
250_
639
502
1.14
304
3950
20.
4715
940
593
1.64
4077
774
665.
0>
1e4
198
4079
71.
845.
9514
541
014
250_
746
323
0.30
183
4632
30.
2818
346
757
1.75
4675
776
411.
5>
1e4
199
4675
72.
255.
4114
446
757
250_
834
037
1.53
467
3403
70.
6713
734
545
1.65
3565
068
558.
5>
1e4
225
3517
43.
817.
5315
935
726
250_
948
048
1.01
360
4804
80.
8018
148
690
1.70
4881
377
998.
3>
1e4
208
4870
52.
005.
6414
648
916
250_
1039
194
1.52
457
3926
40.
4815
840
140
1.61
4044
272
113.
3>
1e4
221
4019
83.
596.
8015
540
442
(b)
500_
111
4182
1.52
356
1141
821.
5035
611
5141
28.3
911
5141
3084
07.3
>1e
47
1158
4816
.77
26.8
916
611
6586
500_
212
5191
1.56
357
1251
911.
4235
712
8085
28.2
312
8085
3073
10.5
>1e
414
1280
0115
.84
26.5
615
912
8339
500_
312
8512
1.52
358
1285
121.
4835
813
0033
27.7
813
0547
3149
95.3
>1e
410
1308
1211
.80
18.1
714
813
0812
500_
412
5445
2.53
662
1265
361.
9137
812
8962
27.4
612
9298
3136
18.8
>1e
413
1296
4711
.44
31.5
318
413
0097
500_
512
3066
1.40
325
1230
661.
3332
512
4785
26.6
312
4785
3097
63.8
>1e
414
1251
419.
6718
.55
156
1254
87
500_
612
0863
1.54
368
1208
631.
5236
812
0258
26.7
712
1074
3081
83.5
>1e
413
1216
0312
.89
29.0
817
412
1772
500_
711
7619
1.39
326
1176
191.
4132
612
1748
24.8
912
2164
3099
75.5
>1e
410
1218
724.
6614
.80
165
1222
01
500_
811
8755
1.55
378
1187
551.
5937
812
2657
25.4
712
3282
3112
51.3
>1e
413
1233
2914
.06
26.3
917
612
3559
500_
911
5548
1.41
336
1155
481.
3633
611
9518
25.1
311
9921
3096
43.3
>1e
414
1204
5612
.84
23.0
516
212
0798
500_
1012
3266
1.42
342
1232
661.
5034
212
9963
25.1
513
0547
3147
25.0
>1e
414
1298
4914
.81
25.3
616
613
0619
278 W. Murray, K.-M. Ng
Tabl
e3
Num
eric
alco
mpa
riso
nof
algo
rith
ms
appl
ied
toB
easl
ey’s
BQ
Pte
stpr
oble
ms
(max
imiz
atio
nob
ject
ive)
with
(a)
1000
bina
ryva
riab
les;
(b)
2500
bina
ryva
riab
les
Prob
lem
inst
ance
DIC
OPT
SBB
BA
RO
NG
SAB
est
know
nob
j.va
lue
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Upp
erbo
und
(a)
1000
_136
1439
13.7
618
3136
1439
19.6
372
537
0209
399.
5237
0513
1257
155.
8>
1e4
137
0926
37.0
884
.78
180
3714
38
1000
_234
6137
16.5
618
4134
6153
12.2
070
635
0619
400.
7735
2816
1252
151.
0>
1e4
135
3621
41.7
284
.44
167
3549
32
1000
_336
4005
14.2
517
7636
4009
11.8
171
636
8936
374.
8736
9864
1264
529.
8>
1e4
136
9972
31.8
683
.73
177
3712
36
1000
_436
9083
13.4
075
136
9083
11.6
375
136
8356
360.
2736
8770
1270
025.
0>
1e4
136
9894
31.8
067
.06
169
3706
75
1000
_534
2971
12.5
270
534
2971
10.8
470
534
8893
369.
6934
9774
1261
032.
5>
1e4
135
2358
42.2
583
.00
202
3527
60
1000
_635
3908
11.5
068
035
3908
10.4
868
035
7013
369.
3935
7610
1257
987.
3>
1e4
135
8578
48.2
073
.42
175
3596
29
1000
_736
3538
11.3
668
636
3538
10.5
868
636
9342
365.
8836
9342
1259
804.
3>
1e4
137
0484
71.8
310
7.25
208
3711
93
1000
_834
5011
11.5
671
634
5011
10.8
871
634
8733
365.
6634
8871
1253
878.
0>
1e4
135
0762
74.9
710
6.09
186
3519
94
1000
_934
3910
11.5
375
334
3910
11.5
675
334
6534
364.
2834
6761
1255
519.
3>
1e4
134
9187
45.5
371
.75
185
3493
37
1000
_10
3433
3712
.34
1823
3433
3711
.72
730
3476
6335
4.94
3497
1712
4115
8.5
>1e
41
3492
2235
.34
90.8
817
635
1415
(b)
2500
_114
9383
322
6.8
1851
1493
833
166.
218
51–
>1e
4–
–>
1e4
–15
1205
357
6.8
788.
9832
515
1594
4
2500
_214
5483
634
7.0
3100
1454
870
170.
518
80–
>1e
4–
–>
1e4
–14
6762
848
1.1
638.
125
114
7139
2
2500
_313
9688
118
5.9
5713
1396
881
178.
019
40–
>1e
4–
–>
1e4
–14
0848
078
6.1
926.
624
314
1419
2
2500
_414
9006
733
2.4
1909
1490
067
171.
219
09–
>1e
4–
–>
1e4
–15
0393
942
5.4
572.
722
315
0770
1
2500
_514
8172
717
8.0
1985
1481
727
177.
519
85–
>1e
4–
–>
1e4
–14
9017
935
7.6
669.
227
414
9181
6
2500
_614
6090
929
3.6
1981
1460
909
176.
719
81–
>1e
4–
–>
1e4
–14
6663
350
3.3
687.
521
614
6916
2
2500
_714
6307
632
2.0
3188
1463
076
183.
620
20–
>1e
4–
–>
1e4
–14
7271
640
0.2
634.
826
614
7904
0
2500
_814
7001
316
9.1
1893
1470
013
170.
018
93–
>1e
4–
–>
1e4
–14
8285
936
5.9
738.
124
214
8419
9
2500
_914
6061
118
5.5
1935
1460
611
173.
719
35–
>1e
4–
–>
1e4
–14
8049
451
6.7
714.
224
114
8241
3
2500
_10
1470
882
175.
519
4814
7088
217
4.7
1948
–>
1e4
––
>1e
4–
1477
279
354.
756
9.6
252
1483
355
An algorithm for nonlinear optimization problems 279
Fig. 6 Graph of relativeaverage computation timerequired by the algorithms forsolving Beasley’s BQP testproblems
As the number of local minima for the problems of Beasley’s test set may besmall, we also applied GSA to problems generated from the standardized test problemgenerator (Subroutine Q01MKD) written by P.M. Pardalos in [28], which may havea larger expected number of local minima. In generating the test matrices, we setthe density of the matrices to be 0.1, with all entries of the matrices being integersbounded between −100 and 100. The seed to initialize the random number generatorwas set to 1. Note that unlike Beasley’s test set, the test problems were formulatedin [28] as minimization problems.
The GSA was run on these test problems ranging from 200 to 1000 binary vari-ables. We used the same parameters as in Beasley’s test problems, i.e., initial μ andγ being 100 and 1 respectively with a μ-reduction ratio of 0.5 and γ -increment ratioof 2. The analytic center of [0,1]n, i.e., 1
2e, is also used as the initial iterate and a tol-erance of εμ = 0.1 is used for the termination criteria. The objective values obtainedby the GSA and the three GAMS solvers, as well as the respective computation timeand number of iterations required are given in Table 4. The DICOPT and SBB resultswere obtained after adding the cut eTx ≥ 1 as was done in the Beasley’s test sets.However, we found that while DICOPT and SBB were able to return solutions to allthe test problems, BARON was unable to return any solution for test problems withmore than 700 variables. As such, we have performed all the BARON runs on theLinux platform mentioned earlier and BARON was then able to return the solutionsreported in Table 4 for all the test problems.
From Table 4, we can see that GSA still produced better solutions than DICOPTand SBB in all the 17 instances. GSA also obtained objective values that are betterthan BARON in 13 instances. For the remaining 4 instances, other than one instancein which GSA obtained the same objective value as BARON, the percentage devia-tions from the best found solution with respect to objective values are all less than0.25%. Thus the solutions obtained by GSA for the remaining instances are also com-parable to that of BARON.
6.2 Other types of nonlinear optimization problems
The objective functions considered so far are relatively simple being purely quadraticfunctions and could be solved by specific methods for such problems, such as [27].
280 W. Murray, K.-M. Ng
Tabl
e4
Num
eric
alco
mpa
riso
nof
algo
rith
ms
for
solv
ing
Pard
alos
’sB
QP
test
prob
lem
s(m
inim
izat
ion
obje
ctiv
e)w
ith20
0to
1000
bina
ryva
riab
les
Prob
lem
size
DIC
OPT
SBB
BA
RO
NG
SA
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Low
erbo
und
200
−349
133.
670
9−3
4913
2.4
179
−354
042.
1−3
5408
−4.8
e4>
1e4
341
−354
081.
85.
314
1
250
−443
945.
715
18−4
5289
2.5
235
−451
704.
8−4
5475
−7.6
e4>
1e4
118
−455
168.
013
.618
9
300
−545
261.
543
1−5
5448
3.5
241
−562
9210
.7−5
6936
−1.1
e5>
1e4
51−5
6966
5.0
11.9
154
350
−690
995.
114
60−6
9594
3.6
279
−698
9718
.6−7
0225
−1.5
e5>
1e4
26−7
0625
9.7
19.8
166
400
−806
685.
315
84−8
0772
2.6
283
−840
2632
.1−8
4786
−2.0
e5>
1e4
15−8
5448
11.4
23.7
186
450
−111
534
3.2
605
−111
534
3.4
320
−112
087
51.0
−112
340
−2.6
e5>
1e4
7−1
1279
025
.942
.319
0
500
−123
982
5.1
1307
−124
348
5.1
340
−129
212
75.2
−129
444
−3.2
e5>
1e4
5−1
2985
215
.628
.516
0
550
−141
026
6.4
1105
−144
857
7.4
478
−145
008
107.
6−1
4540
5−3
.9e5
>1e
45
−145
565
24.6
42.8
171
600
−150
479
7.4
1209
−151
236
12.0
454
−154
702
151.
4−1
5575
3−4
.6e5
>1e
41
−155
615
38.0
55.5
210
650
−188
059
7.8
907
−190
737
33.7
553
−192
353
206.
9−1
9235
3−5
.4e5
>1e
41
−192
902
22.8
49.1
179
700
−214
125
9.2
964
−214
181
27.7
549
−216
110
270.
6−2
1773
9−6
.2e5
>1e
41
−217
628
41.0
74.0
179
750
−221
304
12.3
2099
−223
495
30.5
596
−227
174
355.
1−2
2791
3−7
.1e5
>1e
41
−228
209
34.4
71.7
171
800
−243
821
14.8
2561
−245
284
32.3
613
−245
886
455.
6−2
4702
3−8
.1e5
>1e
41
−248
328
58.1
92.1
223
850
−266
907
16.7
1143
−267
325
26.3
683
−268
361
577.
2−2
6929
5−9
.2e5
>1e
41
−269
945
51.3
99.8
194
900
−276
661
21.0
1800
−276
718
39.2
650
−283
977
722.
9−2
8721
8−1
.0e6
>1e
41
−288
014
99.3
158.
022
5
950
−318
730
23.7
2527
−318
751
41.3
658
−326
427
891.
4−3
2642
7−1
.2e6
>1e
41
−325
640
72.0
148.
919
7
1000
−331
726
22.4
1292
−331
940
35.4
695
−334
556
1086
.1−3
3745
9−1
.3e6
>1e
41
−338
797
64.7
126.
020
6
An algorithm for nonlinear optimization problems 281
It may be seen that when BARON works well the preprocessor in the BARON algo-rithm often obtains good solutions. That is unlikely to be the case for real problemsespecially when continuous variables are present and appear nonlinearly. We now ap-ply GSA to test problems with nonlinear nonquadratic objective functions. The firstone considered is a quartic function generated from the product of two quadratic func-tions. For ease in deriving the optimal solution, we have used the following matrix Q
mentioned in Example 4 of [28] for our construction of the test problems:
Q =
⎡
⎢⎢⎢⎢⎣
15 −4 1 0 2−4 −17 2 1 11 2 −25 −8 10 1 −8 30 −52 1 1 −5 −20
⎤
⎥⎥⎥⎥⎦.
In that example, we know that the solution to problem (6.3) with this Q matrix isgiven by x∗ = (0,1,1,0,1)T with objective value −54.
We define the following (5n × 5n)-block diagonal matrix and (5n × 1)-vector:
Qn =
⎡
⎢⎢⎢⎣
Q 0 . . . 00 Q . . . 0...
.... . .
...
0 0 . . . Q
⎤
⎥⎥⎥⎦ ,
x∗n = (
(x∗)T(x∗)T . . . (x∗)T)T.
Then it is easy to see that an optimal solution to problem (6.3) with Q = Qn is givenby x = x∗
n with objective value −54n.The test problems can now be defined in the general form as follows:
Minimize (xTQnx + 54n + 1)(yTQny + 54n + 1)
subject to x, y ∈ {0,1}5n.(6.4)
The rationale for constructing these test problems in the above form, especially inthe addition of a constant to each quadratic term in the product, is evident from thefollowing observation:
Proposition 6.1 An optimal solution to problem (6.4) is given by x = y = x∗n with
objective value 1.
Proof From above, for any x ∈ {0,1}5n, xTQnx ≥ −54n with equality achievedwhen x = x∗
n . Then we have xTQnx +54n+1 ≥ 1 and yTQny +54n+1 ≥ 1 for anyx, y ∈ {0,1}5n with equality achieved when x = x∗
n and y = x∗n respectively. Thus,
(xTQnx + 54n + 1)(yTQny + 54n + 1) ≥ 1 for any x, y ∈ {0,1}5n with equalityachieved when x = y = x∗
n . �
Both GSA and the GAMS solvers were applied to this set of test problems with n
ranging from 1 to 200, i.e., the number of binary variables ranges from 10 to 2000.
282 W. Murray, K.-M. Ng
More experimentation was being carried out for test problems with smaller problemsizes as we detected a deterioration in the performance of the GAMS solvers for smalltest problems. The initial μ and γ parameters of the GSA are set as 100 and 1 respec-tively, with a μ-reduction ratio of 0.5 and γ -increment ratio of 2. The analytic centerof [0,1]n, i.e., 1
2e, is used as the initial iterate and the tolerance is set as εμ = 0.1 forthe termination criteria. As we found that DICOPT and SBB are again returning thezero vector as the solution, we have also imposed the cut eTx + eTy ≥ 1 for the DI-COPT and SBB runs. The objective values obtained by the GSA and the three GAMSsolvers, as well as the respective computation time and number of iterations requiredare given in Table 5.
GSA found the optimal solution for all the 14 instances and the computation timerequired to reach the optimal solution is less than 30 seconds for each instance, withalmost all the instances requiring less than 10 seconds. BARON was able to findthe optimal solution to the first 3 instances, with a significant increase in computa-tion time even for a small instance of 30 variables. The objective values obtained byBARON for the other 11 instances with size more than 30 variables have not beensatisfactory. The lower bounds found by BARON are also poor for these larger in-stances and are hardly helpful. DICOPT and SBB were unable to find any optimalsolution to all the instances.
In the test problems we have considered so far, the objective function involvespolynomial terms only. To show that GSA is able to handle other types of differ-entiable objective functions, we have constructed another set of test problems thatincludes a sum of exponential functions in the objective function. The objective func-tion constructed here follows closely to that of problem (6.1). Also, the previous testproblems do not show GSA’s performance on test problems with constraints. Thus,we have added a linear constraint in this set of test problems, which also enables aneasier argument for deriving the optimal solution.
The test problems can be defined as follows:
Minimize −(n − 1)
n∑
i=1
xi − 1
n
n/2∑
i=1
xi + 2∑
1≤i<j≤n
exixj
subject ton∑
i=1
xi ≤ n
2,
xi ∈ {0,1} for i = 1,2, . . . , n,
(6.5)
where n is even. By extending the argument in [28], we have the following result forthis set of test problems:
Proposition 6.2 x∗ = (1, . . . ,1︸ ︷︷ ︸p∗
,0, . . . ,0︸ ︷︷ ︸n−p∗
) is an optimal solution to problem (6.5)
with objective value
(e − 1)(p∗)2 +(
2 − n − 1
n− e
)p∗ + n2 − n, where p∗ =
[n + 1
n+ e − 2
2(e − 1)
]
and [x] refers to the integer obtained from rounding x.
An algorithm for nonlinear optimization problems 283
Tabl
e5
Num
eric
alco
mpa
riso
nof
algo
rith
ms
for
solv
ing
test
prob
lem
s(6
.4)
(min
imiz
atio
nob
ject
ive)
with
10to
2000
vari
able
s
Prob
lem
size
DIC
OPT
SBB
BA
RO
NG
SA
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Low
erbo
und
5×2
760
0.14
1220
900.
134
550.
091
10.
4165
12.
8937
.80
124
10×2
2223
0.31
121.
0e4
0.22
427
60.
071
111
.12
959
11.
6936
.72
117
15×2
4408
0.30
82.
4e4
0.16
416
660.
081
146
1.23
2397
61
2.99
35.3
010
8
20×2
380.
5552
4.3e
40.
204
5461
0.08
22−6
808
>1e
422
1691
13.
1326
.64
101
25×2
1.2e
40.
299
6.9e
40.
185
9216
0.04
28−2
.9e4
>1e
474
542
12.
6932
.22
103
30×2
1.6e
40.
288
1.0e
50.
165
1.3e
40.
0863
−6.3
e4>
1e4
4435
51
2.91
22.0
910
0
35×2
380.
3424
1.4e
50.
205
1.8e
40.
0512
9−1
.1e5
>1e
430
105
13.
5033
.75
106
40×2
380.
5624
1.8e
50.
175
2.3e
40.
0946
3−1
.7e5
>1e
420
657
14.
8036
.52
105
45×2
380.
7230
2.3e
50.
255
3.0e
40.
0654
7−2
.3e5
>1e
414
737
15.
3737
.75
115
50×2
4.1e
40.
339
2.8e
50.
165
3.6e
40.
1161
6−3
.1e5
>1e
411
341
15.
0442
.28
103
100×2
380.
3632
1.2e
60.
175
1.5e
50.
2814
11−1
.7e6
>1e
420
491
4.61
30.2
511
3
250×2
9.3e
50.
288
7.2e
60.
255
9.0e
51.
2336
55−1
.2e7
>1e
424
31
4.35
47.5
410
9
500×2
3.7e
60.
298
2.9e
70.
365
3.6e
65.
0678
73−4
.8e7
>1e
446
18.
1980
.26
138
1000
×21.
2e8
0.09
11.
2e8
0.09
11.
4e7
26.0
41.
4e7
−1.9
e8>
1e4
71
24.7
718
2.97
124
284 W. Murray, K.-M. Ng
Proof For any feasible x, i.e., any x with exactly p ones and (n − p) zeros, where0 ≤ p ≤ n
2 ,
f (x) = −(n − 1)p − p̄
n+ p(p − 1)e(1)(1) + (n(n − 1) − p(p − 1))e0
= (e − 1)p2 + (2 − n − e)p − p̄
n+ n2 − n
for some p̄ where 0 ≤ p̄ ≤ p. By having the p ones occurring in the first n/2 variablesof x, f (x) assumes the minimum value of (e − 1)p2 + (2 − n − 1
n− e)p + n2 − n
for a fixed value of p. Since this convex function in p is minimized when
p = − (2 − n − 1n
− e)
2(e − 1), p∗ =
[− (2 − n − 1
n− e)
2(e − 1)
]
will be an integer that minimizes this quadratic function over integral values of p
lying in [0, n2 ]. �
Both the GSA and the GAMS solvers were applied to this set of test problemswith n ranging from 10 to 1000. We have 3 instances with n ≤ 20 (n = 10,16,20)to allow a better comparison of the performance and the computation time needed bythe GSA and the GAMS solvers for solving the smaller test problems.
When applying GSA to (6.5), we can first convert the linear inequality constraintto an equality constraint by adding a non-negative slack variable s. An appropriateglobal smoothing function is then given by
�(x, s) = −n∑
j=1
lnxj −n∑
j=1
ln(1 − xj ) − ln s, (6.6)
so that we are handling the following relaxed equality-constrained problem:
Minimize f (x) + μ�(x, s)
subject ton∑
i=1
xi + s = n
2,
0 ≤ x ≤ e, s ≥ 0.
(6.7)
The initial μ parameter is set to 100 with a reduction ratio of 0.9. For n ≤ 20,a small initial γ parameter of 0.01 is sufficient for convergence to integral solu-tions. For larger dimensional problems, the initial γ parameter is progressively in-creased to reflect the increase in the objective as n increases. In particular, forn = 50,100,200,500 and 1000, the initial γ parameter is set to 5, 20, 80, 250 and500 respectively. The increment ratio for the γ parameter is set to the reciprocal ofthe μ-reduction ratio of 0.9, i.e., 10
9 and the tolerance is set as εμ = 10−3 for thetermination criteria. We chose the initial iterate as n
2(n+1)e based on uniform distrib-
ution of n2 over the x and s variables. The objective values obtained by the GSA and
the three GAMS solvers, as well as the respective computation time and number of
An algorithm for nonlinear optimization problems 285
Fig. 7 Graph of relativeaverage computation timerequired by the algorithms forsolving instances ofproblem (6.5)
iterations required are given in Table 6. The global objective values are also includedas a basis for performance comparisons.
From Table 6, we can see that GSA outperforms or performs as well as all theGAMS solvers in terms of the objective values obtained for all the 8 instances. Inparticular, GSA found the optimal solution to the test problems for 5 of the instances.For all the remaining instances, GSA has obtained solutions with less than 0.025%deviation from the optimal objective values. DICOPT was unable to obtain any opti-mal solution for all the instances and its solutions have percentage deviation from theoptimal objective values ranging from 0.79% to 8.77%. SBB was able to obtain theoptimal solution to 4 instances and for 3 of the remaining instances, SBB’s solutionshave percentage deviation from the optimal objective values ranging from 0.032% to0.056%. However, SBB was unable to obtain any solution for the largest instance with1000 variables. BARON was also able to obtain the optimal solution to 3 instancesand for 3 of the remaining instances, BARON’s solutions have percentage deviationfrom the optimal objective values ranging from 1.18% to 8.73%. However, BARONwas unable to obtain any solution for the instances with 500 and 1000 variables. De-spite running SBB and BARON on the Linux platform mentioned in Sect. 6.1, bothsolvers were still not able to produce any feasible solution for the respective largeinstances of the test set.
Figure 7 gives a plot of the relative average computation times required by GSAand the GAMS solvers for solving test problem (6.5). Some data points for BARONand SBB were not included for reasons discussed above. It can be seen that both GSAand SBB do not show a significant increase in the computation time needed withproblem size, unlike BARON or DICOPT. In addition, GSA also appears to have asmaller rate of increase in computation time with problem size when compared withSBB.
7 Conclusions
We have described a global smoothing algorithm that enables optimization methodsfor continuous problems to be applied to certain types of discrete optimization prob-lems. We were surprised by the quality of the solutions from what is a relatively un-sophisticated implementation and without any need to fiddle with parameters. In thenumerical experiments conducted, the performance of GSA relative to alternativesboth in terms of the quality of solution and time to solution improved dramatically as
286 W. Murray, K.-M. Ng
Tabl
e6
Num
eric
alco
mpa
riso
nof
algo
rith
ms
for
solv
ing
test
prob
lem
s(6
.5)
(min
imiz
atio
nob
ject
ive)
with
10to
1000
vari
able
s
Prob
lem
DIC
OPT
SBB
BA
RO
NG
SA
Size
Glo
bal
obj.
valu
e
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Bes
tob
j.va
lue
Tota
ltim
e(s
ec)
Tota
litn
.co
unt.
Prep
roce
ssin
gPo
stpr
oces
sing
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Bes
tob
j.va
lue
Tim
eto
best
(sec
)
Tota
ltim
e(s
ec)
Tota
litn
.co
unt
Obj
.va
lue
Tim
e(s
ec)
Obj
.va
lue
Low
erbo
und
1073
790.
430
7334
.717
5879
0.1
7373
6.0
348
737.
111
.990
4
1619
921
60.
439
199
35.4
3985
216
0.2
199
199
498.
014
086
199
11.2
17.3
890
2031
734
40.
444
317
35.8
4203
344
0.2
317
315
>1e
418
9748
317
12.1
16.7
883
5020
7622
565.
184
2076
260.
935
9322
551.
821
0014
26>
1e4
481
2076
11.0
23.9
645
100
8424
9160
6.7
146
8429
108.
937
9491
5922
.791
5949
50>
1e4
484
2519
.050
.757
5
200
3393
836
911
25.1
262
3395
057
7.3
4385
3515
328
1.1
3515
319
900
>1e
41
3393
835
.612
7.7
505
500
2130
2223
1713
289.
862
321
3090
5000
.471
07–
>1e
4–
–>
1e4
–21
3060
78.5
455.
443
6
1000
8532
9786
0074
1250
.582
5–
>1e
4–
–>
1e4
––
>1e
4–
8534
9711
4.5
1460
.239
5
An algorithm for nonlinear optimization problems 287
the size of the problem increases, reflecting the encouraging feature of the low rateof growth in the work required by the new algorithm to solve problems as the size ofthe problem increases. Moreover, the memory requirements of the algorithm are alsomodest and typically add little to that required to prescribe the problem. Alternativemethods are often memory intensive.
Given the difference in character to that of other algorithms, GSA could be used inconjunction with alternative methods by obtaining good solutions for the alternativemethods, such as by providing a good upper bound in a branch-and-bound method.Certain improvement heuristics could also be applied to the solutions obtained byGSA in a hybrid algorithm framework. These are possible areas of future research, aswell as the possibility of extending GSA to solve more general classes of nonlineardiscrete optimization problems.
Acknowledgements The authors would like to thank the referees whose suggestions and comments ledto improvements to the paper.
References
1. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1995)2. Boman, E.G.: Infeasibility and negative curvature in optimization. Ph.D. thesis, Scientific Computing
and Computational Mathematics Program, Stanford University, Stanford (1999)3. Borchardt, M.: An exact penalty approach for solving a class of minimization problems with boolean
variables. Optimization 19(6), 829–838 (1988)4. Burer, S., Vandenbussche, D.: Globally solving box-constrained nonconvex quadratic programs with
semidefinite-based finite branch-and-bound. Comput. Optim. Appl. (2007). doi: 10.1007/s10589-007-9137-6
5. Bussieck, M.R., Drud, A.S.: SBB: a new solver for mixed integer nonlinear programming, OR 2001presentation. http://www.gams.com/presentations/or01/sbb.pdf (2001)
6. Cela, E.: The Quadratic Assignment Problem: Theory and Algorithms. Kluwer Academic, Dordrecht(1998)
7. Del Gatto, A.: A subspace method based on a differential equation approach to solve unconstrainedoptimization problems. Ph.D. thesis, Management Science and Engineering Department, StanfordUniversity, Stanford (2000)
8. Dembo, R.S., Steihaug, T.: Truncated-Newton algorithms for large-scale unconstrained optimization.Math. Program. 26, 190–212 (1983)
9. Du, D.-Z., Pardalos, P.M.: Global minimax approaches for solving discrete problems. In: Recent Ad-vances in Optimization: Proceedings of the 8th French–German Conference on Optimization, Trier,21–26 July 1996, pp. 34–48. Springer, Berlin (1997)
10. Duran, M.A., Grossmann, I.E.: An outer approximation algorithm for a class of mixed-integer non-linear programs. Math. Program. 36, 307–339 (1986)
11. Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained MinimizationTechniques. Wiley, New York/Toronto (1968)
12. Forsgren, A., Gill, P.E., Murray, W.: Computing modified Newton directions using a partial Choleskyfactorization. SIAM J. Sci. Comput. 16, 139–150 (1995)
13. Forsgren, A., Murray, W.: Newton methods for large-scale linear equality-constrained minimization.SIAM J. Matrix Anal. Appl. 14(2), 560–587 (1993)
14. Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput.34, 39–60 (1989)
15. Gill, P.E., Murray, W.: Newton-type methods for unconstrained and linearly constrained optimization.Math. Program. 7, 311–350 (1974)
16. Gill, P.E., Murray, W., Wright, M.: Practical Optimization. Academic Press, London (1981)17. Golub, G.H., Van Loan, C.F.: Matrix Computation. John Hopkins University Press, Baltimore/London
(1996)
288 W. Murray, K.-M. Ng
18. Grossmann, I.E., Viswanathan, J., Vecchietti, A., Raman, R., Kalvelagen, E.: GAMS/DICOPT: a dis-crete continuous optimization package (2003)
19. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (1996)20. Leyffer, S.: Deterministic methods for mixed integer nonlinear programming. Ph.D. thesis, Depart-
ment of Mathematics & Computer Science, University of Dundee, Dundee (1993)21. Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization. SIAM J. Optim.
1, 166–190 (1991)22. Mawengkang, H., Murtagh, B.A.: Solving nonlinear integer programs with large-scale optimization
software. Ann. Oper. Res. 5, 425–437 (1985)23. Mitchell, J., Pardalos, P.M., Resende, M.G.C.: Interior point methods for combinatorial optimization.
In: Handbook of Combinatorial Optimization, vol. 1, pp. 189–298 (1998)24. Moré, J.J., Wu, Z.: Global continuation for distance geometry problems. SIAM J. Optim. 7, 814–836
(1997)25. Murray, W., Ng, K.-M.: Algorithms for global optimization and discrete problems based on methods
for local optimization. In: Pardalos, P., Romeijn, E. (eds.) Heuristic Approaches. Handbook of GlobalOptimization, vol. 2, pp. 87–114. Kluwer Academic, Boston (2002), Chapter 3
26. Ng, K.-M.: A continuation approach for solving nonlinear optimization problems with discrete vari-ables. Ph.D. thesis, Management Science and Engineering Department, Stanford University, Stanford(2002)
27. Pan, S., Tan, T., Jiang, Y.: A global continuation algorithm for solving binary quadratic programmingproblems. Comput. Optim. Appl. 41(3), 349–362 (2008)
28. Pardalos, P.M.: Construction of test problems in quadratic bivalent programming. ACM Trans. Math.Softw. 17(1), 74–87 (1991)
29. Pardalos, P.M.: Continuous approaches to discrete optimization problems. In: Di, G., Giannesi, F.(eds.) Nonlinear Optimization and Applications, pp. 313–328. Plenum, New York (1996)
30. Pardalos, P.M., Rosen, J.B.: Constrained Global Optimization: Algorithms and Applications. Springer,Berlin (1987)
31. Pardalos, P.M., Wolkowicz, H.: Topics in Semidefinite and Interior-Point Methods. Am. Math. Soc.,Providence (1998)
32. Sahinidis, N.V.: BARON global optimization software user manual. http://archimedes.scs.uiuc.edu/baron/manuse.pdf (2000)
33. Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: A theo-retical and computational study. Math. Program. 99(3), 563–591 (2004)
34. Zhang, L.-S., Gao, F., Zhu, W.-X.: Nonlinear integer programming and global optimization. J. Com-put. Math. 17(2), 179–190 (1999)