parallel optimization in matlab - uppsala university · parallel optimization in matlab joakim...

Department of Information Technology

Parallel Optimization in Matlab

Joakim Agnarsson, Mikael Sunde, Inna Ermilova

Project in Computational Science: Report

January 2013

PROJECTREPORT

Contents

1 Introduction 4

1.1 Hot Rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theory 5

2.1 Gradient based methods . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Gradient based local optimization methods . . . . . . . . 5

2.1.2 Global optimization using fmincon . . . . . . . . . . . . . 9

2.2 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Pattern search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Pattern search: important options . . . . . . . . . . . . . 15

3 Method of solving the optimizational problem 17

3.1 Matlab framework . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Matlab toolboxes . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Our implementation of optimization framework . . . . . . 17

3.2 Approach to parameter selection . . . . . . . . . . . . . . . . . . 18

3.2.1 MultiStart . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 GlobalSearch . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3 Hybrid Simulated Annealing . . . . . . . . . . . . . . . . 23

3.2.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . 29

3.2.5 Pattern Search . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Results 33

4.1 Gradient based methods . . . . . . . . . . . . . . . . . . . . . . . 33

2

4.2 Hybrid simulated annealing . . . . . . . . . . . . . . . . . . . . . 35


4.4 Pattern search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4.1 Pattern search on Windows-cluster . . . . . . . . . . . . . 35

4.4.2 Pattern search on a Linux-cluster . . . . . . . . . . . . . . 39

4.5 Comparison of methods . . . . . . . . . . . . . . . . . . . . . . . 40

5 Discussion 42

5.1 Gradient based solvers . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Hybrid simulated annealing . . . . . . . . . . . . . . . . . . . . . 43


5.4 Pattern search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.5 Improving speedup . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Conclusions 46

6.1 Current state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3

1 Introduction

In this report we explore different mathematical optimization methods appliedon a production process called hot rolling. This is done by comparing accu-racy, serial speed and parallel speedup for various optimization methods us-ing Matlab’s Optimization Toolbox, Global Optimization Toolbox and ParallelComputing Toolbox.

1.1 Hot Rolling

Hot rolling is a process in metalworking where metal slabs, blocks of metal,are processed into a product of suitable dimensions and material quality. Thisis done by heating the material in a furnace to make it malleable, passing itthrough various mills to shape the material, and then cooling it down undercontrolled conditions. In the hot rolling process the settings for the mills, suchas the rolling speed, are referred to as the rolling schedule.

In production processes the goal is always to achieve an end-product with suffi-cient quality while minimizing the cost of the production. Performing physicaltests to find the best rolling schedule is very time consuming and, most impor-tantly, very expensive. Instead, as there are mathematical models that describehow the rolling schedule affects the material, it is possible to use computers tocalculate the optimal rolling schedule using optimization.

1.2 Optimization

Optimization is a mathematical technique to find extreme values, without loss ofgenerality, a minimum of a given objective function, f(x), subject to some con-straints on which coordinates x are acceptable. Such an optimization problemcan be defined as in equation (1).

minxf(x),

suchthat Gi(x) = 0, i = 1, . . . ,me, (1)

Gi(x) ≤ 0, i = me + 1, . . . ,m,

A point with the lowest objective value is called an optimizer and the cor-responding objective value is called the optimal value; together they are theoptimum. A point is called feasible if it satisfies all the constraints and the setof all feasible points is called the feasible set or the feasible region.

For the hot rolling process, the optimization objectives, i.e. objective functionsf(x), depend on certain parameters, x, such as rolling speed, transfer bar tem-perature and thickness of the metal slabs. Furthermore, some constraints are

4

superimposed on the process, for instance minimum or maximum temperatureof the slabs entering the rougher mill, dimensions in some particular productionmachine and physical restrictions on the process.

One major issue with optimization is how to determine the difference betweena local and a global optimizer. A local optimizer is a point which has thebest objective value in some small region around that optimizer while a globaloptimizer is the point which has the lowest objective value of all feasible points.

For discontinuous functions it is in general not possible to detect whether a localoptimum is also a global optimum and the hot rolling objective function maybe discontinuous. Thus in this report a global optimum will refer to the bestlocal optimum found during all optimizations.

1.3 Goal

Our goal is to investigate Matlab’s implemented global optimization methodsregarding accuracy, serial speed and parallel speedup when applied to the hotrolling schedule provided by ABB. Accuracy is measured by the method’s abil-ity to consistently find the global minimum. These studies are important butpreparatory to the long term goal that is to be able to perform online or realtime optimization, i.e. to optimize the hot rolling schedule during the industrialprocess.

2 Theory

Below we briefly summarize some of the most often used optimization tech-niques.

2.1 Gradient based methods

2.1.1 Gradient based local optimization methods

As the name suggests, gradient-based methods use first and second order deriva-tives, gradients and hessians, to find local minima. Two important gradientbased methods for solving constrained nonlinear optimization problems as de-fined in (1) are called Sequential Quadratic Programming (SQP) and interior-point methods (IP). They both reduce this quite complicated problem into aneasier set of sub-problems and solve them successively until an optimum is found.

Matlab has two gradient-based global optimization solvers: MultiStart andGlobalSearch. These two methods make use of a function in Matlab calledfmincon that finds a local minimum. fmincon iterates from a given startingpoint towards a local minimum using one of four implemented optimization tech-niques: trust-region-reflective, sqp, active-set and interior-point.

5

We do not consider trust-region-reflective in this report since the userhave to supply the solver with a pre-defined gradient for the objective functionand the constraints, which we don’t have in this case.

SQP methods solve a sequence of Quadratic Programs (QP) to find the descentdirection, hence the name sequential quadratic programming. Optimalityis reached when the Karush-Kuhn-Tucker (KKT) conditions, (2), are fulfilled.

∇f(x∗) +

me∑i=1

µi · ∇Gi(x∗) +

m∑i=me+1

λi · ∇Gi(x∗) = 0

Gi(x∗) = 0, i = 1, . . . ,me

Gi(x∗) ≤ 0, i = me + 1, . . . ,m (2)

λi ≥ 0, i = me + 1, . . . ,m

λiGi(x∗) = 0, i = me + 1, . . . ,m

The Karush-Kuhn-Tucker conditions are first order necessary conditions for apoint to be an optimum and describe the relation between the gradient of theobjective function and the gradient of the active constraints. If the optimumlies in the interior of the feasible space the first order necessary condition issimply the same as in the unconstrained case, namely, that the gradient of theobjective function is equal to zero.

Matlab’s function fmincon uses two variants of the SQP method called active-set

and sqp. These two algorithms are very similar and make use of a quasi-Newtonmethod to iterate toward a solution that satisfies the Karush-Kuhn-Tucker equa-tions. It is called a quasi-Newton method since the Hessian is not computedexactly but rather approximated using an update scheme, in this case a BFGS(appendix) update. This approximation is made since the computational cost tocompute the Hessian directly is often too high. Both active-set and the sqp

ensure that the Hessian is positive definite by choosing to initialize the BFGSmethod with a positive definite matrix. This property of the Hessian is main-tained by the algorithm using different matrix operations during the BFGSupdates, for more information see MathWorks Optimization Toolbox User’sGuide (2012). The condition on the Hessian to be positive definite togetherwith the first order optimality conditions described by the Karush-Kuhn-Tucker-equations are necessary and sufficient conditions for x to be a local minimum.

At each iteration, the gradient is computed using finite differences and theHessian is updated. This information is then used to set up a Quadratic Program(QP) that is minimized to find the descent direction, the QP is stated at pointxk as follows.

6

mind∈Rn

1

2dTHkd+∇f(xk)T d,

s.t. ∇Gi(xk)T d+ gi(xk)T = 0, i = 1, . . . ,me, (3)

∇Gi(xk)T d+ gi(xk)T ≤ 0, i = me + 1, . . . ,m,

where Hk = ∇2L(x, λ) is the Hessian of the Lagrangian function. The QPis then solved using an active-set method to find a descent direction, d.The active-set method transforms the problem to only work with active con-straints and then moves between constraints to find a local minimum, see Math-Works Optimization Toolbox User’s Guide (2012) for further reading. The non-linear constraints are linearized by a first order Taylor expansion around thepoint xk. The step length αk is then determined such that a merit function pro-duces a better value at the next point, xk+1, and also updates the Lagrangianmultipliers, λi,k+1 at that point.

In summary, the sqp and active-set follows these three steps;

1. Compute gradient and update the Hessian

2. Set up and solve QP to find descent direction

3. Perform a line search with merit function to find a proper step length

As mentioned active-set and sqp works mostly in the same way, nevertheless,they differ from each other on some important aspects. First of all, sqp onlytakes steps in the region constrained by bounds while active-set can takeintermediate infeasible steps outside bounds. Note that if the objective functionis complex or undefined outside the bounds, it is preferable to never go outsidethe bounds, as any such point would yield an unusable value. Furthermore,sqp uses some look-ahead techniques. If for some step length the value ofthe objective function return Nan, Inf or a complex value, then the algorithmreduces the step length and attempts again. In addition to these differences,sqp uses more efficient linear algebra routines that both require less memoryand are faster than the routines used in the active-set method.

Finally, sqp has another approach when constraints are not satisfied. The ap-proach is to combine the objective function and relaxed constraints into a meritfunction that is the sum of the objective function and the amount of infeasibilityof the constraints. Thus, if the new point reduces the objective function andinfeasibility the merit value will decrease. Also, if the next point is infeasiblewith respect to the non-linear constraints sqp makes a better approximationof the non-linear constraints using a second order Taylor expansion around thecurrent point. The downside with these approaches is that the problem sizebecomes somewhat larger than in the active-set algorithm which might slowdown the solver. See MathWorks Optimization Toolbox User’s Guide (2012) formore information.

The interior-point method, also called barrier method, successively solves asequence of approximate minimization problems. The approach is to rearrange

7

the original problem, defined in (3) using a barrier function, usually a logarith-mic or inverse function, and then solving this new merit function for decreasingµ. Fmincon uses a logarithmic barrier function and sets up the following prob-lem:

minx,s

fµ(x, s) = minx,s

f(x)− µ∑i

ln(si),

s.t. Gi(x) = 0, i = 1, . . . ,me, (4)

Gi(x) + si = 0, i = me + 1, . . . ,m,

where si is so called slack variables and are used to transform the inequalityconstraints to equality constraints. The minimum to this approximate problemwill approach the minimum to the original problem as µ decreases.

The problem defined in (4) is solved by defining the Lagrange function offµ(x, s) and then solving the corresponding Karush-Kuhn-Tucker-equations.The fmincon algorithm interior-point first tries to solve these equations di-rectly by Cholesky factorization. There are cases when a direct step might beinappropriate, e.g. when the approximate problem is not locally convex aroundthe current iterate. Equations (4) is in this case solved by a conjugate gradientmethod. The approach is to minimize a quadratic approximation to the approx-imate problem in a trust region subject to linearized constraints. Conjugategradient methods might work better when solving large and sparse systems.Factoring these types of matrices is often too time-consuming and results inheavy memory usage. Since no inversions of matrices needs to be computed andstored the Conjugate gradient method outperforms direct methods for large andsparse system of equations, see MathWorks Optimization Toolbox User’s Guide(2012) for more information.

The interior-point method, in contrast to active-set and sqp, generates asequence of strictly feasible iterates that converge to a solution from the interiorof the feasible region.

Matlab’s Optimization User’s Guide has some recommendations on when touse which algorithm. The first recommendation is to use the interior-point

method which is able to handle both large, sparse problems as well as smalland dense problems i.e. it can implement both Large-Scale and Medium-Scalealgorithms. It also satisfies bounds on all iterations and handles NaN and Inf

results. If the problem is small a better choice could be to run sqp or active-setwhich is faster on smaller problems.

In Matlab the Large-Scale algorithm uses sparse matrix data structures insteadof the naıve matrix data structure the Medium-Scale algorithm uses. Choosingthe Medium-Scale algorithm could possibly lead to better accuracy but couldalso result in computations that are limited in speed due to many memoryaccesses (MathWorks Optimization Toolbox User’s Guide 2012).

Gradient based methods have some limitations. The most obvious issue is thatthey require the objective function and constraints to be continuous and havecontinuous first derivatives. If discontinuities exist a gradient free optimizationmethod might be used to pass these discontinuities, a gradient based method

8

could then be used to quickly detect a local minimum. One could also introduceseveral starting points and then locate a minimum by approaching it from manydifferent directions.

Matlab provides the user with an option to compute the gradient of the objec-tive function and the constraints in parallel. The evaluations of the objectivefunction and the constraints in the finite difference scheme are distributed be-tween the processors, for more information about parallel implementation seeMathWorks Optimization Toolbox User’s Guide (2012).

2.1.2 Global optimization using fmincon

fmincon is only capable of finding and determining if a point is a local minimum,it is not capable to determine if the a minimum is a global minimum. Matlabtries to locate a global minimum with gradient based optimizations techniquesusing two algorithms: MultiStart and GlobalSearch. These algorithms usethe approach to use many starting points and then call fmincon from thesepoints to find the corresponding local minima. The global minimum is thenchosen to be the point that has the lowest objective value and is feasible.

MultiStart is an easy and straightforward algorithm that initiates a local solverfrom a set of starting points and then creates a vector containing found localminima, returning the best of these points as the global minimum. The algo-rithm goes as follows:

1. Generate starting points

2. Run fmincon from these starting points

3. Create a vector of the results and return the best value as global minimum.

Matlab uses starting points that are by default distributed uniformly withinthe bounds, MultiStart also handles user-supplied starting points. Note thatfmincon is not the only local solver that MultiStart can use, other local solversare fminunc, lsqcurvefit and lsqnonlin. Nevertheless, we consider onlyfmincon in this report since it is the only gradient based local solver that handlesconstrained optimization problems.

GlobalSearch works in a slightly more complicated way. The starting points aregenerated by a scatter-search mechanism. GlobalSearch then tries to analyzethese starting points and throw away points that are unlikely to generate abetter minimum than the best minimum found so far. The algorithm performsas follows:

1. Run fmincon from x0

2. Generate trial points, generate a score function

9

3. Generate a set of points, among these points, choose the one with bestscore function value to be a ”Stage 1” point, run fmincon

4. Initialize basins, counters, threshold. A basin is a part of the domain forwhich it is assumed that all points, when used as starting points, willconverge to the same minimum. An assumption is made that the basinsare spherical. Threshold is the smallest objective value from x0 or ”Stage1”. Two types of counters are initialized; number of consecutive pointsthat lie within a basin of attraction and number of consecutive points thathave score function larger than the threshold.

5. Main loop where GS examines a remaining trial point, p from the list

• Run fmincon if i-ii true:

i p is not in a existing basin, within some basin radius

ii p has lower score than the threshold

iii p satisfies the bounds and/or the inequality constraints

• If fmincon runs

i Reset counters for the basins and the threshold

ii If fmincon converges, update GlobalOptimSolution vector

iii Set a new threshold to score value at p and a new radius for thisbasin

• If fmincon does not run

i Increment the counter for every basin containing p and set allother to 0

ii Increment the counter for the threshold if score of p is largerthan the threshold and set all other to 0

iii If the basin counter is too large, decrease the basin radius andreset the counter. If the threshold counter is too large, increasethe threshold and set the counter to 0

6. Create GlobalOptimSolution vector of optimum points.

GlobalSearch tries to determine in advance whether a point will result in a im-provement of the best minimum found so far. This analysis is done by checkingthe current points score value and also if the point already lies within an existingbasin. The score value is a merit function that punishes constraint violations,thus the score of a feasible point is simply the value of the objective function.The radius of a basin is defined to be the distance between a starting pointand the point to which fmincon converged. If no point has a better score valueor lies outside a basin, increase score threshold and decrease radius of existingbasins until a point is found. GlobalSearch will stop when no points are left toexamine or when a certain user-defined max time has been reached. See Math-Works Global Optimization Toolbox User’s Guide (2012) for more informationabout how GlobalSearch is implements.

While MultiStart provides the user with a choice regarding the local solver,GlobalSearch uses only fmincon. The most important difference is that Mat-lab has implemented a parallel version of MultiStart. MultiStart distributes

10

its starting points to multiple processors that then run fmincon locally and re-turn the result. This parallelism is not implemented in the GS algorithm, thusGS works best on single core machines while MS on the other hand works bet-ter on a multi-core machine (MathWorks Global Optimization Toolbox User’sGuide 2012). In our case we have tried to implement a naıve parallelization ofGlobalSearch by distributing the trial points between different cores and thenrun GlobalSearch locally. One could also run MultiStart and GlobalSearch

using the parallel implemenation of fmincon, computing the gradients in par-allel.

2.2 Simulated annealing

Simulated annealing, SA, described by Kirkpatrick, Gelatt & Vecchi (1983), isa stochastic optimization method which takes inspiration from the physical pro-cess of annealing: heating and cooling a material under controlled circumstancesto reduce defects. A general description of the SA algorithm is shown in Fig. 1.

input : Objective function f(x) : Rn → RAnnealing function fAnn(T ) : R→ RnAcceptance function fAcc(T,∆f) : R2 → [0, 1]Temperature function fT(k) : N0 → [0,∞)Initial temperature T0 ∈ RInitial point x0 ∈ Rn

output: A point xn ∈ Rn

k = 0while Stopping criteria are not met do

// Generate a trial point and test it

if f(xt) < f(xi) thenxi+1 = xt

elsexi+1 = xt with probability fAcc(T, f(xi)− f(xt))xi+1 = xi otherwise

end// Check for reannealing

if reannealing thenk = 0

elsek = k + 1

end// Update the temperature

T = fT(k, T0)end

Figure 1: Algorithm for simulated annealing for some general objectivefunction f , annealing function fAnn, acceptance function fAcc, temperaturefunction fT, initial temperature T0 and initial point x0.

As Fig. 1 illustrates, SA stores a current point xi, creates a trial point xt, givenby interpreting the stochastic annealing function fAnn as a step from xi. The

11

trial point is accepted as the next point xi+1 with probability 1 if it has a lowerobjective value than the current point and with some probability, defined by theacceptance function fAnn, if it has higher objective value. It is important to notethat the step size and the probability to accept worse points both decrease withthe temperature and that the temperature decreases with increasing annealingparameter k.

Simulated annealing may thus be described as one type of direct search methodand as such will have a slow asymptotic convergence rate (Tamara G. Kolda,Robert Michael Lewis & Virginia Torczon 2003) but it requires no informationabout the gradient or higher order differentials of the objective function meaningit is more robust to non-smooth or discontinuous objectives than gradient basedmethods. It should however be noted that there are significant problems for SAwhen applying penalty methods (Pedamallu & Ozdamar 2008) and as such theSA algorithm implemented in MATLAB is only capable of handling boundsconstraints. However, both of these issues might to some degree be alleviatedby implementing a hybrid solver and adjusting the stopping criteria. A hybridsolver refers to running a second solver after the primary solver; for instancerunning fmincon after simulated annealing. Matlab has built-in support for thisfunctionality and it will be referred to as hybrid simulated annealing, HSA.

MATLAB’s implementation of simulated annealing is a part of the global opti-mization toolbox (GADS) and allows only bounds constraints. To satisfy otherconstraints a constrained hybrid function can be used, such as fmincon. Asthe optimization problem studied in this report has both linear and nonlinearconstraints, we will only consider hybrid simulated annealing (HSA) makinguse of fmincon. For detailed information on how to set the options and callthe functions, refer to MathWorks Global Optimization Toolbox User’s Guide(2012).

When adjusting the settings for simulated annealing, it is important to configurethe temperature in an efficient way. If the point, generated by the annealingfunction, is outside bounds, each coordinate will be separately adjusted so thatthe trial point fulfills the bounds. This means that if the temperature is toolarge, few of the trial points will actually fall in the interior of the domain untilthe temperature has been decreased. If the domain is very elongated, it may beof interest to transform the domain so that the bounds are normalized.

From a high-performance computation and parallelization viewpoint it’s impor-tant to note that each execution of simulated annealing on the form given inFig. 1 cannot be parallelized. A naıve method of using parallel resources is thusto let each computational core run a unique instance of simulated annealing,gather up the results from each core and choose the solution with the lowestobjective value.

2.3 Genetic algorithms

Genetic algorithms, GA, take inspiration from the process of natural selection byinterpreting the objective function as a measure of fitness and the coordinates

12

of each point as the genes of an individual. Each iteration of the method iscalled a generation and consists of a set of individuals. Every new generationis created by randomly selecting individuals, called parents, from the currentgeneration and in some fashion creating new individuals, children, from thegenes of the parents. Three common ways to do this are by elitism, mutationand crossover. Elite children are not changed from one generation to the next;mutation children introduce a random change in the genes of a single parent;crossover children randomly combine the genes of two parents according to somealgorithm, called crossover or recombination. See Fig. 2 for a description of GAincorporating these three types of children.

input : Fitness function f(x) : Rn → RRank function fR(F (x)) : R→ RSelection function (possibly stochastic) fS :→ RnMutation function (stochastic) fM : ∅ → RnRecombination function fRec(xa, xb) : R2n → RnPopulation size n ∈ N, n ≥ 2Number of elite children E ∈ N+

Number of mutation children M ∈ N0

output: A point xn ∈ Rn

while Stopping criteria not met do// Evaluate current generation

for each individual i doevaluate the fitness function

endfor each individual i do

evaluate the rank functionend// Create next generation

for each elite child dofrom the parents that have not yet been selected:

select the parent xa with the best rankchild = xa

endfor each mutation child do

select a parents xb with the selection functionchild = xb + fM

endfor each crossover child do

select two parents xc, xd with the selection functionchild = fRec(xc, xd)

endSet the new generation as the current generation

end

Figure 2: Algorithm for GA implementing elitism, mutation and crossoverfor some general fitness/objective function f .

Elite children are always chosen as the individuals with the best objective value.This means that as long as there is at least one elite child the best solution found

13

so far will always be kept to the next generation and the sequence of the bestsolution in each generation will be a non-increasing sequence of numbers.

When running GA it is important to make sure that the genetic diversity of thepopulation is large. If the recombination function is limited to simple crossover,picking each gene from a random parent, then the only way to introduce entirelynew genes into the population is by mutation. This means that if a globalminimum is not reachable from the initial set of genes, then the convergence ofthe method to that global optimum is at best slow, which is consistent with thedescription of direct search methods given by Tamara G. Kolda et al. (2003).

A simple method of parallelization of GA, the one implemented in MATLAB,is to evaluate the fitness function of the different individuals of a generationin parallel; this produces an increase in performance if the fitness function issufficiently computationally heavy.

MATLAB’s implementation of GA uses penalty functions to handle nonlinearconstraints. By setting penalties for violating the constraints and successivelyincreasing them the solution found will hopefully approach a feasible globalminimum. This is done by solving a series of subproblems where each problemhas a higher penalty value than the last. As this requires several generationsfor each subproblem, GA may require a substantially higher number of totalgeneration to converge when using nonlinear constraints. Furthermore, whentalking about generations for such problems, one often refers to each subproblemas one generation of the complete problem, even though each subproblem hasseveral generations of GA in them.

The implementation also allows flexibility in choosing the selection, mutationand recombination functions (the latter being called crossover in MATLAB) aswell as several extra options, such as multiple populations, migration and multiobjective optimization. For more information, refer to the MathWorks GlobalOptimization Toolbox User’s Guide (2012).

2.4 Pattern search

Pattern search represents the family of direct search algorithms for optimizationof different functions. Direct search methods are not using any derivatives whichmean that these methods are very useful when the objective function is notdifferentiable. The disadvantage is that these methods can be computationallyvery expensive; unlike gradient based methods they do not know a direction tosearch for lower objective values; instead they test multiple points in the vicinityof the current point, possibly leading to iterations where little or no improvementis seen. Pattern search is choosing its direction according to specified PollMethod. Polling means that we are “questioning”, “picking” the right pointsregarding to a chosen method/algorithm. The simplest algorithm for patternsearch can be presented in the algorithm shown in Fig. 3.

14

Figure 3: Algorithm of Pattern Search.

2.4.1 Pattern search: important options

Pattern search has many options which can affect the performance and the resultof a single computation. First of all, there are three Poll Methods for patternsearch:

- GSS (Generalized Set Search)

- GPS (Generalized Pattern Search)

- MADS (Mesh Adaptive Direct Search)

Every Poll Method has two basis sets: PositiveBasisNp1 and PositiveBasis2N.These basis sets are creating a pattern for our search. You can find moreinformation about the different basis sets in MathWorks Global OptimizationToolbox User’s Guide (2012).

So in the simplest algorithm under the Poll Step we meant one of the PollMethods. Secondly, pattern search has the following six search methods:

- GSS (Generalized Set Search);

- GPS (Generalized Pattern Search);

- MADS (Mesh Adaptive Direct Search);

- searchlhs (search using Latin Hypercube Algorithm);

- searchga (search using Genetic Algorithm);

15

- searchneldermead (search with Nelder-Mead algorithm).

The detailed description of these algorithms is given in MathWorks Global Op-timization Toolbox User’s Guide (2012). The only detail we mention is that itwasn’t possible to use searchneldermead, because it cannot handle constraintsand our objective function was given with constraints.

If we are specifying poll and search method the new algorithm for optimizationwould look like the simplest algorithm for pattern search, shown in Fig. 3, withthe addition of a search step before the poll step. The search step, like the pollstep, attempts to find a better point than the current point and if the searchmethod manages to improve the solution, then the poll step is skipped for thatiteration. Note that if the same method is used in the search and poll step,the result from both steps would be identical so the poll step is skipped. As adefault setting we have specified poll method GPSPositiveBasis2N. To do searchone has to specify a search method, because there is no default setting for this.

The third important option is ‘Complete Poll’, as a default setting we had‘Complete Poll’ disabled (‘off’) what means that our algorithm will stop pollingas soon as it finds the value of the objective function which is be less than thatof the current polling. When it happened will can call our poll successful andthe found point will become a start point at the next iteration (MathWorksGlobal Optimization Toolbox User’s Guide 2012). If ‘Complete Poll’ is enabled(‘on’) the chosen algorithm will compute the values of objective function at theall mesh points. Then the method will compare the smallest value of objectivefunction to the value at the current point. If the mesh point has the smallestvalue then the poll will be called successful (MathWorks Global OptimizationToolbox User’s Guide 2012).

The fourth important option is ‘TolBind’ (binding tolerance), which speci-fies the tolerance for linear and nonlinear constraints . The default value is‘TolBind’=1e-3. However, with the default binding tolerance patternsearchcould find strictly infeasible points with significantly lower objective value thanany truly feasible point, leading to results which may be misinterpreted. A largebinding tolerance also increases the region which may be searched, possibly in-creasing the time for finding a solution from inside the domain as more iterationsmay be required to find the optimum. Our results in Tables and Figures showthe difference.

Pattern search takes care of nonlinear constraints. It formulates a subproblemby combining the objective function and the function for the nonlinear con-straints, here the Lagrangian is used and some penalty parameters. Also it isimportant to note that here the nonlinear constraints are handled separatelyfrom the linear constraints and bounds. At every iteration we get a new solu-tion of the new subproblem (MathWorks Global Optimization Toolbox User’sGuide 2012).

By specifying the use of parallelism inside the algorithm of pattern search thepattern search function will compute the values of objective function and con-straint function in parallel. For learning more read MathWorks Global Opti-

16

mization Toolbox User’s Guide (2012)

3 Method of solving the optimizational problem

3.1 Matlab framework

3.1.1 Matlab toolboxes

There are a number of different toolboxes available in Matlab; among these arethe Global Optimization Toolbox (GADS) and the Parallel Computing Toolbox(PCT).

GADS adds support for a number of different global optimization methods; seeTable 1 for a list of the methods used in this report.

Table 1: The Matlab implementations of the used global optimization methods.Optimization method Matlab implementationMulti start MultiStart, fminconGlobal search GlobalSearch, fminconSimulated annealing simulannealbnd

Genetic algorithm ga

Pattern search patternsearch

PCT lets a user start a local parallel environment in Matlab using a master-worker model. In such a local environment the user is limited to accessing thecomputational resources on the local computer, with a maximum of 12 workers.Matlab’s Distributed Computation Server (MDCS) extends the functionality toallow clusters of any size for the computations.

3.1.2 Our implementation of optimization framework

With the different global optimization methods introduced with GADS and theparallel computational possibilities with PCT and MDCS there is no standard-ized framework which allows for simple switching between different optimizationmethods. By implementing such a framework we achieve two things. First, wemake it easier for the user to call the different optimization functions in GADSin combination with the parallel environment from PCT and MDCS. Second, weintroduce a structure to the program which means that performance optimiza-tion can be done once by a programmer and all users would benefit from thatoptimization, regardless of their knowledge of high performance computing.

The framework has two levels. The topmost level is globalOptimization, thefunction which the user will call. This defines a number of default parametersfor the method-independent settings and supports calling the function either ona specific struct, containing all the data, or by using a standard string-value pair

17

method of input. The second level is the algorithm-specific optimization meth-ods. These are essentially shells for each global optimization method makingsure to call the corresponding GADS function with the correct syntax.

After the GADS optimization functions have finished, the data they provide isstored in a result struct. Using a struct makes the framework output-agnostic,meaning that any output entered into the result struct will be given to the user.This structure also makes the framework method-agnostic, meaning that newmethods can be added just by implementing a method-specific optimizationfunction with the correct name. The correct name means that you will have tospecify, spell and put the settings regarding to used methods.

3.2 Approach to parameter selection

It should be noted that the feasible point with the lowest objective value foundduring all our tests was found by all methods except GA and had an objectivevalue of approximately 0.0036. This is taken as the global minimum.

3.2.1 MultiStart

Since both MultiStart and GlobalSearch use fmincon, an investigation ismade to determine which of the optimization techniques used by fmincon thatwas the most accurate and fastest for this particular problem. A run was madewith MultiStart with nine randomly distributed starting points and a pre-defined starting point, ten starting points all together. All three algorithm hadthe same tolerances as convergence criteria.

One clearly sees that both sqp and active-set outperform interior-point inspeed. Furthermore, sqp and active-set were able to find the global minimumwhich interior-point wasn’t able to do. An attempt was made to optimize theinterior-point algorithm by comparing the method when using large scale vs.medum scale, ldl-factorization vs. conjugate gradient method but no improve-ment was detected. With this in mind we decided to omit the interior-point

optimization technique from further studies.

The methods sqp and active-set were then closely analyzed in more detailsto find the optimal setup for each of the methods. First of all the toleranceswere varied to investigate how much the termination tolerance in x, termina-tion tolerance on the function value and termination tolerance on the constraintviolation will influence accuracy and speed. A run was performed that variedthese tolerances between 10−1 and 10−10. All the runs found the global mini-mum but varied in time, as expected. Here one should keep in mind how manydecimals are present in the given data, i.e. constraints and starting points. Thetolerances should at least be of a higher accuracy than this particular data. Themaximal number of decimals present in the given data where of the order 10−3,we therefore chose to set the tolerances to 10−4 and were able to gain some timebut still converged to find the global optimum, compare Fig. 4 and Fig. 5.

18

Figure 4: Comparison with MultiStart using sqp, active-set and interior-pointwith default settings.

When using sqp one can choose to normalize all constraints and the objectivefunction which would be appropriate for this particular problem since the fea-sible set is elongated in one of the dimensions. However, no improvement onaccuracy or speed were noted using this feature on the sqp algorithm. Further-more, we point out that while both active-set and sqp found the same globalminimum active-set was slightly faster.

Matlab also provides the user with a choice of which starting points MultiStartshould run fmincon from. One could choose to run from all starting points, onlyfrom starting points that satisfy bounds and finally from starting points thatsatisfies both bounds and inequality constraints. A test was made to investi-gate how these methods impacted the result. Restricting MultiStart to runfmincon only from starting points that are within bounds is only interestingwhen the starting points are generated in a different way than MultiStart doesby default, namely to distribute them uniformly within bounds. However, whenMultiStart was confined to only run fmincon from starting points that satis-fies bounds and inequality constraints the run was considerably faster but theglobal minimum wasn’t found.

Further studies on MultiStart were made to investigate how it scaled when us-ing multiple processors. This study was made using first and foremost active-set,since it was the fastest optimization technique on this particular problem. Nev-ertheless, we kept sqp in the study since the results could be used as a compar-ison with active-set.

Running MultiStart from 16 starting points in parallel on 8 cores with 0 to 8workers resulted in the speedup presented in Fig. 6. The sqp method seems tobe the better of the two methods since it scales better than active-set. On

19

Figure 5: Comparing MultiStart using sqp, active-set and interior-point

with tolerances set to 10−4.

the other hand, active-set is always faster than sqp as shown in Fig. 7. Thus,active-set with tolerances set to be 10−4 is the fastest fmincon algorithm.

Speed and accuracy to locate a minimum depend heavily on which startingpoints MultiStart uses. If the starting point is far from the local minimum itwill take more time and if the starting point lies in a basin which doesn’t containthe global minimum the accuracy will be worse. Since the starting points aregenerated in a stochastic process a statistical study on MultiStart was made.MultiStart was run using 1500 starting points, randomly distributed, the resultwas then analyzed. The run showed that 10 % didn’t converge at all, in Matlabthis is implied by a negative exitflag in this case exit flag -2 which means thatno feasible point was found. Furthermore, 60 % of the points did converge to aminimum but not to the global minimum. Finally, 30 % converged to the globalminimum.

More importantly we noticed that of all the starting points that converged tothe global minimum a majority of the points needed very few iterations, see Fig.8.

More specifically, if a starting point converges to the global minimum the prob-ability that it will need 5 iterations or less is approximately 0.6, see Fig. 9.

Given this result we limited the maximum number of iterations that fmincon

were allowed to take to 5. This resulted in a decrease in time by a factor of 7and also in a better speedup, see Figures 23 and 24.

A serial run of 4 points and a parallel run with 4 points per worker using 8workers was profiled with Matlab’s profiler. The serial run took 174.2 seconds,of which 172.4 were spent in the MEX-file FinalPassCalc. The parallel runtook 199.5 second of which 198.0 were spent either in FinalPassCalc or in a

20

Figure 6: Speedup plot for MultiStart using sqp and active-set on an eightcore machine.

Figure 7: Time taken for MultiStart using sqp and active-set on an eightcore machine.

certain java method (java.util.concurrent.LinkedBlockingQueue) which isused for Matlab’s parallelism. From this data it is clear that the majority oftime is spent in, and any future attempts to speed up the code should be focusedon, FinalPassCalc.

21

Figure 8: Number of iterations needed for a starting point to find the globalminimum.

Figure 9: Cumulative mass distribution showing the probability that a startingpoint will need a certain number of iterations or less to find the global minimum.

3.2.2 GlobalSearch

GlobalSearch was investigated using the same choice of parameters for fminconas in MultiStart. In this case, the default setting weren’t able to find theglobal minimum. We therefore increased the penalty threshold factor and thepenalty basin factor until the global minimum was found using 200 startingpoints and 50 stage 1 points. Tuned to always find the global minimum theparallel implementation was studied. In this case there is no specific parallelismimplemented in the GlobalSearch algorithm. It is however possible to computethe finite differences in parallel when fmincon is used. Trying this approach ona computer with 8 cores using 0 to 8 workers some speedup was obtained, seeFig. 10.

The time for GlobalSearch to find the global minimum on 8 cores is by far

22

Figure 10: GlobalSearch shows almost no speedup when using up to 8 workers.

outrun by the MultiStart method. Due to limitation in time most of theeffort was spent to investigate and improve MultiStart since it gave the mostpromising results both in speed and accuracy.

3.2.3 Hybrid Simulated Annealing

First, recall let’s remind ourselves that hybrid simulated annealing (HSA) refersto running two solvers in series: first simulated annealing (SA), then some othersolver which can satisfy all necessary constraints. The hybrid solver for con-strained problems is Matlab’s fmincon and can be used by setting the hybridfcnoption for SA in Matlab. It should, however, be noted that the framework im-plementation did not use this functionality, but instead called fmincon directlyon simulated annealing’s solution, as to be able to get more detailed data foranalysis. This design choice will give identical solutions and the difference intime taken between the two choices is insignificant.

An important issue that might influence the performance of the methods is thelength of the interval each coordinate is bounded in. From Table 2 it is evidentthat the domain is very elongated in the 11:th coordinate.

This can be dealt with in two ways. First is to adjust the initial scalar tem-perature such that the annealing steps taken are small enough compared to thesmallest constraint. Second is to normalize the annealing step with respect tobounds, which Matlab supports by setting a vector valued temperature. Tocompare these two methods a sweep of normalized initial temperature factorwas performed to find the optimal value for the default settings. Fig. 11 - 13

23

Table 2: The magnitude of the difference in upper and lower bounds vary by alarge amount.

Coordinate Upper bound Lower bound Difference1 0.036 0.003 0.0332 0.036 0.003 0.0333 0.036 0.003 0.0334 0.036 0.002 0.0345 0.036 0.002 0.0346 2.762 2.260 0.5027 3.020 2.471 0.5498 3.258 2.666 0.5929 3.597 2.943 0.654

10 3.850 3.150 0.70011 15.000 -15.000 30.00012 0.167 0.137 0.030

show that a factor of 1.5 was statistically significantly faster than any othersetting and found the global minimum as often as all other methods, thoughwithout significance.

Comparing this to using non-normalized initial temperatures of 0.03 showed asignificantly lower time taken for the non-normalized initial temperature. Forthe probability of finding the global minimum, no statistical significance couldbe shown but using non-normalized initial temperature did find the minimummore times than using the normalized temperature did. To limit the scope ofthis project we therefor focus on using non-normalized initial temperatures.

Once the initial temperature is calibrated, the temperature function and theannealing function are the primary remaining parts of SA. Using the initialtemperature 0.03 for all combinations of these functions showed that among thecombinations that found the global minimum the most times the combination of@annealingboltz and @temperatureexp showed a statistically significantly lowertime of execution.

Limited to a single combination of annealing and temperature function it is notprohibitive to perform a more detailed sweep over initial temperature. Fig. 14and Fig. 15 show that the already used initial temperature of 0.03 is likelyto be the best choice, giving a probability of 0.15 to 0.47 of finding the globalminimum.

At last, parallelization is implemented and 16 hybrid SA are run in parallel usingup to 8 workers. The attained speedup is shown in Figure 16. Measurementsof the run statistics are shown in Table 3. The probability of finding the globalminimum was 0.76(12) using a 95% confidence interval.

Finally, it is worth mentioning that the settings for the hybrid solver, as isreasonable, strongly affects both the accuracy and the speed of the execution.Using the results of section 3.2.1 as a guideline, active-set was chosen asthe method for fmincon. It was also found that limiting the first few steps in

24

Figure 11: The probability of finding the global minimum with different nor-malized temperature factors has not been statistically shown to be better forany one choice.

Table 3: Measurements of the run time for hybrid SA using the final settings16 hybrid SA runs Serial Parallel (8 workers)

Mean time (s) 4439 845Confidence interval (s) ±64 ± 37

Standard deviation of time (s) 232 102

magnitude was highly beneficial. See Table 4 for the hybrid solver settings.

Table 4: Settings used for fmincon as a hybrid solverParameter (s) ChoiceAlgorithm active-setRelLineSrchBnd 0.000001RelLineSrchBndDuration 3

25

Figure 12: A closer examination around normalized temperature factor 1.4 in-dicates that a factor of 1.4 - 1.6 is statistically significantly more likely to findthe minimum than a factor of 1.2.

Figure 13: A closer examination around normalized temperature factor 1.4 in-dicates that a factor of 1.5 is statistically significantly faster than other settings.

26

Figure 14: It is likely, though not completely statistically proven, that an initialscalar temperature of 0.03 is on average faster than most other choices from0.005 to 0.08.

Figure 15: An initial scalar temperature of 0.03 or 0.055 show a higher prob-ability of finding the global minimum than most other choices in the interval0.005 to 0.08

27

Figure 16: When running 8 hybrid SA on 1 to 8 workers the speedup is significantbut not linear.

28

3.2.4 Genetic Algorithm

For genetic algorithms the creation of children is of high importance. Choosinghow to generate new mutation or crossover children can strongly affect howefficient the algorithm is. Consider for instance an unconstrained 2D-functionwith an objective function which has low, negative values in a circular trencharound origo and quickly approaches 0 as the distance from the trench increases.In such a case, choosing crossover children by picking coordinates from eachparent at random would generally not create good children from good parents;a large part of the crossover children would be created in vain, requiring a lotof work.

As the specifics of the objective function and constraints are unknown we cannota priori tell how the different crossover functions will work for the given problem.For this reason it is of interest to test the different built-in crossover functions.See Fig. 17 and Fig. 18 for the time taken and objective value found for 5generations of GA with nonlinear constraints. The exceptions are the threeruns that converged in less than 250 seconds; these attempts converged in 3, 2and 2 generations respectively, despite stringent convergence criteria.

It’s clearly visible that the time taken for each generation doesn’t vary much,while there may be some advantage to the objective function in making an in-formed selection of the crossover function. Heuristic crossover and intermediatecrossover show a high probability of improvement, albeit that the improvementis small and none of the runs were close to finding the global minimum of 0.0036.

Figure 17: The time taken per generation is not strongly influenced by thechoice of crossover function.

For details on how these crossover functions work, refer to MathWorks Global

29

Figure 18: None of the default crossover functions with default settings managesto find the global optimum in 5 generations.

Optimization Toolbox User’s Guide (2012).

The next phase is to investigate how these results vary with the crossover frac-tion (the rate of crossover children to mutation children) and, for heuristiccrossover, the ratio, which is an additional setting for the generation of crossoverchildren using heuristic crossover. See Fig. 19 - 21.

Again, these figures show the objective value after 5 generations of GA. Whilethe result has improved, it is still far from finding the global minimum even once.Furthermore, runs of up to 15 generations of GA shows slow or insignificantimprovement for further generations up until a point where the time requiredis so much larger than other reliable methods, such as MultiStart, that anyfurther attempts to improve GA seems futile; a serial MultiStart can completethe optimization in the same time as serial GA can perform 10 generations,which is not enough for finding the global minimum. As a final nail in thecoffin, Fig. 22 shows that GA does not even scale well so for parallel executionGA will be comparatively even worse.

For this reason GA will not be considered in more depth in this report.

30

Figure 19: Varying the crossover fraction for intermediate crossover improvesthe solution but still does not find the global optimum.

Figure 20: Varying the crossover fraction for heuristic crossover improves thesolution but still does not find the global optimum.

31

Figure 21: Varying the ratio for heuristic crossover improves the solution butstill does not find the global optimum.

Figure 22: There is a measurable speedup for GA, but it is much less thanlinear.

32

3.2.5 Pattern Search

Pattern search has many options that made it difficult to choose the best onesfor our tests. In the beginning our aim was to test all options and find suitablesettings for running the objective function which we got from ABB. After doingsome tests on smaller problems we found out that all possible cases for MADSmethods are meaningless for the objective function of our interest. They weretoo slow, so it wasn’t possible to test all settings for MADS as poll and searchmethods. The same opinion we got about pattern search using genetic algorithmas a search method. Both MADS and searchga showed very bad performanceand were not so accurate. The main interest of our experiments was: GSS, GPSas poll and search methods and searchlhs as a search method. The cases werecreated in this way:

1. Running the default case with default setting serial and parallel.

2. Running another cases and choosing tolerance for changes in x ‘TolX’,‘CompletePoll’, ‘on’, specifying search and poll methods, using parallelismwhen we specified more than 0 workers, and some cases had an option‘TolBind’ - binding tolerance, to check if the constraints are active.

The reason of such choice is the time: it wasn’t enough time to try everything,so for the fastest and the most accurate cases we could try some extra settingsto improve the result.

4 Results

4.1 Gradient based methods

The best result in speed, accuracy and parallel speedup was obtained when usingMultiStart that used the active-set optimization technique when runningfmincon. The convergence criteria were slightly reduced to 1e-4. Furthermore,maximum allowed iterations was limited to 5. The result of speedup and timetaken is shown in Figures 23 and 24.

33

Figure 23: Final speedup for MultiStart with active-set and maximum al-lowed iterations five on an eight core machine. Comparison is also made whithdefault case when no limit on maximum allowed iterations is imposed.

Figure 24: Final time plot for MultiStart, with active-set and maximumallowed iterations five, on an eight core machine. Comparison is also madewhith default case when no limit on maximum allowed iterations is imposed.

r

34

4.2 Hybrid simulated annealing

The simulated annealing part of hybrid SA can be very fast compared to theother solvers due to the fact that it does not use the computationally heavynonlinear constraints shown in Fig. 34. It can also be moderately accurate,with a probability of 0.76(12) probability (95% confidence) of finding the globalminimum when running 16 instances. The naıve implementation of parallelism,letting each worker run one or more instances of hybrid SA, gives close to linearspeedup when using two or more instances per worker. Using more instanceswould make the speedup approach linearity, but would increase the total timetaken.


Genetic algorithms took approximately 60 seconds per generation and didn’tmanage to find the global optimum even once when using 15 or fewer genera-tions. The speedup on up to 8 workers was low, up to only twice the speed.

4.4 Pattern search

Pattern search showed accurate results during experiments on all tested objec-tive functions. The biggest problem with pattern search was its computationalspeed. All experiments on all objective functions took very long time what be-came a reason for the choice of test parameters for the main objective function.That is why we didn’t try many options for MADS search and poll methods,searchga as a search method and searchlhs as a search method.

4.4.1 Pattern search on Windows-cluster

In this section we present the results for the main objective function. BecauseMEX-file was done for Windows we had no opportunity to run our optimizationon machine with more than 8 cores. It became the reason for doing most ofsimulations on the main function. After doing several experiment for differentmethods on different numbers of cores the resulting data for the optimal valuesof the main objective function is shown in Fig. 25.

The Figure 25 shows that the best value of the objective function is 0.0026,but this value was produced during optimization with default value for bind-ing tolerance when ‘TolBind’=1e-3. To get the right result it is important tospecify binding tolerance because it will show the result when the constraintsare active. Otherwise the point could be found outside of the feasible region.So by specifying the binding tolerance as ‘TolBind’=1e-6 or 1e-10 the resulthas changed to the value 0.0036, which we can see as a result from sequencenumber 18. Binding tolerance can affect the computational time. When it isspecified smaller than a default value our method can converge faster because

35

Figure 25: The smallest values of the objective function at different sequencesof runs (sequence here is several runs for the same method on different numbersof cores)

it won’t compute outside of the feasible region. Computational time is indeeda very important parameter when someone wants to choose the right methodfor finding the correct result. In pattern search we had methods which gaveus the desirable values faster than other methods. GPSPositiveBasis2N is thefastest search method from all deterministic methods. We had some stochas-tic methods like search using genetic algorithm, MADS and search using LatinHypercube algorithm. This means that if we use the same starting point forperforming our optimization we will get different result each time we are run-ning our code. MADS showed different results when we gave the same startingpoint and it took almost the same time for computing the values no matterthat the computation was performed on different numbers of processors. ForMADSPositiveBasisNp1 the results is shown in Fig. 26.

From the Figure 26 we can see that MADSPositiveBasisNp1 is an expensivepoll method which is not good to use if you want to save your time, and not soaccurate. Also one can notice that there won’t be possible to see any speedup(here blue colour bars are symbolizing 2 workers, green - 6 workers, red - 8workers).

MADSPositiveBasis2N is faster than the previous MADS-method but even ifwe use a big cluster we won’t win any time, as we can see on Fig. 27 for thismethod the time almost the same for all our computations. Depending on howthe algorithm will generate vectors we will get different results every time werun our computation. The results are not accurate as well.

On this figure blue color is symbolizing run on 2 workers, green - on 6 workers,red - on 8 workers. Easy to see that increasing number of workers doesn’t giveus any win in time at the second sequence and very little time between 8 and 6

36

Figure 26: Computational time for different sequences of runs using MADSPos-itiveBasisNp1 as a poll method.

Figure 27: Computational time for MADSPositiveBasis2N as a poll method.

workers at the first sequence.

The best method of all is GPSPositiveBasis2N: here one gets an accurate resultfast; speedup is one of its characteristics when running on Windows-cluster.When one chooses the search method the same as the poll method the algorithmperforms only search what will save the time. The best speedup of all usedmethods we can observe for GPSPositiveBasis2N is shown in Fig. 28.

The Figure 28 was the best observation of all concerning speedup. One cansay that probably speedup can be seen on MADS computation from the first

37

Figure 28: Speedup for GPSPositiveBasis2N as poll and search methods.

sequence, but when we try to run the method several times the observationswon’t be the same. There is no stable speedup for any method from algorithmsfor pattern search except GPSPositiveBasis2N. The accurate optimal values canbe found for the other algorithms, of course. But if one will consider all desirableparameters for finding the best method for performing an “online” optimizationGPSPositiveBasis2N can be the one.

Another part of the study was the search method using Latin Hypercube algo-rithm. It is a stochastic method. Comparing to MADS method searchlhs gavealways the same correct result, was faster than MADS but wasn’t possible tomake any conclusions about its speedup. Fig. 29 shows that the computationaltime is different for the same initial conditions.

Important to mention is that for running searchlhs algorithm we have chosenthe fastest and the most accurate poll method. Probably if one would chooseMADS-polling the results might differ. Here the optimal value was the sameafter every run, probably, because of chosen poll method.

38

Figure 29: Computational time for different runs of searchlhs with GPSPosi-tiveBasis2N polling on a single core.

4.4.2 Pattern search on a Linux-cluster

Here the objective function was different from the previous experiments. Thegoal was to see if the use of Matlab’s Distributed Computing Server Softwarecould improve the results, i.e. if one could get the accurate result faster. Theobjective function was different from the previous function which we had onWindows-cluster. The resource had 32 cores for usage. All the methods showedthe same behaviour which was seen in previous experiments: no speedup oraccuracy for MADS and searchga methods, GPSPositiveBasis2N showed thebest results concerning accuracy and speedup when we chose the same searchand poll methods what disabled polling. The best value of objective functionwas approximately -400.

The speedup for GPSPositiveBasis2N can be seen on the next figure.

As we see in the beginning of the figure when we specify 0 workers we get theresult faster than when we specify 1 worker because of time which was spenton “unnecessary” communication, by sending the job to one worker. On 32workers we got speedup up to 7.1631 what is a good value comparing with theoptimization on 8 cores where this value was around 2.

39

Figure 30: Speedup for GPSPositiveBasis2N specified as search and poll meth-ods.

4.5 Comparison of methods

For serial evaluation patternsearch is on average the fastest method but MultiStartis only slightly slower and performs much more consistently, as shown in Fig. 31.If any parallel processing is available, MultiStart scales better than patternsearch,as shown in Fig. 32 and at 8 workers it’s almost 3 times as fast, as shown inFig. 33. It should be noted that for this data, all methods consistently foundthe global minimum. Genetic algorithms are not represented in Fig. 31 andFig. 33 since they failed to find the global optimum.

Figure 31: For serial evaluation patternsearch provided the on average fastestmethod while MultiStart is almost as fast but much more consistent.

40

Figure 32: MultiStart and hybrid SA show decent speedup, though not linear,while the other methods show much worse scaling.

Figure 33: For parallel evaluation using 8 workers, MultiStart is 3 times asfast as the second fastest method, patternsearch.

41

5 Discussion

5.1 Gradient based solvers

First, why is active-set and sqp faster than the interior-point method?This is probably due to that the global minimum lies on the border of the feasibleset, e.i. at least one of the constraints are active at that point. This shouldmake the interior-point method slow since it initially forces the solutionto the interior of the feasible region and hence needs to iterate over a longerdistance than sqp and active-set. Furthermore, the global minimum may onlybe reached when approaching it on the border of the feasible set, or even worsefor interior-point, from the infeasible side of the inequality constraints.

Even though sqp has more effective linear algebra routines to solve the resultingsystem of equations, active-set is consistently the fastest method. In thiscase, the size of the problem is probably too small for the efficiency in the sqp

algorithm to be fealt in the computations. Also, as shown in Fig. 34, readingthe non-linear constraints is very expensive. Therefore, in this particular case,the performances of the algorithms rely heavily on how they treat the non-linear constraints. As the sqp method, in contrast to active-set, sometimesapproximates the nonlinear constraints using a second order taylor expansionthis should slow down sqp compared to active-set.

Figure 34: The time required to evaluate the nonlinear constraints dominatethe time it takes to evaluate the objective function.

Regarding the assumption we made for MultiStart, we said that if a startingpoint needs more than 5 iterations to find the global minimum it won’t probablyfind it and is therefore not interesting. This assumption came from the resultsin our statistical investigation that most of the points that converged to the

42

global minimum did not need more than 5 iterations. If more iterations wereneeded, the global minimum will probably not be found or not converge at all.The computational effort needed to take iterations above 5 is thus most likelyunnecessary.

We also made an statistical investigation on how the restricted MultiStart

behaved. By investigating a run with MultiStart using 100000 uniformly dis-tributed starting points we found out that 82 % of the starting points didn’tfound any minimum, they either stopped when reaching it’s 5th iteration orcouldn’t find a feasible point. Furthermore, 16.9 % of all the starting pointsconverged to the global minimum. Finally, 1.1 % did converge but not to theglobal minimum, the result is showed in Fig. 35. Thus, when using MultiStart

with maximum allowed iterations 5, you should use at least 17 starting pointsto find the global minimum with a probability larger than 0.95.

Figure 35: Convergence study on MultiStart using active-set and maximum5 iterations from each

starting point.

Another result from the limitation of allowed iterations is that we obtainedbetter parallel speedup, see Fig. 23. This is due to that we get better loadbalance since the work done on each worker can’t be more than 5 iterations. Inthe case when no limitation was imposed, the iterations could vary a lot, seeFig. 8.

Regarding GlobalSearch we were able to find the global minimum, at least whenthe radius of the assumed basins where sufficiently decreased and the thresholdvalue sufficiently increased. Nevertheless, GlobalSearch wasn’t fast enoughwhen compared to the restricted variant of MultiStart. GlobalSearch is morecomplicated in how it generates it’s starting points and how good starting pointsare selected to run fmincon from.

5.2 Hybrid simulated annealing

There are, undeniably, issues with implementing simulated annealing for opti-mization problems with other constraints than bounds. When using SA on thegiven objective function and constraints, it tends to quickly leave the feasibleregion and find infeasible points that are better than any feasible ones. Imple-

43

menting a hybrid solver, such as fmincon, inevitably leads to the conclusionthat for simulated annealing to be of any use, this process must either find bet-ter points, or find equally good points faster, than a random selection withinbounds.

For sufficiently low-dimensional problems this might be solved by dividing thebounds-limited domain into separate disjoint regions and using SA limited tothese new regions. A hybrid solver would then be applied, but limited to theoriginal constraints. This has not been studied in this report and it is question-able if it is relevant for a 12-dimensional problem. Nevertheless, it is a possiblepath to pursue if one wishes to.


The problems with genetic algorithms exist in many aspect of the solver: it’s tooslow, it’s not accurate and it has low speedup. The speedup is not somethingwhich can be improved on much, as parallel evaluation of the objective functionwill never be as good as running several solvers in parallel (compare MultiStartusing fmincon or hybrid SA), but it is possible that the speed and the accuracycan be improved.

One thing affecting the accuracy is the crossover function and as evidenced insection 3.2.4 the default ones do not provide very good results. It might bethat a custom crossover function must be constructed, exploiting some form ofstructure in the objective function, to make genetic algorithms a viable choice.

A second thing which might improve the speed is to study in detail how manysub-generations are used for each generation and if there’s any possibility ofmaking each generation use less sub-generations without adversely affecting theaccuracy per generation. Genetic algorithms do have a slow convergence rateand tend to stall for multiple generations before fulfilling convergence criteria,meaning that a stricter convergence criteria for the sub-generations may give asignificant improve in speed.

However, any change which only affects the speed, not the accuracy, would haveto reduce the speed by at least a factor of 9 (from roughly 900 seconds for 15generations to 100 seconds) to make it a choice as good as MultiStart.

5.4 Pattern search

There was no doubt from the beginning that pattern search would be an ex-pensive method for optimization because it computes the values of objectivefunction without using any derivatives.

With a given tolerance the result can be as accurate as we want.

Computations went faster when we specified the binding tolerance because thesolver didn’t need to “go” outside of the feasible region. Some speedup was pos-

44

sible to see for non-stochastic methods, was expected. Good speedup and quickresult for the GPSPositiveBasis2N as search and poll methods were predictablebecause function patternsearch had to perform only search, polling was disabled.There might be a possibility to get a better speedup if one has an access to thecode for pattern search. Unfortunately, we were dealing with methods whichwere represented as “black-boxes” in Matlab. That made it difficult to judgewhat the creators of Matlab’s Global Optimization Toolbox could possibly missin writing different methods for patternsearch. Another problem in Matlab isthe existing parallelism in Parallel Computing Toolbox: the model fork/joinand the way of taking the information from different cores make it almost im-possible for implementing any other kinds of parallelism which could exist in analgorithm. The best opportunity for getting a better speedup is to implementthe algorithms in C, C++ or FORTRAN.

5.5 Improving speedup

It is evident from the results that none of the methods currently tested achieveslinear speedup, but also that there is a significant difference between the speedupof one on hand MultiStart and Hybrid SA and on the other hand patternsearch,Genetic Algorithms and finally GlobalSearch. This proves the existence of dif-ferent implementations of parallelism that the five methods use.

MultiStart and Hybrid SA use point-wise parallelism; that is, they need toevaluate several points independently from each other and parallelize by lettingeach worker handle a single point at a time. This means that the only paralleloverhead comes from communication delays. Note that the variance in the timefor a solution will also matter, and this could be strictly controlled by usingMultiStart. It should be possible to achieve linear or close to linear speedup bysimply having a master thread communicate to a number of workers that worksneed to be done; the master thread then prepares to receive results from anyworker while the workers send their results once they are done. It’s possible thatMatlab’s Distributed Computation Server, MDCS, improves the performanceover the Parallel Computing Toolbox, PCT, but this is not something which wehave been able to study.

patternsearch and Genetic Algorithms use a parallelism which distributes theevaluation of the objective and nonlinear constraints function to workers whenthese need to be computed. This means that there will be a much higher re-quirement for communication than when using point-wise parallelism. As themethods need to do work which isn’t run in parallel, these methods will neveracquire linear speedup. Furthermore, the easier the objective and nonlinear con-straints functions are made, the worse the speedup will become as the fractionof the time that is parallel overhead will increase (note though that it’s stillpossible that the total execution time decreases). The simplest way to possiblyimprove speedup is then to increase the number of points which has to be eval-uated in each iteration, as this will improve load balance and make a larger partof the execution be able to be parallelized. However it is not unlikely that thiswill in fact increase the work needed in each iteration so much that the solution

45

time actually increases-

GlobalSearch in itself has no built-in parallelism, but it is possible to setfmincon to be evaluated in parallel. This is the same type of parallelismas patternsearch and Genetic Algorithms have, except that a larger part ofGlobalSearch is serial when compared to Genetic Algorithms and patternsearch.In essence, GlobalSearch uses this parallelism when estimating the gradientsand hessians using finite differences so there might be an improvement in speedupby using higher order accuracy estimates in these cases. However, it seems highlyunlikely that anything close to linear speedup could be acquired.

6 Conclusions

6.1 Current state

Among the five functions studied, MultiStart is the best candidate for furtheruse. With sufficiently many random starting points it can provide a statisticallysecure probability of finding the global optimum. As it is also common that themethod converges in few iterations when it converges to the global optimum, themaximum number of iterations can be limited, indirectly limiting the amountof time for highly parallel optimization (1 point per worker).

Hybrid simulated annealing is used as essentially a MultiStart with anotherway of generating the initial points. This does not provide any tangible benefitover using MultiStart and as such must be seen as inferior for this problem.

Genetic algorithms are slow and even unsuccessful in ever finding the globalminimum. It is possible that adjusting the settings could make them moreefficient, but it is deemed unnecessary when easier and faster options, such asMultiStart, are available.

patternsearch is a viable choice for serial computations but lack the speedupto rival MultiStart on parallel systems.

6.2 Future work

For future studies we recommend an investigation on how MultiStart behaveson larger clusters. In our case, we were only able to study the performance onan 8 core windows cluster. It would also be of interest to implement MultiStartwith active-set on a more computationally efficient code like C or FORTRAN.

Regardless of which method is used, it is clear that the time for evaluatingthe nonlinear constraints must be reduced. While using more powerful com-putational cores might improve the situation, focus should be on reducing thecomplexity of the nonlinear constraints.

46

References

Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. (1983). Optimization by simulatedannealing, Science 220: 671–680.

MathWorks Global Optimization Toolbox User’s Guide (2012). MATLABGlobal Optimization Toolbox User’s Guide, R2012b.

MathWorks Optimization Toolbox User’s Guide (2012). MATLAB Optimiza-tion Toolbox User’s Guide, R2012b.

Pedamallu, C. S. & Ozdamar, L. (2008). Investigating a hybrid simulated an-nealing and local search algorithm for constrained optimization, EuropeanJournal of Operational Research 185(3): 1230–1245.URL: http://EconPapers.repec.org/RePEc:eee:ejores:v:185:y:2008:i:3:p:1230-1245

Tamara G. Kolda, Robert Michael Lewis & Virginia Torczon (2003). Opti-mization by direct search: New perspectives on some classical and modernmethods, SIAM Review 45: 385–482.

47

parallel optimization in matlab - uppsala university · parallel optimization in matlab joakim...

Documents