de-vns: self-adaptive differential evolution with crossover neighborhood search for continuous...

DE-VNS: Self-adaptive Differential Evolution with crossoverneighborhood search for continuous global optimization

Darko Kovačević a,n, Nenad Mladenović b, Bratislav Petrović a, Pavle Milošević a

a Faculty of Organizational Sciences, University of Belgrade, Jove Ilića 154, 11000 Belgrade, Serbiab Department of Mathematics, Brunel University, London UB8 3PH, UK

a r t i c l e i n f o

Keywords:Global optimizationHybrid heuristicsDifferential EvolutionVariable Neighborhood SearchSelf-adaptation

a b s t r a c t

In this paper, we suggest DE-VNS heuristic for solving continuous (unconstrained) nonlinear optimiza-tion problems. It combines two well-known metaheuristic approaches: Differential Evolution (DE) andVariable Neighborhood Search (VNS), which have, in the last decade, attracted considerable attention inboth academic circles and among practitioners. The basic idea of our hybrid heuristic is the use of theneighborhood change mechanism in order to estimate the crossover parameter of DE. Moreover, weintroduce a new family of adaptive distributions to control the distances among solutions in the searchspace as well as experimental evidence of finding the best probability distribution function for VNSparameter supported by its statistical estimation. This hybrid heuristic has shown excellent character-istics and it turns out that it is more favorable than the state-of-the-art DE approaches when tested onstandard instances from the literature.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

This paper deals with multimodal problems that occur whensearching global optimum in difficult unconstrained nonlinearproblems over continuous spaces. Their general form is givenbelow:

ðminÞf ðxÞ; xAXDRD ð1Þwhere f : RD-R is generally nonlinear, non-convex functiondefined on RD, and X is a feasible set.

If the dimension of the problem is sufficiently large, theproblem (1) becomes almost impossible to solve precisely in areasonable time frame. That explains why so many generalheuristic approaches (metaheuristics) have been proposed so far(see [1–5]). One of the major problems in implementation ofmetaheuristic techniques in solving (1) is estimation of method’sparameters. In literature [6], the parameter adaptation techniquesare divided into three categories: deterministic, adaptive, and self-adaptive control rules. Deterministic rules modify the parametersaccording to certain predetermined rationales without usingany feedback from the search process. Adaptive rules incorporatesome form of feedback from the search procedure to guide the

parameter adaptation, whereas self-adaptive rules directly encodecontrol parameters into the individuals and evolve them togetherwith the encoded solutions.

One of the popular methods used for solving these problems isDifferential Evolution (DE), proposed by Storn and Price [7]. It hasgained much attention with successful applications in differentdomains (e.g. [8–10]). DE algorithm is simple and straightforward;the idea contains three main parts: strategy, crossover and selec-tion. Strategy diversifies the population, alleviating such problemsas the premature convergence. Although there are many strate-gies, only a few may be suitable for solving a particular problem.Crossover increases the potential diversity of the populationdetermining the extent of the parent vector that will survive tothe next generation, whereas selection is based on the solutionquality. Vesterstroem and Thomsen [11] compared the DE algo-rithm with Particle Swarm Optimization (PSO) and EvolutionaryAlgorithms (EAs) on numerical benchmark problems. For themajority of problems, DE outperformed both PSO and EAs in termsof the solution’s quality. Sun et al. [12] proposed a combination ofDE and the Estimation of Distribution Algorithm (EDA), in whichthe search for a promising area uses sampling of new solutionsfollowing the distribution function. Liu and Lampinen [13]reported that the effectiveness, efficiency, and robustness of theDE algorithm are sensitive to the settings of control parameters,namely mutation parameter F and crossover parameter CR. Thebest settings for the control parameters may vary for differentfunctions as well as for the same function with different require-ments. A "fast EP" (FEP), proposed by Yao et al. [14], is using a

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/caor

Computers & Operations Research

0305-0548/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.cor.2013.12.009

n Corresponding author. Tel.: þ381 113950852E-mail addresses: [email protected] (D. Kovačević),

[email protected] (N. Mladenović),[email protected] (B. Petrović),[email protected] (P. Milošević).

Please cite this article as: Kovačević D, et al. DE-VNS: Self-adaptive Differential Evolution with crossover neighborhood search forcontinuous global optimization. Computers and Operations Research (2014), http://dx.doi.org/10.1016/j.cor.2013.12.009i

Computers & Operations Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎

www.sciencedirect.com/science/journal/03050548

www.elsevier.com/locate/caor

http://dx.doi.org/10.1016/j.cor.2013.12.009



mailto:[email protected]









Cauchy mutation as the primary search operator (instead ofconventionally used Gaussian). Lee and Yao [15] described afurther generalization of FEP by using a mutation based on theLevy probability distribution. In this way, variations of a singlemutation enable discovering a wider search area than the one gotby Gaussian distributions; whereas large variations of the mutatedoffspring can help to escape from the local optima.

Furthermore, three crucial control parameters involved in DE:population size N, mutation parameter F , and crossover rate CR,may significantly influence the optimization performance. Differ-ent problems usually require different settings for the controlparameters. Self-adaptation allows a DE strategy to adapt itself toany general class of problems by reconfiguring [16–18]. In classicDE, mutation and crossover parameters are fixed (F ¼ Fi, CR¼ CRi),whereas in the most of self-adaptive DE algorithms each indivi-dual i is associated with its own value of these parameters [19].

Mladenovic and Hansen [20] presented Variable NeighborhoodSearch (VNS) approach, a metaheuristic which does not followsearch trajectory. It explores increasingly distant neighborhoods ofthe current best solution. If a better solution is found, VNS jumpsfrom the current solution to the new one. There are a variety ofhybrids, where VNS is combined with other methods. Mladenovicet al. [21] presented the idea of using several geometric neighbor-hood structures and random distributions in the shaking step. Thisidea led to the Glob-VNS, which turned out to be noticeably moreefficient than the variants with fixed geometry and fixed distribu-tion. Instead of using a sequence of neighborhoods and shaking bysampling, Carrizosa et al. [22] defined a sequence of shakingdistribution derived from Gaussian n-variate distribution. Thisheuristic is called Gauss-VNS. Yang et al. [23] suggested hybridiza-tion of DE with neighborhood search. The resulting algorithm,known as NSDE, performs mutation by adding a normally dis-tributed random value to each target vector.

In this paper, we propose an algorithm based on DE, whichincorporates the features of VNS approach. Particular attention isgiven to the mechanism of self-adaptation of crossover parameterCR, based on systematic change of neighborhoods.

The paper is organized as follows. In the next section, a shortreview of conventional DE is provided. Section 3 describes theproposed DE-VNS. In Section 4, computational results are presented,demonstrating better performance of our DE-VNS in comparisonwith the state-of-the-art DE variants. Finally, concluding remarks andguidelines for the future work are summarized in Section 5.

2. Differential Evolution

This section provides a brief summary of the basic DifferentialEvolution (DE) procedure, and introduces the notation and pre-vailingly used terminology.

DE is a parallel direct search method which utilizes D-dimen-sional parameter vectors as a population for each generation. Theinitial population fxð0Þi ¼ ðxð0Þj;i Þji¼ 1;2; :::;N; j¼ 1;2; :::;Dg is chosenrandomly from entire solution space RD. After initialization, DEenters a loop of evolutionary operations: strategy, crossover, andselection.

2.1. Strategy

Strategies in DE enable the algorithm to explore the searchspace while maintaining diversity. At each generation g thisoperation creates mutation vectors v gð Þ

i , i¼ f1;2; :::;Ng based oncurrent parent population. There are two basic strategies used indifferential evolution heuristics:

� The }DE=rand=1=bin} strategy usually shows slow convergencespeed and bears stronger exploration capability. This strategy is

therefore best suited for solving multimodal problems, outperform-ing all other basic strategies. Mutation vector for this strategy isgenerated in the following way:

vðgÞi ¼ xðgÞr0 þFiðxðgÞr1 �xðgÞr2 Þ ð2Þ

where r0; r1; r2 are distinct integers chosen from the set f1;2; :::;Ng.The }DE=best=1=bin} strategy usually has the fast convergence

speed and performs well when solving unimodal problems. How-ever, it is more likely to get trapped in a local optimum andthereby leads to a premature convergence when solving multi-modal problems. This mutation strategy has the following form:

vðgÞi ¼ xðgÞbestþFiðxðgÞr1 �xðgÞr2 Þ ð3Þ

where r1; r2 are distinct integers chosen from the set f1;2; :::;Ng,and x gð Þ

best is the best solution in the generation g.In the general case, all three vectors from the right hand side of

(3) belongs to RD. Therefore, the distance between x gð Þbest and

mutation vectors v gð Þi may be defined in a usual way, e.g., as

Euclidean distance. How close they are depends on the mutationparameter Fi.

2.2. Crossover

Following the application of a chosen strategy, crossover comesas the next phase. The undetermined offspring u gð Þ

i is generated bythe crossover operation on mutation vector v gð Þ

i as

uðgÞj;i ¼

vðgÞj;i ifðrandð0;1ÞrCRðgÞi Þ or ðj¼ jrandÞ

xðgÞj;i otherwise

8<: ð4Þ

where jrand stands for randomly chosen index in order to ensurethat the offspring vector u gð Þ

i does not duplicate v gð Þi , and randð0;1Þ

represents a uniform random value between 0 and 1.In this step it has to be decided whether to take a coordinate of

the mutation vector or coordinate of a parent vector in generatingsolution for the new generation. In that way the dimensionaldistance between any two points x and y, x; yDRD is implicitlydefined as d x; yð Þ ¼ k, if k variables of x and y have the differentvalues.

The dimensional distance d U ; Uð Þ as defined above obviouslysatisfies the metric properties: (i) dðx; yÞZ0; (ii) if d x; yð Þ ¼ 0 thenx¼ y; (iii) dðx; yÞ ¼ dðy; xÞ; (iv) for all x; y; zDRD holds dðx; yÞrdðx; zÞþdðz; yÞ. Therefore, ordered pair ðRD; dÞ is a metric space. Thisproperty we will use in the next section.

In our view, the success of DE based heuristics for solvingcontinuous optimization problems is grounded on the use of twodifferent metric functions in defining every potential member ofthe new generation.

2.3. Selection

The aim of the selection phase is to determine better one fromthe offspring vector u gð Þ

i and the parent vector x gð Þi . The better

vector survives into the next generation, and the worse vector isdiscarded. The selection operation can be expressed as follows:

xðgþ1Þi ¼

uðgÞi if f ðuðgÞ

i Þo f ðxðgÞi ÞxðgÞi otherwise

8<: ð5Þ

where xðgþ1Þi is the parent vector in the next generation gþ1.

These 3 steps are repeated until some specific terminationcriteria are satisfied. Termination criteria can be: reaching thevalue of success threshold, exceeding the number of generations,maximum CPU time used in the search, etc.

D. Kovačević et al. / Computers & Operations Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎2





3. DE-VNS for continuous global optimization

Variable Neighborhood Search is a metaheuristic based on aprinciple of systematic change of neighborhoods within the search[24]; VNS and its extensions are characterized by the simplicity ofthe basic scheme. The search in the different neighborhoods isperformed in a deterministic, stochastic or combined way. The firstof these is a VNS with deterministic rules, called Variable Neigh-borhood Descent (VND), whereas the second is a reduced VNS thatuses stochastic rules. The Basic VNS is the third variant and itcombines a random selection of a point in shaking or perturbationstep followed by a deterministic local search from that point.Inspired by the main idea of VNS, we present a hybrid systembased on the DE algorithms that we extend applying the conceptof a variable neighborhood search. The main ideas of the proposedDE-VNS algorithm are elucidated as follows.

3.1. Strategy

So far numerous approaches have been proposed in order tofind adequate criteria for the selection between }DE=rand=1=bin}and }DE=best=1=bin} strategies. On one hand, it should enableglobal search over the solution space, and on the other it shouldprovide a rapid convergence of the strategy. In this paper, wewill introduce a modification of the }DE=rand=1=bin} strategythat comprises the characteristics of }DE=best=1=bin}, named}DE=order�random=1=bin} (see [25] for details regarding differentDE strategies). The main idea is that within the }DE=rand=1=bin},we try to find a local}DE=best=1=bin} strategy as follows:

vðgÞi ¼ xðgÞr0 þF ðgÞi xðgÞr1 �xðgÞr2

� �ð6Þ

where r0; r1; r2 are distinct integers randomly chosen at each stepfrom the set f1;2; :::;Ng. Vectors x gð Þ

r0 ; xgð Þr1 ; x

gð Þr2 should satisfy the

following condition:

f ðxðgÞr0 Þo f ðxðgÞr1 Þ and f ðxðgÞr0 Þo f ðxðgÞr2 Þ ð7Þ

Therefore, the strategy is in between }DE=rand=1=bin} and}DE=best=1=bin}.

3.2. Estimation of the mutation parameter F

In the conventional DE, the choice of three control parameters(i.e. the population size N, the mutation parameter F , and thecrossover parameter CR) tips the scales in determining the out-come of the optimization. There are a great number of empiricalstudies that try to find their best estimates (e.g. see [26]).However, there is no strict set of parameters related to the broadergroup of practical problems. Higher values of the population size Nallow greater diversification in the solution space and therefore, abetter exploration. On the other hand, large population causesslower convergence, which is reflected in the speed of thealgorithm when solving large scale optimization problems. Notethat as the values of the mutation parameter F grow, the portion ofthe solution space explored grows as well. The problem ofselecting the mutation parameter F is solved by introducingroulette methods [27]. Roulette methods gradually prefer thevalues of mutation parameters that were proven successful inthe previous iterations.

The parameter F is estimated by introducing the competition.Let us assume that there are H different values for the parameter Fand that we choose one among them at random with theprobability ph;h¼ 1;2; :::;H. The probabilities are adjusted accord-ing to the success rate of the preceding steps of search through thesolution space. The hth value of F is successful if it generates anoffspring better than a parent, f ðuðgÞ

i Þo f ðxðgÞi Þ. When nh is the

current number of the f ðuðgÞi Þ setting successes, the probability ph

can be calculated as the relative frequency as

ph ¼nhþn0

∑H

i ¼ 1ðniþn0Þ

¼ nhþn0

n0Hþ ∑H

i ¼ 1ni

ð8Þ

where n040 is a constant. Taking n041 prevents a dramatic changein ph by the use of the hth parameter settings. If any of theprobabilities decreases below a given threshold δ; δA ð0;1Þ, thecurrent value of ph is reset to their starting value ph ¼ 1=H. Thecompetition prefers the parameters related to the success, in such away that the algorithm self-adapts each time when the better solutionis found over probabilities ph within the selected settings.

3.3. Crossover parameter

The degree of similarity between offspring and parent vectordefined earlier varies as it depends on the values of crossoverparameter CR. We define the degree of similarity of these twovectors as follows. The nearest dimensional neighborhood of theparent is associated to the small value of the parameter CR, as isillustrated on the additive problem in Fig. 1.

Higher value of crossover parameter CR allows larger dimen-sional neighborhood, and the offspring vector can change moredimensions at once. Each approach has its own advantages anddisadvantages. If crossover parameter CR is small, the algorithmmay not be able to perform jump that could find a better solutionby changing the small number of dimensions, as shown in Fig. 2.

In the example of the rotated ellipse, Salomon [28] explainswhy CR parameter should not have low values during the wholecourse of the optimization. For large values of CR, the entirepopulation could converge too fast and remain trapped in a localoptimum. This is the reason behind starting our parameter fromthe neighborhoods closer to the parent vector, in order for thepopulation not to converge too fast. However, if our algorithm isfound in the local optima or is on the correct path to the globaloptimum, we strive for higher values of CR for two reasons:

� To provide the algorithm to escape from local optima bychanging values of several variables, allowing an effective jumpto any point over the solution space;

� To speed up the convergence, if the algorithm is located in thevicinity of global optimum.

We rely on the choice of probability distributions from whichwe sample the parameter CR. In the similar line, we do not wish todefine a clear boundary among dimensional distances of a parentvector, in the sense of VND, opting instead to set only the expectedvalue of dimensions. In this way we will use a family of parametricdistributions that would allow us to effectively select expectednumber of dimensions from the parent vector neighborhoods.

3.3.1. Estimation of the crossover parameter CR with VariableNeighborhood Search

Here the method of selecting a crossover parameter CR value isconsidered. Numerous studies show that this parameter is difficultto choose since there is no best value for all test instances.Therefore, we decide to change crossover parameter CR duringthe execution of the method, in the way how the neighborhoodparameter k is changed within VNS. In addition, we introduce afamily of adaptive distributions that depend on crossover neigh-borhood parameter par. The CR parameter is then stochasticallychosen with respect to these distributions. The Variable Neighbor-hood Search included in solving this problem works in thefollowing manner. When the algorithm finds an offspring vectorthat is better than the parent, in the following iterations, a

D. Kovačević et al. / Computers & Operations Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3





neighborhood parameter par is required to reduce its value,ensuring low crossover parameter value; so that the entirepopulation would not converge too quickly i.e. to achieve a bettersearch performance over the solution space. In those cases wherethe algorithm cannot find a satisfactory solution, it then appliesthe stepf actor in order to gradually increase the neighborhoodparameter par, which in turn allows additional dimensionalneighborhoods around the parent vector to be searched. In casea favorable offspring vector is found, the algorithm resets the parvalue based on improvement of the fitness function. This propertyof the parameters allows the algorithm to adapt to the givenproblem, sidestepping the need for parameterization of the algo-rithm on the part of the user. The more detailed description will begiven below.

3.3.2. Determining the distributions from which CR derives valuesThe analysis is opened with the question on the distributions

from which crossover parameter CR derives its values. We firstobserve that nonlinear test functions may be divided into threemain classes:

1. Continuous multimodal (non-convex) problems: Schwefel, Ack-ley, Griewank, Rastrigin, Molecular potential energy (MPE) [29],Michalewicz0s function and Six-hump;

2. Continuous unimodal (convex) problems: Sphere, Rosenbrockand De Jong0s variations;

3. Non-continuous problems: step function, molecule or spherepacking [30].

Each test function in each group was solved by DE 100 times usingthe algorithm }DE=rand=1=bin} with CR and F parameters randomlysampled from the uniform distribution. The values of F and CR wererecorded in the case when the fitness function value improved and thealgorithm achieved the predefined tolerance. Bins are defined in sucha way that they contain the recorded values of the CR parameter fromthe initial iterations, midway iterations, and from the final stages ofthe algorithm, but only in cases when the algorithm succeeds toachieve the global optimum. Empirical probability distribution func-tions of defined classes are presented in Fig. 3.

From Fig. 3 it is evident that our classification of test instancesinto 3 classes has a lot of sense taking into account differentbehavior of empirical distribution functions. The goal of ouranalysis is to make a family of parametric distributions that mimicthe distributions presented in Fig. 3. To find out which parametricdistribution reflects best the empirical distribution, we try to fit anumber of distributions such as: beta, Weilbull, exponential,gamma, Gaussian, inverse Gaussian, Pareto, Nakagami, etc. Movingwindows are defined as 1000 successful iterations i.e. iterations inwhich the better solution is found. Distribution of such movingwindows with best estimated maximum likelihood function isthen recorded. The array of recorded distributions is then dividedby quintiles into the following bins: 0–25%, 25–50%, 50–75%, 75–100%, which effectively describes the path to the global optimum.Based on maximum likelihood estimation, a set of 6 most frequentdistributions and allocation by bins are shown in Fig. 4.

According to this figure, in the final iterations of the algorithm,CR parameter values are taken from the beta distribution in nearly100% of cases, whereas in the first quintiles, in addition to the betadistribution there is also a significant occurrence of the exponen-tial distribution. Since the exponential distribution represents aspecial case of the beta distribution, this leads to the conclusionthat the beta distributions should be considered as the maindistribution of parameters CR throughout the entire life of thealgorithm and will be used as such in the following analysis.

3.3.3. Beta distributionIt should be noted that the structure of the distribution changes

depending on the strategy applied. In this research, the usage ofthe beta distribution is related to }DE=rand=1=bin} strategy. Betadistribution is defined as follows:

f ðx; α; βÞ ¼ xα�1ð1�xÞβ�1R 10 uα�1ð1�uÞβ�1du

¼ 1Bðα; βÞx

α�1ð1�xÞβ�1 ð9Þ

As shown at Fig. 3(a), the algorithm achieved the best resultsstarting from a beta distribution which has the feature favoringsmall values of CR for the continuous multimodal problem. In thisway, detailed search over the solution space is achieved. Thatcapacity, consistent with the multimodal definition of a problem,is archived for well-defined values of α; β:

α¼ 1; β41 ð10ÞIn the final phase, for all classes, empirical probability distribu-

tion converges to uniform, which is a special case of betadistribution defined for values:

α¼ 1; β¼ 1 ð11ÞWithin the unimodal class of problems, the distribution is

nearly uniform during the whole course of the optimization.The best results, for the beta distribution of the non-continuous

class of problems, are obtained by distributions that sample theentire domain of CR. In the final phase, the distribution shiftsslightly to the value of 1 i.e. distribution favors high values of CR.The non-continuous algorithm seeks out the best leap towardsbetter solutions by increasing number of dimensions in orderto skip a discontinuity in the solution space. In this case, the

(g)xi

(g)ui(g)xi

min

max

low CR

low C

R

high CRFig. 2. Rotated 2-dimensional multimodal problem.

(g)xilow CR

min

max

(g)ui

Fig. 1. Additive 2-dimensional multimodal problem.






parameters of the distribution are:

α41; β¼ 1 ð12ÞIt can be seen that the distributions are not completely

symmetrical for the small and large values of CR. Although theprobability distribution function favors large CR values, it still doesnot converge to zero even for small CR values i.e. it preserves theprobability of lower CR values at a certain positive level in allcases.

3.3.4. Two-sided power distributionThe beta distribution faces some difficulties related to its

maximum likelihood estimation for two parameters that do nothave a clear-cut meaning. Johnson [31] and Johnson and Kotz [32]tried with the neglected applications of triangular distributions asan alternative to the beta distribution. Johnson [31] used trian-gular distributions as a proxy for the beta distribution, specificallyin assessing risk and uncertainty, such as the project evaluationand review technique (PERT). The parameters of a triangulardistribution have a one-to-one correspondence with an optimisticestimate a, most likely estimate m and pessimistic estimate b,providing to the triangular distribution its intuitive appeal (see, forexample [33]). Similar to the beta distribution, the triangulardistribution can be positively or negatively skewed, but mustremain unimodal. It should be noted that there is no triangulardistribution which would reasonably approximate uniform,J-shaped or U-shaped distributions.

In this paper we investigate an extension of the three-parameter triangular distribution, so called the two-sided power

(TSP) distribution [34], as a proven alternative to the betadistribution. The four-parameter distribution used herein doesallow uniform, J-shaped and U-shaped forms.

The cumulative TSP a;m; b; parð Þ distribution function is

Fðxja;m; b;parÞ ¼m�ab�a

x�am�a

� �1=par; aoxom

1�b�mb�a

b� xb�m

� �1=par;moxob

8<:

9=; ð13Þ

3.4. Evaluation of Neighborhood search parameter

Based on the estimation of parameter mand applying theprocedure discussed in Appendix A, it may be assumed thatrða; bÞ ¼ 1; mða; bÞ ¼ Xð1Þ ¼ a and

parða; bÞ ¼ � log ½Mfa; b; rða; bÞg�s

ð14Þ

We applied the estimation of neighborhood parameters par to aset of continuous global optimization problems, trying to deter-mine the extent of parameter movement. The maximum like-lihood estimates are obtained by using the same procedure asmentioned in Section 3.3. In Fig. 6 the estimated values of theneighborhood parameters parand the limits of their movement areshown.

For certain problems (Schwefel, Rastrigin, MPE) neighborhoodparameter par starts from low-end values in the initial iterations.This indicates that crossover takes place in closer dimensionalneighborhoods meaning greater exploration of the solution spacein the initial phase. In the final iterations of the algorithm, larger

Continous unimodalContinous multimodal Non-Continous

Fig. 3. Empirical probability distribution functions for: (a) continuous multimodal problems, (b) continuous unimodal problems and (c) non-continuous problems.

0%10%20%30%40%50%60%70%80%90%

100%

first 25% 25%-50% 50%-75% 75%-100% first 25% 25%-50% 50%-75% 75%-100% first 25% 25%-50% 50%-75% 75%-100%

contionous multimodal contionious unimodal non continious

'nakagami' 'exponential' 'generalized pareto' 'weibull' 'beta' 'gamma'

Fig. 4. Allocation of probability distribution functions of the parameter CR by bins.






00.020.040.060.080.10.120.140.16

0.18

0.000.10

0.20

0.31

Parametric TSP distribution

00.050.10.150.20.250.30.350.40.45

0

0.35

0.7

Empirical distribution

Fig. 5. Family of TSP distribution over parameter par.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1001 2001 3001 4001 5001

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1001 2001 3001 4001

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1001 2001 3001 4001 5001

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 5001 10001 15001 20001

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1001 2001 3001 4001 5001 60010

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1001 2001 3001 4001 5001 6001 7001 8001

Fig. 6. Estimated neighborhood parameter par for: (a) Schwefel; (b) Rastrigin; (c) MPE; (d) Rosenbrock; (e) Griewank and (f) Ackley.

values of the neighborhood parameter par are preferred. Thisindicates that the algorithm is in the vicinity of the global optimumallowing crossover by several dimensions at once. In some problems(Griewank, Ackley, Rosenbrock), it is evident that the algorithmmoves from the higher values of the parameters par, indicating thatthe used strategy }DE=rand=1=bin} has no problem of local optimatrapping. In all cases, in the final iterations of the algorithm it isevident that values of the neighborhood parameter tend to 0.7.

Parameters a and b are defined by the limits of the crossoverparameter CR (see Eq. (4)), a¼ CRmin ¼ 0; b¼ CRmax ¼ 1. Theanalysis of the empirical distribution shape, shown in Fig. 5, incase of continuous multimodal distributions suggests m¼ 0, redu-cing Eq. (13) to a following form:

FðxjparÞ ¼ 1�ð1�xÞ1=par ; xA ½0;1� ð15ÞIn the continuous multimodal problem, the algorithm achieves

the best results starting from the TSP distribution with preferablesmall values of the neighborhood parameter par, resulting in amore detailed search over the solution space. However, in the finaliterations, neighborhood parameter par tends to 0.7, meaning thatall CR values are practically equally favored (Fig. 6).

3.5. DE-VNS pseudo-code

Finally, we present our DE-VNS pseudo-code in Algorithm 1.

Algorithm 1. DE-VNS pseudo-code

01 Setting initial values of parameters (see Sections 3.2 and 3.4):N, F ¼ ½0:4;0:6;0:8;1�, H ¼ 4, parmin ¼ 0, parmax ¼ 0:7,parð0Þ ¼ parmin, n0 ¼ 2, δ¼ 0:05, stepf actor ¼ 1=ð10 Dlog 2ðDÞÞ

02 Randomly initialize a population of N individuals PG ¼fXð0Þ

1 ; :::;Xð0ÞN g

03 Evaluate the population, find f ðxð0Þi Þ, i¼ 1; :::;N04 Setting initial roulette probability for the mutation parameter F 0ð Þ

05 g’106 WHILE stopping criteria are not met07 FOR i¼ 1 : N08 Calculate roulette probability ph ¼ nhþn0=∑ðnjþn0Þ;

h¼ 1;2; :::;H and obtain FðgÞi09 Sampling CRi

gð Þ from adaptive TSP distribution as

CRiðgÞ’1�ð1�χÞ1=parðgÞ ; χ �Unif ð0;1Þ

10 Applying strategy }DE=Rand�Local�Best=1=bin} withobtained F ðgÞi parameter

11 Applying crossover with obtained parameter CRigð Þ

12 Evaluate offspring vector f ðuðgÞi Þ

13 IF f ðuðgÞi Þo f ðxðgÞi Þ

14 parðgþ1Þ’maxðparmin; parðgÞ �ðf ðuðgÞi Þ� f ðxðgÞi ÞÞÞ

15 xðgþ1Þi ’uðgÞ

i16 ELSE17 parðgþ1Þ’minðparðgÞ þstepf actor; parmaxÞ18 END IF19 END FOR20 g’gþ121 END WHILE

The outer loop is defined by some usual stopping criteria, such asmaximum CPU time, until global optimum is reached, maximumnumber of function evaluations or iterations etc. The inner loop startsafter the steps that are used for initialization and evaluation of thepopulation and for setting initial roulette probability. In each itera-tion, roulette probability of mutation parameter F gð Þ

i is calculated andcrossover parameter CRi

gð Þ is sampled from TSP distribution. The nextsteps are applying strategy, and evaluating offspring vector u gð Þ

i . If the

offspring vector is better than the parent vector, neighborhoodparameter par gð Þ value is reduced based on improvement of fitnessfunction value. If offspring vector is not better, par gð Þ value isincreased by stepf actor value.

4. Computational results

In this section experimental settings for the benchmarks andDE heuristics used for comparison are explained and the numer-ical results are given.

4.1. Test functions

Seven usual benchmark functions listed in Table 1 are used forthe numerical experiments. The problems are selected from[28,29]. We shall focus on multimodal problems, but unimodalproblems will be also included in the analysis, in order todemonstrate the robustness of the proposed approach on a widerclass of problems.

4.2. DE based heuristics used in comparison

In addition to the DE-VNS approach, which is proposed in thispaper, we analyze the problem of global optimization and compareour algorithm with different variations of DE methods. Two commonversions of DE without parameter adaptation are taken into consid-eration: }DE=best=1=bin}(DeBest) and }DE=rand=1=bin}(DeRand).Three self-adapted algorithms with distinguished characteristics arealso chosen: SaDE, JADE, and CoDE.

4.2.1. DeRandSince it is well known that }DE=rand=1=bin} version of the

strategy has good searching possibility of the solution space andtherefore, better performance in solving multimodal global opti-mization problem, we use one variant of this strategy for thecomparison. Used values of control parameters are F ¼ 0:5 andCR¼ 0:3, which proved to be promising strategy according to Qinet al. [35].

4.2.2. DeBestAlthough the main focus of this paper is global multimodal

problems, }DE=best=1=bin}with defined control parameters F ¼ 0:5and CR¼ 0:3 is also taken into consideration because this algo-rithm is well-known and is commonly used for comparison. Thevalues of F and CR are chosen based on good behavior of}DE=rand=1=bin} at these values of control parameters.

4.2.3. SaDESelf-adaptive DE (SaDE) algorithm [35] is one of the state-of-art

algorithms that avoid the expensive computational costs ofsearching for the most appropriate trial vector generation strategy,as well as its associated parameter values by a trial-and-errorprocedure. During the evolution, one strategy will be chosen fromthe candidate pool. The choice is made according to probabilitylearned from the previous experience in generating promisingsolutions. The parameter Fi is approximated by a normal distribu-tion with mean value 0.5 and standard deviation 0.3. ParameterCRi has the normal distribution NðCRm;0:1Þ, where CRm is initi-alized as 0.5. To adapt CRi to the proper values, the authors updateCRm every 25 generations, based on the recorded successful CRi

values since the last CRm update.






4.2.4. JADEZhang and Sanderson [19] have presented their DE algorithm

named JADE. The main contribution of JADE is the implementationof a new mutation strategy }DE=current�to�pbest}. This strategyhas an optional external archive and updating control parametersin an adaptive manner. Crossover probability CRi is randomlytaken from a normal distribution of mean μCR and standarddeviation 0.1, and then truncated to [0, 1]. μCR is initialized to be0.5 and then updated at each generation in accordance withpreviously successful values of crossover parameter. The mutationparameter Fi is randomly taken from a Cauchy distribution withlocation parameter μF and scale parameter 0.1. The locationparameter μF is initialized to be 0.5 and then updated at eachgeneration in accordance with previously successful values ofmutation parameter.

4.2.5. CoDEWang et al. [36] proposed a method based on Differential

Evolution, called composite DE (CoDE). This method uses threetrial vector generation strategies and three control parametersettings randomly combining them to generate trial vectors.The three control parameter settings are: ½F ¼ 1:0;CR¼ 0:1�,½F ¼ 1:0;CR¼ 0:9�, and ½F ¼ 0:8;CR¼ 0:2�. In each generation, eachof these trial vector generation strategies is used to create a newtrial vector with a control parameter setting randomly chosenfrom the parameter candidate pool.

4.2.6. DE-VNSThe following values for the maximum and minimum value of

parwere used – parmin ¼ 0 and parmax ¼ 0:7. We have used anarbitrary step defined as

stepf actor ¼ 110D log 2ðDÞ

ð16Þ

Parameters a, b, and m were defined in Section 3.4 as a¼ 0,b¼ 1, m¼ 0. For competitive settings we use F ¼ ½0:4;0:6;0:8;1�with defined n0 ¼ 2; δ¼ 0:05.

4.2.7. Common parameter valuesFor all DE variants used in this paper, the maximum number of

parameter evaluation eval max and population pop were set as it isshown in Table 2.

Experiments are performed on Pentium dual core computerwith 4 GB of memory, using the MATLAB.

4.3. Comparison

For the purpose of analyzing the behavior of the proposedalgorithm, we focus on the unconstrained continuous multimodalglobal optimization problems. In order to show the robustness ofthe proposed algorithm, we consider a number of dimensions (D)

ranges from 10 to 100 in order to cover lower and mediumdimension problems. A value of 1e�6 is used as the predefinedtolerance around the global optimum. The other stopping condi-tion is the maximum number of function evaluations, that is50,000nD for Rosenbrock’s function and 10,000nD for otherfunctions. Each problem is repeated 25 times in order to obtaincredible data.

In addition to the number of function evaluations FEs, we arealso interested in success rate (SR), or the percentage of success-fully achieved tolerance around the global optimum.

Table 1Benchmark functions used in experimental study.

Name Function Search range Global optimum x*

Schwefel f 1 ¼ ∑ni¼1 −xi sin

ffiffiffiffiffiffiffiffiffiffixijÞ��

q�h −500≤xi ≤500 f ðx⁎Þ ¼ −418:9829⋅D

Ackley f 2 ¼ −20 exp −0:2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n∑

ni¼0x

2i

q� �−exp

ffiffi1n

q∑n

i¼0 cos ð2πxiÞ� �

þ 20 −32≤xi ≤32 x⁎ ¼ ð0; 0; …; 0Þ; f ðx⁎Þ ¼ 0

Griewank f 3 ¼ 14000∑

ni¼1x

2i −∏

ni¼1cos

xiffii

p� �

þ 1 −600≤xi ≤600 x⁎ ¼ ð0; 0; …; 0Þ; f ðx⁎Þ ¼ 0

Rastrigin f 4 ¼ 10 nþ∑ni¼1ðx2i −10 cos ð2πxiÞÞ −5:12≤xi ≤5:12 x⁎ ¼ ð0; 0; …; 0Þ; f ðx⁎Þ ¼ 0

MPEf 5 ¼ ∑n

i¼1 1þ cos ð3xiÞ þ ð−1Þiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi10:60099896−4:141720682ñ cos ðxi Þ

p�

0≤xi ≤5 f ðx⁎Þ ¼ −0:0411183034⋅D

Rosenbrock f 6 ¼ ∑n−1i¼1 100ðxiþ1−x2i Þ2 þ ð1−xiÞ2h i

−5≤xi ≤5 x⁎ ¼ ð1; 1; :::; 1Þ; f ðx⁎Þ ¼ 0

Sphere f 7 ¼ ∑ni¼1x

2i

−1≤xi ≤1 x⁎ ¼ ð0; 0; …; 0Þ; f ðx⁎Þ ¼ 0

Table 2eval max and pop parameters for DE.

Dimensions eval maxa Population

10 1.0Eþ05–5.0Eþ05 3420 2.0Eþ05–1.Eþ06 4430b 1.5Eþ06 5050 5.0Eþ05–2.5Eþ06 80

100 1.0Eþ06 100

a Depending on the applied problem.b Just for the Rosenbrock’s function.

Table 3Results for Schwefel’s function.

Problem Dim Method Eval min Eval avg Eval max SR (%) fmin

f1 10 DeRand 9,001 9,441 9,901 100 1.00E�06DeBest 6,601 53,771 100,000 48 8.29Eþ01JADE 9,001 56,401 100,000 52 7.11Eþ01SaDE 12,001 12,101 13,001 100 1.00E�06CoDE 25,001 25,801 27,001 100 1.00E�06DE-VNS 7,901 8,901 10,101 100 1.00E�06

20 DeRand 23,601 24,601 25,601 100 1.00E�06DeBest 38,601 77,391 200,000 80 2.37Eþ01JADE 31,001 150,401 200,000 32 9.48Eþ01SaDE 25,001 27,601 29,001 100 1.00E�06CoDE 76,001 80,101 85,001 100 1.00E�06DE-VNS 21,201 23,431 27,301 100 1.00E�06

50 DeRand 94,301 101,161 105,301 100 1.00E�06DeBest 500,000 500,000 500,000 0 1.19Eþ06JADE 102,001 263,901 500,000 60 5.92Eþ01SaDE 84,001 87,501 93,001 100 1.00E�06CoDE 338,001 356,601 376,001 100 1.00E�06DE-VNS 70,401 79,081 89,101 100 1.00E�06

100 DeRand 350,501 385,181 408,701 100 1.00E�06DeBest 1,000,000 1,000,000 1,000,000 0 2.10Eþ04JADE 223,001 615,601 1,000,000 52 1.07Eþ02SaDE 220,001 233,901 249,001 100 1.00E�06CoDE 1,000,000 1,000,000 1,000,000 0 3.24Eþ03DE-VNS 172,201 189,791 210,001 100 1.00E�06






In Tables 3–9, the results of the test are shown. Gray areas indicatethat, for a given problem, the tolerance around the global optimum isnot reached in all 100% of cases. Tables also reflect the basic statisticsfor function evaluations: minimum FEs number (eval min), averageFEs number (eval avg) and maximum FEs number (eval max).Another statistic that we are interested in is named fmin and itpresents tolerance, in cases where tolerance around the globaloptimum is achieved, or the difference between the average costfunction and known global optimum in other cases. The best resultsare marked in bold fields, provided that the heuristic achieved SRpercentage of 100%.

Table 10 refers to overall scores (OS) of compared heuristics foreach of selected algorithms. This measure is calculated as follows:

OS¼ log ∑D

eval avgD

fmin

!;

D¼ 10;20;30;50 for RosenbrockD¼ 10;20;50;100 in other cases

(

ð17Þ

The overall score is the logarithm of the sum of elements thatcharacterize optimization for the selected dimensional problem.The average number of function evaluations eval avg is divided bythe number of dimensions, since it is important that both small

Table 4Results for Ackley’s function.

Problem Dim Method Eval min Eval avg Eval max SR % fmin

f2 10 DeRand 11,401 11,771 12,101 100 1.00E�06DeBest 8,701 9,411 10,001 100 1.00E�06JADE 9,001 9,801 10,001 100 1.00E�06SaDE 9,001 9,901 10,001 100 1.00E�06CoDE 30,001 30,501 31,001 100 1.00E�06DE-VNS 9,401 10,441 13,501 100 1.00E�06

20 DeRand 27,501 28,211 29,001 100 1.00E�06DeBest 53,801 55,211 58,001 100 1.00E�06JADE 15,001 16,201 18,001 100 1.00E�06SaDE 19,001 20,601 21,001 100 1.00E�06CoDE 80,001 83,101 86,001 100 1.00E�06DE-VNS 22,701 26,241 30,001 100 1.00E�06

50 DeRand 94,301 96,011 97,501 100 1.00E�06DeBest 500,000 500,000 500,000 0 6.34E�02JADE 30,001 78,001 500,000 88 8.70E�02SaDE 500,000 500,000 500,000 0 1.25Eþ00CoDE 228,001 239,601 248,001 100 1.00E�06DE-VNS 78,101 86,231 98,901 100 1.00E�06

100 DeRand 293,201 297,981 304,001 100 1.00E�06DeBest 1,000,000 1,000,000 1,000,000 0 1.20Eþ01JADE 1,000,000 1,000,000 1,000,000 0 1.53Eþ00SaDE 1,000,000 1,000,000 1,000,000 0 2.55Eþ00CoDE 444,001 457,901 470,001 100 1.00E�06DE-VNS 182,801 204,521 237,601 100 1.00E�06

Table 5Results for Griewank’s function.





100 DeRand 212,601 216,321 218,501 100 1.00E�06DeBest 1,000,000 1,000,000 1,000,000 0 4.11Eþ02JADE 43,001 332,001 1,000,000 72 9.30E�03SaDE 100,001 195,401 1,000,000 88 1.72E�03CoDE 320,001 330,301 339,001 100 1.00E�06DE-VNS 158,801 168,961 180,701 100 1.00E�06

Table 6Results for Rastrigin’s function.



20 DeRand 73,601 78,551 86,601 100 1.00E�06DeBest 200,000 200,000 200,000 0 4.41Eþ01JADE 33,001 34,301 35,001 100 1.00E�06SaDE 34,001 37,201 42,001 100 1.00E�06CoDE 117,001 124,901 132,001 100 1.00E�06DE-VNS 25,001 27,811 31,101 100 1.00E�06

50 DeRand 500,000 500,000 500,000 0 1.35Eþ02DeBest 500,000 500,000 500,000 0 3.59Eþ02JADE 108,001 111,201 114,001 100 1.00E�06SaDE 125,001 206,401 500,000 80 1.99E�01CoDE 500,000 500,000 500,000 0 3.04Eþ01DE-VNS 85,001 89,741 95,601 100 1.00E�06

100 DeRand 1,000,000 1,000,000 1,000,000 0 5.35Eþ02DeBest 1,000,000 1,000,000 1,000,000 0 1.03Eþ03JADE 231,001 247,801 257,001 100 1.00E�06SaDE 1,000,000 1,000,000 1,000,000 0 3.28Eþ00CoDE 1,000,000 1,000,000 1,000,000 0 2.12Eþ02DE-VNS 200,101 220,541 243,701 100 1.00E�06

Table 7Results for MPE function.



20 DeRand 200,000 200,000 200,000 0 4.44E�02DeBest 200,000 200,000 200,000 0 1.56Eþ00JADE 27,001 97,601 200,000 60 8.22E�02SaDE 25,001 43,501 200,000 92 8.17E�03CoDE 69,001 73,101 77,001 100 1.00E�06DE-VNS 16,801 19,121 20,901 100 1.00E�06

50 DeRand 500,000 500,000 500,000 0 9.03Eþ00DeBest 500,000 500,000 500,000 0 1.75Eþ01JADE 89,001 131,901 500,000 80 8.23E�02SaDE 88,001 460,601 500,000 8 8.20E�02CoDE 372,001 390,201 402,001 100 1.00E�06DE-VNS 52,701 59,221 66,701 100 1.00E�06

100 DeRand 1,000,000 1,000,000 1,000,000 0 4.07Eþ01DeBest 1,000,000 1,000,000 1,000,000 0 5.27Eþ01JADE 190,001 520,901 1,000,000 60 3.22E�02SaDE 1,000,000 1,000,000 1,000,000 0 3.04E�01CoDE 1,000,000 1,000,000 1,000,000 0 4.23Eþ00DE-VNS 122,901 143,831 160,701 100 1.00E�06






and large dimensional problems have the same importance. Fail-ure to reach the global minimum is penalized by fmin. The successrate is not included in the formula, so there is no double counting.Optimization is considered better if OS value is lower. Thelogarithm is used for scaling the results.

As previously mentioned, DE-VNS is developed for multimodalproblems wherein it achieves the best results. On hard optimiza-tion problems, such as MPE, Schwefel and Rastrigin, our algorithmachieves the smallest number of FEs for all dimensions. Also,DE-VNS performs excellent on medium scale problems. For maximaldimension (100D), on 6 of 7 test functions (Rosenbrock’s and all ofmultimodal functions) it has proved to be the fastest algorithm.On unimodal functions JADE outperforms all the selected algorithms,except on 50 dimensional Rosenbrock’s function. According to OS,DE-VNS demonstrates the best results on 5 of 7 selected problems.

It should be emphasized that our DE-VNS is the most robustalgorithm of all considered, since it is the only algorithm thatscores SR 100% on all test problems (on all test functions, for alldimensions). CoDE algorithm is the second best in terms ofrobustness, with a 100% success rate achieved in 23 of 28problems.

In Table 11 all algorithms are sorted in accordance to thecorresponding rank. The right end column reflects the arithmeticmean of rankings. For DE-VNS this value is 1.43; JADE performedwell at all test functions, and its arithmetic mean of rankings is2.86. Overall performance results of SaDE, DeRand and CoDE aresimilar. DeBest algorithm converges prematurely at multimodalproblems, and it is ranked last in our comparison.

For each function, we compared the speed of convergence forthe four self-adaptive methods: DE-VNS, JADE, SaDE, and CoDE.For the comparison, we used the convergence path that is closestto the mean number of function evaluations. Convergences formaximal dimension by respective problems are presented in Fig. 7.

Convergence for Schwefel’s function, presented in Fig. 7a,indicates that DE-VNS and SaDE converge fast toward the solution.For Rastrigin, DE-VNS and JADE converge rapidly. In these pro-blems, CoDE algorithm shows slow convergence rate. On the otherhand, CoDE algorithm almost always converged toward the solu-tion but its convergence speed varies from very slow at Schwefeland Rastrigin, to much faster, especially at 100D Greiwank and 50DRosenbrock.

DE-VNS proved to have the greatest convergence rate for allmultimodal problems. JADE algorithm is faster for 50D Griewank,but it does not converge for 100D. In the tests no other algorithmexcept the one we proposed in this paper converges toward theglobal solution for every chosen test function.

5. Conclusion

In this paper, we present a new approach for solving thecontinuous global optimization problem. Our method, called DE-VNS, uses Variable neighborhood search (VNS) idea within Differ-ential Evolution (DE) heuristic. We found that DE crossover

Table 8Results for Rosenbrock’s function.



20 DeRand 1,000,000 1,000,000 1,000,000 0 4.94E�01DeBest 551,001 572,601 598,001 100 1.00E�06JADE 38,001 43,301 52,001 100 1.00E�06SaDE 1,000,000 1,000,000 1,000,000 0 8.00E�01CoDE 193,001 202,301 218,001 100 1.00E�06DE-VNS 183,001 234,301 415,001 100 1.00E�06

30 DeRand 1,500,000 1,500,000 1,500,000 0 4.58Eþ00DeBest 1,500,000 1,500,000 1,500,000 0 7.76Eþ00JADE 70,001 79,401 96,001 100 1.00E�06SaDE 1,500,000 1,500,000 1,500,000 0 2.18Eþ00CoDE 410,001 426,401 446,001 100 1.00E�06DE-VNS 281,001 424,301 516,001 100 1.00E�06

50 DeRand 2,500,000 2,500,000 2,500,000 0 1.54Eþ01DeBest 2,500,000 2,500,000 2,500,000 0 2.16Eþ01JADE 157,001 645,901 2,500,000 80 7.91E�01SaDE 2,500,000 2,500,000 2,500,000 0 9.48Eþ00CoDE 853,001 1,167,601 2,500,000 84 3.99E�01DE-VNS 729,001 1,142,001 1,442,001 100 1.00E�06

Table 9Results for Sphere function.





100 DeRand 224,001 226,901 230,001 100 1.00E�06DeBest 1,000,000 1,000,000 1,000,000 0 4.22Eþ02JADE 44,001 45,401 48,001 100 1.00E�06SaDE 104,001 110,701 117,001 100 1.00E�06CoDE 331,001 338,101 344,001 100 1.00E�06DE-VNS 153,001 165,001 189,001 100 1.00E�06

Table 10Overall scores of compared heuristics.

Method f1 f2 f3 f4 f5 f6 f7

DE-VNS �2.256 �2.213 �2.179 �2.190 �2.370 �1.432 �2.304DERand �2.094 �2.126 �2.185 6.826 5.691 5.738 �2.255JADE 6.319 4.189 1.490 �2.108 2.895 3.708 �2.682SaDE �2.175 4.580 1.902 4.527 3.581 5.535 �2.471CoDE 7.511 �1.781 �1.767 6.385 4.626 3.668 �1.928DEBest 10.084 5.083 6.614 7.156 5.856 5.888 6.625

Table 11Rankings of compared heuristics.

Method f1 f2 f3 f4 f5 f6 f7 Avg. rank

DE-VNS 1 1 2 1 1 1 3 1.43JADE 4 4 4 2 2 3 1 2.86SaDE 2 5 5 3 3 4 2 3.43DERand 3 2 1 5 5 5 4 3.57CoDE 5 3 3 4 4 2 5 3.71DEBest 6 6 6 6 6 6 6 6.00






1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

0.E+00

DE-VNS SaDE JaDE CoDE

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02


1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

0.0E+00


1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

0.E+00


1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

0.0E+00


1.E-06

1.E-04

1.E-02

1.E+00

1.E+02

1.E+04

1.E+06

1.E+08

1.E+10

0.0E+00


1.E-061.E-051.E-041.E-031.E-021.E-011.E+001.E+011.E+021.E+031.E+041.E+051.E+06

0.E+00


1.E+05 2.E+05 3.E+05 4.E+05 5.E+05 6.E+05 0.E+00 1.E+05 2.E+05 3.E+05 4.E+05 5.E+05 6.E+05

1.0E+05 2.0E+05 3.0E+05 4.0E+05 2.E+05 4.E+05 6.E+05 8.E+05 1.E+06

2.0E+05 4.0E+05 6.0E+05 8.0E+05 1.0E+06 5.0E+05 1.0E+06 1.5E+06 2.0E+06 2.5E+06

1.E+05 2.E+05 3.E+05

Fig. 7. Convergence for: (a) Schwefel 100D; (b) Ackley 100D; (c) Greiwank 100D; (d) Rastrigin 100D; (e) MPE 100D; (f) Rosenbrock 50D and (g) Sphere 100D.






parameter CR depends on the stage of the algorithm. At thebeginning of the search, its value should be different than at theend or in the middle of the search. However, we experimentallyshow that CR can be chosen as a random variable with Betadistribution (and its approximate Two-sided power distribution),having its different parameter values in different stages of thesearch. As a consequence, solutions that potentially form a newgeneration are chosen from the different neighborhoods that aresystematically changed, as in VNS. In one word, in our new DE-VNS heuristic, the change of neighborhoods is used for automaticestimation of the crossover parameter CR. Numerical results showvery good performance of our new heuristic: (i) the success rate onall instances proved to be 100%; (ii) the computational efforts,measured as the number of function evaluations before the globaloptimum is reached, proved to be consistently smaller than effortsobtained by other state-of-the-art methods; (iii) exceptionalperformance of DE-VNS on higher-dimensional problems wereevident. This last property may be an indication of the futureresearch, namely comparing DE-VNS against other heuristics onlarge scale optimization problems. Future work may also includeself-adapting jointly mutation and crossover parameters usingVNS mechanism, i.e., their automatic change during the searchprocess.

Appendix A. Maximum likelihood estimation of TSPdistribution

Let X be a random variable with probability density functiongiven as

f ðxja;m; b; parÞ ¼ð1=parÞb�a

x�am�a

� �ð1�parÞ=par; aoxom

ð1=parÞb�a

b� xb�m

� �ð1�parÞ=par; moxob

8<: ðA:1Þ

where armrb; parZ0.The random variable X is said to follow a TSP distribution.

Its cumulative distribution function can be expressed as follows:

Fðxja;m; b; parÞ ¼m�ab�a

x�am�a

� �1=par; aoxom

1�b�mb�a

b� xb�m

� �1=par;moxob

8<: ðA:2Þ

For a random sample X ¼ ðx1; x2; :::; xsÞ from TSP distribution, letthe order statistics bexð1Þoxð2Þo⋯oxðsÞ. Utilizing expression(A.1), the likelihood for X (by definition) is

LðX; a;m; b; parÞ ¼ 1parðb�aÞ

� s

HðX; a;m; bÞ1=ðpar�1Þ ðA:3Þ

where

H X; a;m; bð Þ ¼∏r

i ¼ 1xðiÞ �a� �

∏s

i ¼ rþ1b�xðiÞ� �

m�að Þr b�mð Þs� r ðA:4Þ

Xð1Þ ¼ a; XðsÞ ¼ b and r is implicitly defined by XðrÞrmr Xðrþ1Þ.From (A.3) it follows that two-parameter MLE procedure max-imizing equation as a function of m and par (with a and b fixed) istwo stage optimization problem, namely we may first determinem at which Eq. (A.4) attains its maximum as a function of m. Nextutilizingm, we may calculate par, maximizing LðX; a;m; b; parÞ as afunction of par. Van Dorp and Kotz [34] proved that Eq. (A.4)attains its maximum, for fixed a and b, at one of the order statisticsðXð1Þ;Xð2Þ; :::;XðsÞÞ. Specifically,mða; bÞ ¼ Xðrða;bÞÞ ðA:5Þ

rða; bÞ ¼ argmaxrA f1;2;:::;sgfMða; b; rÞg ðA:6Þ

M a; b; rð Þ ¼ ∏r�1

i ¼ 1

XðiÞ �aXðrÞ �a

∏s

i ¼ rþ1

b�XðiÞb�XðrÞ

ðA:7Þ

Utilizing HfX; a; mða; bÞ; bg ¼Mfa;b; rða;bÞg, the maximum like-lihood estimator of n, nða; bÞ, maximizing (A.3) together with themaximum likelihood estimator of m, mða; bÞ in (A.5) is

parða; bÞ ¼ � log ½Mfa; b; rða; bÞg�s

ðA:8Þ

References

[1] Chelouah R, Siarry P. A continuous genetic algorithm designed for the globaloptimization of multimodal functions. J Heuristics 2000;6(2):191–213.

[2] Hedar AR, Fukushima M. Tabu search directed by direct search methods fornonlinear global optimization. Eur J Oper Res 2006;170(2):329–49.

[3] Locatelli M. Simulated annealing algorithms for continuous global optimiza-tion: Convergence conditions. J Optim Theory Appl 2000;4(1):121–33.

[4] Mladenović N, Perez M, Vega J. A chain-interchange heuristic method. Yugosl JOper Res 1996;6:41–54.

[5] Bierlaire M, Thémans M, Zufferey NA. Heuristic for nonlinear global optimiza-tion. INFORMS J Comput 2010;22(1):59–70.

[6] Smith JE, Fogarty TC. Operator and parameter adaptation in genetic algo-rithms. Soft Comput 1997;1(2):81–7.

[7] Storn R, Price K. Differential Evolution: a simple and efficient adaptive schemefor global optimization over continuous spaces. J Glob Optim 1997;11:341–59.

[8] Wang L, Pan QP, Suganthan PN, Wang WH, Wang YM. A novel hybrid discretedifferential evolution algorithm for blocking flow shop scheduling problems.Comput Oper Res 2010;37(3):509–20.

[9] Pan QP, Wang L, Qian B. A novel differential evolution algorithm for bi-criteriano-wait flow shop scheduling problems. Comput Oper Res 2009;36(8):2498–511.

[10] Salman A, Ahmad I, Omran MGH, Mohammad MG. Frequency assignmentproblem in satellite communications using differential evolution. ComputOper Res 2010;37(12):2152–63.

[11] Vesterstroem J, Thomsen R. A comparative study of differential evolution,particle swarm optimization, and evolutionary algorithms on numericalbenchmark problems. In: Proceedings of IEEE congress on evolutionarycomputation; 2004. p. 1980–7.

[12] Sun J, Zhang Q, Tsang E. DE/EDA: a new evolutionary algorithm for globaloptimization. Inf Sci 2004;169:249–62.

[13] Liu J, Lampinen J. On setting the control parameter of the differential evolutionmethod. In: Proceedings of the 8th international conference on soft comput-ing; 2002. p. 11–8.

[14] Yao X, Liu Y, Lin G. Evolutionary programming made faster. IEEE Trans EvolComput 1999;3(2):82–102.

[15] Lee CY, Yao X. Evolutionary programming using mutations based on the Levyprobability distribution. IEEE Trans Evol Comput 2004;8(1):1–13.

[16] Pan QP, Suganthan PN, Wang L, Gao L, Mallipeddi R. A differential evolutionalgorithm with self-adapting strategy and control parameters. Comput OperRes 2011;38(1):394–408.

[17] Brest J, Greiner S, Boskovic B, Mernik M, Zumer V. Self-adapting controlparameters in Differential Evolution: a comparative study on numericalbenchmark problems. IEEE Trans Evol Comput 2006;10(6):646–57.

[18] Omran MGH, Salman A, Engelbrecht AP. Self-adaptive Differential Evolution.Comput Intell Secur Lect Notes Comput Sci 2005;37(12):192–9.

[19] Zhang J, Sanderson AC. Adaptive differential evolution with optional externalarchive. IEEE Trans Evol Comput 2009;13(5):945–58.

[20] Mladenovic N, Hansen E. Variable neighborhood search. Comput Oper Res1997;24(1):1097–100.

[21] Mladenovic N, Drazic M, Kovacevic-Vujcic V, Cangalovic M. General variableneighborhoods search for the continuous optimization. Eur J Oper Res2008;191(3):753–70.

[22] Carrizosa E, Drazic M, Drazic Z, Mladenovic N. Gaussian variable neighborhoodsearch for continuous optimization. Comput Oper Res 2012;39:2206–13.

[23] Yang Z., Tang K., Yao X. Self-adaptive differential evolution with neighborhoodsearch. In: Proceedings of IEEE Congress on Evolutionary Computation(CEC-2008); 2008. p. 1110–6.

[24] Hansen P, Mladenović N, Moreno Pérez JA. Variable neighbourhood search:methods and applications. Ann Oper Res 2010;175(1):367–407.

[25] Mezura-Montes E, Velazquez-Reyes J, Coello Coello CA. A comparative study ofdifferential evolution variants for global optimization. In: Proceedings ofthe Genetic and Evolutionary Computation Conference (GECCO ’06); 2006.p. 485–92.

[26] Price K, Storn R, Lampinen J. Differential Evolution: a practical approach toglobal optimization. Berlin: Springer-Verlag; 2005.

[27] Tvrdik J. Differential Evolution with competitive setting of its control para-meters. TASK q 2007;11:169–79.

[28] Salomon R. Re-evaluating genetic algorithm performance under coordinaterotation of benchmark functions: a survey of some theoretical and practicalaspects of genetic algorithms. Biosystems 1996;39(3):263–78.

[29] Lavor C, Maculan N. A function to test methods applied to global minimizationof potential energy of molecules. Num Algorithms 2004;35:287–300.



http://refhub.elsevier.com/S0305-0548(13)00360-2/sbref1




























































[30] Gensane T. Dense packings of equal spheres in a cube. Electron J Comb2004;11(1):1–17.

[31] Johnson D. The triangular distribution as a proxy for the beta distribution inrisk analysis. The Statistician 1997;46(3):387–98.

[32] Johnson NL, Kotz S. Nonsmooth sailing or triangular distributions revisitedafter some 50 years. The Statistician 1999;48(2):179–87.

[33] Williams TM. Practical use of distributions in network analysis. J Oper Res Soc1992;43(3):265–70.

[34] Van Dorp J, Kotz SA. Novel extension of the triangular distribution and itsparameter estimation. The Statistician 2002;51(1):63–79.

[35] Qin AK, Huang VL, Suganthan PN. Differential evolution algorithm withstrategy adaptation for global numerical optimization. IEEE Trans Evol Comput2009;13(2):398–417.

[36] Wang Y, Cai Z, Zhang Q. Differential evolution with composite trial vectorgeneration strategies and control parameters. IEEE Trans Evol Comput 2011;15(1):55–66.






















de-vns: self-adaptive differential evolution with crossover neighborhood search for continuous...

Documents