a model-adaptive evolutionary algorithm for optimization

ORIGINAL ARTICLE

Artif Life Robotics (2012) 16:546–550 © ISAROB 2012DOI 10.1007/s10015-011-0987-8

Yoel Tenne · Kazuhiro Izui · Shinji Nishiwaki

A model-adaptive evolutionary algorithm for optimization

2. Each simulation run is expensive, i.e., has a lengthy exe-cution time This implies that only a small number of function evaluations can be made.

Accordingly, such problems are often referred to as expensive optimization problems.1 Owing to these diffi cul-ties, classical optimization algorithms may perform poorly, and this has motivated researchers to develop and apply to such problems a computational intelligence (CI) or nature-inspired optimization algorithms. A class of such algorithms which have performed well in these challenging settings are the model-assisted evolutionary algorithms, which combine two powerful approaches:

1. The evolutionary algorithm (EA) is a robust, gradient-free optimizer which can perform well on nonconvex objective landscapes.

2. Models, also referred to as surrogates or meta-models, are computationally cheaper approximations of the true expensive function, and are typically interpolants trained using previously evaluated solutions. During the optimi-zation search, the optimizer receives the objective value predicted by the model at a fraction of the computa-tional cost when compared to using the true expensive function. While an EA typically requires many thou-sands of evaluations, models allow to use them effi ciently on a limited computational budget. Examples of models include quadratics, artifi cial neural networks (ANN), Kriging, and radial basis functions (RBF).2

However, models introduce several new challenges into the optimization search. First, they are inherently inaccu-rate, as they are trained using small samples due to the expensive evaluations. A severely inaccurate model not only impairs the search, but may even introduce false optima and lead the optimizer to a poor fi nal solution. Accordingly, the optimizer must manage this inaccuracy to ensure convergence to a good true optimum.3 Second, numerous types of models exist, but owing to the limited computational budget, it is impractical to identify the optimal type by experiments. Also, a priori fi xing of the

Abstract Many applications in engineering and science rely on the optimization of computationally expensive functions. A successful approach in such scenarios is to couple an evolutionary algorithm with a mathematical model which replaces the expensive function. However, models intro-duce several diffi culties, such as their inherent inaccuracy, and the diffi culty of matching a model to a particular problem. To address these issues, this paper proposes a model-based evolutionary algorithm with two main imple-mentations: (a) it combats model inaccuracy with a tailored trust-region approach to manage the model during the search, and to ensure convergence to an optimum of the true expensive function, and (b) during the search it con-tinuously selects an optimal model type out of a set of candidate models, resulting in a model-adaptive optimiza-tion search. Extensive performance analysis shows the effi -cacy of the proposed algorithm.

Key words Expensive optimization problems · Evolution-ary algorithm · Modeling · Model selection

1 Introduction

Nowadays researchers replace real-world laboratory exper-iments with computer simulations in order to reduce the time and cost of the engineering design process. In this set-up, the design process is effectively an optimization problem which has two distinct features.

1. Objective values are obtained from the simulation, which is often a legacy code or a commercial software available only as an executable. As such, the simulation is treated as a black-box function, i.e., a function with no analytic expression.

Received and accepted: August 15, 2011

Y. Tenne (*) · K. Izui · S. NishiwakiDepartment of Mechanical Engineering and Science Faculty of Engineering, Kyoto University, Kyoto, Japane-mail: [email protected]

547

model type may result in an unsuitable model, and can thus degrade the search effi cacy. To address this issue, this article proposes a model-assisted EA which both manages the model accuracy and selects an optimal model during the search. The remainder of the article is as follows. Section 2 describes the proposed algorithm, Sect. 3 gives the perfor-mance analysis showing the effi cacy of the proposed algo-rithm, and lastly Sect. 4 summarizes the article.

2 Proposed algorithm

To address the diffi culties described, we propose an EA which uses a trust-region (TR) framework to manage the model’s accuracy and to ensure convergence to an optimum of the true expensive objective function. To further improve performance, the algorithm employs model selection to identify the most suitable model type out of a set of candi-date models. The following sections describe the proposed algorithm.

2.1 Modeling and model selection

As discussed above, a priori fi xing of the model type may hamper the optimization search, and therefore it is neces-sary to develop algorithms which use many types of model. Gorissen et al.4 proposed an involved set-up for adaptively selecting models. Their approach used a genetic algorithm (GA) in an island topology, where different models evolved concurrently, and the GA eventually yielded the best per-forming model for the given problem. Zhou et al.5 also considered multiple models, and used a memetic algorithm with different models, where a trial step was taken with each model in turn. Based on the success of the steps, their algo-rithm selected an optimal model. This approach relies on performing at least one trial step and a correspond-ing expensive function evaluation with each model. Meckesheimer et al.6 compared several methods for esti-mating model accuracy, but did not consider adapting the model during the search.

In contrast to these studies, the algorithm proposed in this study adaptively selects the model type during the search, and uses a procedure which is both mathematically rigorous and computationally effi cient. The procedure readily generalizes to any number or types of model, and for the specifi c implementation in this paper we considered two candidate models.

(1) Kriging. The model combines a global “drift” function with a local correction based on the correlation between the interpolation points. The model exactly interpolates the objective function at the sample points. Using a con-stant drift function2 gives the Kriging model

m = +x x( ) ( )β κ , (1)

with the drift function β and point-wise local correction κ(x). The latter is defi ned by a stationary Gaussian process with mean zero and covariance

Cov Rκ κ σ θx y x y( ) ( )( ) = ( ), , (2)

where R is the correlation between vectors x and y. A common choice for the correlation is the Gaussian function which is defi ned as

R = x yi ii

dθ θ, , ,x y( ) − −( )( )=∏ exp 2

1 (3)

where d is the dimension of the vectors, and combining Eq. 3 with the constant drift function transforms the model from Eq 1 into

m x =( ) ( ) −( )−r x R fT 1 1 ˆ ,β (4)

where β̂ is the estimated drift coeffi cient, f is the vector of objective values, and 1 is a vector with all elements equal to 1. rT(x) is the correlation vector between a new vector x and the sample vectors, namely,

r x x x xTn= R Rθ θ, , , . . . , , ,1( ) ( )[ ] (5)

The estimated drift coeffi cient β̂ and variance σ̂2 are obtained by maximizing the model likelihood.

(2) Radial basis functions (RBF). The model approximates the objective function by a superposition of basis func-tions of the form

φi =x x xi( ) −( ) (6)

where xi is an interpolation point, and the full model is then given by

m = + ci ix x( ) ( )∑α φ (7)

where αi and c are coeffi cients determined from the inter-polation conditions

m = fx xi i( ) ( ) (8a)

α i =∑ 0 (8b)

In this study, the Gaussian basis function was used, which is defi ned as

φσi =x

x xi( ) − −⎛⎝

⎞⎠exp (9)

To select which model to use, namely, Kriging or RBF, the proposed algorithm depends on statistical model selection theory, which defi nes an estimator for the model accuracy. The model chosen is the one having the best estimated accuracy value.7 As an estimator, the proposed algorithm uses the mean squared error (MSE) of model prediction, which is obtained via cross-validation (CV), i.e., the set of evaluated vectors is split into a training set and a testing set. A candidate model is trained using the former and tested on the latter, and the MSE is

MSE =l

m1 2ˆ x xi i( ) − ( )( )∑ (10)

where xi, i = 1 . . . l, are the testing vectors, m(x) is a model trained using the training set, and f(xi) is the exact and already known function value at xi. The CV uses the vectors

548

which have already been evaluated with the true expensive function and stored in the memory, and employs a train-ing–testing split ratio of 80%–20%. The proposed algorithm calculates the MSE for the two candidate models and selects the one with the lower score. As mentioned earlier, this approach can be applied to any type or number of models.

2.2 Workfl ow

The proposed algorithm begins by generating an initial sample using a Latin hypercube (LH) design which ensures that the points are space-fi lling, and therefore improves the accuracy of the model.8 The current best solution is taken as the TR center, designated as xc, and the main optimiza-tion loop begins, where the algorithm fi rst selects an optimal model type, namely, Kriging or RBF, and trains a corre-sponding model using all vectors stored in memory. Next, it performs an optimization trial step by seeking the optimum of the model in the TR. For this it uses the real-coded EA of Chipperfi eld et al.,9 and since evaluating the model is computationally cheap, i.e., it requires only a fraction of a second, the EA uses a large population size and many gen-erations to improve its search effectiveness. The full EA parameter settings are a population size of 100, a search duration of 100 generations, selection by stochastic univer-sal selection (SUS), intermediate recombination with a probability of 0.7, the breeder genetic algorithm (BGA) mutation with a probability of 0.1, and 10% elitism. The predicted optimum found by the EA, xp, is then evaluated with the true expensive function at a cost of one expensive function evaluation, providing f(xp), i.e., the exact objective value at the predicted optimum. Based on this value, the algorithm updates the TR, as follows.

1. If f(xp) < f(xc), the search was successful since the pre-dicted optimum is better than the current best solution, namely, the TR center. Accordingly, the TR is centered at the predicted optimum, and the TR radius is doubled.

2. If f(xp) ≥ f(xc) and there are suffi cient points inside the TR, the search was unsuccessful, but since there are suf-fi cient points in the TR, the model is considered accurate enough to justify contracting the TR. Accordingly, the TR radius is halved.

3. If f(xp) ≥ f(xc) and there are insuffi cient points inside the TR, the search was unsuccessful, but this may be due to poor model accuracy in the TR. In this case, the algo-rithm adds a new point inside the TR to improve the local model accuracy.

From the above description, it follows that the TR steps safeguard the optimization search by restricting the search to the TR, i.e., the region where the model is likely to be more accurate, instead of attempting to optimize the model globally. The TR steps used here depend on the classical TR framework,3 but have been adapted to manage models, i.e., to monitor the number of points in the TR to avoid con-tracting the TR too quickly, which can lead to premature convergence. The threshold number of points to contract

the TR was chosen as 10% of the problem dimension. For completeness, Table 1 gives the pseudocode of the proposed algorithm.

3 Performance analysis

To evaluate the performance of the proposed algorithm, it was benchmarked against four reference algorithms:

1. P–K. The proposed algorithm which uses only a Kriging model (no model selection).

2. P–R. The proposed algorithm which uses only an RBF model (no model selection).

3. EA with periodic sampling (EA–PS).10 The algorithm safeguards the model accuracy by periodically evaluat-ing a small subset of the population with the true objec-tive function, and it then incorporates the new members into the model.

4. CMA–ES with expected improvement (CMA–ES–EI).11 The algorithm follows the expected improvement (EI) framework of balancing exploration and exploitation. At each optimization iteration, the algorithm trains a local Kriging model, and depending on the expected improve-ment (EI) framework, it obtains four points where each represents a different trade-off between a global and a local search.

The algorithms were tested using the following estab-lished test functions: Ackley, Griewank, Rastrigin, Rosen-brock, and Schwefel 2.13, and Weierstrass in dimensions 10 and 50. To support a valid statistical analysis, 30 runs were repeated for each algorithm–objective function combina-tion. Table 2 gives the resultant test statistics for each algo-rithm–objective function combination. It also gives the statistic α, which indicates the signifi cance level at which the proposed algorithm outperformed the other algorithms (0.01, 0.05, or none, indicated by an empty cell).

The results show that in 17 cases out of 24 (4 reference algorithms × 6 functions) the proposed algorithm had a sta-tistically signifi cant performance advantage over the other algorithms. In most cases, the statistics for the proposed algorithm were either the best or near best. The proposed algorithm had a clear advantage in the higher-dimensional problems, namely, Griewank, Rastrigin, Rosen-brock, and Weierstrass in dimension 50, where its performance was statistically signifi cantly better at the 0.01 level. The contribution of the model selection procedure is evident by comparing the proposed algorithm to its two

Table 1. The proposed algorithm

Adaptive EA generate an initial LH sample and evaluate vectors Repeat Set the TR center as the best vector in the memory Select a model type using CV Search for a model optimum in the TR Update the TR and model based on the success of the trial step until the maximum number of allotted analyses are completed

549

Table 2. Test statistics

P P–K P–R EA–PS CMA–ES–EI

Ackley 10 MeanSDMedianMinMaxα

8.808e+003.470e+008.696e+001.805e+001.470e+01

1.347e+014.043e+001.375e+012.058e+001.931e+010.01

8.463e+004.529e+009.595e+001.152e+001.509e+01

4.917e+004.821e−014.923e+003.822e+006.078e+00

1.853e+013.415e+001.939e+011.430e+002.033e+010.01

Griewank 50 MeanSDMedianMinMaxα

9.500e−014.661e−029.632e−018.408e−011.015e+00

9.790e−012.784e−029.887e−019.008e−011.018e+000.01

1.150e+001.218e−011.106e+009.598e−011.443e+000.01

1.660e+006.867e−021.650e+001.497e+001.812e+000.01

1.766e+013.802e+001.739e+011.102e+012.504e+010.01

Rastrigin 50 MeanSDMedianMinMaxα

3.498e+026.498e+013.332e+022.519e+024.811e+02

4.091e+026.303e+014.067e+022.760e+025.291e+020.01

3.901e+024.977e+014.075e+022.327e+024.501e+020.01

3.891e+021.946e+013.858e+023.465e+024.248e+020.01

7.693e+024.749e+017.798e+026.898e+028.516e+020.01

Rosenbrock 50 MeanSDMedianMinMaxα

1.436e+047.356e+031.398e+043.158e+043.812e+04

2.609e+048.235e+032.491e+041.020e+044.681e+040.01

2.904e+041.238e+042.814e+049.589e+036.080e+040.01

8.730e+032.230e+038.523e+034.662e+031.472e+04

2.544e+061.022e+062.304e+069.415e+054.983e+060.01

Schwefel-2.13 10D MeanSDMedianMinMaxα

8.986e+031.181e+045.032e+035.565e+014.113e+04

6.862e+037.556e+034.155e+031.462e+023.220e+04

8.645e+038.578e+035.607e+031.676e+023.112e+04

2.345e+042.316e+041.543e+042.831e+039.364e+040.01

4.857e+044.469e+042.809e+043.927e+032.119e+050.01

Weierstrass 50 MeanSDMedianMinMaxα

4.405e+018.313e+004.481e+013.097e+016.328e+01

4.431e+016.308e+004.444e+013.364e+016.056e+01

4.747e+017.629e+004.935+01

2.116e+015.605e+010.05

3.050e+011.159e+003.064e+012.799e+013.258e+01

7.895e+015.895e+007.842e+017.004e+019.404e+010.01

variants with no model selection, namely, P–K and P–R, where its performance advantage was statistically signifi -cant in eight comparisons.

To further compare the performance of the algorithms, Fig. 1 shows representative convergence plots of all fi ve algorithms for the Schwefel 2.13 function in 10D, and for the Rastrigin function in 50D. Each curve corresponds to a single run by one of the algorithms. The plots show that: (a) the proposed algorithm (P) converged as fast or faster than the reference algorithms, namely, EA–PS and EI–CMA-ES, and that (b) model selection procedure improved perfor-mance, since the proposed algorithm outperformed the variants with no model selection, namely, P–K and P–R.

We also studied whether the proposed algorithm used predominantly a single model type, or whether the model type was updated frequently during the search. Figure 2 gives two representative examples, namely, Schwefel-10D and Rastrigin-50D. The plots show that the model type was updated frequently during the search, with the Kriging model being selected more often than the RBF. These results suggest that there is no single optimal model, but instead the optimal model type may vary during the

optimization search. These results also indicate that using a fi xed model type may not be optimal, and may degrade the search effi cacy. Overall, the results show the benefi ts of continuously adapting the model during the search and the effi ctiveness of the proposed algorithm.

4 Summary

We have proposed a model-adaptive EA for expensive opti-mization problems. The algorithm depends on the trust-region framework to ensure convergence to a true optimum of the objective function while using models. To further improve the search, it uses a model selection step to select between either a Kriging or an RBF model at each iteration. Extensive performance analysis showed the effi ctiveness of the proposed algorithm.

Acknowledgments The fi rst author thanks the Japan Society for the Promotion of Science for its support.

550

References

1. Tenne Y, Armfi eld SW (2009) A framework for memetic optimiza-tion using variable global and local surrogate models. J Soft Comput 13(8):781–793

2. Simpson TW, Poplinski JD, Koch PN, et al (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17:129–150

3. Dennis JE Jr, Torczon V (1997) Managing approximation models in optimization. In: Alexandrov NM, Hussaini MY (eds) Multidis-ciplinary design optimization: state of the art. SIAM, Philadelphia, pp 330–347

4. Gorissen D, De Tommasi L, Croon J, et al (2008) Automatic model type selection with heterogeneous evolution: an application to RF circuit block modeling. Proceedings of the IEEE Congress on Evo-lutionary Computation, CEC 2008, Piscataway, USA, pp 989–996

5. Zhou Z, Ong Y-S, Nair PB, et al (2007) Combining global and local surrogate models to accelerate evolutionary optimization. IEEE Trans Syst Man Cybern Part C 37(1):66–76

6. Meckesheimer M, Booker AJ, Barton RR, et al (2002) Computa-tionally inexpensive metamodel assessment strategies. AIAA J 40(10):2053–2060

7. Burnham KP, Anderson DR (2002) Model selection and inference: a practical information-theoretic approach. Springer, New York

8. McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245

9. Chipperfi eld A, Fleming P, Pohlheim H, et al (1994) Genetic algo-rithm toolbox for use with MATLAB, Version 1.2. Department of Automatic Control and Systems Engineering, University of Shef-fi eld, Sheffi eld

10. Ratle A (1999) Optimal sampling strategies for learning a fi tness model. 1999 IEEE Congress on Evolutionary Computation, CEC 1999, Piscataway, USA, pp 2078–2085

11. Büche D, Schraudolph NN, Koumoutsakos P (2005) Accelerating evolutionary algorithms with Gaussian process fi tness function models. IEEE Trans Syst Man Cybern, Part C 35(2):183–194

Fig. 1. Representative conver-gence plots for the fi ve algorithms

Fig. 2. Examples of model selection during the optimization search

a model-adaptive evolutionary algorithm for optimization

Documents