[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

6
2013 Sixth Inteational Conference on Advanced Computational Intelligence October 19-21, 2013, Hangzhou, China Active Covariance Matrix Adaptation for multi-objective CMA-ES Christoph Krimpmann, Jan Braun, Frank Hoffmann, and Torsten Bertram Abstct- This paper proposes a novel approach for a deran- domized covariance matrix adaptation for multi-objective op- timization. Common derandomized multi-objective algorithms only utilize the information gained from successful mutations. However in case of optimization problems with a limited budget for fitness evaluations inferior mutations provide additional information to adjust the search. The proposed algorithm, called active-(I1+A)-MO-CMA-ES, extends previous approaches as it reduces the covariance along directions of unsuccessful mutations. In experiments on a set of commonly accepted multi- objective test problems the presented algorithm outperforms other derandomized evolution strategies. I. INTRODUCTION Evolution strategies (ES) are commonly applied to op- timization of complex real-valued problems. Beyer and Schwefel [ 1] give a detailed introduction of evolution s- trategies. Over the last years derandomized evolution strate- gies, especially the covariance matrix adaptation (CMA) respectively its derivatives demonstrated to be among the most efficient algorithms to solve single-objective problems f : n R However in case of multi-objective optimiza- tion f : n m with several conflicting objectives, there is still room for improvements on the principal of covariance matrix adaptation. A real-world example of multi-objective optimization (MOO) is the hardware-in-the-Ioop (HiL) op- timization of technical systems in which the electrical or mechanical components and controllers are optimized by evolutionary algorithms. This concept of automated design has been used for electrical circuits, the design of robots or to optimize the topology of antennas [ 2]. Another example is the optimization of hydraulic valves [3] in which the parameters of the servo controller are optimized with respect to a sequence of closed loop step responses. Compared to standard purely numerical optimization problems (e.g. the ZDT test problems by Zitzler et al. [4]), an HiL evaluation is much more time consuming and the number of fitness evaluations constitutes a major bottleneck for the solution quality. In industrial practice, there is a finite time budget for development and optimization of the hardware which is the limiting factor for HiL optimization. To generate solutions within a feasible time horizon, the evolutionary algorithms are constrained in the number of fitness evaluation and the task becomes to find the best possible solution under these limited budgets, which is the main concern of this paper. It presents an approach that rests upon a modified covari- ance matrix adaptation for multi-objective optimization and compares the novel scheme with the well known state-of-the- art algorithms AMO x (1 +A)-MO-CMA-ES by Igel et al. [5] The authors e with the Institute of Control Theory and Systems Engi- neering. Technische Universitat Dortmund. D-44227 Dortmund. Germany. (email: christop[email protected]). 978-1-4673-6343-3/13/$3\.00 ©2013 IEEE 189 and non-dominated sorting genetic algorithm II (NSGA-II) by Deb et al. [ 6]. Since this work focuses on the optimiza- tion under limited costs, the number of evaluations of the fitness functions is limited to one-tenth of the evaluations, typically employed in related publications [5], [7]. In the following section, two well known covariance matrix adap- tation algorithms for single-objective optimization are briefly introduced, since they form the basis for the novel evolution strategy presented in section III. That section also describes the transfer from single-objective CMA-ES to their multi- objective counterparts. Section IV describes the experimental evaluation of the presented algorithms and finally, the results are discussed in section V II. DERANDOMIZED EVOLUTION STRATEGIES This section gives a brie description o the well known derandomized evolution strategies based on covariance matrix adaptation proposed by Hansen and Ostermeier [8]. In particular the (1 + A)-CMA-ES is explained. Additionally the active-CMA-ES described by Jastrebski et al. [9] is presented. All strategies have in common that they operate on a representation a that at least consists of the 5-tuple a = [x, C, , Pc , Psuccl where x E n is its solution vector, C E nxn the covariance matrix, E the step size respectively the mutation strength, Pc E n the evolution path and Psucc E n the average success rate. In addition, the following notation is employed. f : n , x f(x) denotes the objective function to be minimized. Asucc = I{i = 1, 2, ... ,AI f( Xi) f(x pare n t)}1 is the number of successul offspring solutions. C = BD(BD) T is the eigenvalue decomposition of the covariance matrix, where B E nxn is the matrix of normalized eigenvectors and D E nxn the diagonal matrix o the square roots o the eigenvalues. z N(O, I) is the mutation vector, drawn om a multi- variant normal distribution N with mean 0 E n and the unit matrix I E nxn as covariance. The algorithms adapt the outlined parameters at each iteration. A. The ( l +A ) -CMA-ES The outlined algorithm is a basic approach to realize a derandomized evolution strategy and is identical with the one presented by Igel et al. [5]. Each iteration 9 of the algorithm consists o five operations. I) Compute the eigendecomposition of C, generate mutation vectors Z i with i E [1, 2, ... , A] and create

Upload: torsten

Post on 21-Dec-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

2013 Sixth International Conference on Advanced Computational Intelligence October 19-21, 2013, Hangzhou, China

Active Covariance Matrix Adaptation for multi-objective CMA-ES

Christoph Krimpmann, Jan Braun, Frank Hoffmann, and Torsten Bertram

Abstract- This paper proposes a novel approach for a deran­domized covariance matrix adaptation for multi-objective op­timization. Common derandomized multi-objective algorithms only utilize the information gained from successful mutations. However in case of optimization problems with a limited budget for fitness evaluations inferior mutations provide additional information to adjust the search. The proposed algorithm, called active-(I1+A)-MO-CMA-ES, extends previous approaches as it reduces the covariance along directions of unsuccessful mutations. In experiments on a set of commonly accepted multi­objective test problems the presented algorithm outperforms other derandomized evolution strategies.

I. INTRODUCTION

Evolution strategies (ES) are commonly applied to op­timization of complex real-valued problems. Beyer and Schwefel [ 1] give a detailed introduction of evolution s­trategies. Over the last years derandomized evolution strate­gies, especially the covariance matrix adaptation (CMA) respectively its derivatives demonstrated to be among the most efficient algorithms to solve single-objective problems f : JRn --+ R However in case of multi-objective optimiza­tion f : JRn --+ JRm with several conflicting objectives, there is still room for improvements on the principal of covariance matrix adaptation. A real-world example of multi-objective optimization (MOO) is the hardware-in-the-Ioop (HiL) op­timization of technical systems in which the electrical or mechanical components and controllers are optimized by evolutionary algorithms. This concept of automated design has been used for electrical circuits, the design of robots or to optimize the topology of antennas [ 2]. Another example is the optimization of hydraulic valves [3] in which the parameters of the servo controller are optimized with respect to a sequence of closed loop step responses. Compared to standard purely numerical optimization problems (e.g. the ZDT test problems by Zitzler et al. [4]), an HiL evaluation is much more time consuming and the number of fitness evaluations constitutes a major bottleneck for the solution quality. In industrial practice, there is a finite time budget for development and optimization of the hardware which is the limiting factor for HiL optimization. To generate solutions within a feasible time horizon, the evolutionary algorithms are constrained in the number of fitness evaluation and the task becomes to find the best possible solution under these limited budgets, which is the main concern of this paper. It presents an approach that rests upon a modified covari­ance matrix adaptation for multi-objective optimization and compares the novel scheme with the well known state-of-the­art algorithms AMO x (1 +A)-MO-CMA-ES by Igel et al. [5]

The authors are with the Institute of Control Theory and Systems Engi­neering. Technische Universitat Dortmund. D-44227 Dortmund. Germany. (email: [email protected]).

978-1-4673-6343-3/13/$3\.00 ©2013 IEEE 189

and non-dominated sorting genetic algorithm II (NSGA-II) by Deb et al. [ 6]. Since this work focuses on the optimiza­tion under limited costs, the number of evaluations of the fitness functions is limited to one-tenth of the evaluations, typically employed in related publications [5], [7]. In the following section, two well known covariance matrix adap­tation algorithms for single-objective optimization are briefly introduced, since they form the basis for the novel evolution strategy presented in section III. That section also describes the transfer from single-objective CMA-ES to their multi­objective counterparts. Section IV describes the experimental evaluation of the presented algorithms and finally, the results are discussed in section V.

II. DER ANDOMIZED EVOLUTION STR ATEGIES

This section gives a brief' description of' the well known derandomized evolution strategies based on covariance matrix adaptation proposed by Hansen and Ostermeier [8]. In particular the (1 + A)-CMA-ES is explained. Additionally the active-CMA-ES described by Jastrebski et al. [9] is presented. All strategies have in common that they operate on a representation a that at least consists of the 5-tuple a = [x, C, IJ', Pc , Psuccl where x E JRn is its solution vector, C E JRnxn the covariance matrix, IJ' E JR the step size respectively the mutation strength, Pc E JRn the evolution path and Psucc E JRn the average success rate. In addition, the following notation is employed. f : JRn --+ JR, x --+ f(x) denotes the objective function to be minimized. Asucc = I{i = 1, 2, ... ,AI f(Xi) � f(xparent)}1 is the number of successf'ul offspring solutions. C = BD(BD)T is the eigenvalue decomposition of the covariance matrix, where B E JRnxn is the matrix of normalized eigenvectors and D E JRnxn the diagonal matrix of' the square roots of' the eigenvalues. z rv N(O, I) is the mutation vector, drawn from a multi­variant normal distribution N with mean 0 E JRn and the unit matrix I E JRnxn as covariance. The algorithms adapt the outlined parameters at each iteration.

A. The (l +A)-CMA-ES

The outlined algorithm is a basic approach to realize a derandomized evolution strategy and is identical with the one presented by Igel et al. [5]. Each iteration 9 of the algorithm consists of' five operations.

I) Compute the eigendecomposition of C, generate mutation vectors Zi with i E [1, 2, ... , A] and create

Page 2: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

offspring individuals.

9 aparent g+l _ 9 2BD xi - Xparent + (J" Zi (I)

2) Determine the fitness values f(Xi) and update the average success rate according to the improvements accomplished by individuals

_ _ Asuec Psuee +- (1 - cp)Psucc + cp-A-_ P��:t7;:�:ei� A where cp - 2+p��:;.�d)..'

3) Update the step size according to

(J" +- (J" exp _ succ succ ( 1 P _ ptarget ) d 1 _ p��,��et

with d = 1 + � and ptarget = __ 1_

(2)

(3)

2A succ 5+v0:/2 . 4) The evolution path Pc and the covariance matrix e are

updated with respect to the average success rate Psucc' In case of Psuce < Pthresh the update is given by.

Pe +- (1 - ce) Pc + j Cc (2 - ce)xstep , (4)

e +- (1 - ceov) e + ccovPcP� (5)

with g+l 9 xparent - xparent Xstep = 9 (J"parent

Otherwise a modified update is used,

Pc +- (1 - cc) Pc , (6)

e +- (1 - ccov) e + Ceov (Pcp� + Cc (2 - Ce) e) (7)

with Ce = 2/(n + 2), and Pthresh = 0.44.

B. active-CMA-ES

Ccov = 2/ (n2 + 6)

From their first appearance of the CMA-ES in 1996 until today the literature witnesses numerous variations of the original approach. A comparative description and benchmark of the variants is given by Back et al. [10]. The bench­mark results show that the active-CMA-ES by lastrebski and Arnold [9] exhibits a superior performance on most benchmark problems. The solution update consists of seven steps:

1) Computation of the normalized eigenvectors B and the diagonal matrix D.

2) Generation of A offspring individuals.

Xi = xparent + (J" BDzi (8)

3) Determine the objective function values f (Xi) and sort the individuals in ascending order so that k E [1 ,2 , ... , A] refers to the k-th best individual. Then calculate the average mutation vector

1 I" zavg = - L Zk

fL k=l (9)

190

of the fL best individuals. 4) The parents solution vector xparent

following: is updated as

Xparent +- xparent + (J" BDzavg . (10)

5) In contrast to the (l+A)-CMA-ES the active-CMA-ES uses two evolution paths Pc and PO" .

Pc +- (1 - cc) Pc + j fLCc (2 - cc) BDzavg, (II)

PO" +- (1 - cO") PO" + j fLCO" (2 - cO" ) Bzavg , (12)

where Cc = CO" = 4/ (n + 4). 6) The covariance matrix update not only takes the suc­

cessful mutations into account but also the mutations that perform worse than its parent.

e +- (1 - ccov) e + ccovPcP� + (3Z (13)

with Ceov = 2/ (n + V'2) 2,

(3 = 4fL - 2

and (n + 12)2 + 4fL

Z = BD (� t zkzk - � t ZkZk) fL k=l fL k=)..-I,+1

7) The step size is updated according to

(BD)T.

(J" +- (J" exp ( IIPO"J�;: Xn ) , (14)

where Xn :::::; E (1IN(o, 1)112) [11] and d = 1 + l/cO" . The presented algorithms form the basis of the active-(/L+A)­MO-CMA-ES presented in section III-B. See [9] and [10] for a detailed performance analysis of the single objective CMA-ES.

III. MULTI-OBJECTIVE COVARIANCE MATRIX

ADA P TATION STR ATEGIES

Based on the (l+A)-CMA-ES and the active-CMA-ES we present a novel CMA-ES for multi-objective optimization. First the multi-objective optimization is outlined and then, the active-(fL+A)-MO-CMA-ES is presented. In the following we consider an optimization problem with m objectives f(x) [h(x) , h(x) , ... , fm(x) ] to be minimized. A set of candidate solutions X is sorted according to their relative location in the ob­jective space using the principle of Pareto dominance. An individual of this set Xl E X dominates another individual X2 E X if Vi E {1 , 2 , ... ,rn} : fi(X1) .-::: fi(X2) and 3i E {1 , 2 , ... , m} : fi(xd < fi(X2) and is denoted Xl -< X2. The non-dominated solutions form the Pareto front. Without the additional specification of particular preferences it is not possible to further distinguish among the members of the Pareto front. The aim of multi-objective optimization is to generate an accurate approximation of the true Pareto front by a set of diverse individuals uniformly spread across the front. Throughout this paper the non-dominated sorting [12]

Page 3: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

is used to sort the population according to their level of dominance. To sort the individuals of same rank, a secondary criteria is needed. Therefore the crowding distance of the NSGA-II algorithm [6] and the contributing hypervolume or rather the S-metric purposed by Zitzler and Thiele [13] is used. The crowding-distance measures, by how much an individual contributes to the diversity of the set of individuals with the same rank. Alternatively S-metric selection mea­sures the contribution of a single individual to the covered hypervolume. In both cases the individual with the higher contribution is preferred.

A. AMO x (1 +A)-MO-CMA-ES

The (I+A)-CMA-ES as presented in section II-A can easily be applied to MOO problems as outlined by Igel et al. [5]. For this, the AMO x (1 +A)-MO-CMA-ES compris­es AMO (I+A)-CMA-ES. The update steps are almost equal to section II-A except the following modifications:

• In each iteration g, the AMO parents and A x AMO offspring individuals form the set

Qg = {af, a�arent,k 11::'; i ::'; A and 1 ::.; k ::.; AMO}

• The success rate AS!LCC is the fraction of offspring solutions that dominate their parent.

AS!LCC = I{ i = 1 , 2, ... , AI! (Xi,k) --< !(xparent,k)}I

• The selection of the AMO parents for the next generation is done by selecting the most dominant individuals of the current generation.

(15)

where Q�l:)\MO denotes the AMO best individuals.

B. active-(JL+A)-MO-CMA-ES

The active-(JL+A)-MO-CMA-ES is an evolution strategy that is based on the AAifOx(I+A)-MO-CMA-ES and com­bines its simplicity with the advantages of the active-CMA­ES. To reveal the full power of the active-(JL+A)-MO-CMA­ES, the number of offspring solutions should significantly exceed the number of parents. That ensures a sufficient number of solutions superior to their parents in addition to the inferior ones. The inferior solutions are utilized to decrease the variance of the covariance matrix in directions of unsuccessful mutations and accelerate the evolution to an improved approximation of the Pareto-optimal solutions. This property is especially important for problems with a limited budget of fitness evaluations and provides an advan­tage in comparison to the common MO-CMA-ES. The new algorithm consists of the following steps:

I) Compute the eigendecomposition of the IL parents co­variance matrices C and generate A offspring sampling from the parents normal distributions given by C, so that each parent generates AI JL offspring individuals.

191

2) The objective function values f (Xi) are determined and each subpopulation, consisting of a parent xparent and AI JL offspring individuals Xi are sorted according to their level of dominance such that k E [1, 2, ... , AIIL] refers to the k best individuals. Then the average mutation vector of each parent individual is calculated according to

(16)

with v = A/(2JL). 3) The search path Pc is updated according to equa­

tion II. 4) The parent's covariance matrices are updated according

to equation 13 with the given modification

Z = BD (� t ZkzI - � f ZkZI) (BD)T, I k=l I k=2v-,+1

(17) where I is the number of best respectively worst offspring individuals of one parent.

5) The step size adaption is taken from the AMO x (1 +A)­MO-CMA-ES. Empirical tests show, that the selected step size adaption leads to an improved convergence compared with the step size adaption shown in sec­tion II-B. The tests also showed an increased conver­gence for the selected step size adaption by replacing

p;���et with

target _ Fn PS!LCC - 5 + ffv 12

for two- and three-dimensional MOO problems.

(18)

IV. EXPERIMENTAL EVALUATION OF THE MO-CMA-ES

This section provides a comparative analyis of the present­ed MO-CMA-ES acts on a suite of benchmark problems. First, the test problems are described and the performance indicator is explained. Then, the active-(JL+A)-MO-CMA­ES and the AAifOx (I+A)-MO-CMA-ES are compared with the NSGA-II.

A. Measuring the performance of multi-objective evolution­ary algorithms

The multi-objective evolutionary algorithms are tested on a set of standard box-constraint test problems which are commonly used to analyze the performance of multi­objective optimizers. All problems have m = 2 objective functions !i (x) and a varying number n of parameters Xi. The set consists of the test problem FON (n = 7) introduced by Fonserca and Fleming [14], the KUR (n = 6) problem by Kursawe [IS] and the ZDT problems by Zitzler et al. [4]. The ZDTI, ZDT2 and ZDT3 problems have n = 30 parameters, the ZDT4 and ZDT6 problems have n = 10 parameters. The key aspect of this work is to find the best approximation to Pareto front for a fixed budget of fitness evaluations rather than the final convergence to the Pareto front for a larger number of evaluations. To find the best Pareto-optimal set, all

Page 4: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

non-dominated solutions are collected during the evolution process. This so called elitist set is the output of the ES. Additionally, to reduce the influence of random effects, each test problem is optimized over five runs of each ES. For the performance comparison between the algorithms, the set of non-dominated individuals of the union of the elitist sets, denoted as A, are used. The hypervolume-indicator Is is employed as a quality measure. Is measures the hypervol­ume between the normalized Pareto-optimal set Are! and .A. The lower Is is, the better is the approximation of the optimal set. The analysis is comparable with the one used by Igel et al. [7].

Is = Sa,,! (Are!) - Sarct (A)

To compare the approximation of the Pareto-optimal sets be­tween the different test problems a normalization is needed. Therefore the objective values of the union of the Pareto­optimal set Are! and the solution set A are scaled into the range of [1, 2]m. The reference point are! is chosen to have a value of 2.1 in each objective.

B. Experimental setup

All tested algorithms operate with a population size of 100 individuals in order to make the results comparable with performance reported in the literature e.g. [5] or [7]. According to this definition the population parameters of the )'1\10 x (1 +A)-MO-CMA-ES are set to AlVIO = 100 and A = 1. The covariance matrices are initialized with C = I and the step size IJ is chosen problem depended, such that the optimum solution is covered by the normal distribution N (x, IJC). This results to IJ = (XI,max - XI,min ) 16. The outcome of this definition is the restriction, that this setting is only feasible, if the search space has the same scale across all dimensions. In practice this is not always possible. The active-(/L+A)-MO-CMA-ES also uses an offspring size of A = 100 individuals but the parents are limited to fL = 25. The ratio of AI fL = 4 leads to a setup where I = z; = 2. In other words all offspring individuals of one parent are used to update its covariance matrix, either the two best individuals to increase the variance in successful mutation directions or the two worst to decrease the variance in other directions. To comply with the restriction of a uniformaly scaled search space the step size is set to IJ = 1 and to fulfill the requirement of covering the optimal solution the initial covariance matrix is set up as following.

C = diag ( Xi,max � Xi,min ) ( 19)

This enables the algorithm to cover the optimal solution even on test problems where the search space has unequal ranges. Both algorithms are tested with crowding distance selection, denoted as C- • . • -ES, and S-metric selection, denoted as s-... -ES. The algorithms are compared with the well known and commonly used NSGA-II, which is a randomized search strategy. The NSGA-II is set up with an offspring and parents population size of 100 individuals. As outlined in section I,

192

this paper deals with the optimization under limited costs. Therefore the algorithms are terminated after 5000 fitness evaluations and their performance is analyzed. Notice that at this early stage the algorithms usually do not achieve complete convergence to the Pareto front.

V. RESULTS & DISCUSSION

The resulting performances Is are reported in table I. Each column of the table contains the hypervolume indicators for one test problem and the best entry is printed in bold, the worst one in italic. On the ZDT4 test problem no algorithm is able to find a good approximation of the Pareto set under the limitation of 5000 fitness evaluations. As shown in figure I, the solution sets of the CMA-ES are still far away from the Pareto-optimal solution set. This results in a high Is value. The NSGA-II reaches a comparable low value of Is = 0.18278 which seemingly indicates good approximation of the Pareto front. However one can see in figure I, the NSGA-II is not able to generate a diverse Pareto set. The found solution set collapsed the lower bound of the first objective function h (x) , but has a comparably good performance for the second objective function 12 (x) . Due to the definition of the S-metric, the covered hypervolume of the solution set found by the NSGA-II is much higher than the covered hypervolume of the other solution sets, which show a wider spread.

TABLE I

RESULTING HYI'ERVOLUME INDICATOR VALUES Is AFTER 5 RUNS OF 5000 FITNESS EVALUATIONS

FON KUR ZDT1 ZDT2 c - 100 x 1 + 1 -ES .2051 0.1227 0.474 0.6613 s - 100 x (1 + 1)-ES 0.1619 0.1084 0.4552 0.7199 c - a - (25 + 100)-ES 0.0123 0.0l70 0.2340 0.2676 s - a - (25 + 100)-ES 0.0066 0.0143 0.1881 0.3739 NSGA-II 0.0085 0.0175 0.0071 0.4282

ZDT3 ZDT4 ZDT6 c - 100 x 1 + 1 -ES 0.39951 0.82687 0.45464 s - 100 x (1 + 1)-ES 0.40773 0.82623 0.22585 c - a - (25 + 100)-ES 0.25657 0.97266 0.00075 s - a - (25 + 100)-ES 0.16148 0.83994 0.00298 NSGA-II 0.00542 0.18278 0.49687

Despite the results of the ZDT4 problem, for all other test problems at least one variant exhibits a good solution set. Taking the example of FON, the performance differences be­tween the s-active-(fL+A)-MO-CMA-ES and the Al'cW x (1 + A)-MO-CMA-ES are quite noticeable. Figure 2 illustrates these differences. For the sake of clarity the reference Pareto set and the solution sets of the c-active-(fL+A)-MO-CMA­ES and the NSGA-II are not shown because they nearly coincide with the solution set of the s-active-(fL+A)-MO­CMA-ES. The algorithm is capable to find a solution set which completely covers the Pareto-optimal set. In contrast, the AlVIO x (1 +A)-MO-CMA-ESs are not able to find well approximating Pareto sets within the limited number of fitness evaluations. This leads to higher Is values for the AlVIO x (l+A)-MO-CMA-ES. The NSGA-II also performs quite well and achieves an indicator value of Is = 0.0085

Page 5: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

Z �

2.2 1--.------.------;::::==::::::;==::::::==:::::;-] --- Pareto-Set

++ X

4- • .

1.6

1.4

1.2

1.2 1.4 1.6 JI(x)

+ c-lOOx (1+1)-ES * NSGA-II o c-a-(25+ UlO)-ES X s-lOOx (1+1)-ES

s-a-(25+ lOO)-ES

1.8 2 2.2

0.1

0.01

o

--�. ,......,. . ....--. . -- . .......... '- --=-"-

- - - c-100-x-(1+1)-ES ....... s-100-x-(1+1)-ES . - . - . s-a-(25+100)-ES --- NSGA-II

_ . .........

10 20 30 Generation

'"

...... .

40 50

Fig. 1. Pareto sets generated after 5 runs of 5000 fitness evaluations on Fig. 3. Evolution of the hypervolume indicator values is over the ZDT4 generations on FON. The algorithms stopped after 5000 fitness evaluations.

0.8

Z 0.6 �

0.4

0.2 + c-100x(1+1)-ES

X s-lOOx (1+1)-ES s-a-(25+ lOO)-ES

o�==�==�==�--���� o 0.2 0.4 0.6

11 (x) 0.8 1 1.2

Fig. 2. Pareto sets generated after 5 runs of 5000 fitness evaluations on FON

which is almost as good as the s-active- ( IL+>.)-MO-CMA­ES.

Comparing the online performance of the CMA-ES on the example of FON as shown in figure 3, the superior conver­gence of the s-active- ( IL+>.)-MO::CMA-ES is apparent. This plot shows a modified measure Is with

where Eg denotes the elitist set at the generation g. This mod­ification of the measure is needed, because the unmodified measure Is cannot be computed online during the evolution process. The modified measure is the mean of the normalized hypervolume between the reference set Are! and the current elitist set Eg across all repetitions. The rapid reduction of is indicates a fast convergence to the Pareto-optimal set. The new algorithm is capable to achieve the performance of the >'AW X (1 +>.)-MO-CMA-ES within less than half of the generations. It approximately requires 20 generations to achieve the level of quality of the >'MO-... -ES. Taking the

193

results of the NSGA-II into account, which performs best in the first generations, the s-active- ( IL+>. )-MO-CMA-ES outperforms NSGA-II after 35 generations.

The three algorithms NSGA-II, C->'MO X (1 +>')-MO­CMA-ES and c-active- ( /L+>.)-MO-CMA-ES rely on the same selection scheme. That makes it quite fair to compare their performances regarding the hypervolume indicator Is. In contrast, the s-... -CMA-ES almost directly aims to opti­mize the covered hypervolume which seemingly constitutes an unfair advantage over NSGA-II. However, by comparing the results in table I it becomes apparent, that there is no clear advantage of the direct S-metric selection. The s-active­( IL+>. )-MO-CMA-ES outperforms the other algorithms only in two out of six cases, namely the FON and KUR problems. On ZDT2 and ZDT6 the crowding distance selections seems to be superior because the c-active- ( /L+>. )-MO-CMA-ES is the most efficient algorithm. Despite the outstanding results of the active- ( IL+>. )-MO-CMA-ES the NSGA-II shows the best results on the ZDTI and ZDT3 problems. Adapting the initial covariance matrix to a given test problem requires quite a number of fitness evaluations, which correlate with the number of parameters 'n. This type of secondary adapta­tion of strategy parameters comes with the cost of inferior performance in the early stage. Figure 3 shows that there is hardly any improvement on the hypervolume Is during the first generations. The less complex NSGA-II is already optimizing towards the Pareto optimal set whereas the CMA­ESs still remain close to the initial solutions. However, once the strategy parameters provide a good local approximation of the fitness landscape, the CMA-ES explores the Pareto front more efficiently.

To compare the long-term performance of the active­( IL+>. )-MO-CMA-ES and to validate the implementation of the >'AW x (1 +>')-MO-CMA-ES, the algorithms are also test­ed on FON for 50000 fitness evaluations. As figure 4 shows, all algorithms are capable to approximate the Pareto set with similar quality. The s-active- ( IL+>.)-MO-CMA-ES performed

Page 6: [IEEE 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI) - Hangzhou, China (2013.10.19-2013.10.21)] 2013 Sixth International Conference on Advanced

best with an indicator value of Is = 0.00409, followed by the NSGA-II with Is = 0.00421. The AMO x (1 +A)-MO­CMA-ES perfonned slightly better than the implementation used by Igel et al. [5]. Using the S-metric selection, the CMA-ES reaches a value of Is = 0.00448 respectively Is = 0.00522 with crowding distance selection. By reaching the performance presented by Igel et aI., it is also shown, that the active-(I1+A)-MO-CMA-ES is the superior algorithm even on long-term optimization runs.

1\\ \ \

\ \

0.1 \

\ \

0.01

o

t i I. I. ,.

\ " ,.., � . - . ....;:;.....:,

-. .

-. . . .

-c-100x (1 + l)-ES s-100x (1 + l)-ES

. _ . - s -a -(25 + 100)-ES --- NSGA-II

', � ....... � -� . '-'" -. -

100 200 300 400 Generation

.-:

500

Fig. 4. Development of the hypervolume indicator values is over the generations on FON. The algorithms stopped after 50000 fitness evaluations.

As the reported results demonstrate, the newly developed active-(/L + A)-MO-CMA-ES is able to fulfill the particular requirements for short-term optimization tasks. It shows improved convergence speed compared to the AMO x (1 +A)­MO-CMA-ES on all test problems as it takes successful and unsuccessful mutations into account to estimate the covariance matrix. The results on FON also show, that the NSGA-II performs better on extremely limited costs, for which secondary adaptation of strategy parameters is too costly. If the costs are limited further, it is expected to see the NSGA-II as the most efficient algorithm. For test problems with unlimited costs no clear winner emerges as all algorithms demonstrate a similar performance.

VI. SUMMARY & CONCLUSIONS

This contribution presents a modification of the multi­objective CMA-ES. The scalar-objective (l+A)-CMA-ES and the active-CMA-ES are briefly explained and as they form the foundation of the A1\lO x (1 +A)-MO-CMA-ES and the novel multi-objective active-(I1+A)-CMA-ES. The moti­vation for this work is to create a multi-objective evolution strategy based on covariance matrix adaptation for optimiza­tions with a limited budget of fitness evaluations that exhibits a better convergence than state-of-the-art algorithms. The established algorithms such as the Al'cW x (1 +A)-MO-CMA­ES only utilize the information of successful mutations to modify the covariance matrix which is feasible in case of unlimited cost budgets. However, for a limited budget the

194

algorithm should utilize information about inferior offspring as well to adjust the strategy parameters. Therefore the active covariance matrix adaptation, introduced for scalar-objective strategies, is applied to multi-objective optimization. The ac­tive CMA also takes unsuccessful mutations into account, in order to decrease the variance of the covariance matrix along those directions and simultaneously increase the variance in directions of successful mutations.

To compare their performance the algorithms are tested on a set of commonly used multi-objective test problems with a strongly limited number of fitness tests. Additionally the NSGA-II is tested as an established MO-optimization algorithm. The analysis of the optimization results show, that the active-(11 + A)-MO-CMA-ES clearly outperforms the AMO x (1 +A)-MO-CMA-ES in all test problems by at least a factor of two. Despite its good performance the novel algorithm is outperfonned by the NSGA-II in two out of six times. The different selection schemes, either the S-metric selection or the crowding distance selection, do not show a substantial effect on the results.

REFERENCES

[I] H. G. Beyer and H. P. Schwefel, "Evolution strategies - A comprehen­sive introduction," Natural Computing, vol. I, pp. 3-52, 2002.

[2] J. D. Lohn and G. S. Hornby, "Evolvable hardware: using evolutionary computation to design and optimize hardware systems," IEEE Compu­

tational Intelligence Magazine, vol. 1, pp. 19-27, 2006. [3] J. Krettek, D. Schauten, F. Hoffmann and T. Bertram, "Evolutionary

hardware-in-the-Ioop optimization of a controller for cascaded hydraulic valves," IEEE international conference on advanced intelligent mecha­

tronics. pp. 1-6, 2007. [4] E. Zitzler, K. Deb and L. Thiele, "Comparison of multiobjective

evolutionary algorithms: Empirical results," Evolutionary Computation,

pp. 173-195, 2000. [5] C. Igel, N. Hansen and S. Roth, "Covariance matrix adaptation for

multi-objective optimization," Evolutionary Computation. vol. IS, no. I, pp. 1-28, 2007.

[6] K. Deb, A. Pratap and S. Agarwal, T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE Transactions on Evolutionary Computation. vol. 6, no. 2, pp. 182-197, 2002.

[7] C. 1ge1, T. Suttorp and N. Hansen, "Steady-state selection and efficient covariance matrix update in the multi-objective CMA-ES," Evolutionary

Multi-Criterion Optimization, pp. 171-185, 2007. [8] N. Hansen and A. Ostermeier, "Adapting arbitrary normal mutation

distributions in evolution strategies: the covariance matrix adaptation," Proceedings uf IEEE International Conf'erence on Evolutionary Com­

putations. 1996, pp. 312-317. [9] G.A. Jastrebski and D.V. Arnold, "Improving Evolution Strategies

through Active Covariance Matrix Adaptation," IEEE Congress on

Evolutionary Computation CEC. 2006, pp. 2814-2821. [10] T. Back, c. Fousette and P. Krause, "Eine Ubersicht moderner Evo­

lutionsstrategien und empirische Analyse ihrer Effizienz," Proceedings

of' Workshop Computational Intelligence. 2012, vol. 45, pp. 273-305. [11] S. Kern, S. D. Miiller, N. Hansen, D. Biiche, J. Ocenasek and

P. Koumoutsakos, "Learning probability distributions in continuous evolutionary algorithms - a comparative review," Natural Computing.

vol. 3, no.!, pp. 77-112, 2004. [12] K. Deb. Multi-Objective Optimization using Evolutionary Algorithms.

Chichester: Wiley-interscience series in systems and optimization, 2001. [13] E. Zitzler and L. Thiele, "Multiobjective optimization using evolution­

ary algorithms - A comparative case study," Parallel Problem Solving from Nature - PPSN V, pp. 292-301, 1998.

[14] c. M. Fonserca and P. J. Fleming. "Muitiobjective genetic algorithms made easy: selection sharing and mating restriction," First International

Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, GALESIA, pp. 45-52, 1995.

[15] F. Kursawe. "A variant of evolution strategies for vector optimization." Parallel Problem Solving ji'om Nature, pp. 193-197, 1991.