a comparative study of four parallel and distributed pso methods

New Generation Computing, 29(2011)129-161Ohmsha, Ltd. and Springer

A Comparative Study of Four Parallel and DistributedPSO Methods

Leonardo VANNESCHI, Daniele CODECASA andGiancarlo MAURIDepartment of Informatics, Systems and Communication (D.I.S.Co.)University of Milano-Bicocca, 20126 Milan, ITALY{vanneschi,codecasa,mauri}@disco.unimib.it

Received 30 September 2010Revised manuscript received 6 January 2011

Abstract We present four new parallel and distributed particle swarmoptimization methods consisting in a genetic algorithm whose individualsare co-evolving swarms, an “island model”-based multi-swarm system, whereswarms are independent and interact by means of particle migrations at reg-ular time steps, and their respective variants enriched by adding a repulsivecomponent to the particles. We study the proposed methods on a wide setof problems including theoretically hand-tailored benchmarks and complexreal-life applications from the field of drug discovery, with a particular focuson the generalization ability of the obtained solutions. We show that the pro-posed repulsive multi-swarm system has a better optimization ability than allthe other presented methods on all the studied problems. Interestingly, theproposed repulsive multi-swarm system is also the one that returns the mostgeneral solutions.

Keywords: Optimization, Swarm Intelligence, Parallel and Distributed Algo-rithms.

§1 IntroductionThe class of complex systems sometimes referred to as swarm systems7,18)

is a source of new computational methods introduced to solve difficult problemsor model complex phenomena. When swarms solve problems in nature, theirabilities are usually attributed to swarm intelligence. Perhaps the best-knownexamples are colonies of social insects such as termites, bees, and ants. In recentyears, it has proved possible to identify, abstract, and exploit the computational

130 L. Vanneschi, D. Codecasa and G. Mauri

principles underlying some forms of swarm intelligence, and to deploy them forscientific and industrial purposes. One of the best-developed techniques of thistype is Particle Swarm Optimization (PSO).9,15,31) In PSO, which is inspired byflocks of birds and shoals of fish, a number of simple entities, the particles, areplaced in the parameter space of some problem or function, and each evaluatesthe fitness at its current location. Each particle then determines its movementthrough the parameter space by combining some aspect of the history of its ownfitness values with those of one or more members of the swarm. The membersof the swarm that a particle can interact with are called its social neighborhood.Together the social neighborhoods of all particles form a PSO social network.More precisely, in the canonical version of PSO, the movement of each particledepends on two elastic forces, one attracting it with random magnitude to thefittest location so far encountered by the particle, and one attracting it withrandom magnitude to the best location encountered by any of the particlessocial neighbors in the swarm.

PSO is becoming more and more popular and finding applications in moreand more domains, thanks to its effectiveness and extremely easy implementa-tion. As reported in 25), searching the IEEExplore (http://ieeexplore.ieee.org)technical publication database by the keyword “PSO” returns a list of muchmore than 1,000 titles, about one third of which deal with theoretical aspects.Despite this intensive activity, much is still to be learned and discovered aboutPSO from a theoretical point of view.17) Most contributions, in fact, deal withmodifications of the basic algorithm. On the one side, the essentiality of the ba-sic equations suggests that further improvements may be introduced, modifyingthe dynamics of the swarm. On the other side, because of the natural implicitparallelism that is present in the basic algorithm (particles and/or social neigh-borhoods can be seen as independent agents evolving in parallel, with somesynchronizations) it is natural to develop new parallel and distributed versionsof PSO.

The present paper introduces four new parallel and distributed PSO mod-els. Motivations, contents, contribution and novelty of the paper are discussedbelow.

1.1 MotivationParallel and distributed approaches are natural in swarm intelligence and

they have been used intensively since the early years of this research field. Swarmsystems, in fact, have often been described as intrinsically parallel computationalmethods. The reason for this is that many of the main computational taskscharacterizing this family of heuristics are independent of each other; thus it isstraightforward to perform them at the same time. This is the case, for instance,of the evaluation of the fitness of the particles in a swarm. Furthermore, by at-tributing a non-panmictic structure to the population, something that also findsits inspiration in nature, the operations that allow particles to update their po-sition can also be performed independently of each other (once the coordinatesof the swarm global best position so far have been distributed among the pro-

A Comparative Study of Four Parallel and Distributed PSO Methods 131

cessors) and thus can also potentially be parallelized. These approaches canbe useful even when there is no actual parallel or distributed implementation,thanks to the particular information diffusion given by the more local structuresof the swarms. But of course parallel and distributed approaches are at theirbest when the structures of the models are reflected in the actual algorithmimplementations. In fact, when compared with other heuristics, swarm systemsare relatively costly and slow. But parallel and distributed implementations canboost performance and thereby allow practitioners to solve, exactly or approxi-mately, larger and more interesting problem instances thanks to the time savingsafforded.

As mentioned above, these advantages have been known and appreciatedsince several years (some of the previous efforts are summarized in Section 2);so, what justifies a new contribution in the field? The answer is that significantprogress has been made in recent years and these methodologies have becomeeven more important today for at least two reasons. First, engineers and sci-entists wish to tackle problems of ever increasing complexity. The resultingincreases in computational costs have been compensated for, in part, by thesteady improvements in single-CPU performance, but they can obviously be de-creased in an even more substantial manner by harnessing the implicit or explicitparallelism present in swarm based heuristics. Second, while really fast parallelhardware was until recently extremely expensive and difficult to program, thishas changed in a radical manner during the last decade. Nowadays, almost everyscientist has easy access at least to networks of underexploited workstations, ifnot to dedicated parallel/distributed hardware in the form of clusters of powerfulCPUs. Often these collective computational structures are part of hierarchical,geographically extended grid systems that offer unprecedented computationalpower at a very low cost. But the fast hardware that is now available would bemuch less exploitable if it was not accompanied by corresponding advances interms of new, effective and highly parallelizable models.

Motivated by the need of developing high quality work on parallel anddistributed approaches in PSO, in 38) and 37) four new highly parallelizable PSOvariants were introduced. Successively, further work has been done to improvethe design of these parallel models and the related algorithms and to validatethem in terms of optimization ability on a wide set of test problems, as describedin the continuation of this paper. Our future work will be oriented to the studyof the computational speed and scalability of these algorithms on truly parallelarchitectures, like clusters of CPUs.

1.2 ContributionThis paper contains a systematic presentation of the models we defined

in 37) and 38) and of the results of a set of experiments carried on to test therespective optimization abilities between each other and with standard PSO.Besides proposing and evaluating new ideas for parallelizing PSO, we believethat the main contributions of the present work consist at least in the followingpoints:


• the study of new ways of integrating Genetic Algorithms (GAs) and PSO,different from the ones that had been investigated before (some of themdiscussed in Section 2);

• the study of new methods to integrate attractive and repulsive strengthsin the PSO algorithm, different from the ones that had been investigatedbefore (some of them discussed in Section 2);

• the introduction of the original model that we call Multi-swarm RepulsivePSO (MRPSO) in the continuation, that has shown interesting charac-teristics and abilities.

We consider the latter point particularly significant because, contrarily to someof the parallel and distributed PSO models proposed so far (some of them arediscussed in Section 2), MRPSO does not have any computational overhead(due, for instance, to diversity calculations or guided size modifications of thesingle swarms). Furthermore, it is able to find results of better quality comparedto all the other proposed models and also to standard PSO for all the problemsstudied here. Last but not least, MRPSO is able to generalize better to out-of-sample data in all the real-life regression problems studied here (that were notinvestigated in 37) and 38), as further discussed in Section 1.3), compared to all theother considered methods. This last contribution is, in our opinion, particularlyinteresting, given that we are not aware of any study on the generalization abilityof PSO models appeared so far.

One of the most delicate aspects of experimental studies aimed at com-paring different optimization methods is to find a suitable set of test problems onwhich to perform the simulations. In this paper, we use a large set of problemscomposed by:

• two new sets of test functions (that we call cosff and wtrap in the contin-uation) whose difficulty can be tuned by simply modifying the values ofsome parameters and that were introduced for the first time in 38),

• the well known set of Rastrigin test functions (see for instance 44)),• the test functions introduced in the CEC 2005 numerical optimization

competition,34) which are nowadays a more or less accepted standardbenchmark for testing the performances of real-valued parameters op-timization methods, even though a constructive critic of these functionshas appeared in 38),

• four complex real-life problems in the field of drug discovery, whose objec-tive is the prediction of as many important pharmacokinetic parameters(Human Oral Bioavailability, Median Oral Lethal Dose, Plasma ProteinBinding levels and Docking Energy) as a function of a set of molecu-lar descriptors of potential candidate new drugs (the first three of theseproblems were introduced in3) and the fourth one in 2)),

• one further complex real-life problem in the field of drug discovery, intro-duced in 1) and whose objective is the prediction of the response of cancerpatients to a pharmacologic therapy based on a drug called Fludarabine.


1.3 NoveltyAs stated above, the present paper extends and completes two recent

contributions: 37) and 38), in which the four parallel and distributed PSO mod-els studied here have been first introduced. The focus of 38) was prevalentlyon the CEC 2005 benchmark functions and the main contribution of that pa-per consisted in the introduction of the cosff and wtrap sets of functions andin the justification of why and how those functions could represent a suitableintegration to the CEC 2005 functions. In 37), the four proposed parallel anddistributed models were studied also, for the first time, on the Rastrigin func-tions. Here, besides giving a systematic presentation of the models and of theresults contained in 37) and 38), the following further points are considered:

• the experimental validation of the proposed models on five real-life appli-cations in the field of drug discovery and development, whose importancefor the Computational Biology field and whose complexity have been fullydiscussed in previous recent contributions; 2,3,26)

• the study of the generalization ability of the proposed PSO models toout-of-sample data (while in 37) and 38), the study was limited to theoptimization ability on training data), which is important if we want toinvestigate those problems conveniently and consistently with previousliterature.

Showing that some computational models that had obtained promising results ona set of theoretical hand-tailored benchmarks also perform conveniently on sucha set of real-life applications is not nearly a trivial contribution. In fact, thoseapplications contain characteristics of multi-modality and functional complexitythat are not nearly identifiable in the previously investigated benchmarks2,3,26)

and that make them much more difficult problems to be solved by optimizingheuristics.

1.4 ContentsThis paper is structured as follows:

• Section 2 contains a discussion of previous and related works, focusingon multi-swarm PSO, on models that integrate Evolutionary Algorithms(EAs) with PSO and on attractive/repulsive PSO methods defined so far.

• In Section 3, the new PSO methods we are proposing are discussed.• In Section 4, we describe the sets of test problems used.• In Section 5, we present and discuss the experimental results obtained by

the proposed PSO methods on these sets of problems.• Finally, Section 6 concludes the paper and discusses possible future re-

search.

§2 Previous and Related WorkAccording to 14,24), the numerous contributions that have appeared so far

on PSO can be partitioned into two broad areas: modifications/improvements to


the traditional PSO algorithm (this algorithm is presented here in Section 3.1)and applications. The present paper clearly belongs to the first one of thesetwo classes, even though a set of real-life applications is used to validate theproposed models. For this reason, only this class will be discussed here. Thereader interested in a survey of the main applications of PSO is referred to 24).

Several different variants of the basic PSO algorithm have been proposedso far. For instance, several recent interesting researches describe techniquesaimed at improving the performances of the PSO with different settings, whichfocus on the optimization of parameters such as the inertia weight, and the con-striction and acceleration coefficients (see for instance 4,8,35,42)). Even thoughinteresting, these contributions are orthogonal to the present work, where stan-dard parameter setting only is considered, and they could be integrated with thepresent work in the future. Another interesting variant of PSO original formu-lation consists in establishing a given “structure” (or “topology”) to the swarm.Among others, Kennedy and coworkers evaluate different kinds of topologies,finding that good performance is achieved using random and Von Neumannneighborhoods.16) Nevertheless, the authors also indicate that, selecting the mostefficient neighborhood structure is in general a problem-dependent task. In 10),Oltean and coworkers evolve the structure of an asynchronous version of thePSO algorithm. They use a hybrid technique that combines PSO with a geneticalgorithm (GA), in which each GA chromosome is defined as an array which en-codes an update strategy for the particles of the whole swarm. Such an approachworks at macro and micro levels, that correspond, respectively, to the GA usedfor structure evolution, and to the PSO algorithm that assesses the quality of aGA chromosome at the macro level. This contribution represents one of the firstefforts to integrate GAs and PSO, which is tackled also in the present paper bythe definition of the PSE algorithm (see Section 3.2). The authors of 10) empir-ically show that the evolved PSO algorithm performs similarly and sometimeseven better than standard approaches for several benchmark problems. Theyalso indicate that, in structure evolution, several features, such as particle qual-ity, update frequency, and swarm size influence the overall performance of PSO(as further confirmed in 11)). After 10), many other improvements based on theconjunction of Evolutionary Algorithms (EAs) and PSO have been proposed,for example considering self-update mechanisms33) or formation of 3D complexpatterns in the swarm,19) to increase convergence speed and performance inthe problems under consideration. Recently, a modified genetic PSO has beendefined by Jian and colleagues,45) which takes advantage of the crossover andmutation operators, along with a differential evolution (DE) algorithm which en-hances search performance, to solve constrained optimization problems. Otherwork, aimed at solving global non-linear optimization problems is presented byKou and colleagues in 43). They have developed a constraint-handling method inwhich a double PSO is used, together with an induction-enhanced evolutionarystrategy technique. Two populations preserve the particles of the feasible andinfeasible regions, respectively. A simple diversity mechanism is added, allowingthe particles with good properties in the infeasible region to be selected for the


population that preserves the particles in the feasible region. The authors statethat this technique could effectively improve the convergence speed with respectto plain PSO.

As explained in Section 3, the PSE algorithm introduced in this worksubstantially differs from the variants presented in 10,33,43,45) mainly because itconsists of a GA, where each evolving individual is a swarm. This idea is new and,to the best of our knowledge, it has never been exploited before. The interestedreader is referred for instance to 25) for a detailed survey on the different variantsof the standard PSO algorithm that have been proposed so far.

Here we are particularly interested in multi-swarm PSO and in repul-sive and attractive/repulsive PSO methods. In 5), a multi-swarm PSO methodfor dynamic optimization environments was proposed. The main idea of thatmethod is to extend the single swarm PSO, integrating it with another modelcalled Charged Particle Swarm Optimization, by constructing interacting multi-swarms. The main difference between contribution5) and the present one is thatin 5), the goal is clearly the one of improving the PSO optimization ability andself-adaptability in presence of dynamic environments, i.e. where the targetfunction changes with time according to some unknown patterns. The overheadof computation introduced by 5) for dynamic adaption clearly slows down theperformances of the model in presence of static problems. In the present work,we take a different viewpoint: our goal is to improve the PSO optimization abil-ity in case of complex (i.e. characterized by difficult – rugged or deceptive –fitness landscapes) but static (i.e. where the target is fixed and do not changewith time) problems.

In 21), an interesting extension of multi-swarm PSO is proposed, where anindependent local optimization is performed in all the different swarms. Whenthese local optimization processes terminate, all the particles in the system areonce again randomly partitioned in several different swarms and the process isiterated. Even though interesting, the model presented in 21) differs from themodels presented here since we do not introduce any local optimization strategyin the algorithm. In fact, we want to test our models on difficult problem spaces,possibly characterized by the presence of many local optima, and we believethat adding local optimization strategies would favor premature convergenceor stagnation. A variant of the algorithm introduced in 21) was presented in 20),where a mechanism to track multiple peaks by preventing overcrowding at a peakis introduced. Instead of introducing local optimization and then defining criteriafor avoiding premature convergence, we prefer not to use local optimization here,and to tackle complex problems using different ideas, like isolating interactingsub-swarms and repulsion.

In 13), PSO is modified to create the Multi-Swarm Accelerating PSOwhich is applied to dynamic continuous functions. In contrast to the previ-ously introduced multi-swarm PSOs and local versions of PSO, the swarms aredynamic. The whole population is divided into many small swarms, which areregrouped frequently by using various regrouping schedules, and exchange infor-mation among them. Accelerating operators are combined to improve its local


search ability. As previously pointed out for other contributions, also in thecase of 13) the focus is on dynamic environments, while in the present work, weconcentrate on static, yet complex, problems.

In 23), a multi-swarm PSO method inspired by the phenomenon of symbio-sis in natural ecosystems is presented. This method is based on a master-slavemodel, in which a population consists of one master swarm and several slaveswarms. The slave swarms execute a single PSO or its variants independently tomaintain the diversity of particles, while the master swarm evolves based on itsown knowledge and also the knowledge of the slave swarms. Paper 23) representsone of our main sources of inspiration. Nevertheless, it differs from the presentwork by the fact that we do not partition the swarms into masters and slaveshere, but we assign to all of them the same hierarchic importance. This hasinteresting effects on the allocation of our swarm/processes on computationalresources: we do not have to identify the most powerful resources and assignthem to master swarms. In other words, we do not need complex allocationalgorithms: our system can work on clusters of machines that have all the samecomputational power and the allocation can be made arbitrarily.

A domain in which multi-swarm PSO methods find a natural application ismulti-objective optimization, in which typically each swarm is used to optimize adifferent criterium. It is the case, among many other references, of 39). This paperis interesting and has to be considered for a future extension of our approach.But for the moment, we restrict our attention to the problems that have onlyone target/objective.

Multi-swarm PSO methods have also been used in several applicativedomains. Among many others, particularly successful applications are shownin 41), where multiple interacting swarms of adaptive mobile agents are used tosolve problems in networks; in 22), where a multi-swarm system is used for riskmanagement modelling for virtual enterprizes and in automatic generation of im-provised music; and in 6), where improvised music is automatically generated byan interactive multi-swarm method using attractors and repulsors. Here, we ap-ply the proposed models to real-applications in drug discovery and development,aimed at predicting the values of some useful pharmacokinetic parameters andthe responses of patients to therapies (these applications are described in 4.5).Even though the focus of the present work is on the proposed methods, andnot on the specific applications (that are used only as test cases), it is worthmentioning fact that we are not aware of any previous contribution using PSOfor drug discovery applications.

We also point out that, even though clearly related and quite similar,the multi-swarm PSO methods proposed in 5,6,13,20–23,39,41) are different from thefour algorithms proposed in this paper and described in Section 3.

The concept of repulsive PSO or attractive/repulsive PSO has been ex-ploited in some references so far. It is the case, for instance, of 27), where a PSOvariant is presented where the attractive or repulsive behavior of the particlesis changed in function of the swarm’s diversity. Even though interesting, thealgorithm proposed in 27) has the drawback of calculating diversity at each iter-


ation of the algorithm, to decide if particles have to be attractive or repulsive.This implies an overhead of computation, that we want to avoid in the proposedmethods. In fact, in our models we identify some particles as attractive andsome others as repulsive once for all, on the basis of precise principles (that willbe discussed in Section 3), and we do not change their characteristics duringevolution.

§3 PSO Algorithms

3.1 Algorithm 1: PSOThis is the basic PSO algorithm as introduced for instance in 31), where

each particle is attracted by one global best position for all the swarm and onelocal best position. The basic PSO velocity and position-update equations for aparticle are given as follows:

V(t) = w ∗V(t− 1) +C1 ∗ rand() ∗ [Xbest(t− 1)−X(t− 1)] +

C2 ∗ rand() ∗ [Xgbest(t− 1)−X(t− 1)] (1)

X(t) = X(t− 1) + V(t)

where V is the velocity of the particle, C1, C2 are two positive constants, w isthe inertia weight (constriction factor), X(t) is the position of the particle attime t, Xbest(t − 1) is the best-fitness position reached by the particle up totime t − 1 and Xgbest(t − 1) is the best-fitness point ever found by the wholeswarm. We also point out that, in our implementation, when a particle reachesa limit of the admissible range on one dimension, velocity on that dimension isset to zero.

In our experiments, we have considered a swarm size equal to 100 particles.This is particularly suitable given that we have to compare the results of standardPSO with the ones of various versions of distributed and multi-swarm PSO andwe want the total number of particles in each system to be the same (for instance,in the PSE algorithm described below, we will use 10 swarms of 10 particleseach). The other parameters we have used are: C1 = C2 = 2. The value of whas been progressively decremented during the execution of the algorithm fromthe initial value of 1 to the final value of 0.001. Maximum particle velocityequal to 0.5 for the cosff, the wtrap and the Rastriging test functions, while forthe CEC 2005 benchmark functions and the real-life applications we have used1/3 of its size. The motivation for this choice is that while the cosff and thewtrap functions are limited to the [0, 1] range and the Rastriging function to the[−5, 12,+5, 12] range, the CEC 2005 benchmarks generally work on much largerranges, as well as the real-life applications we have studied. The maximumnumber of fitness evaluations depends on the test function and on the usedparameters. For the cosff and the wtrap functions it varies between 105, 2× 105

and 3×105. For the CEC 2005 benchmarks, it is equal to 105 (as requested in 34)

for a dimension number equal to 10). For the Rastriging function and for thereal-life applications it varies between 2 × 105 and 3 × 105 (for much precision,


the reader can refer to the horizontal axis of the average best fitness plots inFigs. 2, 3, 4, 5 and 7). Section 4 introduces these test problems.

3.2 Algorithm 2: Particle Swarm Evolver (PSE)It can be imagined as a GA in which each individual in the population

is itself an evolving swarm. In synthesis, all the swarms in a population workindependently like the PSO algorithm described above for a certain numberof loops (say p which stands for period). After that all the swarms in the GApopulation have performed p iterations independently, we execute one generationof the GA, by selecting the most promising swarms for mating and by evolvingthem with crossover and mutation like in an usual GA. Each swarm has exactlythe same number of particles and this number stays constant during the wholeevolution. As fitness of a swarm we consider the fitness of its global best particle.The crossover between two parent swarms is just a random mixing of theirparticles, in such a way that each offspring swarm contains the same numberof particles as the parent swarms but some of them (at random) belong to oneparent and the rest to the other parent. In synthesis, suppose that the numberof particles per swarm is n, that the selected parent swarms are s1 and s2 andthat the two offspring swarms that we want to create are s3 and s4. We beginby labelling all particles in s1 and s2 as unused. To create s3 we repeat n timesthe following process: we select one parent swarm among s1 and s2 with uniformprobability (coin flip). From the chosen swarm, we take a random particle (withuniform distribution) among the particles in that swarm that are still labelledas unused, we insert it in s3 and we label it as used. Finally, when s3 containsn particles, we take all the still unused particles from s1 and s2 and we insertthem in s4. The mutation of a swarm is the replacement of a random particle inthat swarm (chosen with uniform probability) with a completely random particle(remark that this process do not alter the global best of the swarm unless thenew randomly created particle is the new global best).

The parameters we have used are: number of independent iterations ofeach swarm in the population before a GA generation: p = 10. Number of parti-cles in each swarm: 10. Number of swarms in the GA population: 10. Crossoverprobability between swarms: 0.95. Probability of mutation of a swarm: 0.01.Swarms have been selected using tournament selection with tournament sizeequal to 2. All the other parameters are like in the PSO algorithm described inthe first part of this section.

3.3 Algorithm 3: Repulsive PSE (RPSE)This algorithm works as PSE defined above, except for the fact that each

swarm in the GA population also has a repulsive component. In particular, eachparticle of each swarm is attracted by the global best of its own swarm and by thelocal best position and repulsed by the global best of all the other swarms in theGA population (this is true only in the case the global best of the other swarmis different from the global best of the swarm that particle belongs to). Therepulsive factor of each particle is modelled as follows: for each swarm different


from the current one, we have:

V(t) = VPSO(t) + C3 ∗ rand() ∗ f(Xforeign−gbest(t− 1),X(t− 1),Xgbest(t− 1)).

VPSO(t) is the velocity calculated with the canonical PSO formula (see equa-tion (1) at page 137), Xgbest(t−1) is the position of the global best of the currentswarm and Xforeign−gbest(t − 1) is the position of the global best of the otherconsidered swarm. The function f always ignores Xgbest(t− 1) and repulses theparticle by pushing it in the opposite direction of Xforeign−gbest(t − 1) exceptin case the repulsor is between the particle and the global best of the currentswarm. In this case, f accelerates the particle towards Xgbest(t − 1). More inparticular, function f works as described by the following pseudo-code:

if (X(t−1) < Xforeign−gbest(t−1)) and (Xforeign−gbest(t−1) < Xgbest(t−1))or (Xgbest(t− 1) < Xforeign−gbest(t− 1) < X(t− 1))

return −φ(X(t− 1),Xforeign−gbest(t− 1))else

return φ(X(t− 1),Xforeign−gbest(t− 1))

where:φ(X(t− 1),Xforeign−gbest(t− 1)) =

sig(dis) ∗ (1− |dis/(U − L)|)and where: dis = X(t−1)−Xforeign−gbest(t−1), sig indicates the sign functionand U and L are the upper and lower bound of the interval respectively.

Informally, φ is equal to the sign of the difference of its arguments mul-tiplied by 1 minus the normalized distance of its arguments. In other words,the repulsive factor increases with the proximity to the repulsor and we obtaina value that does not depend on the space dimension.

In our experiments, we have used C3 = 2/N , where N is the number ofrepulsors, i.e. the number of swarms minus 1. All the other parameters used areas in the PSE algorithm described above.

3.4 Algorithm 4: Multi-swarm PSO (MPSO)This version of multi-swarm PSO can be considered as an alternative to

the PSE algorithm, in the sense that also MPSO uses a set of swarms that runthe PSO algorithm independently for a number of iterations and then have aninteraction. The interaction, this time, consists in exchanging some particles.In particular, as it is often the case in the island model of EAs,12) a pool of thek best particles in the sender swarm is sent to the receiver swarm. The newparticles replace the worst k ones in the receiver swarm.

In this work, the number of independent iterations of each swarm beforecommunication takes place has been set to 10 (as in PSE). The number k ofmigrating particles has been set to 1/5 of the number of particles in each swarm(all the swarms have exactly the same number of particles). The swarms com-municate using a ring topology (see for instance 12)). As for the PSE algorithm,


we have used 10 swarms of 10 particles each. All the other parameters are as inthe PSO algorithm described above.

3.5 Algorithm 5: Multi-swarm Repulsive PSO (MRPSO)This algorithm works as MPSO defined above, except for the fact that

the particles in the swarms with an even index in the ring topology (i.e. onlya half of the swarms) also have a repulsive component. Each particle of thoseswarms is attracted by the global best of its own swarm and by the local bestposition reached so far and it is repulsed by the global best of the swarm fromwhich it receives individuals at migration time (given that this swarm will bein an odd position in the ring topology, it will not be repulsive). In this way,the particles that migrate in the even swarms should be as different as possibleto the particles already contained in those swarms (given that they have beenrepulsed by the global best of the sender swarm). This should help maintaininga high degree of diversity in the whole system. The repulsive component ofeach particle is exactly the same as for the RPSE algorithm described above,except for the fact that this time we have used C3 = 0.5 because we have onlyone repulsor for each particle. All the other parameters are the same as for theMPSO algorithm described above.

§4 Test Functions

4.1 Cosff FunctionsThe first set of test functions we propose in this work is defined as:

cosff (x) = (n∑

i=1

fi(xi,Mi))/n

(a) (b) (c) (d)

Fig. 1 Two dimensional graphical representations of two cosff (plots (a)and (b)) and two wtrap (plots (c) and (d)) functions. Plot (a)shows the cosff function with K = 10, M1 = M2 = 0.3. Plot (b)shows the cosff function with K = 20, M1 = M2 = 0.3. Plot (c)shows the wtrap function with B = 0.3 and R = 0.75 and plot (d)shows the wtrap function with B = 0.7 and R = 0.25. See thetext for an explanation of the K, M1 and M2 parameters of thecosff functions and an explanation of the B and R parameters ofthe wtrap functions.


where n is the number of dimensions of the problem, x = (x1, x2, ..., xn) is apoint in an n-dimensional space and for all i = 1, 2, ..., n given two floating pointnumbers x and M :

fi(x,M) ={

cos(K ∗ (x−M)) ∗ (1.0− (M − x)), if x ≤ Mcos(K ∗ (x−M)) ∗ (1.0− (x−M))), otherwise

and where (M1,M2, ..., Mn) are the coordinates of the known maximum valueof the function and K is a constant that modifies the ruggedness of the fitnesslandscape (the higher K the most complex the fitness landscape).

The two dimensional graphical representations of function cosff with K =10 and K = 20 are reported in Fig. 1(a) and (b) respectively. In both plots,we have used (M1,M2) = (0.3, 0.3) as the coordinates of the global maximum.From those plots, it is clear that increasing the value of K we are able to increasethe ruggedness of the fitness landscape and thus its complexity.

4.2 WTrap FunctionsThe second set of test functions we use in this work is called W-trap

functions (see for instance 36) for a preliminary and slightly different definitionof these functions). It is defined as follows:

wtrap(x) = (n∑

i=1

g(xi))/n

where n is the number of dimensions and, given a floating point number x:

g(x) =

R1 ∗ (B1 − x)/B1, if x ≤ B1

R2 ∗ (x−B1)/(B2 −B1), if B1 < x ≤ B2

R2 ∗ (B3 − x)/(B3 −B2), if B2 < x ≤ B3

R3 ∗ (x−B3)/(1−B3) otherwise

and where B1 < B2 < B3 and B1, B3 are the coordinates of the two minimain the search space while the global maximum has coordinates in B2

∗1. R1 isthe fitness of the first local maximum placed in the origin (all coordinates equalto 0.0), R2 is the fitness of the global maximum and R3 is the fitness of thesecond local maximum, that has all its coordinates equal to 1.0.

In this paper, for simplicity, we wanted to modify the functions difficultyby changing the values of only two parameters, instead of the 6 typical parame-ters of wtrap functions. For this reason, given two parameters B and R, we haveused: B1 = 0.4 − B/3, B2 = 0.4, B3 = 0.4 + B ∗ 2/3, R1 = R, R2 = 1.0 andR3 = R and we have obtained different test problems by modifying B and R.

The two dimensional graphical representations of functions wtrap withB = 0.3 and R = 0.75 and with B = 0.7 and R = 0.25 are reported in Fig. 1(c)and (d) respectively. We can see that changing the values of B and R, we can∗1 I.e. the first minimum has coordinates (B1, B1, ..., B1), the global maximum has coordi-

nates (B2, B2, ..., B2) and the second minimum has coordinates (B3, B3, ..., B3). In otherwords, the first minimum has all its n coordinates equal to B1, the global maximum hasall its n coordinates equal to B2 and the second minimum has all its n coordinates equalto B3.


modify the relative importance of the basins of attraction of global and localmaxima, thus tuning the difficulty of the problem.

4.3 Rastrigin FunctionsThe Rastrigin functions are a well known and widely used set of test

functions for floating point parameters optimization, given its high multimodal-ity, the regular distribution of the minima and the possibility of changing theruggedness of the induced fitness landscape by modifying one single real-valuedparameter. These functions are defined as follows:

Rastrigin(x) = n ·A +n∑

i=1

(x2i −A · cos(2πxi))

where for all i = 1, 2, ..., n, xi ∈ [−5.12, 5.12], n is the dimension of the functionand A is the parameter that determines the steepness of the local optima and thusthe complexity of the induced fitness landscape. For a deeper introduction tothese functions, the reader is referred to 44) and for their graphical representationsfor various different values of the A parameter to: http://www.cs.rtu.lv/dssg/en/staff/rastrigin/rastr-function.html.

4.4 CEC 2005 Benchmark SuiteIn 34), 25 benchmark functions have been presented for the CEC 2005

numerical optimization competition. Such benchmark suite contains 5 unimodaland 20 multimodal functions, further divided into basic, expanded and hybridcomposition functions. Twenty-two of these functions are non-separable, twoare completely separable, and one is separable near the global optimum. Thisbenchmark suite has been accepted as a more or less agreed-upon standard fortesting real-parameter optimization algorithms, even though the fact that it iscomposed only by either very easy or very hard functions has been pointed outin 38). This benchmark suite contains the following functions:

• Unimodal Functions: F1: Shifted Sphere Function; F2: Shifted Schwefel’sProblem; F3: Shifted Rotated High Conditioned Elliptic Function; F4:Shifted Schwefel’s Problem with Noise in Fitness; F5: Schwefel’s Problemwith Global Optimum on Bounds.

• Multimodal Functions:– Basic Functions: F6: Shifted Rosenbrock’s Function; F7: Shifted

Rotated Griewank’s Function without Bounds; F8: Shifted RotatedAckley’s Function with Global Optimum on Bounds; F9: ShiftedRastrigin’s Function; F10: Shifted Rotated Rastrigin’s Function;F11: Shifted Rotated Weierstrass Function; F12: Schwefel’s Prob-lem.

– Expanded Functions: F13: Expanded Extended Griewank’s plusRosenbrock’s Function (F8F2); F14: Shifted Rotated Expanded Scaf-fer’s F6.

– Hybrid Composition Functions: F15: Hybrid Composition Function;


F16: Rotated Hybrid Composition Function; F17: Rotated HybridComposition Function with Noise in Fitness; F18: Rotated HybridComposition Function; F19: Rotated Hybrid Composition Functionwith a Narrow Basin for the Global Optimum; F20: Rotated HybridComposition Function with the Global Optimum on the Bounds;F21: Rotated Hybrid Composition Function; F22: Rotated HybridComposition Function with High Condition Number Matrix; F23:Non-Continuous Rotated Hybrid Composition Function; F24: Ro-tated Hybrid Composition Function; F25: Rotated Hybrid Compo-sition Function without Bounds.

The codes in Matlab, C and Java for all these functions can be found at: http://www.ntu.edu.sg/home/EPNSugan/. The mathematical formulas and propertiesof these functions are discussed in detail in 34).

4.5 Real-life Applications in Drug DiscoveryWe also consider a set of real-life applications characterized by a large

dimensionality of the feature space. Four of them consist in predicting thevalue of as many important pharmacokinetic parameters and the fifth consistsin predicting the response of a set of cancer patients to the treatment of theFludarabine drug. These problems are briefly discussed in the continuation ofthis section. The interested reader is referred to the contributions quoted belowfor a more detailed introduction.

Prediction of Pharmacokinetic Parameters These problems consist inpredicting the value of four pharmacokinetic parameters of a set of candidatedrug compounds on the basis of their molecular structure. The first pharma-cokinetic parameter we consider is human oral bioavailability (indicated with%F from now on), the second one is median oral lethal dose (indicated withLD50 from now on), also informally called toxicity, the third one is plasma pro-tein building levels (indicated with %PPB from now on) and the fourth one iscalled Docking Energy (indicated with DOCK from now on). %F is the pa-rameter that measures the percentage of the initial orally submitted drug dosethat effectively reaches the systemic blood circulation after the passage fromthe liver. LD50 refers to the amount of compound required to kill 50% of thetest organisms (cavies). %PPB corresponds to the percentage of the drug initialdose that reaches blood circulation and binds the proteins of plasma. DOCKquantifies the amount of target-drug chemical interaction, i.e. the energy thatbinds the molecules of the candidate drug to the ones of the target tissue. For amore detailed discussion of these four pharmacokinetic parameters, the reader isreferred to 2,3). The datasets we have used are the same as in 2) and 3): the %F(LD50, %PPB and DOCK respectively) dataset consists in a matrix composedby 260 (234, 234 and 150 respectively) rows (instances) and 242 (627, 627 and268 respectively) columns (features). Each row is a vector of molecular descrip-tor values identifying a drug; each column represents a molecular descriptor,


except the last one, that contains the known target values of %F (LD50, %PPBand DOCK respectively). These datasets can be downloaded from:http://personal.disco.unimib.it/Vanneschi/bioavailability.txt,

http://personal.disco.unimib.it/Vanneschi/toxicity.txt,

http://personal.disco.unimib.it/Vanneschi/ppb.txt, andhttp://personal.disco.unimib.it/Vanneschi/dock.txt.

For all these datasets training and test sets have been obtained by randomsplitting: at each different PSO run, 70% of the molecules have been randomlyselected with uniform probability and inserted into the training set, while the re-maining 30% formed the test set. These problems were solved by PSO by meansof a linear regression, were the variables in a PSO candidate solutions representthe coefficients of the linear interpolating polynomial. We have imposed thesecoefficients to take values in the range [−10, 10]. As fitness, we have used theroot mean squared error (RMSE) between outputs and targets. For more detailson the solving process for the bioavailability dataset via a linear regression bymeans of PSO, the reader is referred to 8).

Prediction of Response to Fludarabine Treatment This problem con-sists in predicting anticancer therapeutic response on the basis of the geneticsignature of the patients. To build the data, we have used the NCI-60 microar-ray dataset,26,29,30) looking for a functional relationship between gene expressionsand responses to the Fludarabine oncology drug. Fludarabine (indicated by FLUfrom now on) is a drug for the treatment of chronic lymphocytic leukemia. Thedataset we have used can be represented by a matrix with 60 lines (instances)and 1376 columns (features). Each line represents a gene expression. Eachcolumn represents the expression level of one particular gene, except the lastone that contains the known value of the therapeutic response to the chosendrug (Fludarabine). Thus, as for the previous problems, the last column of thematrix contains the known values of the parameter to estimate. The reader isreferred to 26) for a more detailed documentation of all drugs reported above.The datasets used in our experiments can be downloaded from the web page:http://personal.disco.unimib.it/Vanneschi/gp_nci_datasets.htm.

§5 Experimental ResultsIn this section, we compare the results obtained by the five PSO methods

presented in Section 3 on the test problems presented in Section 4. For doing thiscomparison, we have performed 200 independent runs of each considered PSOmethod for the cosff, wtrap and Rastriging functions and 25 independent runsfor the CEC 2005 test functions (both because the fitness calculation for thosefunctions is more time consuming than for the other ones, and because 25 is thenumber of runs requested in 34)), and the five real-life problems. Furthermore, wehave studied three different performance measures. These measures are threeof the evaluation criteria requested in 34): the number of successful runs, thesuccess performance and the average best fitness. They are described below:

• The number of successful runs is defined as the number of runs in which


an individual has been found that approximates the global optimum withan error smaller than a given threshold. The thresholds used for theCEC 2005 benchmarks were the same as the ones proposed in the tableat pages 40 and 41 of 34); we also used higher thresholds, as reportedbelow. For the cosff, wtrap and Rastrigin test functions the threshold wehave used is equal to 10−8. For the real-life applications, it is equal to10−5. We are aware that these thresholds are arbitrary and the resultsmay qualitatively change if these thresholds are modified. Nevertheless,we accept this arbitrariness because the number of successful runs is notthe only criterium that we use to evaluate the algorithms; in other words,we believe that the results deriving from the number of successful runshas to be interpreted as an indication of the optimization ability of thealgorithms, but only taking into account also the other criteria we canhave a rather complete picture. Furthermore, we point out that also thethresholds defined in 34) are arbitrary, but they are nowadays accepted,and often used, by the community of researchers.

• The success performance is defined as the mean of the fitness evaluationsrequested for successful runs, multiplied by the total number of runs,divided by the number of successful runs (this measure is introducedin 34), at page 41). To calculate it, for each successful run (i.e. for eachrun in which a solution with fitness smaller than the prefixed thresholdhas been found) we calculate how many PSO iterations are necessary tofind an optimal solution; then we count how many fitness evaluationswe have performed until that iteration; we multiply that number by thetotal number of runs that we have performed and finally we divide theresult by the number of successful runs. By its definition, small valuesof the success performance are better than large ones, but this has anexception given by the fact that we have forced the value of the successperformance to be equal to zero when no run has been successful (thus, asuccess performance equal to zero is the worst possible value and not thebest possible one).

• Finally, the average best fitness reports the average of the best fitness inthe whole PSO system at each iteration.

For the number of successful runs, we have calculated standard deviations (forverifying the statistical significance of the presented results) following 12), whereexperimental runs are considered as a series of independent Bernouilli trials hav-ing only two possible outcomes: success or failure. In this case, the number ofsuccesses (or of failures) is binomially distributed.28) The maximum likelihoodestimator p for the mean of a series of Bernouilli trials, and hence for the prob-ability of success, is simply the number of successes divided by the sample size(the number of runs n). With this information at hand, one can calculate thesample standard deviation σ =

√n.p(1− p). The experimental results that we

have obtained for the different studied test problems are discussed below.Figure 2 reports the results obtained by each studied PSO method for


Number of successful runs:

(a) (b) (c) (d)

Success Performance:

(e) (f) (g) (h)

Average Best Fitness:

(i) (l) (m) (n)

Fig. 2 Results obtained by the five studied PSO variants on the cosfffunctions with 20 dimensions. The value of the K parameter isequal to 10. In table (a) (respectively table (b), (c), (d)), we reportthe number of successful runs with their standard deviations forM1 = M2 = ... = M20 = 0.1. (respectively M1 = M2 = ... =M20 = 0.2, M1 = M2 = ... = M20 = 0.3, M1 = M2 = ... = M20 =0.4). In plot (e) (respectively plot (f), (g), (h)), we report successperformance for M1 = M2 = ... = M20 = 0.1. (respectivelyM1 = M2 = ... = M20 = 0.2, M1 = M2 = ... = M20 = 0.3, M1 =M2 = ... = M20 = 0.4). In plot (i) (respectively plot (l), (m), (n)),we report average best fitness against fitness evaluations for M1 =M2 = ... = M20 = 0.1 (respectively M1 = M2 = ... = M20 = 0.2,M1 = M2 = ... = M20 = 0.3, M1 = M2 = ... = M20 = 0.4).In plots (e) to (h), we identify PSO with 1, PSE with 2, RPSEwith 3, MPSO with 4 and MRPSO with 5.


the cosff test function. The dimensionality of the problem (number of elementsof the vector coding a particle) is equal to 20. The K constant of the cosfffunction is equal to 10. The number of successful runs, with their standarddeviations are reported in tabular form in Fig. 2, tables (a), (b), (c) and (d).The success performance is reported as histograms in plots (e), (f), (g) and(h). The average best fitness curves against fitness evaluations are reported inplots (i), (l), (m) and (n). As explained in the figure’s caption, the differencebetween Fig. 2(a), (e) and (i), Fig. 2(b), (f) and (l), Fig. 2(c), (g) and (m) andFig. 2(d), (h) and (n) stands in the fact that the M parameter (that represents thecoordinates of the optimal solution) is modified. Even though the ruggednessof the fitness landscape is tuned by parameter K, and not by parameter M,changing parameter M is also important because in this way, we are able tosee how the performances of the different algorithms change as the maximumis moved away from the borders of the admissible range. Besides ruggedness,also this characteristic influences the difficulty of the problem for PSO, becauseshifting the position of the global optimum we are able to change the shape of thelandscape. The experimental results corroborate this intuition, since problemswith different values of M have different difficulties for all the studied PSOmethods. Anyway, we have also performed experiments in which we modify thevalue of the K parameter, and the results that we have obtained (not shownhere to save space) allow us to draw the same qualitative conclusions (about thecomparison of the different studied PSO methods) than the ones of Fig. 2. InFig. 2(e) to 2(h), the different studied algorithms are reported in abscissa, where:algorithm 1 is PSO, algorithm 2 is PSE, algorithm 3 is RPSE, algorithm 4 isMPSO and algorithm 5 is MRPSO.

Let us begin the analysis of the results by considering the number ofsuccessful runs (Fig. 2, tables (a), (b), (c) and (d)). These tables show thatMRPSO performs better than the other methods and the differences betweenthe results obtained by MRPSO and the other methods are always statisticallysignificant. If we consider the success performance plots (Fig. 2, plots (e), (f), (g),(h)), we can see that also in this case MRPSO outperforms the other methods,while the performances of MPSO are more or less comparable with the ones ofPSO and the performances of PSE and RPSE are always worse than the othermethods. Finally, let us consider the average best fitness plots (Fig. 2, plots (i),(l), (m), (n)). Here we can clearly see that in all cases RPSE performs worsethan the other methods. Standard deviations (not shown here) confirm thatthe difference between the results obtained by RPSE and the ones of the othermethods are statistically significant. Among the other methods, even thoughthe differences are small, MRPSO seems to have slightly better performancesfor all the considered test functions. Nevertheless, the differences between theresults obtained by MRPSO and the ones of the other methods (except RPSE)are not statistically significant, according to the standard deviation (not shownhere). Figure 3 reports exactly the same results as Fig. 2 for the cosff function,but for a dimensionality of the problem equal to 10, and it allows us to draw thesame qualitative conclusions.



(a) (b) (c) (d)


(e) (f) (g) (h)


(i) (l) (m) (n)

Fig. 3 Results obtained by the five studied PSO variants on the cosfffunctions with 10 dimensions. All the rest is as in Fig. 2.

Figure 4 reports the results obtained by the different PSO methods on aset of wtrap functions for a dimensionality of the problem equal to 10. Also in thiscase, it is possible to see that MRPSO performs a larger number of successfulruns compared to all the other studied methods, and the differences betweenthe results obtained by MRPSO and the ones obtained by the other methodsare statistically significant, as indicated by the standard deviations. About thesuccess performance, we can see that PSE and RPSE are the methods that havereturned the worst results, while MRPSO has returned slightly better resultscompared to PSO and MPSO. Consistently with the previous results, also theaverage best fitness plots show that PSE has obtained the worst results amongthe studied methods and that MRPSO slightly outperforms all the others.

Figure 5 reports the results obtained by the different PSO methods on



(a) (b) (c) (d)


(e) (f) (g) (h)


(i) (l) (m) (n)

Fig. 4 Results obtained by the five studied PSO variants on the wtrapfunctions with 10 dimensions. In table (a) (respectively table (b),(c), (d)), we report the number of successful runs for B = 0.3and R = 0.5 (respectively B = 0.5 and R = 0.5, B = 0.7 andR = 0.75, B = 0.9 and R = 0.75). In plot (e) (respectivelyplot (f), (g), (h)), we report success performance for B = 0.3 andR = 0.5 (respectively B = 0.5 and R = 0.5, B = 0.7 and R = 0.75,B = 0.9 and R = 0.75). In plot (i) (respectively plot (l), (m),(n)), we report average best fitness against fitness evaluations forB = 0.3 and R = 0.5 (respectively B = 0.5 and R = 0.5, B = 0.7and R = 0.75, B = 0.9 and R = 0.75). In plots (e) to (h), weidentify PSO with 1, PSE with 2, RPSE with 3, MPSO with 4and MRPSO with 5.

a set of Rastrigin functions for a dimensionality of the problem equal to 10.It is important to remark that in the case of the Rastrigin functions, in theplots 5(i), (l), (m), (n) (concerning the average best fitness results) the curvesare decreasing because the Rastrigin functions, differently from the cosff and



(a) (b) (c) (d)


(e) (f) (g) (h)


(i) (l) (m) (n)

Fig. 5 Results obtained by the five studied PSO variants on the Rastriginfunctions with 10 dimensions. In table (a) (respectively table (b),(c), (d)), we report the number of successful runs for A = 4.0 (re-spectively A = 6.0, A = 8.0, A = 10.0). In plot (e) (respectivelyplot (f), (g), (h)), we report success performance for A = 4.0(respectively A = 6.0, A = 8.0 and A = 10.0). In plot (i) (respec-tively plot (l), (m), (n)), we report average best fitness againstfitness evaluations for A = 4.0 (respectively A = 6.0, A = 8.0and A = 10.0). In plots (e) to (h), we identify PSO with 1, PSEwith 2, RPSE with 3, MPSO with 4 and MRPSO with 5.

wtrap ones, are minimization problems (i.e. low fitness values are better thanhigh ones). Once again, the statistic that makes the difference between MRPSOand the other methods more visible is the number of successful runs: MRPSOhas consistently performed a larger number of successful runs for all the studiedinstances, and the differences with the results found by the other methods arestatistically significant, as indicated by the standard deviations. Also the other


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (l) (m) (n)

(o) (p) (q) (r)

Fig. 6 Tables from (a) to (h) refer to the threshold proposed at pages 40

and 41 of 34) for successful runs and the test functions consideredare 1, 2, 4, 5, 6, 9, 12 and 15 of 34). For each one of these functions

the threshold can be expressed in the form 10k. Tables from (i) to(r) report the results when a threshold with an exponent divided

by two is used to identify successful runs compared to 34) (i.e. the

threshold was 10k/2, as discussed in the text); in this case, the

test functions considered are 1, 2, 4, 5, 6, 7, 12 and 15 of 34).

measures confirm the suitability of MRPSO, compared to the other methods,even though the differences are less visible.

Figure 6 reports the number of successful runs tables for functions F1, F2,F4, F5, F6, F7, F9, F12 and F15 included in the CEC 2005 benchmark suite.34)

We have chosen to report the results on these functions because they are the onlyfunctions in which at least one method has scored a number of successful runs


Results on the training set:

(a) (b) (c)

Results on the test set:

(d) (e) (f)

Fig. 7 Average best fitness (RMSE between outputs and targets) againstgenerations for the prediction of bioavailability (%F) (plots (a)and (d)), median oral lethal dose (LD50) (plots (b) and (e))and plasma protein binding levels (%PPB) (plots (c) and (f)).Plots (a), (b) and (c) report results on the training set andplots (d), (e) and (f) report the average of the RMSE of the bestindividuals on the training set, calculated on the test set.

larger then zero. For tables from 6(a) to 6(h), we have used the same thresholdas at pages 40 and 41 of 34) to decide whether a run is successful or not. Foreach one of these functions, the threshold can be expressed in the form 10k. Fortables from 6(i) to 6(r), we have used a threshold equal to 10k/2 in order toincrement the number of successful runs for all the methods. In all these figures(and also in the case of wtrap and Rastrigin functions with dimensionality equalto 20, results not shown here to save space) results are qualitatively analogousto the ones in Fig. 2 and this allows us to conclude that MRPSO seems to havebetter performances than the other considered methods, at least for all the testfunctions considered so far.

Figure 7 reports the results for the %F, LD50 and %PPB problems. Forall the considered real-life problems, we report only the average best fitnessplots. On the other hand, contrarily to the theoretical hand-tailored test func-tions considered until now, for the real-life problems, we are also interested in


studying the generalization ability of the proposed models. For this reason,plots (a), (b) and (c) report the average of the best (i.e. minimum) RMSE be-tween outputs and targets (measure used as fitness) on the training set, whileplots (d), (e) and (f) report the average of the RMSE of the best individuals ontraining, evaluated on the test set (for %f, LD50 and %PPB, respectively). Forsimplicity, for the real-life problems, we only report results for PSO, MPSO andMRPSO (which are the methods that have returned the best results on theseproblems). Figure 7 shows that, also for the considered real-life applications,MRPSO outperforms the other models, both on the training and on the test set.Interestingly, while the differences between the fitness values found by the dif-ferent methods at termination is not statistically significant on the training set,the difference between the results returned by MRPSO and the other methodsis statistically significant on the test set for LD50 and %PPB (Fig. 7, plots (e)

Results on the training set:

(a) (b)

Results on the test set:

(c) (d)

Fig. 8 Average best fitness (RMSE between outputs and targets) againstgenerations for the prediction of the response to Fludarabine(FLU) (plots (a) and (c)), and the prediction of docking energy(DOCK) (plots (b) and (d)). Plots (a) and (b) report results onthe training set and plots (c) and (d) report the average of theRMSE of the best individuals on the training set, calculated onthe test set.


and (f)). Once again, MRPSO seems the best among the studied methods.In Fig. 8 we report the average best fitness results obtained for the FLU

and DOCK datasets. As for the three previously considered applications, alsoin these cases, we see that PSO, MPSO and MRPSO obtain very similar fitnessvalues on the training set, but once again MRPSO is the method that generalizebetter, given that it obtains the best results on the test set and the differencesbetween the results returned by MRPSO and the ones of the other methodsare statistically significant. Even more interestingly, for the FLU dataset, weobserve that the fitness on the training set steadily improves for all the threereported methods, included PSO; but for PSO this improvement on the trainingset corresponds to a visible deterioration of the results on the test set. Onthe other hand, both MPSO and MRPSO never worsen the RMSE on the testset during the whole evolution. This is a clear indication of the fact that PSOoverfits training data for the FLU problem, while MPSO and MRPSO counteractoverfitting and have a better generalization ability.

Among all the studied problems, the only case in which the advantage ofusing MRPSO is not clear is represented by the CEC 2005 test functions, whenthe threshold requested in 34) has been used to decide if a run is successful or not.In those cases, however, it is possible to remark that MRPSO gets “close” toa satisfactory solution much faster than PSO, but then it actually gets to thatsolution slower than PSO. Our interpretation is that MRPSO is able to findthe “pick” of the fitness landscape containing good solutions faster than PSO,but then it is slower than PSO in climbing that pick, because of the repulsivecomponent and because of the relatively small number of particles in each swarm.For this reason, we have also inserted in our simulations the tests with a lowerthreshold, thus incrementing the number of successful runs for each method. Tomake MRPSO faster when it has found the right pick of the fitness landscape,we could for instance include in the algorithm a local optimization phase (forinstance by means of simple “hill-climbers”), as suggested in 21). This is one ofthe activities of our current research.

Comparison with Other Machine Learning Methods The focus of thispaper is on the presentation of new ways of parallelizing PSO and on the ex-perimental comparison between these methods and standard PSO, and not onsolving the proposed applications in the best possible way. Nevertheless, we findit interesting to have an idea of the difference in optimization ability betweenthe proposed methods and more standard Machine Learning approaches. Forthis reason, we compare the average of the best RMSE value obtained by ourmethods, with the analogous value (obtained over the same number of runs) bySupport Vector Machines (SVMs). The Smola and Scholkopf sequential minimaloptimization algorithm32) was adopted for training a Support Vector regressionusing polynomial kernels. The Weka implementation40) was adopted. The re-sults we have obtained can be summarized by saying that SVMs find solutionswith smaller RMSE than all the proposed methods, and that the differencesbetween the results returned by SVMs and the ones returned by the proposed


methods are statistically significant. For instance, for the DOCK dataset theaverage of the best error found by MRPSO (respectively PSO and MPSO) isequal to 2.499 (respectively 3.745 and 3.468), while the average of the best er-ror found by SVMs is equal to 0.108. Similar results have been obtained by usalso on the other datasets and lead us to the same qualitative conclusions. Ourmotivations for the fact that the proposed methods have a weaker optimizationability compared to SVMs are at least:

• The solutions returned by the proposed methods are, by construction,linear regressions of the studied datasets, while SVMs are able to returnsolutions that are not linear.

• The ranges that we have used for the coefficients of the linear regres-sion (i.e. the acceptable ranges of the different genes forming a genomethat represents a solution of the PSO methods) have been chosen in acompletely arbitrary way, and they deserve to be finely tuned by furtherexperiments in the future.

• No parameter tuning was done for the basic PSO algorithm. This issuealso deserves further investigation in the future.

Computational Cost of the Proposed Methods Even though in Sec-tion 1.1, we have stated that in this paper, we focus on the validation of theproposed methods in terms of optimization ability, and that a study of theircomputational speed and scalability is the subject of our future work, it makessense to report some results on the CPU times that were spent to obtain theresults discussed so far. Those results have been obtained executing the meth-ods on a single-CPU computer and thus, we do not expect the proposed parallelmethods to necessarily be faster than standard PSO (given that they integratecomputations for synchronizations and communications between the swarms).Only in the near future, when we will be able to execute our methods on par-allel architectures, we will be able to draw conclusions on the real gain theyoffer in terms of speed and scalability. Nevertheless, we believe that reportingthese results is useful for at least two reasons: first of all, it is useful to quantifythe amount of computational resources required by our methods to handle thecommunications between the swarms (and a clear indication of this can be givenby the difference in speed between them and standard PSO); secondly, this cangive an idea of the total computational effort that we have spent in this work.

In Fig. 9, we report the CPU times (expressed in milliseconds) that havebeen spent by the different methods on the cosff, trap, %F, LD50, %PPB, FLUand DOCK problems. It is worth recalling that PSO is a stochastic heuristic,and thus the CPU times required for different runs of the same method onthe same problem are not necessarily identical. For this reason, we report theaverage CPU times over the performed runs (at the beginning of this section, wedescribe how many runs have been executed for the different problems) alongwith their standard deviations. All the parameters that we have used are thesame as in the experiments discussed so far, with the only exception that themethods have been executed in any case until a prefixed maximum number of


(Cosff) (WTrap)

(%F) (LD50) (%PPB)

(FLU) (DOCK)

Fig. 9 CPU times (expressed in milliseconds) required by the various

studied PSO models for executing 105 iterations for the cosff andwtrap functions and 3×105 iterations for the real life applications.In these experiments, the algorithms have not been stopped whenan optimal solution was found, but they have been executed inany case until the prefixed maximum number of iterations. Allother parameter settings are exactly the same as in the previousexperiments. Results reported are averages over the performednumber of independent runs (the text at the beginning of Section 5reports the number of runs performed for each problem), alongwith their standard deviations. All reported results have beenapproximated to the closest integer number.

iterations (in other words, the algorithms did not terminate in case they foundan optimal solution). That prefixed value was set to 105 for the cosff and wtrapfunctions and to 3× 105 for the real-life applications.

Let us begin by analyzing the CPU times obtained on the cosff and wtrapfunctions. The most visible result is that RPSE is much slower than the othermethods. This is probably due to the fact that the implementation of a GAof swarms, together with the computational overhead for computing repulsions,implies a high number of interactions between the swarms (for instance, for im-plementing the selection and crossover operators). Considering that PSE andRPSE were the methods that returned the worst results in terms of optimizationability, we believe that it is worth not considering them anymore in the future(unless we find a way to improve them). For this reason, PSE and RPSE werenot tested on the real-life problems. Another interesting result is that MPSO isslightly faster than PSO (even though the differences between these two methods


are not statistically significant, as indicated by the standard deviations). Thisindicates the fact that the computational time spent for the communicationsbetween the swarms in MPSO is not so high as one might expect. MRPSOis slightly slower than PSO. Given that MRPSO uses exactly the same mech-anisms used by MPSO for the interaction between swarms, we conclude thatthe time spent for calculating repulsions does not have to be underestimated.Nevertheless, differences between MRPSO and PSO can be considered marginal,if we consider that MRPSO can naturally be executed on a parallel architecture.More or less, the same considerations also hold for the real-life problems: MPSOseems to always be slightly faster than PSO and MRPSO seems to always beslightly slower than PSO, but the differences between MRPSO and PSO arestatistically relevant only for the %F and the LD50 datasets, while they arenot statistically significant for FLU and DOCK. We finally point out that theconclusions drawn so far have a noteworthy exception: for the %PPB datasetMRPSO (besides MPSO) seems faster than PSO, even though the differencesbetween the two are not statistically significant.

§6 Conclusions and Future WorkFour parallel and distributed particle swarm optimization (PSO) methods

have been defined in this paper and their performances have been compared ontwo new sets of test functions, on the well-known Rastrigin set of functions,on some of the test functions contained in the benchmark suite presented inand on five complex real-life applications. The presented PSO methods arevariants of multi-swarm and attractive/repulsive PSO. They include a version inwhich swarms are interpreted as individuals of a genetic algorithm (GA), calledparticle swarm evolver (PSE); a variant in which a repulsive factor is added tothe particles, called repulsive PSE (RPSE); a version in which the PSO systemis parallelized at the swarm level as in the multi-island parallel and distributedmodel of evolutionary algorithms (EAs), called multi-swarm PSO (MPSO) anda variant of MPSO in which particles also contain a repulsive component, calledmulti-swarm repulsive PSO (MRPSO).

The presented experimental results show that MRPSO outperforms theother considered PSO methods on all the studied problems. This is probablydue to the fact that MRPSO is able to maintain a higher diversity degree in thewhole system. In fact, in MRPSO, particles from a given swarm are attracted bythe best particles of that swarm and repulsed by the best particles of the previousswarm in the ring topology. Thus, at migration time, given that the particlesthat migrate are the best ones in the donor swarms, the particles that enter theacceptor swarms are very different from the ones that are already present in theacceptor swarms, given that they have been repulsed by those particles until theprevious generation.

Interestingly, MRPSO has also obtained better results than the othermethods on out-of-sample test data for all the considered real-life applications,showing a noteworthy generalization ability. MPSO has obtained better, or atleast comparable, results than the ones of standard PSO, while PSE and RPSE


are the methods that have obtained the worst results. The poor performancesobtained by PSE and RPSE are probably due to the fact that in the GA sys-tem individuals are complicated structures (swarms), and this forces us to userelatively few individuals (10), which limits the exploration ability of the GA.Furthermore, the choice of defining as the fitness of a swarm the fitness of thebest particle that belongs to it is questionable and variants deserve further in-vestigation.

Future work includes the study of other multi-swarm PSO methods onother multi-dimensional test functions including other real-life applications. Fur-thermore, we plan to make simulations on truly parallel hardware, like parallelmachines or clusters of PCs and to investigate the advantage of our parallel anddistributed methods in terms of speedup and scalability.

References1) Archetti, F., Giordani, I. and Vanneschi, L., “Genetic programming for anti-

cancer therapeutic response prediction using the NCI-60 dataset,” Computersand Operations Research, 37, 8, pp. 1395–1405, 2010. Impact Factor: 1.789.

2) Archetti, F., Giordani, I. and Vanneschi, L., “Genetic programming for QSARinvestigation of docking energy,” Applied Soft Computing, 10, 1, pp.170–182,2010.

3) Archetti, F., Messina, E., Lanzeni, S. and Vanneschi, L., “Genetic program-ming for computational pharmacokinetics in drug discovery and development,”Genetic Programming and Evolvable Machines, 8, 4, pp. 17–26, 2007.

4) Arumugam, M. S. and Rao, M., “On the improved performances of the particleswarm optimization algorithms with adaptive parameters, cross-over operatorsand root mean square (rms) variants for computing optimal control of a classof hybrid systems,” Journal of Applied Soft Computing, 8, pp. 324–336, 2008.

5) Blackwell, T. and Branke, J., “Multi-swarm optimization in dynamic environ-ments,” in EvoWorkshops (Raidl, G. R. et al. eds.), LNCS, Springer, pp. 489–500, 2004.

6) Blackwell, T. M., “Swarm music: improvised music with multi-swarms,” inProc. of the 2003 AISB Symp. on Artificial Intelligence and Creativity in Artsand Science, pp. 41–49, 2003.

7) Bonabeau, E., Dorigo, M. and Theraulaz, G., Swarm Intelligence: From Naturalto Artificial Systems (Santa Fe Institute Studies in the Sciences of Complexity),Oxford University Press, New York, NY, 1999.

8) Cagnoni, S., Vanneschi, L., Azzini, A. and Tettamanzi, A., “A critical assess-ment of some variants of particle swarm optimization,” in European Workshopon Bio-inspired algorithms for continuous parameter optimisation, EvoNUM’08,Springer Verlag, pp. 565–574, 2008.

9) Clerc, M. ed., Particle Swarm Optimization, ISTE, 2006.

10) Diosan, L. and Oltean, M., “Evolving the structure of the particle swarm opti-mization algorithms,” in EvoCOP’06, Springer Verlag, pp. 25–36, 2006.

11) Diosan, L. and Oltean, M., “What else is evolution of pso telling us?” Journalof Artificial Evolution and Applications, 1, 5, pp. 1–12, 2008.


12) Fernandez, F., Tomassini, M. and Vanneschi, L., “An empirical study of mul-tipopulation genetic programming,” Genetic Programming and Evolvable Ma-chines, 4, 1, pp. 21–52, 2003.

13) Jiang, Y., Huang, W. and Chen, L., “Applying multi-swarm accelerating parti-cle swarm optimization to dynamic continuous functions,” in 2009 Second In-ternational Workshop on Knowledge Discovery and Data Mining, pp. 710–713,2009.

14) Kameyama, K., “Particle swarm optimization - a survey,” IEICE Transactions,92-D, 7, pp. 1354–1361, 2009.

15) Kennedy, J. and Eberhart, R., “Particle swarm optimization,” in Proc. IEEEInt. conf. on Neural Networks, 4, IEEE Computer Society, pp. 1942–1948, 1995.

16) Kennedy, J. and Mendes, R., “Population structure and particle swarm per-formance,” in IEEE Congress on Evolutionary Computation, CEC’02, IEEEComputer Society, pp. 1671–1676, 2002.

17) Kennedy, J., Poli, R. and Blackwell, T., “Particle swarm optimization: anoverview,” Swarm Intelligence, 1, 1, pp. 33–57, 2007.

18) Kennedy, J. and Eberhart, R. C., Swarm Intelligence, Morgan Kaufmann Pub-lishers, 2001.

19) Kwong, H. and Jacob, C., “Evolutionary exploration of dynamic swarm behav-ior,” in IEEE Congress on Evolutionary Computation, CEC’03, IEEE Press,pp. 367–374, 2003.

20) Li, C. and Yang, S., “Fast multi-swarm optimization for dynamic optimizationproblems,” in ICNC ’08: Proc. of the 2008 Fourth International Conferenceon Natural Computation, Washington, DC, USA, IEEE Computer Society, pp.624–628, 2008.

21) Liang, J. J. and Suganthan, P. N., “Dynamic multi-swarm particle swarm opti-mizer with local search,” in 2005 IEEE Congress on Evolutionary Computation,CEC 2005, 1, pp. 522–528, 2005.

22) Lu, F.-Q., Huang, M., Ching, W.-K., Wang, X.-W. and Sun, X.-l., “Multi-swarm particle swarm optimization based risk management model for virtualenterprise,” in GEC ’09: Proc. of the first ACM/SIGEVO Summit on Geneticand Evolutionary Computation, New York, NY, USA, ACM, pp. 387–392, 2009.

23) Niu, B., Zhu, Y., He, X. and Wu, H., “MCPSO: A multi-swarm cooperativeparticle swarm optimizer,” Applied Mathematics and Computation, 2, 185, pp.1050–1062, 2007.

24) Poli, R., “Analysis of the publications on the applications of particle swarmoptimisation,” J. Artif. Evol. App., 2008, 1, pp. 1–10, January 2008.

25) Poli, R., “Analysis of the publications on the applications of particle swarmoptimization,” Journal of Artificial Evolution and Applications, 2009, (in press).

26) N. C. M. Project, National Cancer Institute, Bethesda MD, 2008. Seehttp://genome-www.stanford.edu/nci60/.

27) Riget, J. and Vesterstrm, J., “A diversity-guided particle swarm optimizer - thearpso,” Technical report, Dept. of Comput. Sci., Aarhus Univ., Denmark, 2002.

28) Ross, S. M., Introduction to Probability and Statistics for Engineers and scien-tists, Academic Press, New York, 2000.

29) Ross, D. T. et al., “Systematic variation in gene expression patterns in humancancer cell lines,” Nat Genet, 24, 3, pp. 227–235, 2000.


30) Sherf, U. et al., “A gene expression database for the molecular pharmacology ofcancer,” Nat Genet, 24, 3, pp. 236–244, 2000.

31) Shi, Y. H. and Eberhart, R., “A modified particle swarm optimizer,” in Proc.IEEE Int. Conference on Evolutionary Computation, IEEE Computer Society,pp. 69–73, 1998.

32) Smola, A. J. and Scholkopf, B., “A Tutorial on Support Vector Regression,”Technical Report Technical Report Series - NC2-TR-1998-030, NeuroCOLT2,1999.

33) Srinivasan, D. and Seow, T. H., “Particle swarm inspired evolutionary algo-rithm (ps-ea) for multi-objective optimization problem,” in IEEE Congress onEvolutionary Computation, CEC03, IEEE Press, pp. 2292–2297, 2003.

34) Suganthan, P., Hansen, N., Liang, J., Deb, K., Chen, Y., Auger, A. and Tiwari,S., “Problem definitions and evaluation criteria for the CEC 2005 special sessionon real-parameter optimization,” Technical Report Number 2005005, NanyangTechnological University, 2005.

35) Valle, Y. D., Venayagamoorthy, G., Mohagheghi, S., Hernandez, J. and Harley,R., “Particle swarm optimization: Basic concepts, variants and applications inpower systems,” IEEE Transactions on Evolutionary Computation, 12, 2, pp.171–195, 2008.

36) Vanneschi, L., “Theory and Practice for Efficient Genetic Programming,” Ph.D.thesis, Faculty of Sciences, University of Lausanne, Switzerland, 2004.

37) Vanneschi, L., Codecasa, D. and Mauri, G., “A study of parallel and distributedparticle swarm optimization methods,” in Proc. of the 2nd workshop on Bio-inspired algorithms for distributed systems, BADS’10, New York, NY, USA,ACM, pp. 9–16, 2010.

38) Vanneschi, L., Codecasa, D. and Mauri, G., “An empirical comparison of paralleland distributed particle swarm optimization methods,” in Proc. of the Geneticand Evolutionary Computation Conference, GECCO 2010 (Branke, J. et al.eds.), ACM Press, 2010. To appear.

39) Wang, Y. and Yang, Y., “An interactive multi-swarm pso for multiobjectiveoptimization problems,” Expert Systems with Applications, In press, 2008. On-line version available at http://www.sciencedirect.com.

40) Weka, A multi-task machine learning software developed by Waikato University,2006. See http://www.cs.waikato.ac.nz/ml/weka.

41) White, T. and Pagurek, B., “Towards multi-swarm problem solving in net-works,” in Proc. of Third International Conference on Multi-Agent Systems(ICMAS’98), IEEE Computer Society, pp. 333–340, 1998.

42) Wu, Z. and Zhou, J., “A self-adaptive particle swarm optimization algorithmwith individual coefficients adjustment,” in Proc. IEEE International Confer-ence on Computational Intelligence and Security, CIS’07, IEEE Computer So-ciety, pp. 133–136, 2007.

43) You, X., Liu, S. and Zheng, W., “Double-particle swarm optimization with in-duction enhanced evolutionary strategy to solve constrained optimization prob-lems,” in IEEE International Conference on Natural Computing, ICNC’07,IEEE Computer Society, pp. 527–531, 2007.

44) Zhigljavsky, A. and Zilinskas, A., “Stochastic Global Optimization,” SpringerOptimization and Its Applications, 9, Springer, 2008.


45) Zhiming, L., Cheng, W. and Jian, L., “Solving constrained optimization viaa modified genetic particle swarm optimization,” in Workshop on KnowledgeDiscovery and Data Mining, WKDD’08, IEEE Computer Society, pp. 217–220,2008.

Leonardo Vanneschi: He is an assistant professor of Computer

Science at the University of Milano-Bicocca. His research inter-

ests include Machine Learning and Complex Systems and in par-

ticular bio-inspired optimization methods such as Evolutionary

Computation and Swarm Intelligence, also using paradigms of

Parallel and Distributed Computing. He published around 100

scientific papers in international journals, contributed volumes

and conference proceedings.

Daniele Codecasa: He is a Ph.D. student in Computer Science at

the University of Milano-Bicocca. His research concerns Data and

Text Mining and Optimization. In the former area, he focuses his

studies on probabilistic models like BN, CTBN, CRF and decision

models; in the latter one, he studies bio-inspired methods like

evolutionary algorithms and particle swarm optimization.

Giancarlo Mauri: He is full professor of Computer Science at the

University of Milano-Bicocca. His research interests include: bio-

inspired computing models and their applications to learning and

optimization; bioinformatics; computational systems biology, in

particular stochastic modeling and simulation of biological sys-

tems and processes. On these subjects, he published more than

200 scientific papers in international journals, contributed vol-

umes and conference proceedings.

a comparative study of four parallel and distributed pso methods

Documents