the evolutionary origins of modularity · evolvability is that modularity arises to enable...

The evolutionary origins of modularityJeff Clune1∗, Jean-Baptiste Mouret2∗ and Hod Lipson1

1Cornell University, Ithaca, USA, 2Universite Pierre et Marie Curie-Paris 6, CNRS UMR 7222, France

∗ These authors contributed equally to this work.

Preprint – November 6, 2018

A central biological question is how natural organismsare so evolvable (capable of quickly adapting to new envi-ronments). A key driver of evolvability is the widespreadmodularity of biological networks–their organization as func-tional, sparsely connected subunits–but there is no consen-sus regarding why modularity itself evolved. While mosthypotheses assume indirect selection for evolvability, herewe demonstrate that the ubiquitous, direct selection pres-sure to reduce the cost of connections between networknodes causes the emergence of modular networks. Exper-iments with selection pressures to maximize network perfor-mance and minimize connection costs yield networks thatare significantly more modular and more evolvable than con-trol experiments that only select for performance. These re-sults will catalyze research in numerous disciplines, includ-ing neuroscience, genetics and harnessing evolution for en-gineering purposes.

ALong-standing open question in biology is how populationsare capable of rapidly adapting to novel environments, a

trait called evolvability 36. A major contributor to evolvability isthe fact that many biological entities are modular, especially themany biological processes and structures that can be modeled asnetworks, such as brains, metabolic pathways, gene regulationand protein interactions 3, 7, 19, 36, 42. Networks are modular if theycontain highly connected clusters of nodes that are sparsely con-nected to nodes in other clusters30, 39, 42. Despite its importanceand decades of research, there is no agreement on why modular-ity evolves 16, 41, 42. Intuitively, modular systems seem more adapt-able, a lesson well-known to human engineers40, because it is eas-ier to rewire a modular network with functional subunits than anentangled, monolithic network22, 23. However, because this evolv-ability only provides a selective advantage over the long-term,such selection is at best indirect and may not be strong enough toexplain the level of modularity in the natural world41, 42.

Modularity is likely caused by multiple forces acting to var-ious degrees in different contexts42, and a comprehensive un-derstanding of the evolutionary origins of modularity involvesidentifying those multiple forces and their relative contributions.The leading hypothesis is that modularity mainly emerges dueto rapidly changing environments that have common subprob-lems, but different overall problems22, 23. Computational sim-ulations demonstrate that in such environments (called modu-larly varying goals: MVG), networks evolve both modularityand evolvability22, 23. In contrast, evolution in unchanging en-vironments produces non-modular networks that are slower toadapt to new environments22, 23. Follow-up studies support themodularity-generating force of MVG in nature: the modular-ity of bacterial metabolic networks is correlated with the fre-quency with which their environments change35. It is unknownhow much natural modularity MVG can explain, however, be-cause it unclear how many biological environments change mod-ularly, and whether they change at a high enough frequency forthis force to play a significant role16. A related theory that alsoassumes a constantly changing environment and selection for

Fig. 1. Main hypothesis. Evolving networks with selection for perfor-mance alone produces non-modular networks that are slow to adapt tonew environments. Adding a selective pressure to minimize connectioncosts leads to the evolution of modular networks that quickly adapt to newenvironments.

evolvability is that modularity arises to enable modifying onesubcomponent without affecting others16. There are other plausi-ble hypotheses (reviewed in Wagner 2007), including that vari-ation mechanisms, such as gene duplication, create a bias to-wards the generation of modular structures42 and that modularityevolves due to selection to make phenotypes robust to environ-mental perturbations 41.

We investigate an alternate hypothesis that has been suggested,but heretofore untested, which is that modularity evolves not be-cause it conveys evolvability, but as a byproduct from selection toreduce connection costs in a network (Fig. 1)38, 39. Such costs in-clude manufacturing connections, maintaining them, the energyto transmit along them, and signal delays, all of which increase asa function of connection length and number1, 8, 9, 39. The conceptof connection costs is straightforward in networks with physi-cal connections (e.g. neural), but may also apply to other typesof networks like metabolic and genetic pathways, as illustratedby the following examples; adding links in a pathway may in-cur delays; adding binding sites may slow replication and phys-ically make regulation inefficient because polymerases recruitedto upstream promoters would have to traverse longer stretchesof sequence to reach protein-coding gene sequences; adding pro-tein and metabolic interactions may increase constraints. Thestrongest evidence that biological networks face direct selectionto minimize connection costs comes from the vascular system 24

and from nervous systems, including the brain, where multiplestudies suggest that the summed length of the wiring diagramhas been minimized, either by reducing long connections or byoptimizing the placement of neurons1, 8, 10, 25, 37, 39. Founding38 andmodern 39 neuroscientists have hypothesized that direct selectionto minimize connection costs may, as a side-effect, cause modu-larity. This hypothesis has never been tested in the context of evo-lutionary biology. The most related study was on non-evolving,simulated neural networks with a specific within-life learning al-gorithm that produced more modularity when minimizing con-nection length in addition to performance21, although the gen-erality of the result was questioned when it was not replicatedwith other learning algorithms6. Without during-life learning al-gorithms, carefully-constructed MVG environments, or mutationoperators strongly biased toward creating modules, attempts to

Clune*, Mouret* and Lipson arXiv | 1

arX

iv:1

207.

2743

v1 [

q-bi

o.PE

] 1

1 Ju

l 201

2

Fig. 2. The addition of connection costs leads to higher-performing, functionally modular networks. (A) Networks evolve to recognize patterns(objects) in an eight-pixel retina. The problem is modularly decomposable because whether an object exists on the left and right sides can beseparately determined before combining that information to answer whether objects exist on both sides (denoted by the AND logic function). (B)Networks from an example trial become more modular across evolutionary time (see SI for video) with a pressure to minimize connection costsin addition to performance (P&CC). (C) Median performance (± 95% bootstrapped confidence intervals) per generation of the highest-performingnetwork of each trial, which is perfect only when minimizing connection costs in addition to performance. (D) Network modularity, which is significantlyhigher in P&CC trials than when selecting for performance alone (PA). (E) The 12 highest-performing PA networks, each from a separate trial. (F) The12 highest-performing P&CC networks, which are functionally modular in that they have separate modules for the left and right subproblems. Nodesare colored according to membership in separate partitions when making the most modular split of the network (see text). The final networks of all 50trials are visualized in Fig. S2. (G,H) The cost and modularity of PA and P&CC populations across generations, pooled from all 50 trials. A connectioncost pushes populations out of high-cost, low-modularity regions towards low-cost, modular regions. Fig. 3 shows the fitness potential of each maparea.

evolve modularity in neural networks have failed12, 15, 41.

Given the impracticality of observing modularity evolve inbiological systems, we follow most research on the subject byconducting experiments in computational systems with evolu-tionary dynamics16, 22, 42. Specifically, we use a well-studied sys-tem from the MVG investigations12, 22, 23: evolving networks tosolve pattern-recognition tasks and Boolean logic tasks (Meth-

ods). These networks have inputs that sense the environmentand produce outputs (e.g. activating genes, muscle commands,etc.), which determine performance on environmental problems.We compare a treatment where the fitness of networks is basedon performance alone (PA) to one based on two objectives: maxi-mizing performance and minimizing connection costs (P&CC). Amulti-objective evolutionary algorithm is used 14 with one (PA)


0 50 100 150 200 250 300 350cost

0.1

0.2

0.3

0.4

0.5

0.6

0.7

mod

ular

ity (Q

)PAPA, perfect perf.P&CCP&CC, perfect perf.

0.750

0.775

0.800

0.825

0.850

0.875

0.900

0.925

0.950

0.975

1.000

Fig. 3. The highest-performing networks found for each combina-tion of modularity and cost for the retina problem. Colors indicate thehighest-performing network found at that point in the modularity vs. costspace, with yellow representing perfect performance. This map has beengenerated using the MOLE algorithm (Methods). The best-performingnetwork at the end of each of the 50 PA and P&CC runs are overlaid onthe map. Networks with perfect performance exist throughout the space,which helps explain why modularity does not evolve when there is selec-tion based on performance alone. Below a cost threshold of around 125there is an inverse correlation between cost and modularity for perfectlyperforming networks. The lowest cost networks–those with the shortestsummed lengths–that are high-performing are modular.

or two (P&CC) objectives: To reflect that selection is strongeron network performance than connection costs, the P&CC costobjective affects selection probabilistically only 25% of the time,although the results are robust to substantial changes to thisvalue (Methods). Two example connection cost functions are in-vestigated. The default one is the summed squared length ofall connections, assuming nodes are optimally located to mini-mize this sum (Methods), as has been found for animal nervoussystems8, 9, 11. A second measure of costs as solely the number ofconnections yields qualitatively similar results to the default costfunction, and may better represent biological networks withoutconnections of different lengths. More fit networks tend to havemore offspring (copies that are randomly mutated), and the cyclerepeats for a preset number of generations (Fig. 1, Methods). Suchcomputational evolving systems have substantially improvedour understanding of natural evolutionary dynamics16, 22, 23, 28, 29, 42.

The main experimental problem involves a network that re-ceives stimuli from eight inputs22. It can be thought of as an eight-pixel retina receiving visual stimuli, although other analogies arevalid (Methods), such as a genetic regulatory network exposedto environmental stimuli. Patterns shown on the retina’s left andright halves may each contain an ‘object’, meaning a pattern ofinterest (Fig. 2a). Networks evolve to answer whether an object ispresent on both the left and right sides of the retina (the L-AND-Renvironment) or whether an object is displayed on either side (theL-OR-R environment). Which patterns count as an object on theleft and right halves are slightly different (Fig. 2a).

Each network iteratively sees all possible 256 input patternsand answers true (≥ 0 ) or false (< 0). Its performance is thepercentage of correct answers, which depends on which neuronsare connected, how strongly, and whether those connections areinhibitory or excitatory (Methods). Networks are randomly gen-erated to start each experiment. Their connections stochasticallymutate during replication (Methods). Network modularity isevaluated with an efficient approximation27, 33 of the widely usedmodularity metric Q, which first optimally divides networks intomodules then measures the difference between the number ofedges within each module and the number expected for randomnetworks with the same number of edges27, 33.

A

B

Fig. 4. Evolving with connection costs produces networks that aremore evolvable. (A) P&CC networks adapt faster to new environmentsthan PA networks. Organisms were first evolved in one environment (e.g.L-AND-R) until they reached perfect performance and then transferredto a second environment (e.g. L-OR-R). Thick lines are medians, boxesextend from 25th to 75th data percentiles, thin lines mark 1.5× IQR (in-terquartile range), and plus signs represent outliers. Fig. S7 is a zoomed-out version showing all of the data. (B) P&CC networks in an unchangingenvironment (dotted green line) have similar levels of modularity to thehighest levels produced by MVG (solid blue line). Combining MVG withP&CC results in even higher modularity levels (solid green line), showingthat the forces combined are stronger than either alone.

Results

After 25000 generations in an unchanging environment (L-AND-R), treatments selected to maximize performance and minimizeconnection costs (P&CC) produce significantly more modularnetworks than treatments maximizing performance alone (PA)(Fig. 2d, Q = 0.42, 95% confidence interval [0.25, 0.45] vs. Q =0.18[0.16, 0.19], p = 8 × 10−09 using Matlab’s Mann-Whitney-Wilcoxon rank sum test, which is the default statistical test unlessotherwise specified). To test whether evolved networks exhibitfunctional modularity corresponding to the left-right decomposi-tion of the task we divide networks into two modules by selectingthe division that maximizes Q and color nodes in each partitiondifferently. Left-right decomposition is visually apparent in mostP&CC trials and absent in PA trials (Fig. 2e,f). Functional modu-larity can be quantified by identifying whether left and right in-puts are in different partitions, which occurrs in 56% of P&CCtrials and never with PA (Fisher’s exact test, p = 4× 10−11). Pairsof perfect sub-solution neurons–whose outputs perfectly answerthe left and right subproblems–occur in 39% of P&CC trials and0% of PA trials (Fisher’s exact test, p = 3× 10−6, Fig. S2).

Despite the additional constraint, P&CC networks aresignificantly higher-performing than PA networks (Fig. 2c).The median-performing P&CC network performs per-fectly (1.0[1.0, 1.0]), but the median PA network does not(0.98[0.97, 0.98] , p = 2 × 10−05). P&CC performance maybe higher because its networks have fewer nodes and con-nections (Fig. S9d,e), meaning fewer parameters to optimize.Modular structures are also easier to adapt since mutationaleffects are smaller, being confined to subcomponents30. Whileit is thought that optimal, non-modular solutions usually out-perform optimal, modular designs, such ‘modularity overhead’only exists when comparing optimal designs, and is not at oddswith the finding that adaptation can be faster and ultimately moresuccessful with a bias towards modular solutions30.

To better understand why the presence of a connection cost in-


creases performance and modularity, we searched for the highest-performing networks at all possible combinations of modularityand cost (Methods). For high-performing networks, there is aninverse correlation between cost and modularity, such that thelowest-cost networks are highly modular (Fig. 3). Many runs inthe P&CC treatment evolved networks in this region whereas thePA treatments never did. There are also many non-modular, high-cost networks that are high-performing, helping to explain whymodularity does not evolve due to performance alone (Fig. 3).Comparing PA vs. P&CC populations across generations revealsthat a connection cost pushes populations out of high-cost, low-modularity areas of the search space into low-cost, modular ar-eas (Figs. 2g,h). Without the pressure to leave high-cost, low-modularity regions, many PA networks remain in areas that ul-timately do not contain the highest-performing solutions (Fig. 3,dark green squares in the bottom right), further explaining whyP&CC treatments have higher performance.

P&CC networks are also more evolvable than PA networks.We ran additional trials until 50 P&CC and 50 PA trials eachhad a perfectly performing network (Methods) and transferredthese networks into the L-OR-R environment, which has the samesubproblems in a different combination (Fig. S7). The presence(P&CC) or absence (PA) of a connection cost remained after theenvironmental change. We also repeated the experiment, ex-cept first evolving in L-OR-R and then transferring to L-AND-R. In both experiments, P&CC networks exhibit greater evolv-ability than PA by requiring fewer generations to adapt to thenew environment (Fig. 4a, L-AND-R→L-OR-R: 3.0[2.0, 5.0] vs.65[62, 69], p = 3 × 10−78; L-OR-R→L-AND-R: 12.0[7.0, 21.0] vs.222.5[175.0, 290.0], p = 9 × 10−120). Modular networks thusevolve because their sparse connectivity has lower connectioncosts, but such modularity also aids performance and evolvabil-ity because the problem is modular.

Minimizing connection costs can work in conjunction withother forces to increase modularity. Modularity levels are higherwhen combining P&CC with MVG environments (Fig. 4b: solidvs. dotted green line, p = 3 × 10−5). Overall, P&CC (with orwithout MVG) yields similar levels of modularity as MVG at itsstrongest, and significantly more when rates of environmentalchange are too slow for the MVG effect to be strong (Fig. 4: greenlines vs. blue solid line).

P&CC modularity is also higher than PA even on problems thatare non-modular (Fig. 5a, p = 5.4 × 10−18). As to be expected,such modularity is lower than on modular problems (p = 0.0011,modular retina vs. non-modular retina). This non-modular prob-lem involves answering whether any four pixels were on (black),which is non-modular because it requires information from allretina inputs. As mentioned previously, performance and mod-ularity are also significantly higher with an alternate connectioncost function based on the number of connections (P&CC-NC) in-stead of the length of connections (Fig. 6). We also verified thatmodularity and performance are not higher simply because a sec-ond objective is used (Fig. 6). We further tested whether modu-larity arises even when the inputs for different modules are notgeometrically separated, which is relevant when cost is a func-tion of connection length. Even in experiments with randomizedinput coordinates (Methods), a connection cost significantly in-creased performance (1.0[0.98, 1.0], p = 0.0012) and modularity(Q = 0.35[0.34, 0.38], p = 1 × 10−9), and performance and mod-ularity scores were not significantly different than P&CC withoutrandomized coordinates (Fig. S8).

All the results presented so far are qualitatively similar in a dif-ferent model system: evolving networks to solve Boolean logictasks. We tested two fully separable problems: one with five “ex-clusive or” (XOR) logic modules (Fig. 5b), and another with hi-erarchically nested XOR problems (Fig. 5c). P&CC created sepa-rate modules for the decomposed problems in nearly every trial,whereas PA almost never did (Figs. S3, S4). P&CC Performancewas also significantly higher (Fig. 5b,c), and there was an an in-

PAP&

RO

P&CC-NC

P&CC0.9

1.0

****

PAP&

RO

P&CC-NC

P&CC0.0

0.20.40.60.8 ****

Performance Modularity (Q)

Fig. 6. Alternate Cost Functions Performance and modularity are signif-icantly higher (p < 0.0001) either with a cost function based on the length(P&CC) or number (P&CC-NC) of connections versus performance alone(PA) or performance and a random objective (P&RO). P&RO assigneda random number to each organism instead of a connection cost scoreand maximized that random number. Fig. S8 contains visualizations of allP&CC-NC networks.

verse correlation between cost and modularity (Figs. S10). Con-firming the generality of the finding that connection costs im-prove adaptation rates and that high-performing, low-cost net-works are modular is an interesting area for future research.

Discussion and Conclusion

Overall, this paper supports the hypothesis that selection to re-duce connection costs causes modularity, even in unchanging en-vironments. The results also open new areas of research intoidentifying connection costs in networks without physical con-nections (e.g. genetic regulatory networks) and investigatingwhether pressures to minimize connection costs may explainmodularity in human-created networks (e.g. communication andsocial networks).

It is tempting to consider any component of modularity thatarises due to minimizing connection costs as a ‘spandrel’, in that itemerges as a byproduct of selection for another trait17. However,because the resultant modularity produces evolvability, minimiz-ing connection costs may serve as a bootstrapping process thatcreates initial modularity that can then be further elevated byselection for evolvability. Such hypotheses for how modularityinitially arises are needed, because selection for evolvability can-not act until enough modularity exists to increase the speed ofadaptation42.

Knowing that selection to reduce connection costs producesmodular networks will substantially advance fields that har-ness evolution for engineering, because a longstanding challengetherein has been evolving modular designs12, 30, 31, 41. It will ad-ditionally aid attempts to evolve accurate models of biologicalnetworks, which catalyze medical and biological research3, 20, 39.The functional modularity generated also makes syntheticallyevolved networks easier to understand. These results will thusgenerate immediate benefits in many fields of applied engineer-ing, in addition to furthering our quest to explain one of nature’spredominant organizing principles.

Methods

Experimental Parameters Each treatment is repeated 50 timeswith different stochastic events (i.e. different random numbergenerator seeds). Analyses and visualizations are of the highest-performing network per trial with ties broken randomly. Themain experiments (retina, non-modular retina, 5-XOR, and hier-archical XOR) last 25000 generations and have a population sizeof 1000.

Statistics For each statistic we report the median +/- 95% boot-strapped confidence intervals of the median (calculated by resam-pling the data 5,000 times). In plots, these confidence intervals are


B Multiple, Separable Problems C Hierarchical, Separable ProblemsA Non-Modular Problem

PA P&CC

PA

P&CC

PA P&CC

Exactly 4 blackpixels?

retina

XOR XOR XOR XOR XORXOR

XOR

XOR XOR

XOR

XOR

Fig. 5. Results from tests with different environmental problems. (A) Even on a non-modular problem, modularity is higher with P&CC, though itis lower than for modular problems. (B, C) P&CC performs better, is more modular, and has better functional decomposition than PA when evolvingnetworks to solve five separate XOR functions and hierarchically nested XOR functions. The examples are the final, highest-performing networks pertreatment. Figs. S3, S4, and S5 show networks from all trials.

smoothed with a median filter (window size=200) to remove sub-sampling noise.

Length Cost For P&CC treatments, prior to calculating connec-tion costs, we place nodes in positions optimal for minimizingconnection costs given the topology of the network, which is bi-ologically motivated8, 9, 11 and can be solved for mathematically11.Inputs and outputs are at fixed locations (SI Text). Visualizationsreflect these node placements.

Evolvability Experiment The evolvability experiments (Fig.4A) are described in Fig. S7. To obtain 50 trials that each hada perfectly performing network in L-AND-R and L-OR-R, respec-tively, took 110 and 116 trials for P&CC and 320 and 364 trialsfor PA. 1000 clones of each of these networks then evolved in thealternate environment until performance was perfect or 5000 gen-erations passed.

Overview of Network Models This section provides a briefoverview of the network models in this paper. A more completereview of network models is provided in Alon 2007 4, Haykin1998 18, Newman 2003 34, Barabasi 2004 5, and cites therein.

Network models can represent many types of biological pro-cesses by representing interactions between components4, 5, 34. Ex-ample biological systems that are commonly modeled as net-works are neural, genetic, metabolic, and protein interaction net-works. All such networks can be represented abstractly as nodesrepresenting components, such as neurons or genes, and the in-teractions between such components, such as a gene inhibitinganother or gene. The weight of connections indicates the typeand strength of interactions, with positive values indicating acti-vation, negative values indicating inhibition, and the magnitudeof the value representing the strength of the interaction.

Multiple nodes can connect to form a network (e.g. Fig. S1b).Typically, information flows into the network via input nodes,passes through hidden nodes, and exits via output nodes. Examplesinclude a gene regulatory network responding to changing levelsof environmental chemicals or a neural network responding tovisual inputs from the retina and outputting muscle commands.

Our model of a network is a standard, basic one used in ma-chine learning18, systems biology2, and computational neuro-

science 32. It also has been used in previous, landmark stud-ies on the evolution of modularity16, 22. The networks are feed-forward, meaning that nodes are arranged into layers, such that anode in layer n receives incoming connections only from nodesin layer n−1 and has outgoing connections only to nodes in layern+1 (Fig. S1b). The maximum number of nodes per hidden layeris 8/4/2 for the three hidden layers in the retina problem, 8 forthe single hidden layer in the 5-XOR problem, and 8/4/4 forthe three hidden layers in the hierarchical XOR problem. Thepossible values for connection weights are the integers -2,-1,1,and 2. The possible values for thresholds (also called biases) arethe integers -2,-1,0,1, and 2. Information flows through the net-work in discrete time steps one layer at a time. The output ofeach node in the network is the following function of its inputs:

yj= tanh(λ

(∑i∈Ij

wijyi+b

))where yj is the output of node j, I is

the set of nodes connected to j, wij is the strength of the connec-tion between node i and node j, yi is the output of node i, and b

is a threshold (also called a bias) that determines at which inputvalue the output transitions from negative to positive. The tanh(x)transfer function ensures an output range of [−1,1]. λ (here, 20) de-termines the slope of the transition between these inhibitory andexcitatory extremes (Fig. S1c). This network model can approx-imate any function with an arbitrary precision provided that itcontains enough hidden nodes13.

Evolutionary Algorithm The evolutionary algorithm is basedon research into algorithms inspired by evolution that simulta-neously optimize several objectives, called multi-objective algo-rithms14. These algorithms search for, but are not guaranteed tofind, the set of optimal trade-offs: i.e., solutions that cannot beimproved with respect to one objective without decreasing theirscore with respect to another one. Such solutions are said to beon the Pareto Front14, described formally below. These algorithmsare more general than algorithms that combine multiple objec-tives into a single, weighted fitness function, because the latternecessarily select one set of weights for each objective, whereasmulti-objective algorithms explore all possible tradeoffs betweenobjectives14.

The specific multi-objective algorithm in this paper is thewidely used Nondominated Sorting Genetic Algorithm, versionII (NSGA-II)14 (Fig. S1a). As with most modern multi-objective


evolutionary algorithms, it relies on the concept of Pareto domi-nance, defined as follows:

An individual x∗ is said to dominate another individual x, ifboth conditions 1 and 2 are true: (1) x∗ is not worse than x withrespect to any objective; (2) x∗ is strictly better than x with respectto at least one objective.

However, this definition puts the same emphasis on all objec-tives. In the present study, we take into account that the first ob-jective (performance) is more important than the second objective(optimizing connection cost). To reflect this, we use a stochasticversion of Pareto dominance in which the second objective is onlytaken into account with a given probability p. Lower values of pcause lower selection pressure on the second objective. Our re-sults are robust to alternate values of p, including up to p=1.0 forstatic environments (Fig. S6a-d) and p=0.95 for environments withmodularly varying goals (Fig. S6e).

This stochastic application of the second objective is imple-mented as follows. Let r denote a random number in [0;1] andp the probability to take the second objective into account. A so-lution x∗ is said to stochastically dominate another solution x, ifone of the two following conditions is true: (1) r>p and x∗ is bet-ter than x with respect to the first objective; (2) r≤p and x∗ is notworse than x with respect to either objective and x∗ is better thanx with respect to at least one objective.

Stochastic Pareto dominance is used in the algorithm twice(Fig. S1a): (1) To select a parent for the next generation, two in-dividuals x1 and x2 are randomly chosen from the current popu-lation; if x1 stochastically dominates x2, then x1 is selected, if x2stochastically dominates x1, then x2 is selected. If neither domi-nates the other, the individual selected is the one which is in theless crowded part of the objective space14. (2) To rank individu-als, the set of stochastically non-dominated solutions is first iden-tified and called the first Pareto layer (rank=1, e.g. l1 in Fig. S1a);these individuals are then removed and the same operation is per-formed to identify the subsequent layers (additional ranks corre-sponding to l2, l3, etc. in Fig. S1a).

Mutational Effects Mutations operate in essentially the sameway as in the study by Kashtan and Alon22. In each generation,every new network is randomly mutated (Fig. S1a). Four kindsof mutation are possible, which are not mutually exclusive: (1)each network has a 20% chance of having a single connectionadded. Connections are added between two randomly chosennodes that are not already connected and belong to two consecu-tive layers (to maintain the feed-forward property described pre-viously); (2) each network has a 20% chance of a single, randomlychosen connection being removed; (3) each node in the networkhas a 1/24=4.16% chance of having its threshold (also called its bias)incremented or decremented, with both options equally likely;five values are available (−2,−1,0,1,2); mutations that produce val-ues higher or lower than these five values are ignored; (4) eachconnection in the network has a separate probability of being in-cremented or decremented of 2.0/n, where n is the total number ofconnections of the network. Four values are available (−2,−1,1,2);mutations that produce values higher or lower than these fourvalues are ignored.

The results in this manuscript are robust to varying these pa-rameters. Because having more mutational events that removeconnections than add them might also produce sparsely con-nected, modular networks, we repeated the main experimentwith mutation rates biased to varying degrees (Fig. S9a,b). Theseexperiments show that even having remove-connection events bean order of magnitude more likely than add-connection eventsdoes not produce modular networks. In each of the experimentswith biased mutation rates, the modularity and performance ofthe P&CC treatment with default mutation values was signifi-cantly greater than the PA treatment with biased mutation ratevalues (Fig. S9a,b).

Our main results are qualitatively the same when weights andbiases are real numbers (instead of integers) and mutated viaGaussian perturbation. Nodes are never added nor removed. Forclarity, following 22, nodes without any connections are not dis-played or included in results.

Multi-Objective Landscape Exploration (MOLE) AlgorithmIt would be helpful to visualize the constraints, tradeoffs, andother correlations between different phenotypic dimensions thatoccur in evolving systems (e.g. in this study the performance,modularity, and length for each possible network). Because it isimpossible for this study to determine such information via ex-haustive or random search (SI Text), we designed an algorithmto find high-performing solutions with different combinationsof modularity and length scores. We call this algorithm Multi-Objective Landscape Exploration (MOLE). It is a multi-objectiveoptimization search14 (Fig. S1a) with two objectives. The first ob-jective prioritizes individuals that have high performance. Thesecond objective prioritizes individuals that are far away fromother individuals already discovered by the algorithm, where dis-tance is measured in a Cartesian space with connection costs onthe x axis and modularity on the y axis. Algorithms of this typehave been shown to better explore the search space because theyare less susceptible to getting stuck in local optima26. Thus, unlikea traditional evolutionary algorithm that will only be drawn to atype of solution if there is a fitness gradient towards that type ofsolution, MOLE searches for high-performing solutions for everypossible combination of modularity and cost scores. While thisalgorithm is not guaranteed to find the optimal solution at eachpoint in the space, it provides a focused statistical sampling ofhow likely it is to discover a high-quality solution in each area ofthe search space. The MOLE maps in this paper show the highestperforming network at each point in this Cartesian space foundin 30 separate runs.

Exhaustive and random search fail to discover high-performing networks In this study it would help to know theperformance, modularity, and length for each solution in thesearch space. If the search space is small enough, such valuescan be determined by exhaustively checking every possible so-lution. Such an approach is intractable for the problems in thismanuscript. For example, for the main problem in the paper,which is the retina problem, the number of possible weights is

NumberOfWeights = (8×8)+(8×4)+(4×2)+(2×1)+23 = 129(1)

due to the maximum possible number of nodes in the input, hid-den, and output layers, as well as the bias for each of the 23 pos-sible nodes. Each of these weights can be one of four values or azero if no connection exists, and biases can be one of five values,leading to a search space of

NumberOfSolutions = 5129 = 1090. (2)

Given that it takes on average 0.0013 seconds to assess the fit-ness of a solution across all possible 256 inputs using a moderncomputer, it would take 4.1×1079 years of computing time to ex-haustively determine the performance, modularity, and cost foreach solution in the search space.

Because it is infeasible to exhaustively search the space, werandomly sampled it to see if we would find high-performingsolutions with a variety of cost and modularity scores. Specifi-cally, we randomly generated more than two billion solutions, butevery solution had poor performance. The highest-performingsolution gave the correct answer for only 62 out of 256 retinapatterns (24.2%), which is far below the performance of 93% orgreater for solutions routinely discovered by the evolutionary al-gorithm (Fig. 2c). We thus concluded that randomly samplingthe space would not lead to the discovery of high-performing so-lutions.


Geometric coordinates Nodes exist at two-dimensional (i.e.x,y) Cartesian locations. The geometric coordinates of the inputsand outputs for all problems were fixed throughout evolution,including the treatment where the within-row location of inputsare randomized at the beginning of each separate trial, and are asfollows. The inputs for all problems have y values of 0. For theretina problem, the x values for the inputs are {−3.5,−2.5,...,3.5} andthe output is at {4,0}. For the problem with 5 XOR modules, the xvalues for the inputs are {−4.5,−3.5,...,4.5} and the outputs all havey values of 2 with x values of {−4,−2,0,2,4}. For the problem withdecomposable, hierarchically nested XOR functions, the x valuesfor the inputs are {−3.5,−2.5,...,3.5} and the outputs all have y val-ues of 4 with x values of {−2,2}. The geometric location of nodesis consequential only when there is a cost for longer connections(i.e. the main P&CC treatment).

Video of networks from each treatment evolving across gen-erations A video is provided to illustrate the change in net-works across evolutionary time for both the PA and P&CC treat-ments. It can be viewed at http://chronos.isir.upmc.fr/˜mouret/tmp/

modularity.avi. In it, the networks are visualized as described in thetext.

Acknowledgments

Funding provided an NSF Postdoctoral Research Fellowship inBiology to JC (DBI-1003220), NSF CDI Grant ECCS 0941561, andNSF Creative-IT Grant IIS 0757478. Thanks to Shimon Whiteson,Rafael Sanjuan, Stephane Doncieux, Dusan Misevic, and CharlesOfria for helpful discussions.

1. Yong-Yeol Ahn, Hawoong Jeong, and Beom Jun Kim.Wiring cost in the organization of a biological neuronal net-work. Physica A: Statistical Mechanics and its Applications,367:531–537, July 2006.

2. U. Alon. Biological networks: the tinkerer as an engineer. Sci-ence, 301(5641):1866, 2003.

3. U. Alon. An introduction to systems biology: design principlesof biological circuits. CRC Press, 2007.

4. U. Alon. An introduction to systems biology: design principlesof biological circuits, volume 10. CRC press, 2007.

5. A.L. Barabasi and Z.N. Oltvai. Network biology: understandingthe cell’s functional organization. Nature Reviews Genetics,5(2):101–113, 2004.

6. J.A. Bullinaria. Simulating the evolution of modular neural sys-tems. In Proceedings of the twenty-third annual conference ofthe Cognitive Science Society, pages 146–151. Lawrence Erl-baum, 2001.

7. S.B. Carroll. Chance and necessity: the evolution of morpho-logical complexity and diversity. Nature, 409(6823):1102–1109,2001.

8. B.L. Chen, D.H. Hall, and D.B. Chklovskii. Wiring optimizationcan relate neuronal structure and function. Proceedings of theNational Academy of Sciences, 103(12):4723, 2006.

9. Christopher Cherniak, Zekeria Mokhtarzada, Raul Rodriguez-Esteban, and Kelly Changizi. Global optimization of cerebralcortex layout. Proceedings of the National Academy of Sci-ences, 101(4):1081–6, January 2004.

10. D.B. Chklovskii, Thomas Schikorski, and C.F. Stevens. WiringOptimization in Cortical Circuits. Neuron, 34(3):341–347, 2002.

11. Dmitri B Chklovskii. Exact solution for the optimal neuronallayout problem. Neural computation, 16(10):2067–2078, Octo-ber 2004.

12. J. Clune, B. E. Beckmann, P. K. McKinley, and C. Ofria. Investi-gating whether hyperneat produces modular neural networks.In Proceedings of GECCO’10, pages 635–642. ACM, 2010.

13. G. Cybenko. Approximation by superpositions of a sigmoidalfunction. Mathematics of Control, Signals, and Systems(MCSS), 2(4):303–314, 1989.

14. K. Deb. Multi-objective optimization using evolutionary algo-rithms. Wiley, 2001.

15. A. Di Ferdinando, R. Calabretta, and D. Parisi. Evolving modu-lar architectures for neural networks. In Connectionist modelsof learning, development and evolution: proceedings of theSixth Neural Computation and Psychology Workshop, pages253–262. Springer Verlag, 2001.

16. C. Espinosa-Soto and A. Wagner. Specialization can drivethe evolution of modularity. PLoS computational biology,6(3):e1000719, 2010.

17. S.J. Gould and R.C. Lewontin. The spandrels of San Marcoand the panglossian paradigm: a critique of the adaptation-ist programme. Proceedings of the Royal Society of London.Series B, Biological Sciences, 205(1161):581–598, 1979.

18. S. Haykin. Neural Networks: A Comprehensive Foundation.Prentice Hall, 2nd edition, 1998.

19. Arend Hintze and Christoph Adami. Evolution of complexmodular biological networks. PLoS computational biology,4(2):e23, February 2008.

20. R. E. Hoffman and T. H. Mcglashan. Neural network models ofschizophrenia. The Neuroscientist, 7(5):441–454, 2001.

21. R.A. Jacobs and M.I. Jordan. Computational consequences ofa bias toward short connections. Journal of Cognitive Neuro-science, 4(4):323–336, 1992.

22. Nadav Kashtan and Uri Alon. Spontaneous evolution ofmodularity and network motifs. Proceedings of the NationalAcademy of Sciences, 102(39):13773–13778, September 2005.

23. Nadav Kashtan, Elad Noor, and Uri Alon. Varying environ-ments can speed up evolution. Proceedings of the NationalAcademy of Sciences, 104(34):13711–13716, August 2007.

24. M. LaBarbera. Principles of design of fluid transport systemsin zoology. Science, 249(4972):992, 1990.

25. Simon B Laughlin and Terrence J Sejnowski. Communi-cation in neuronal networks. Science (New York, N.Y.),301(5641):1870–4, September 2003.

26. J. Lehman and K.O. Stanley. Abandoning objectives: Evolu-tion through the search for novelty alone. Evolutionary Com-putation, 19(2):189–223, 2011.

27. E. A. Leicht and M. E. J. Newman. Community structure indirected networks. Physical review letters, pages 118703–118707, 2008.

28. R. E. Lenski, C. Ofria, T. C. Collier, and C. Adami. Genomecomplexity, robustness and genetic interactions in digital or-ganisms. Nature, 400(6745):661–664, 1999.

29. R. E. Lenski, C. Ofria, R. T. Pennock, and C. Adami. The evolu-tionary origin of complex features. Nature, 423(6936):139–144,2003.

30. H. Lipson. Principles of modularity, regularity, and hierarchyfor scalable systems. Journal of Biological Physics and Chem-istry, 7(4):125–128, 2007.

31. H. Lipson and J.B. Pollack. Automatic design and manufactureof robotic lifeforms. Nature, 406(6799):974–978, 2000.

32. W.S. McCulloch and W. Pitts. A logical calculus of the ideasimmanent in nervous activity. Bulletin of mathematical biol-ogy, 5(4):115–133, 1943.

33. M. E. J. Newman. Modularity and community structure in net-works. Proceedings of the National Academy of Sciences,103(23):8577–8582, 2006.

34. M.E.J. Newman. The structure and function of complex net-works. SIAM review, pages 167–256, 2003.

35. M. Parter, N. Kashtan, and U. Alon. Environmental variabilityand modularity of bacterial metabolic networks. BMC Evolu-tionary Biology, 7(1):169, 2007.

36. Massimo Pigliucci. Is evolvability evolvable? Nature reviews.Genetics, 9(1):75–82, January 2008.

37. Ashish Raj and Yu-hsien Chen. The wiring economy principle:connectivity determines anatomy in the human brain. PloSone, 6(9):e14832, January 2011.

38. S. Ramon y Cajal. Textura del sistema nervioso del hombrey de los vertebrados [Texture of the Nervous System of Manand the Vertebrates]. Nicolas Moya, 1899.

39. G.F. Striedter. Principles of brain evolution. Sinauer Asso-ciates Sunderland, MA, 2005.

40. N. P. Suh. The principles of design, volume 226. Oxford Uni-versity Press, 1990.

41. G.P. Wagner, J. Mezey, and R. Calabretta. Modularity. Under-standing the development and evolution of complex naturalsystems, chapter Natural selection and the origin of modules.The MIT Press, 2001.

42. Gunter P Wagner, Mihaela Pavlicev, and James M Cheverud.The road to modularity. Nature Reviews Genetics, 8(12):921–31, December 2007.


http://chronos.isir.upmc.fr/~mouret/tmp/modularity.avi

http://chronos.isir.upmc.fr/~mouret/tmp/modularity.avi

The evolutionary origins of modularity– Supplementary Material

Jeff Clune1∗, Jean-Baptiste Mouret2∗ and Hod Lipson1

1Cornell University, Ithaca, USA, 2Universite Pierre et Marie Curie-Paris 6, CNRS UMR 7222, France∗ These authors contributed equally to this work.

Preprint – November 6, 2018

Fig. S1. Overview of the model. (A) The multi-objective evolutionary algorithm (NSGA-II). Starting with a population of N randomly generatedindividuals, an offspring population of N new individuals is generated using the best individuals of the current population. The union of the offspringand the current population is then ranked according to the stochastic Pareto dominance (explained in Methods, here represented by having organismsin different ranks connected by lines labeled L1, L2, etc.) and the best N individuals form the next generation. (B) An example network model.Networks can model many different biological processes, such as genetic regulatory networks, neural networks, metabolic networks, and proteininteraction networks. Information enters the network when it is sensed as an input pattern. Nodes, which represent components of the network (e.g.neurons or genes), respond to such information and can activate or inhibit other components of the network to varying degrees. The strength ofinteractions between two nodes is represented by the weight of the connection between them, which is a scalar value, and whether the interaction isexcitatory or inhibitory depends on whether the weight is positive or negative. In this figure all non-zero weights are represented by an arrow. Thesignal entering each node is passed through a transfer function to determine the output for that node. That output then travels through each of thenode’s outgoing connections and, after being scaled by the weight of that outgoing connection, serves as a component of the incoming signal forthe node at the end of that connection. Eventually an output pattern is produced, which for a neural network could be muscle commands or for agenetic regulatory network could be proteins. (C) The transfer function for each node. The sum of the incoming signal (x) for a node is multiplied by 20before being passed through the transfer function tanh(x). Multiplying by 20 makes the transition steep and similar to a step-like function. The tanh(x)function ensures that the output is in the range [−1,1].


1.00 0.24 1.00 0.22 1.00 0.21 1.00 0.20 1.00 0.20

1.00 0.20 1.00 0.19 1.00 0.19 0.99 0.16 0.99 0.15

0.99 0.15 0.99 0.22 0.99 0.20 0.99 0.16 0.99 0.13

0.98 0.27 0.98 0.27 0.98 0.20 0.98 0.19 0.98 0.18

0.98 0.23 0.98 0.23 0.98 0.21 0.98 0.18 0.98 0.17

0.98 0.15 0.98 0.12 0.98 0.12 0.97 0.18 0.97 0.17

0.97 0.20 0.97 0.15 0.96 0.11 0.96 0.18 0.96 0.16

0.96 0.12 0.96 0.12 0.95 0.21 0.95 0.19 0.95 0.16

0.95 0.12 0.95 0.12 0.95 0.11 0.95 0.19 0.94 0.15

0.94 0.12 0.94 0.12 0.93 0.15 0.93 0.12 0.93 0.22

1.00 0.53L R

1.00 0.50 1.00 0.49L R

1.00 0.49L R

1.00 0.49L R

1.00 0.49L

1.00 0.47L

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.46L R

1.00 0.45L R

1.00 0.45L R

1.00 0.45L R

1.00 0.45L R

1.00 0.44L R

1.00 0.44R1.00 0.43 1.00 0.43 1.00 0.43 1.00 0.42

R

1.00 0.41R1.00 0.36 1.00 0.32 1.00 0.25 1.00 0.18

1.00 0.14 1.00 0.27 0.99 0.29 0.99 0.17 0.98 0.19

0.98 0.16 0.98 0.17 0.98 0.26 0.97 0.23 0.97 0.28

0.97 0.23 0.97 0.22 0.97 0.19 0.97 0.18 0.97 0.16

0.96 0.14 0.96 0.16 0.96 0.16 0.95 0.15 0.93 0.19

A Performance Alone (PA) B Performance & Connection Costs (P&CC)

Fig. S2. Network visualizations and data from the main experiment. The highest-performing network at the end of each trial is visualized here,sorted first by fitness and second by modularity. For each network the performance (left-justified) and modularity score Q (right-justified) are shown.An “L” or “R” indicates that the network has a single neuron that perfectly solves the left or the right subproblem, respectively. (A) Networks from thetreatment where the only selection pressure is on the performance of the network. (B) Networks from the treatment where there are two selectionpressures: one to maximize performance and one to minimize connection costs.


XOR XOR XOR XOR XOR

A

1.00 0.66 0.95 0.76 0.95 0.76

0.95 0.75 0.95 0.73 0.95 0.72

0.95 0.70 0.95 0.70 0.95 0.68

0.95 0.65 0.95 0.65 0.95 0.65

0.95 0.62 0.95 0.61 0.95 0.60

0.95 0.60 0.95 0.59 0.95 0.59

0.95 0.57 0.95 0.56 0.95 0.55

0.95 0.54 0.95 0.52 0.95 0.50

0.95 0.47 0.95 0.46 0.95 0.42

0.91 0.62 0.91 0.46 0.90 0.68

0.90 0.67 0.90 0.67 0.90 0.65

0.90 0.60 0.90 0.59 0.90 0.58

0.90 0.55 0.90 0.55 0.90 0.54

0.90 0.51 0.90 0.49 0.90 0.45

0.90 0.42 0.90 0.39 0.90 0.36

0.85 0.63 0.85 0.59 0.85 0.59

0.85 0.48 0.80 0.63

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.80

1.00 0.80 1.00 0.80 1.00 0.75

1.00 0.75 1.00 0.75 1.00 0.75

1.00 0.72 1.00 0.72 1.00 0.72

1.00 0.72 1.00 0.72 1.00 0.72

1.00 0.72 1.00 0.72 1.00 0.72

1.00 0.70 1.00 0.69 1.00 0.69

0.95 0.76 0.95 0.72

B Performance Alone (PA) C Performance & Connection Costs (P&CC)

Fig. S3. Network visualizations and data for the experiment with five decomposable XOR Boolean logic functions. The highest-performingnetwork at the end of each trial, sorted by fitness then modularity, and its performance (left-justified) and modularity score Q (right-justified). (A) Theenvironmental challenge in this experiment. (B and C) Networks from the PA and P-MCC treatments, respectively.


XOR

XOR

XOR XOR

XOR

XOR

A

0.97 0.41 0.96 0.24 0.95 0.25 0.95 0.36 0.94 0.23

0.94 0.44 0.94 0.42 0.94 0.41 0.94 0.38 0.94 0.36

0.94 0.35 0.94 0.35 0.94 0.34 0.92 0.29 0.92 0.23

0.91 0.21 0.90 0.25 0.90 0.21 0.88 0.27 0.87 0.28

0.87 0.24 0.87 0.31 0.87 0.24 0.84 0.39 0.84 0.33

0.84 0.28 0.84 0.26 0.84 0.29 0.84 0.24 0.83 0.28

0.83 0.26 0.83 0.25 0.83 0.36 0.83 0.29 0.82 0.24

0.82 0.25 0.81 0.23 0.81 0.35 0.81 0.20 0.80 0.31

0.80 0.21 0.79 0.25 0.79 0.24 0.75 0.31 0.75 0.30

0.75 0.24 0.75 0.22 0.75 0.22 0.75 0.21 0.72 0.29

1.00 0.55 1.00 0.50 1.00 0.48 0.97 0.59 0.97 0.51

0.97 0.50 0.97 0.49 0.97 0.44 0.97 0.40 0.96 0.42

0.96 0.30 0.94 0.56 0.94 0.56 0.94 0.56 0.94 0.52

0.94 0.50 0.94 0.50 0.94 0.48 0.94 0.48 0.94 0.48

0.94 0.46 0.94 0.46 0.94 0.46 0.94 0.46 0.94 0.45

0.94 0.45 0.94 0.44 0.94 0.41 0.94 0.32 0.93 0.32

0.92 0.29 0.91 0.22 0.91 0.53 0.91 0.52 0.91 0.47

0.90 0.26 0.90 0.45 0.86 0.48 0.85 0.41 0.84 0.48

0.84 0.45 0.84 0.42 0.83 0.37 0.83 0.20 0.75 0.47

0.75 0.32 0.75 0.26 0.75 0.25 0.75 0.23 0.75 0.23


Fig. S4. Network visualizations and data for the experiment with decomposable, hierarchically nested XOR Boolean logic functions.The highest-performing network at the end of each trial is visualized here, sorted first by fitness and second by modularity. For each network theperformance (left-justified) and modularity score Q (right-justified) are shown. (A) A depiction of the environmental challenge in this experiment. (B)Networks from the treatment where the only selection pressure is on the performance of the network. (C) Networks from the treatment where thereare two selection pressures: one to maximize performance and one to minimize connection costs.


Exactly 4 blackpixels?

retinaA

1.00 0.15 1.00 0.15 1.00 0.15 1.00 0.15 1.00 0.12

1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12

1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12

1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12

1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12

1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12 1.00 0.12

1.00 0.12 1.00 0.11 1.00 0.11 1.00 0.11 1.00 0.11

1.00 0.11 1.00 0.11 1.00 0.11 1.00 0.11 1.00 0.11

1.00 0.11 1.00 0.11 1.00 0.11 1.00 0.11 1.00 0.11

1.00 0.11 1.00 0.11 1.00 0.10 1.00 0.10 1.00 0.10

1.00 0.44 1.00 0.41 1.00 0.38 1.00 0.38 1.00 0.38

1.00 0.36 1.00 0.36 1.00 0.36 1.00 0.35 1.00 0.35

1.00 0.35 1.00 0.35 1.00 0.31 1.00 0.31 1.00 0.30

1.00 0.28 1.00 0.28 1.00 0.28 1.00 0.28 1.00 0.24

1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24

1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24

1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24 1.00 0.24

1.00 0.24 1.00 0.16 1.00 0.16 1.00 0.16 1.00 0.16

1.00 0.16 1.00 0.16 1.00 0.16 1.00 0.16 1.00 0.16

1.00 0.16 1.00 0.16 1.00 0.16 1.00 0.16 1.00 0.14


Fig. S5. Network visualizations and data from the experiment with a non-modular problem. (A) A non-modular version of the retina problemis created when networks have to answer whether any four pixels are on, which is non-modular because it involves information from anywhere in theretina. (B, C) The highest-performing network at the end of each trial is visualized here, sorted first by fitness and second by modularity. For eachnetwork the performance (left-justified) and modularity score Q (right-justified) are shown. (B) shows networks from the treatment where the onlyselection pressure is on the performance of the network. (C) shows networks from the treatment where there are two selection pressures: one tomaximize performance and one to minimize connection costs.


Fig. S6. Modularity and performance results are robust to variation in the strength of selection to minimize connection costs. (A) Medianperformance per generation of the highest-performing network, which is improved by adding a selection pressure to minimize connection costs. Thevalue of p indicates the strength of the pressure (Methods), ranging from 0% (no pressure) through 10% (low pressure) to 100% (high pressure–equal to that for the main performance objective). (B) Performance of the highest-performing networks after 25000 generations for different values ofp. The differences between PA (p= 0%) and P&CC (p> 0%) are highly significant (four asterisks indicate a p-value <0.001). (C) Median modularityper generation of the highest-performing network of each trial for different values of p. (D) Modularity of the highest-performing networks after 25000generations, which is significantly higher when there is a selection pressure to reduce connection costs (i.e. when p ¿ 0%). (E) A changing environment.The x-axis represents the number of generations between a switch from the L-AND-R environment to the L-OR-R environment or vice-versa. Thisfigure is the same as Fig. 4 from the main text, but shows results from additional values of p. The results for P&CC (p¿0) are consistent acrossdifferent values of p provided that the strength of selection on network connection costs is not equal to that of network performance (i.e. for all p ¡100%). The p=100% line is anomalous because in that case cost is as important as performance, leading to mostly low-performing networks withpathological topologies: this effect is strongest when the environment changes quickly because such changes frequently eliminate the fitness benefitsof high performance.


Fig. S7. Environmental changes between two modular problems with shared subproblems. (A) In the first evolutionary phase, networks evolveto answer whether a left and right object are present (the L-AND-R environment). The problem can be modularly decomposed, since the presence ofleft and right objects can first be separately detected before answering the larger question of whether they are both present. A left object is consideredpresent if one of the eight left object patterns (shown in Fig. 2 in the main text) is exposed to the left side of the retina, and vice-versa for right objects.Networks that could perfectly solve the L-AND-R problem after 5000 generations are then transferred to an environment in which the challenge is todetermine if a left or right object is present (the L-OR-R environment), which is a different overall problem that shares the subproblems of identifyingleft and right objects. Evolution continues until networks can perfectly solve the L-OR-R problem or until 5000 generations elapse. (B) The sameexperiment is repeated, but first evolving in the L-OR-R environment and then transferring to the L-AND-R environment. (C) A fully zoomed-out plot ofFig. 4 from the main text. Trials last 5000 generations or until a network exhibits perfect performance in the new environment. For each treatment, foreach of 50 networks that perform perfectly in the first environment, we conduct 50 trials in the second environment, meaning that each column in thisfigure (and Fig. 4 from the main text) represent 50×50 = 2500 data points.


A

B

C

1.00 0.49L R

1.00 0.49L R

1.00 0.49L R

1.00 0.49L R

1.00 0.46

1.00 0.46 1.00 0.46R

1.00 0.42 1.00 0.41 1.00 0.41

1.00 0.41 1.00 0.40 1.00 0.40 1.00 0.40 1.00 0.40

1.00 0.40 1.00 0.40 1.00 0.40 1.00 0.40L

1.00 0.39

1.00 0.39 1.00 0.39 1.00 0.39 1.00 0.38 1.00 0.36

1.00 0.30 1.00 0.28 1.00 0.28 1.00 0.24 1.00 0.20

1.00 0.15 0.99 0.35 0.99 0.30 0.99 0.21 0.99 0.20

0.99 0.17 0.98 0.18 0.98 0.18 0.98 0.16 0.98 0.19

0.97 0.21 0.97 0.16 0.96 0.22 0.96 0.16 0.96 0.20

0.96 0.17 0.96 0.11 0.95 0.18 0.95 0.22 0.94 0.14

1.00 0.51L

1.00 0.50R

1.00 0.49

1.00 0.47 1.00 0.45L

1.00 0.45L R

1.00 0.45L R

1.00 0.45 1.00 0.44L R

1.00 0.42R

1.00 0.42 1.00 0.42

1.00 0.42 1.00 0.40 1.00 0.40

1.00 0.40 1.00 0.39L

1.00 0.39

1.00 0.39 1.00 0.38 1.00 0.38

1.00 0.37 1.00 0.36 1.00 0.34

1.00 0.32 1.00 0.32 1.00 0.25

1.00 0.16 0.99 0.25 0.99 0.20

0.99 0.19 0.98 0.27 0.98 0.22

0.98 0.20 0.98 0.19 0.97 0.20

0.97 0.17 0.97 0.39 0.97 0.39

0.97 0.18 0.96 0.18 0.96 0.17

0.96 0.32 0.96 0.22 0.95 0.18

0.94 0.20 0.94 0.12 0.94 0.18

L R L LRR LR

Fig. S8. (A) Repeating the main experiment wherein selection is based on performance and connection costs (P&CC), but with connection costssolely measured as the number of connections (P&CC-NC), produces networks that look qualitatively similar to those produced with the default P&CCcost function (compare the networks visualized here to those in Fig. S2). The modularity and performance scores are also qualitatively similar and notsignificantly different (modularity Q=0.36[0.22,0.4] vs. default cost function Q=0.42[0.25,0.45],p=0.15; performance =1.0[0.99,1.0] vs. default cost functionperformance of 1.0[1.0,1.0],p=1.0). Like the default cost function, the alternative cost function produced significantly more modularity (p=1×10−8)and higher performance (p=1×10−5) than PA. Also qualitatively similar between these cost functions is the percent of runs that have the inputsrelated to the left and right subproblems in separate modules after splitting the network once to maximize modularity (alternate: 52%, default: 56%,Fisher’s exact test: p=0.84). One minor qualitative difference is the percent of runs that have two neurons that each perfectly solve the left and rightsubproblems, respectively (alternate: 10%, default: 39%, Fisher’s exact test: p=8×10−9). (B) Randomizing the geometric location of input coordinates(P&CC-RL) eliminates the left-right geometric decomposition of the L-AND-R retina problem, yet such randomization does not change the result thata connection cost causes significantly higher performance and modularity. (C) Displayed are final, evolved P&CC-RL networks, which are shown bothwith randomized input coordinates (left) and with the left and right inputs in order to visualize problem decomposition (right). Despite the randomizationof the input locations, many of the evolved modules still functionally decompose the left and right subproblems.


Fig. S9. (A,B) Biasing the mutation rate towards having more remove-connection mutations than add-connection mutations does notincrease the modularity or performance of PA treatments. Setting the remove-connection mutation rate to be up to an order of magnitude higherthan the add-connection mutation rate did not qualitatively change the results. The PA treatment with default values of 0.2 for the remove-connectionand add-connection mutation rates did not have statistically different levels of modularity or performance from PA treatments with the same remove-connection mutation rate of 0.2, but a lower add-connection mutation rate of 0.15, 0.10, or 0.02 (p>0.05). The P&CC treatment had statistically higherperformance and modularity scores than every PA treatment, irrespective its mutation rate bias. (C,D,E) Networks evolved with a connection costhave shorter lengths, fewer nodes, and fewer connections. (C) The summed length of all connections is smaller for networks evolved in the P&CCregime than in the PA regime. The P&CC networks also have fewer nodes (D) and connections (E). Three asterisks indicate a value of p<0.001 andfour indicate a value of p<0.0001.


A Multiple, Separable Problems B Hierarchical, Separable Problems

Fig. S10. The highest-performing networks found for each combination of modularity and length for (A) the problem with five separableXOR functions and (B) the hierarchical XOR problem. To understand the performance potential of networks with different combinations of modular-ity levels and summed connection lengths we invented the Multi-Objective Landscape Exploration (MOLE) algorithm, which is described in Methods.Colors indicate the highest-performing network found at that point in the modularity vs. cost space, with yellow representing perfect performance. Thebest-performing network at the end of each of the 50 PA and P&CC runs are overlaid on the map. Networks with perfect performance exist through-out the space, which helps explain why modularity does not evolve when there is selection based on performance alone. There is also an inversecorrelation between length and modularity for high-performing networks: The lowest cost networks–those with the shortest summed lengths–that arehigh-performing are modular.


the evolutionary origins of modularity · evolvability is that modularity arises to enable...

Documents