modeling the evolution of complex genetic systems: the...

Modeling the Evolution ofComplex Genetic Systems: TheGene Network Family TreeJANNA L. FIERST*AND PATRICK C. PHILLIPSInstitute of Ecology and Evolution, University of Oregon, Eugene, Oregon

Network thinking is firmly embedded in biology today, but thishas not always been true. Starting with the work of Hardy ('08);and Wilhelm Weinberg ('08) that initiated increasingly quanti-tative approaches to evolutionary genetics, the idea thatevolutionary change is generated chiefly by natural selectionacting at the genetic level was well accepted by the 1920s. Theopen question then, as now, was exactly how this happened. Tomake rigorous progress on this question, scientists needed toidentify the appropriate genetic unit for evolution. Evolutionarygeneticists focused on the population, where adaptation wasthought to be dominated by allelic variants of individual genes.Quantitative geneticists focused on the phenotype, where it wasexpected that the number of genes contributing to any trait wasso large that individual effects would average and couldeffectively be studied throughmeans and variances (Provine, '71).A third school of thought popularized by Stuart Kaufman ('69,'74, '93) focused on interacting networks of biochemicalmolecules,each of which are produced by underlying genes. As biologicaltechnology developed, genetic and biochemical studies demon-strated that complex interaction networks were in fact the rule,rather than the exception. Despite this, nearly 50 years later ourunderstanding of evolution at a network level lags far behind ourpopulation or quantitative genetic understanding. This is primarilydue to the complexity of interaction networks and the difficulty instudying them. Biological networks almost always involve

complex, non linear reactions between heterogeneous elementsthat are dependent on both timing and location within theorganism.In order to analyze and understand networks, biologists haveturned to an explicit modeling framework. These models fall intotwo classes. First, there are models designed to study a specificnetwork or pathway in a single organism (e.g., the Drosophilasegment polarity network; von Dassow et al., 2000). Thesetypically address small networks with few genes, use differentialequations, and require precise measurements of gene products.Second, there are models that attempt to discover the generalprinciples that “emerge” from networks. These models, which areloosely based on biological data, often use general, abstractrepresentations of genes and gene products, and rely heavily on

ABSTRACT In 1994 and 1996, Andreas Wagner introduced a novel model in two papers addressing theevolution of genetic regulatory networks. This work, and a suite of papers that followed usingsimilar models, helped integrate network thinking into biology and motivate research focused onthe evolution of genetic networks. The Wagner network has its mathematical roots in the Isingmodel, a statistical physics model describing the activity of atoms on a lattice, and in neuralnetworks. These models have given rise to two branches of applications, one in physics and biologyand one in artificial intelligence and machine learning. Here, we review development along thesebranches, outline similarities and differences between biological models of genetic regulatorycircuits and neural circuits models used in machine learning, and identify ways in which thesemodels can provide novel insights into biological systems. J. Exp. Zool. (Mol. Dev. Evol.) 324B:1–12,2015. © 2014 Wiley Periodicals, Inc.How to cite this article: Fierst JL, Phillips PC. 2015. Modeling the evolution of complex geneticsystems: The gene network family tree. J. Exp. Zool. (Mol. Dev. Evol.) 324B:1–12.

J. Exp. Zool.(Mol. Dev. Evol.)324B:1–12, 2015

Grant sponsor: National Institutes of Health; grant number: GM096008.Conflict of interest: None�Correspondence to: Janna L. Fierst, Institute of Ecology and Evolution,

University of Oregon, Eugene, Oregon 97403 5289.E mail: [email protected] 18 February 2014; Revised 24 July 2014; Accepted 26 August

2014DOI: 10.1002/jez.b.22597Published online in Wiley Online Library

(wileyonlinelibrary.com)..

PERSPECTIVE AND HYPOTHESIS

© 2014 WILEY PERIODICALS, INC.

computer simulations. One of the most well known of theseabstract models was developed by Andreas Wagner in the mid1990's. Wagner analyzed genetic duplications with a genenetwork model, and demonstrated that duplications of singlegenes or entire networks produced the least developmentaldisruption (Wagner, '94). Empirical studies have shown thatduplications typically occur as single, tandem duplications orentire genome duplications, and Wagner's paper provided anexplanation for this. His work pre dated a slew of papersuncovering duplications in organisms ranging from Arabidopsisthaliana (Arabidopsis Genome Initiative, 2000) to zebrafish(Amores et al., '98). These studies showed that duplications areubiquitous and play a critical role in evolution, and Wagner'sresults provide a logical context for these findings.In 1996, Wagner used a similar model to address a longstanding issue proposed by Conrad Waddington ('42, '57).Organisms must develop stably in the face of environmentaland genetic perturbations, but how does this evolve and how is itmaintained? Waddington called this process canalization, andWagner demonstrated that canalization evolved under longperiods of stabilizing selection. Siegal and Bergman (2002)brought additional attention to this by using Wagner's model toshow that canalization, or robustness to mutation, could alsoevolve in the absence of phenotypic selection. Their modelrequired that individuals achieve developmental stability (astable, equilibrium gene expression state) and this, separate fromany external selection, was sufficient to result in population wideincreases in canalization. Their article demonstrated that thismodel could be used to address a variety of evolutionaryquestions, and since then Wagner's original framework hasserved as the basis for more than 30 articles published in physics,biology, and computer science.Despite increasing reliance on this approach, few authors areaware of the roots of themodel they are using. Here, we review theWagner model's connection to previous work in other disciplinesand discuss the mathematics and key concepts of each of thesemodels (Table 1). Elements of these models share a conceptualthread that can be linked together to form a “family tree” ofrelationships (Fig. 1). The shared history of these models isimportant for several reasons. First, these connections mean thatwork in other disciplines is directly related to work ongoing inevolutionary biology, and learning about the model's use in otherdisciplines may inform evolutionary studies. Second, manycharacteristics of the model come from its development in otherdisciplines, which brings up several important questions: Is thisthe appropriate model to study genetic regulation and networkevolution? Is it biologically relevant? For example, areassumptions inherited from studies of magnetism and neuronsappropriate for genetic networks?Finally, because the Wagner model has its roots in otherdisciplines, its conceptual development and context is importantfor understanding its use in evolutionary biology. New

techniques and technologies bring new approaches, and ateach step along the network family tree, conceptual andmethodological development has resulted in new scientificinsights. Throughout this article we highlight in bold the keyconceptual breakthroughs that came with each model. Research-ers in machine learning and artificial intelligence have success-fully expanded this model to address complex, high dimensionaldata. Incorporating these techniques into biology has thepotential to help us make sense of the ever growing avalancheof data being generated by the omics revolution. We hope thathighlighting the similarities between familiar models and newermachine learning techniques will encourage their expanded usein biological studies.

The Wagner Gene NetworkWagner's model (Wagner, '94, '96) addresses the genetic anddevelopmental evolution of a population of individuals, whereeach individual is haploid. Here, we introduce the generalframework of the model. The mathematics will be discussed lateron. At a certain stage in development, gene expression levels aredetermined by interactions between proteins produced bytranscription factor genes. In his 1994 paper, Wagner gives anexample system as, “e.g., a set of nuclei in a part of a Drosophilablastoderm expressing a specific subset of gap genes and pairrule genes.” The model envisions a set of regulatory elements,such as promoters, that are in close physical proximity to a gene(Fig. 2a). Genes activate and repress other genes through theprotein products they express. Interactions between regulatorsand targets, and the inputs that determine initial conditions, canbe graphically depicted as a network where arrows indicate theregulator >target direction of individual interactions (Fig. 2b).Quantifying interactions results in a matrix that is a haploidgenotype (Fig. 2c). This genotype can then be implemented inindividual based simulations subject to mutation, selection, andgenetic drift. Wagner specified several simplifying assumptionsfor the model: 1) the expression of each gene is exclusivelyregulated on the transcriptional level; 2) each gene produces onlyone type of transcriptional regulator; 3) genes independentlyregulate other genes; and 4) individual transcription factors“cooperate” to produce activation or repression of target genes.The Wagner gene network model makes several key assump-tions. Genes have activation or repression states, interactions arebetween pairs of genes (eliminating higher order interactions), athreshold function determines interaction dynamics, andinteractions can be asymmetric and recurrent. In the sectionsthat follow, we describe themodels that theWagner gene networkbuilds on and the conceptual links between these models andcurrent implementations.

History of Multi state Network ModelsThe Ising Model. In the early 1900's statistical mechanics couldnot explain all the properties of liquids and solids, particularly

2 FIERST AND PHILLIPS

J. Exp. Zool. (Mol. Dev. Evol.)

Table 1. Graphical depictions and descriptions of each of the models.

Model Image Symmetric Stochastic Key Innovations

The Ising Model(Ising, '25)

yes yes Discrete, pair wise interactions

Artificial Neuron(McCullough and Pitts, '43)

yes no Threshold function determinesthe output

The Perceptron(Rosenblatt, '57)

no no Asymmetric interactions

The Multilayer Perceptron(Rosenblatt, '61)

no no Additional layer of interactions

Boolean Gene Net(Kauffman, '69)

no no Stability analysis

Spin Glass(Sherrington and Kirkpatrick, '75)

no yes Continuous spins and multiplestable states

Hopfield Net(Hopfield, '82)

yes no Recurrent feedback

(Continued )


THE GENE NETWORK FAMILY TREE 3

phase transitions (transformation from one state of matter toanother, i.e., liquid to gas). This was particularly problematic formagnetism, because the transition of a magnet could be easilymeasured but it could not be explained using the standardpartition function that serves as the basis for much of statisticalmechanics. Because the partition function allows one to derive allof the thermodynamic functions of the system, this question wasone of the most pressing scientific problems of the early 20thcentury (Brush, '67).Wilhelm Lenz, a physicist working at Hamburg University,devised a model to address the problem (Lenz, '20) and gave it toone of his students, Ernst Ising, for further study. Ising solved themodel in one dimension for his 1924 dissertation but the 1 Dmodel did not show phase transitions and Ising incorrectlyconcluded that the model could not address phase transitions inhigher dimensions (Ising, '25). As a German Jewish scientist,Ising was barred from teaching and research when Hitler came to

Figure 1. The network family tree. A chronology of the models.

Table 1. (Continued )

Model Image Symmetric Stochastic Key Innovations

Boltzmann Machine(Hinton et al., '83)

yes yes Stochastic interactions

RestrictedBoltzmann Machine(Smolensky, '86)

yes yes Constrained topology

Deep Belief Net(Hinton et al., 2006)

yes yes Additional layers of interactions

Wagner Gene Network(Wagner, '94)

no yes Biological context



power in 1933 and eventually fled Germany for Luxembourg. In1947 he emigrated to the United States and later learned that hisoriginal model had been expanded to address phase transitions intwo dimensions by Rudolf Peierls ('36) and Lars Onsager ('44). TheIsing model is now one of the fundamental mathematical modelsin statistical mechanics, and Ising's 1998 obituary in PhysicsToday estimated that�800 papers are published using his modeleach year (Stutz and Williams, '99).The Isingmodel sets fortha simple system;givena set ofmagneticparticles on a lattice, each has a “spin” or orientation of either � 1/2or þ1/2. The spin of each particle is determined by interactionswith its nearest neighbors and with the external magnetic field.The configuration, s, is the spin of each particle on the lattice.The energy of the configuration across sites i and j is

HðsÞ ¼ �X

<ij>

wijsisj �X

j

hjsj; ð1Þ

wherewij is the interaction between the sites and hj is the externalmagnetic field for site j. The probability of a configuration is

PbðsÞ ¼e� bHðsÞ

Zb

; ð2Þ

where b ¼ 1kBTand Zb; ¼

Ps e� b;HðsÞ: In this model, T is the

temperature, kB is the Boltzmann constant relating particleenergy to system temperature, Zb is the partition function, andthe configuration probabilities are given by the Boltzmanndistribution.Ising and Lenz were able to solve the model by introducingseveral key assumptions. Particles in physical systems occupycomplex, three dimensional spaces, exist in multiple possiblestates with varying distances between particles, and withinteractions across short and long distances. The Ising modelsimplifies this to a discrete 2 D lattice with two possible states perparticle. Ising and Lenz assumed that interactions betweenparticles dissipate quickly, such that only the interactionsbetween nearest neighbors need to be accounted for. Latermodels have relaxed some of these assumptions and expandedIsing's framework to incorporate additional dynamics, but thisframework (and much of the mathematics) underlies all of themodels that follow.The Ising model sets up several central themes. Ising and Lenzwere addressing transitions, which appear in different forms inlater models. First, particles in the Ising model take on differentstates. Ising simplified these to two discrete spins, but later modelsexpand these to continuous states. In evolutionary studies, thestates are gene activation and repression or expression levels.Second, particles in the Ising model interact in pairs with discretestrengths.While there is no a priori reason to represent particles, orgenes, in pairwise interactions that are constant over time,eliminating higher order dynamics and variable interactionsgreatly simplifies the model and analysis.

McCulloch and Pitts Artificial NeuronContemporaneous with this work by physicists on phasetransitions, neuroscientists were addressing logic, learning andmemory. Anatomists had demonstrated the structure of the brainand individual neurons, but how did these physical structuresachieve neural processing and give rise to complex phenomenalike logic? Alan Turing ('36, '38) had proposed his TuringMachine, an early theoretical computer with unlimited memorythat operated by reading symbols on a tape, by executinginstructions according to those symbols, and by movingbackwards and forwards along the tape. In 1943 WarrenMcCulloch, a neurophysiologist working at the University ofChicago, and Walter Pitts, a graduate student in logic, designedan artificial biological neuron (McCulloch and Pitts, '43). Theyenvisioned a set of inputs I with weights wi that determine theoutput y of a single neuron. Under their formulation, inputs canbe either excitatory or inhibitory and they feed into the modelusing a somewhat complicated, heuristic update rule. At eachtime step the model must check if the inhibitory input is activatedand if it is, then the neuron is repressed. If it is not activatedthe output is determined by summing across excitatory inputs

Figure 2. The Wagner model. The model describes interactionsbetween a set of genes. a) Each gene is regulated by a set ofproximal elements, such as promoters. b) Each gene receives anexternal input (depicted here with short, unattached arrows).Interactions between genes are not symmetric, and self regulation(depicted here with short, circular arrows) is possible. c)Interactions between target genes and regulator genes arequantified and represented as a matrix. Interactions can beactivating (positive values) or repressing (negative values). A '0'indicates no interaction.



u ¼PN

i¼1Iiwi and the state of the neuron (activated or repressed) is

given by the step function,

y ¼1 if u > u

0 otherwise;

(

ð4Þ

where u is the excitatory threshold.The McCulloch and Pitts neuron was revolutionary because itintroduced a threshold gated logical system that could performcomplex logical operations by combining multiple neurons. Forexample, Boolean algebra operates on two values, false and true.Boolean operations like AND, OR, and XOR (exclusive OR) arelogic gates that return true and false values (Table 2). Bycombining two outputs y1 and y2, the McCulloch and Pitts modelcould perform these basic logical operations. The most influentialpart of the McCulloch and Pitts model for evolutionary studies isthe idea of a threshold governing interaction dynamics. Manybiochemical reactions appear to follow a Hill function (Hill, '10)with a strong sigmoidal shape. In the limit, this leads to athreshold model. However, there is no a priori reason to modelgenetic interactions with functions borrowed from biochemistry,and functional forms of gene expression states could also be animportant component of network dynamics.

The Perceptron and the Multilayer PerceptronDespite the logical capabilities of theMcCulloch and Pitts neuron,it still lackedmany features. Themodel requiredmultiple neuronsto perform logical operations, and it could not learn todiscriminate between different possibilities. Frank Rosenblatt, acomputer scientist working at Cornell, developed a modifiedneuron that was the first artificial neural structure capable oflearning by trial and error (Rosenblatt, '57, '58) Rosenblatt'sperceptron draws on McCulloch and Pitts model with two keydifferences: 1) weights and thresholds for different inputs do nothave to be identical; and 2) there is no absolute inhibitory input.These changes create the asymmetry in inputs and interactionsthat is necessary for learning, and Rosenblatt introduced a

learning rule to train the neuron to different target outputs.Learning is framed as a bounded problem: What are the weightsthat produce a given output from a given input? The learningalgorithm works by taking a random neuron and updating theweights wi until the perceptron associates output with input. Thisperiod is called training the network. For biological studies, theperceptron's most influential contribution was that of a learningrule. Biological simulations evolve networks with mutationalgorithms that are similar to the learning algorithms developedby Rosenblatt, with phenotype replacing the learning target andevolutionary fitness guiding the input output association.The perceptron still could not perform complex operations,and in 1961 Rosenblatt developed the multilayered perceptron(Rosenblatt, '61, '62). The major difference is a layer of “hidden”units between the input and output. The inputs connect tothe hidden units with weights wij and the hidden units connectto the output with additional weights wij. As before, learningoccurs by updating the weights until the network produces thedesired output from a given input. The multilayered perceptronis capable of performing the XOR function and other complexlogical operations (Table 2). In modern terms, the multilayeredperceptron is often called a “feedforward” neural networkbecause information flows from the inputs through the hiddenunits to the output and there are no backflows. The originalmultilayered perceptron structure and algorithms are still usedin modern neural network based machine learning (Rojas, '96).

Kauffman's Gene NetworkIn the late 1960's, Stuart Kauffman published a model of a set ofN genes connected by k inputs. Similar to the McCulloch and Pittmodel, genes are binary, with only “on” or “of” states, but in thiscase the activation or repression of each gene is computed as aBoolean function of the inputs it receives from other genes(Kauffman, '69). Instead of true and false, the gene receivesactivating (“1”) or repressing (“0”) signals. For two inputs, thereare 16 Boolean functions possible; for three inputs, there are 256Boolean functions. The network is initialized at some state at timeT, and the Boolean functions determine which genes are on or ofat time Tþ 1. The system must go through several updates toreach a stable state, and some systems do not reach stability.These networks instead cycle continuously with different genesturning on and off at each time step.Kauffman focused extensively on the number of updatesnecessary to reach stability (the cycle length) for differentpatterns of connections, as well as the relationship betweennoise perturbations and cycle length. In his 1969 paperKauffman cited a dissertation that studied cycle length inneural networks, and his model was likely inspired by similarwork on updates and cycling in perceptrons and multilayeredperceptrons. Although Kauffman did not cite the Ising model inhis 1969 work, he did reference thermodynamic studies of gasesand incorporated the idea of transitions. Kauffman's gene

Table 2. Logical operations between two Boolean values, A and B.

A B A AND B A OR B A XOR B

True True True True FalseTrue False False True TrueFalse True False True TrueFalse False False False False

When A and B are both true the AND operation produce true, OR returnstrue if either A or B is true, and XOR returns true if either A or B, but notboth, is true.



networks go through multiple activation/repression statetransitions, with one possible outcome being chaos, anaperiodic sequence of states that exhibits a failure to reach astable equilibrium.Kauffman's work was hugely influential for several reasons. Inaddition to transitions, a key concept is that of the relationshipbetween noise perturbations and stability. Kauffman exten-sively studied the properties that cause some gene networks torecover quickly from noise perturbations while others are throwninto chaos. In modern research, the idea of robustness toperturbations is well established across scientific fields rangingfrom computer science studies of the world wide web tomolecular and developmental biology (Wagner, 2005; Maseland Siegal, 2009).

Spin GlassesSpin glasses are magnetic systems with disordered interactions(as opposed to the ordered interactions of the Ising model). Sincespin glasses do not possess symmetries, they are difficult tostudy and were one of the first mathematically modeledcomplex systems. Spin glass models represent particles withvarying distances between them and the conflicting ferromag-netic and antiferromagnetic interactions a given spin has withlong and short ranged neighbors can result in frustration, theinability of a system to meet multiple constraints (Sherringtonand Kirkpatrick, '75). The system therefore exists on a landscapewith multiple stable states, and both local and global optima,where temperature determines the ruggedness of the landscape.Because of these features, spin glass models have been used tostudy complex systems in physics, biology, economics, andcomputer science (Stein and Newman, 2012).Kauffman's NK model (Kaufman, '93) is a special case of a spinglass where allelic values take the place of magnetic particles andfitness replaces the energy of the system. In both Kauffman's NKmodel and Wagner's gene network model, the temperatureparameter that determines the ruggedness of the spin glass systemis the inverse of the population selection coefficient. Spin glassesseek low energy configurations while populations seek highfitness peaks, and temperature determines the ruggedness of theunderlying physical landscape while selection determines theruggedness of the fitness landscape.

The Hopfield NetIn 1982 J.J. Hopfield, a physicist working at the CaliforniaInstitute of Technology, published a new kind of neuralnetwork model (Hopfield, '82). He investigated the evolutionof neural circuits, specifically how a complex contentaddressable memory system, capable of retrieving entirememories given partial information, could result from thecollective action of multiple, interacting simple structures.Hopfield proposed that synchronized, feed forward models,such as those developed by McCulloch and Pitts and

Rosenblatt, did not sufficiently capture the complexityinherent in natural systems and that interesting dynamicswould likely result from asynchronous networks that includeboth forward and backward movement of information. Thesimple addition of feedback into the network produced thefirst recurrent neural networks.Hopfield nets are composed of units with activation/repressionstates of “1” or “0.” The state of the unit is given by the stepfunction,

si ¼� 1 if

P

jwijsj > ui

0 otherwise;ð5Þ

where ui is the threshold for unit i, wij is the strength of theconnection from unit i to unit j, and sj is the state of unit j. Themodel works by randomly picking units and then updatingthe activation functions. Through training theneural system learnsto associate inputwith output.Memories (outputs) can be retrievedby giving the system pieces of information (inputs). Hopfield'smodel relies on large simplifying assumptions that have helped tofacilitate significant progress on scientific understanding oflearning and memory. For example, he distilled complex neuralcircuitry into a system of multiple binary interactors, as well asdefining learning as an input output association.Hopfield did not specify a general connection pattern, but hedid explore the case where there are no self connections (wii¼ 0)and all connections are symmetric (wij¼wji). If these conditionsare met, the network moves steadily towards the configurationthat minimizes the energy function

E ¼ �12

X

i;j

wijsisj �X

i

uisi: ð6Þ

Although Hopfield intended to study complexity, the modelcan be studied analytically due to its symmetry and deterministicinteractions. Relaxing these assumptions produces complexmodels that can be used as efficient learning structures (i.e.,the Boltzmann Machine that follows) and in evolutionarysimulations (the Wagner model).

The Boltzmann MachineBoltzmannMachines are similar in structure to Hopfield Nets, butthe authors (Hinton and Sejnowski, '83a,b) made one keymodification: in a Boltzmann Machine the units are stochasticand turn on and off according to a probability function. Theprobability of a unit's state is determined by the difference innetwork energy that results from unit i being on or off,

DEi ¼X

j

wijsj � ui: ð7Þ



The energy of each state has a relative probability given by theBoltzmann distribution, for example Ei¼of f ¼ � kBTlnðpi¼of f Þ:The probability that the ith unit is on is then given by

pi¼on ¼1

1þ e�DEiT

; ð8Þ

where T is the temperature of the system. The model operates byselecting one unit at random and then calculating its activationaccording to its connections with all other units. After multipleupdate steps, the network will converge to the energy minimagiven in Equation (5) because it retains the same interactionsymmetry. These stochastic, recurrent neural networks have thesame mathematical structure as the Wagner gene network.

The Wagner Gene NetworkThe network is the interactions that control the regulation of theset of genes, {G1, ..., GN} where N is the number of genes (Fig. 2).Genes activate or repress other genes through the proteinproducts they express {P1, ..., PN}, where Pi is the expressionstate of gene Gi normalized to the interval [0, 1]. The genotypeis the NxN matrix of regulatory interactions, W, where eachelement wij is the strength with which gene j regulates gene i.Protein levels determine the expression state of each gene, Si,which ranges from complete repression (Si¼ � 1) to activation

(Si¼ 1) according to ~S ¼ 2~P � ð1; :::; 1Þ. The network is adynamical system that begins with some initial state of proteinlevels, and subsequent expression states are given by

Siðt þ tÞ ¼ s½PN

i¼1wijSjðtÞ�:

Wagner stated that the function s could be represented with asigmoid, 1=½1þ expð� cxÞ�, but he used the sign function

sðxÞ ¼

� 1 if x < 0

1 if x > 0

0 if x ¼ 0;

8><

>:ð9Þ

to reduce the system's sensitivity to perturbation.Each networkmay converge to a stable equilibriumwhere geneexpression levels do not change, or it may cycle between differentgene expression states. Wagner explicitly studied the class ofnetworks that converge to a stable equilibrium, while Siegal andBergman (2002) defined this attainment of a stable equilibrium“developmental stability” and modeled an evolutionary scenariowhere each individual was required to attain developmentalstability for inclusion in the population. This resulted inindividuals that rapidly attained developmental stability, andhad lower sensitivity to mutation. Implementing this model inpopulation simulations has been a popular way to study epistasis(non linear interactions between the alleles at different loci) and

complex genetic interactions and it has been used to explore theeffects of mutation, recombination, genetic drift, and environ-mental selection in a network context (e.g., Bergman andSiegal, 2003; Masel, 2004; Azevedo et al., 2006; MacCarthy andBergman, 2007; Borenstein et al., 2008; Draghi and Wagner,2009; Espinosa Soto and Wagner, 2010; Espinosa Soto et al.,2011; Fierst, 2011a,b; Le Cunff and Pakdaman, 2012). Severalgroups (Huerta Sanchez and Durrett, 2007; Sevim and Rikvold,2008; Pinho et al., 2012) have analyzed the model to identifyhow networks change when they transition from oscillatorydynamics to developmental stability. These analyses have notuncovered descriptive metrics that separate stable from unstablenetworks. Instead, the behaviors seem to result from complex,multi way interactions between genes that can be observeddynamically but not measured through connection strengths orpatterns.

Restricted Boltzmann Machines and Deep LearningThe Boltzmann Machine, which as we have seen is structurallysimilar to the Wagner model, is a straightforward extension ofprevious neural networks but these alterations significantlychange the model's dynamical behavior and practical imple-mentation. For example, the networks must be small because ofthe significant computing time required to train the model.Smolensky ('86) expanded the Boltzmann machine to create aRestricted Boltzmann Machine (RBM, also called a “Harmo-nium”) by eliminating intra layer connections in a RBM. Thisreduces the computational time required to train the model andallows the Boltzmann machine framework to be used for largenetworks. Once trained, the network can discriminate betweeninputs and outputs and thereby generate specific patterns. Forexample, one of the classic examples in artificial intelligence isdigit recognition. One may seek to train a network that canrecognize each of the 10 digits (0–9) and discriminate betweendigits in future data, as well as to understand the similarities anddifferences in the data being used to train the network. In this case,the network from the RBM can be used both to generate patternsand to subsequently study the similarities and differences in thosepatterns.Stochastic neural networks generated in this fashion facilitateefficient training algorithms and can be used for both supervisedand unsupervised learning. In supervised learning one providesthe network a labeled set of data and then asks it to learn afunction that discriminates between the labeled sets. Inunsupervised learning one gives the network an unlabeled setof data and then asks it to both identify the relevant classes andlearn to discriminate between them. Unsupervised learning isrelevant for biology because biological variables are connectedby functions that are too complex to extract from highdimensional data, or to identify from first principles (Tarcaet al., 2007). Given sufficient training data, computer algorithmscan learn functions that describe subtle correlations and complex



relationships. Currently, clustering algorithms are the primaryunsupervised methods used in biology, but sophisticatedtechniques like RBMs are better suited for handling large datasetsand complex associations.For example, RBMs have been developed into Deep Learningnetworks by increasing the number of hidden layers. These deepernetworks process complex data with greater accuracy and arethought to approximate the structure of the cortex and humanbrain (Hinton, 2007; Bengio, 2009). Deep learning networks wereproposed in the mid 80's (Smolensky, '86) but proved difficult toimplement and train andwere not successfully implemented until2006 (Hinton et al., 2006; Le Roux and Bengio, 2008). Since then,the increase in accuracy and precision that deep networks providehas led them to become the focus of extensive research efforts(reviewed in Yu and Deng, 2011). Investment by tech companieslike Google has also been rapidly pushing this technologyforward. Deep learning was used to redesign the Google Androidvoice recognition system, and it was named one of the“10 Breakthrough Technologies 2013” by the MIT TechnologyReview (Hof, 2013) .Currently, thesemethods are used on image and voice problemswhere it is possible to inexpensively generate large amounts ofdata for training and validation. Bringing these methods intobiology may allow us to better discriminate patterns in complexbiological datasets. For example, it is now feasible to measure theexpression level of every gene in the genome in response toenvironmental or genetic changes (Rockman and Kruglyak,2006), yet these measurements by themselves do not allow us tounderstand the relationships between the genes that result inchanges in expression level. Deep learning techniques have thepotential to provide insights into these problems, but part ofthe success in current approaches derives from the fact that theyare being developed for success in very specific applications,for example problems of voice recognition. Biological data areheterogeneous, and work is needed, for example, on using thesemethods with data with different dispersion parameters anddistributions. Developing these specific techniques into generalapplications is a formidable task, but it will provide deeperbiological insights.

DISCUSSIONHere, we have reviewed the history of the Wagner gene networkmodel in the context of network based analytical approaches.Given this broader context, what then are the most importantnext steps for the field? While biologists should continue to usethese models to address open concepts and general theory, theseapproaches will benefit from adopting features from specificbiological systems and by pairing theory with rigorous empiricaltests. Further, use of the sophisticated machine learningtechniques that share the same core intellectual history as theWagner model should allow us to more tightly integrate complexempirical datasets into these approaches.

The original network models were developed to address opentheoretical problems in physics and neuroscience. Lenz ('20) andIsing ('25) formulated a tractable model by introducing keyassumptions and simplifying the system, and they were able toproduce a model that was later used to solve one of the criticaltheoretical challenges of the 20th century. McCulloch and Pitts('43) and Rosenblatt ('57, '58, '61, '62) did the same for neurons,and thereby solved the problem of logical operations. In contrast,as genetic and cellular technologies have developed over the last15 years, model building in biology has shifted towards highlyspecified, mechanistic models that closely approximate empiricalsystems (i.e., Matias Rodrigues, 2009; Wagner, 2009). Thesemodels are useful for deepening our understanding of specificorganisms and systems but can not make predictions onevolutionary timescales or formulate general theory regardingnetwork dynamics. Levins ('66) identified three axes for modelsrealism, precision and generality asserting that models couldonly succeed on two of these three axes. As we have discussedhere, in both physics and neuroscience, scientists made progresson crucial problems by moving away from the complicatingdetails of real systems. We may simply not be able to makeprogress on the big questions in the biology of complex systemswithout developing new models with the appropriate level ofabstraction.Indeed, the Wagner network has been most successfully used tostudy general, system level properties. As discussed above, Siegaland Bergman (2002) published an analysis of canalization anddevelopmental plasticity and followed this with a study of geneknockouts and evolutionary capacitance (Bergman and Siegal,2003). Later studies have addressed genetic assimilation (Masel,2004), the maintenance of sexual reproduction (Azevedo et al.,2006; MacCarthy and Bergman, 2007), the evolution of evolv-ability and adaptive potential (Draghi and Wagner, 2009;Espinosa Soto et al., 2011; Fierst, 2011a,b; LeCunff andPakdaman,2012), macroevolutionary patterns (Borenstein et al., 2008),reproductive isolation (Palmer and Feldman, 2009), and modu-larity (Espinosa Soto and Wagner, 2010). In addition, a body ofwork has used the model to address the dynamics of genetic andphenotypic evolution in a network context. For example, Cilibertiet al. (2007) investigated the number of evolutionary stepsseparating phenotypic states and Siegal et al. (2007) studied therelationship between network topology and the functional effectsof gene knockouts.The generality in the model means that many of thesefindings are applicable to systems across disciplines. Forexample, Wagner's 1994 paper addressed the somewhat specificquestion of the impact of gene duplication on the structureof genetic networks. However, there is no reason why thisquestion could not be rephrased so as to allow it to address themuch more general question of how systems operate whenspecific components are duplicated. It is likely that, similarto Wagner's findings for biological networks, most systems



function best as entire duplicates, or with the smallest possibledisturbance (i.e., duplication of a single component). Indeed,this model was developed as a learning structure by Hopfield('82) and many of its properties in evolutionary simulationsstem from these origins. Similarly, are biological networksessentially learning structures? We can broadly define learningas the process by which a specific input is associated with aspecific output, and a learning structure as any system thataccomplishes this. A genetic network processes informationabout the external environment or the state of the cell in theform of gene product inputs, similar to how a neural circuitprocesses sensory input. For example, the aim of Hopfield'scontent addressable memory was a system that could retrievecomplete information, i.e. “H. A. Kramers and G. H. WannierPhys. Rev. 60, 252 ('41)” from a partial input “and Wannier,('41)” (Hopfield, '82). If we simplify this to binary information,this translates directly to the Wagner gene network andgenerates a system that can robustly produce a completeactivation/repression state from few gene inputs (e.g., “001001”from the partial input “01”). Genetic regulatory and neuralcircuits are two types of complex systems that share manyfundamental characteristics and properties. The challenge is toidentify which of these characteristics and properties are sharedand which of these are unique to a particular system.Ultimately, biological theory must be empirically tested to bevalid and we still lack rigorous tests of many of these models.The biggest roadblock in connecting theoretical and empiricalwork on biological networks is that we lack biologically orevolutionarily meaningful network measures. Creating anappropriate measurement theory for biological networks iscrucial to connect network theory with empirical tests andbiological systems rather than simply parroting networkmeasures developed for other fields (Proulx et al., 2005).Without a coherent link between theory and empirical tests,abstract models and the principles that emerge from themremain difficult to evaluate.We argue that continuing to use assumptions that are notrelevant for and/or compatible with studying gene regulationlimits the broader impacts of the Wagner model. For example,the Wagner gene network is usually implemented with a smallnumber of genes interacting simultaneously. Pinho et al. (2012)demonstrated that the number of networks able to converge to astable equilibrium drops precipitously as network size increasesand therefore that the model can only be used to address smallsystems. This is a general property of models based on randominteraction matrices and, given that most biological networksappear to be fairly large and interconnected, suggests that thelinearmaps used in thesemodelsmaynot be fully representative ofbiological networks. For example, May ('72) modeled ecologicalsystems as randommatrices of n species interacting with strengtha, where the connectance C is the probability that any two specieswill interact. May showed that ecological systems transition from

stability to instability as (1) the number of species grows, (2)interaction strength increases, and (3) connectance increases. Hetherefore suggested that large multi species communities arelikely organized into smaller “blocks” of stably interacting species.This is analogous to the developmental genetics concept ofmodularity, the organization of a larger system into smallerfunctional units (Wagner and Altenberg, '96). Together, thesefindings suggest that the temporal and spatial organization ofinteractions is a crucial aspect of complex systems that has yet tobe addressed in current implementations of the Wagner genenetwork. The emphasis on developmental stability itself may beproblematic because many genetic circuits appear to rely onoscillating expression patterns (Moreno Riseuno et al., 2010).Biologicallymeaningful alterations of theWagner genenetworkare clearly possible. For example, modern molecular biologyrepresents gene and gene product interactions through hierarch-ical, directed pathways. Although not commonly done, this can bemodeled with the asymmetric interactions which are a key featureof the Wagner model. For example, Draghi and Whitlock (2012)altered the Wagner gene network towards a directed model in astudy of the evolution of phenotypic plasticity. Their modelrepresents interactions between a set of genes that receiveenvironmental cues and a 2nd set of genes that determine valuesfor two traits. Another fundamental assumption of the Wagnergene network is that the influence of all genes in the network onany one gene is additive. However, extending this framework byexplicitlymodeling protein levels through synthesis, diffusion anddecay rates has proved successful in modeling the regulatoryinteractions in the Drosophila melanogaster gap gene network(Jaeger et al., 2004a,b; Perkins et al., 2006).Lastly, the fields of artificial intelligence and machine learninghave successfully expanded similar models by specificallyadapting the patterns of connections. The resulting RestrictedBoltzmann Machines and Deep Learning Networks are highlytuned as learning structures, and capable of discerning complexrelationships from high dimensionalunlabeled data. Using thesenew techniques to analyze biological data will provide newinsights into the structure and function of biological systems.These methods are just now starting to show up in biologicalliterature (i.e., Makin et al., 2013; Wang and Zeng, 2013) and wehope that by highlighting them, more biologists will recognizeand utilize these methods. However, it remains to be seen whethera computational model that is good at predicting complexbiological patterns is isomorphic to the functional model thatactually generated those patterns. Does the latter cast a strongenough shadow to be revealed in the former, or will predictingnetworks, while useful from a statistical point of view, provideonly the illusion that we understand the underlying functionalrelationships among genes?The Wagner gene network draws on a long history ofmathematical modeling in physics and neuroscience. The modelhas been very influential because it connects evolutionary ideas



with empirically observed genetics and the emerging fields ofcomplex systems and network analysis. In the years since it wasfirst published, this model has been used to study evolutionarypatterns and processes, evolution in a network context, andproperties of genetic networks. The historical review providedhere is intended to enhance both understanding of the Wagnermodel itself and to connect the evolutionary literature to relevantwork in other disciplines, with the ultimate goal of deepening themeaning and use of complex network models in biology.

ACKNOWLEDGMENTSSupport provided by an NSF Postdoctoral Fellowship inBiological Informatics to JLF and a grant from the NationalInstitutes of Health (GM096008) to PCP.

LITERATURE CITEDAmores A, Force A, Yan YL, et al. 1998. Zebrafish hox clusters andvertebrate genome evolution. Science 282:1711–1714.Arabidopsis Genome Initiative. 2000. Analysis of the genomesequence of the flowering plant Arabidopsis thaliana. Nature408:796–815.Azevedo RBR, Lohaus R, Srinivasan S, Dang KK, Burch CL. 2006. Sexualreproduction selects for robustness and negative epistasis inartificial gene networks. Nature 440:87–90.Bengio Y. 2009. Learning deep architectures for AI. Found TrendsMach Learn 2:1–127.Bergman A, Siegal M. 2003. Evolutionary capacitance as a generalfeature of complex gene networks. Nature 424:549–552.Borenstein E, Krakauer DC, Bergstrom CT. 2008. An end to endlessforms: epistasis, phenotype distribution bias, and nonuniformevolution. PLoS Comput Biol 4:e1000202.Brush S. 1967. History of the Lenz Ising model. Rev Mod Phys 39:883–893.Ciliberti S, Martin OC, Wagner A. 2007. Innovation and robustnessin complex regulatory gene networks. Proc Natl Acad Sci 104:13591–13596.Draghi JA, Wagner GP. 2009. The evolutionary dynamics ofevolvability in a gene network model. J Evol Biol 22:599–611.Draghi JA, Whitlock MC. 2012. Phenotypic plasticity facilitatesmutational variance, genetic variance, and evolvability along themajor axis of environmental variation. Evolution 66:2891–2902.Espinosa Soto C, Martin OC, Wagner A. 2011. Phenotypic plasticitycan facilitate adaptive evolution in gene regulatory circuits. BMCEvol Biol 11:1–14.Espinosa Soto C, Wagner A. 2010. Specialization can drive theevolution of modularity. PLoS Comput Biol 6:e1000719.Fierst JL. 2011a. A history of phenotypic plasticity acceleratesadaptation to a new environment. J Evol Biol 24:1992–2001.Fierst JL. 2011b. Sexual dimorphism increases evolvability in a geneticregulatory network. Evol Biol 38:52–67.Hardy GH. 1908. Mendelian proportions in a mixed population.Science 28:49–50.

Hill AV. 1910. The possible effects of the aggregation of the moleculesof haemoglobin on its dissociation curves. J Physiol 40:iv–vii.Hinton GE. 2007. Learning multiple layers of representation. TrendsCogn Sci 11:428–434.Hinton GE, Osindero S, Teh YW. 2006. A fast learning algorithm fordeep belief nets. Neural Computation 18:1527–1554.Hinton GE, Sejnowski TJ. 1983a. Analyzing cooperative computation.Proceedings of the 5th Annual Congress of the Cognitive ScienceSociety, Rochester, NY.Hinton GE, Sejnowski TJ. 1983b. Optimal perceptual inference.Proceedings of the IEEE conference on computer vision and patternrecognition (CVPR) (IC Society, ed). Washington, DC. p 448–453.Hof RD. 2013. Deep learning. MIT Technology Review, April 23.Hopfield J. 1982. Neural networks and physical systems withemergent collective computational abilities. Proc Natl Acad Sci79:2554.Huerta Sanchez E, Durrett R. 2007. Wagner's canalization model.Theoretical Population Biology 71:121–130.Ising E. 1925. Beitrag zur theorie des ferromagnetismus. Z Phys 31:253–258.Jaeger J, Blagov M, Kosman D, et al. 2004. Dynamical analysis ofregulatory interactions in the gap gene system of Drosophilamelanogaster. Genetics 167:1721–1737.Jaeger J, Surkova S, Blagov M, et al. 2004. Dynamic control ofpositional information in the early Drosophila embryo. Nature430:368–371.Kauffman SA. 1969. Metabolic stability and epigenesis in randomlyconnected nets. J Theor Biol 22:437–467.Kauffman SA. 1974. The large scale structure and dynamics of genecontrol circuits: an ensemble approach. J Theor Biol 44:167–190.Kauffman SA. 1993. The origins of order. Oxford, England: OxfordUniversity Press.Le Cunff Y, Pakdaman K. 2012. Phenotype genotype relation inWagner's canalization model. J Theor Biol 314:69–83.Le Roux N, Bengio Y. 2008. Representational power of restrictedBoltzmann machines and deep belief networks. Neural Comput20:1631–1649.Lenz W. 1920. Beitrage zum Verständnis der magnetischenEigenschaften in festen Körpern. Phys Zeitschr 21:613–615.Levins R. 1966. The strategy of model building in population biology.Am Sci 54:421–431.MacCarthy T, Bergman A. 2007. Coevolution of robustness, epistasis,and recombination favors asexual reproduction. Proc Natl Acad Sci104:12801–12806.Makin JG, Fellows MR, Sabes PN. 2013. Learning multisensoryintegration and coordinate transformation via density estimation.PLoS Comput Biol 9:e1003035.Masel J. 2004. Genetic assimilation can occur in the absence ofselection for the assimilating phenotype, suggesting a role for thecanalization heuristic. J Evol Biol 17:1106–1110.Masel J. 2009. Robustness: mechanisms and consequences. TrendsGenet 25:395–403.



Matias Rodrigues JF, Wagner A. 2009. Evolutionary plasticity andinnovations in complex metabolic reaction networks. PLoS ComputBiol 5:e1000613.May RM. 1972. Will a large complex system be stable?. Nature238:413–414.McCulloch W, Pitts W. 1943. A logical calculus of the ideas immanentin nervous activity. Bull Math Biophys 7:115–133.Moreno Risueno MA, Van Norman JM, Moreno A, et al. 2010.Oscillating gene expression determines competence for periodicArabidopsis root branching. Science 329:1306–1311.Onsager L. 1944. Crystal statistics. I. A two dimensional model withan order disorder transition. Phys Rev 65:117–149.Palmer ME, FeldmanMW. 2009. Dynamics of hybrid incompatibility ingene networks in a constant environment. Evolution 63:418–431.Peierls R. 1936. On Ising's model of ferromagnetism. Proc CambridgePhilosophical Soc 32:477.Perkins TJ, Jaeger J, Reinitz J, Glass L. 2006. Reverse engineeringthe gap gene network of Drosophila melanogaster. PLoS ComputBiol 2:e51.Pinho R, Borenstein E, FeldmanMW. 2012. Most networks inWagner'smodel are cycling. PLoS ONE 7:e34285.Proulx SR, Promislow DEL, Phillips PC. 2005. Network thinking inecology and evolution. Trends Ecol Evol 20:345–353.Provine WB. 1971. The origins of theoretical population genetics.Chicago: University of Chicago Press.Rockman MV, Kruglyak L. 2006. Genetics of global gene expression.Nat Rev Genet 7:862–872.Rojas R. 1996. Neural networks: a systematic introduction. New York:Springer.Rosenblatt F. 1957. The perceptron: a perceiving and recognizingautomaton. Cornell Aeronautical Laboratory Report 85:460–461.Rosenblatt F. 1958. The perceptron: a probabilitistic model forinformation storage and organization in the brain. Psychol Rev65:386–408.Rosenblatt F. 1961. Principles of neurodynamics: perceptrons and thetheory of brain mechanisms. Technical report. Cornell AeronauticalLaboratory.Rosenblatt F. 1962. Principles of neurodynamics: perceptrons and thetheory of brain mechanisms. Washington, DC: Spartan Books.Sevim V, Rikvold P. 2008. Chaotic gene regulatory networks can berobust against mutations and noise. J Theor Biol 253:323–332.Sherrington D, Kirkpatrick S. 1975. Solvable model of a spin glass.Phys Rev Lett 35:1792–1796.

Siegal M, Bergman A. 2002. Waddington's canalization revisited:developmental stability and evolution. Proc Natl Acad Sci 99:10528–10532.Siegal ML, Promislow DEL, Bergman A. 2007. Functional andevolutionary inference in gene networks: does topology matter?.Genetica 129:83–103.Smolensky P. 1986. Information processing in dynamical systems:Foundations of harmony theory. Cambridge, MA: MIT Press.Stein DL, Newman CM. 2012. Spin glasses: old and new complexity.Complex Syst 20:115–126.Stutz C, Williams B. 1999. Ernst Ising (obituary). Phys Today 52:106–108.Tarca AL, Carey VJ, Chen XW, Romero R, Draghici S. 2007.Machine learning and its applications to biology. PLoS ComputBiol 3:e116.Turing AM. 1936. On computable numbers, with an application to theEntscheidungs problem. Proc Lond Math Soc 42:230–265.Turing AM. 1938. On computable numbers, with an application tothe Entscheidungs problem: a correction. Proc Lond Math Soc43:544–546.von Dassow G, Meir E, Munro EM, Odell GM. 2000. The segmentpolarity network is a robust developmental module. Nature406:188–192.Waddington CH. 1942. Canalization of development and theinheritance of acquired characters. Nature 150:563–565.Waddington CH. 1957. The strategy of the genes: a discussion of someaspects of theoretical biology. London: Allen and Unwin.Wagner A. 1994. Evolution of gene networks by gene duplications amathematical model and its implications on genome organization.Proc Natl Acad Sci 91:4387–4391.Wagner A. 1996. Does evolutionary plasticity evolve?. Evolution50:1008–1023.Wagner A. 2005. Robustness and evolvability in living systems.Princeton, NJ: Princeton University Press.Wagner GP, Altenberg L. 1996. Perspective: complex adaptations andthe evolution of evolvability. Evolution 50:967–976.Wang YH, Zeng JY. 2013. Predicting drug target interactions usingrestricted Boltzmann machines. Bioinformatics 29:126–134.WeinbergW. 1908. Uber den nachweis der vererbung beimmenschen.Jahresh Ver Vaterl Naturkd Württemb 64:368–382.Yu D, Deng L. 2011. Deep learning and its applications to signal andinformation processing [exploratory, dsp]. IEEE Signal Process Mag28:145–154.



modeling the evolution of complex genetic systems: the...

Documents