associative learning in biochemical networks

9
Journal of Theoretical Biology 249 (2007) 58–66 Associative learning in biochemical networks Nikhil Gandhi a , Gonen Ashkenasy b, , Emmanuel Tannenbaum b, a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA b Department of Chemistry, Ben-Gurion University of the Negev, Be’er-Sheva 84105, Israel Received 4 January 2007; received in revised form 4 July 2007; accepted 5 July 2007 Available online 18 July 2007 Abstract It has been recently suggested that there are likely generic features characterizing the emergence of systems constructed from the self- organization of self-replicating agents acting under one or more selection pressures. Therefore, structures and behaviors at one length scale may be used to infer analogous structures and behaviors at other length scales. Motivated by this suggestion, we seek to characterize various ‘‘animate’’ behaviors in biochemical networks, and the influence that these behaviors have on genomic evolution. Specifically, in this paper, we develop a simple, chemostat-based model illustrating how a process analogous to associative learning can occur in a biochemical network. Associative learning is a form of learning whereby a system ‘‘learns’’ to associate two stimuli with one another. Associative learning, also known as conditioning, is believed to be a powerful learning process at work in the brain (associative learning is essentially ‘‘learning by analogy’’). In our model, two types of replicating molecules, denoted as A and B, are present in some initial concentration in the chemostat. Molecules A and B are stimulated to replicate by some growth factors, denoted as G A and G B , respectively. It is also assumed that A and B can covalently link, and that the conjugated molecule can be stimulated by either the G A or G B growth factors (and can be degraded). We show that, if the chemostat is stimulated by both growth factors for a certain time, followed by a time gap during which the chemostat is not stimulated at all, and if the chemostat is then stimulated again by only one of the growth factors, then there will be a transient increase in the number of molecules activated by the other growth factor. Therefore, the chemostat bears the imprint of earlier, simultaneous stimulation with both growth factors, which is indicative of associative learning. It is interesting to note that the dynamics of our model is consistent with certain aspects of Pavlov’s original series of conditioning experiments in dogs. We discuss how associative learning can potentially be performed in vitro within RNA, DNA, or peptide networks. We also describe how such a mechanism could be involved in genomic evolution, and suggest relevant bioinformatics studies that could potentially resolve these issues. r 2007 Elsevier Ltd. All rights reserved. Keywords: Associative memory; Associative learning; Biochemical networks; RNA world 1. Introduction Emerging evidence suggests that much of the so-called ‘‘junk’’ DNA in complex, multi-cellular eukaryotic organ- isms in fact codes for a vast, RNA-based, genetic regulatory network (Claverie, 2005; Costa, 2005; Dennis and Omer, 2005; Green and Doudna, 2006; Herbert, 2004; Herbert and Rich, 1999; Laaberki and Repoila, 2003; Mattick, 2005; Mattick and Makunin, 2006; Moulton, 2005; Plasterk, 2006; Wassarman, 2004). In addition to the regulatory roles in encoding protein structure, RNA is involved in other processes such as gene silencing, catalysis of chemical reactions, and sensing self- and non-self analytes. It is believed that this RNA biochemistry is responsible for the variety and complexity of terrestrial life. Indeed, it has been suggested that this RNA biochemistry is essentially a kind of ‘‘RNA computer’’, whose existence provides the key to correlating genome size with organis- mal complexity (the so-called ‘‘C-value’’ paradox) (Taft et al., 2007). Since the RNA biochemistry emerged through a long process of replicative selection, it is likely that there are large subnetworks of RNA interactions that are essentially biochemical implementations of some fairly sophisticated ARTICLE IN PRESS www.elsevier.com/locate/yjtbi 0022-5193/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2007.07.004 Corresponding authors. Tel.: +972 54 599 8278. E-mail addresses: [email protected] (Gonen Ashkenasy), [email protected] (E. Tannenbaum).

Upload: nikhil-gandhi

Post on 14-Jul-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Associative learning in biochemical networks

ARTICLE IN PRESS

0022-5193/$ - se

doi:10.1016/j.jtb

�CorrespondE-mail addr

emanuelt@bgu

Journal of Theoretical Biology 249 (2007) 58–66

www.elsevier.com/locate/yjtbi

Associative learning in biochemical networks

Nikhil Gandhia, Gonen Ashkenasyb,�, Emmanuel Tannenbaumb,�

aCollege of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USAbDepartment of Chemistry, Ben-Gurion University of the Negev, Be’er-Sheva 84105, Israel

Received 4 January 2007; received in revised form 4 July 2007; accepted 5 July 2007

Available online 18 July 2007

Abstract

It has been recently suggested that there are likely generic features characterizing the emergence of systems constructed from the self-

organization of self-replicating agents acting under one or more selection pressures. Therefore, structures and behaviors at one length

scale may be used to infer analogous structures and behaviors at other length scales. Motivated by this suggestion, we seek to

characterize various ‘‘animate’’ behaviors in biochemical networks, and the influence that these behaviors have on genomic evolution.

Specifically, in this paper, we develop a simple, chemostat-based model illustrating how a process analogous to associative learning can

occur in a biochemical network. Associative learning is a form of learning whereby a system ‘‘learns’’ to associate two stimuli with one

another. Associative learning, also known as conditioning, is believed to be a powerful learning process at work in the brain (associative

learning is essentially ‘‘learning by analogy’’). In our model, two types of replicating molecules, denoted as A and B, are present in some

initial concentration in the chemostat. Molecules A and B are stimulated to replicate by some growth factors, denoted as GA and GB,

respectively. It is also assumed that A and B can covalently link, and that the conjugated molecule can be stimulated by either the GA or

GB growth factors (and can be degraded). We show that, if the chemostat is stimulated by both growth factors for a certain time,

followed by a time gap during which the chemostat is not stimulated at all, and if the chemostat is then stimulated again by only one of

the growth factors, then there will be a transient increase in the number of molecules activated by the other growth factor. Therefore, the

chemostat bears the imprint of earlier, simultaneous stimulation with both growth factors, which is indicative of associative learning. It is

interesting to note that the dynamics of our model is consistent with certain aspects of Pavlov’s original series of conditioning

experiments in dogs. We discuss how associative learning can potentially be performed in vitro within RNA, DNA, or peptide networks.

We also describe how such a mechanism could be involved in genomic evolution, and suggest relevant bioinformatics studies that could

potentially resolve these issues.

r 2007 Elsevier Ltd. All rights reserved.

Keywords: Associative memory; Associative learning; Biochemical networks; RNA world

1. Introduction

Emerging evidence suggests that much of the so-called‘‘junk’’ DNA in complex, multi-cellular eukaryotic organ-isms in fact codes for a vast, RNA-based, geneticregulatory network (Claverie, 2005; Costa, 2005; Dennisand Omer, 2005; Green and Doudna, 2006; Herbert, 2004;Herbert and Rich, 1999; Laaberki and Repoila, 2003;Mattick, 2005; Mattick and Makunin, 2006; Moulton,2005; Plasterk, 2006; Wassarman, 2004). In addition to the

e front matter r 2007 Elsevier Ltd. All rights reserved.

i.2007.07.004

ing authors. Tel.: +972 54 599 8278.

esses: [email protected] (Gonen Ashkenasy),

.ac.il (E. Tannenbaum).

regulatory roles in encoding protein structure, RNA isinvolved in other processes such as gene silencing, catalysisof chemical reactions, and sensing self- and non-selfanalytes. It is believed that this RNA biochemistry isresponsible for the variety and complexity of terrestrial life.Indeed, it has been suggested that this RNA biochemistryis essentially a kind of ‘‘RNA computer’’, whose existenceprovides the key to correlating genome size with organis-mal complexity (the so-called ‘‘C-value’’ paradox) (Taftet al., 2007).Since the RNA biochemistry emerged through a long

process of replicative selection, it is likely that there arelarge subnetworks of RNA interactions that are essentiallybiochemical implementations of some fairly sophisticated

Page 2: Associative learning in biochemical networks

ARTICLE IN PRESSN. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–66 59

computational algorithms associated with proper cellfunction. Therefore, a major challenge for systems andevolutionary biologists will be to uncover the structure ofthese biochemical networks, and understand their evolu-tion and the role they play in the emergence of complexterrestrial life.

In a recent paper (Tannenbaum, 2006), Tannenbaumargued that the hypothesized RNA networks in complexeukaryotes exhibit structures and behaviors that areanalogous to structures and behaviors that emerge inagent-built systems such as the brain (it is believed thatpathway selection in the brain is driven by a chemicallybased reward–punishment system). This speculation wasdriven by the hypothesis that there are likely genericfeatures in systems that are constructed by agents actingunder one or more selection pressures (replicative selection,reward–punishment chemicals in the brain, free-marketcompetition, etc.). Therefore, by studying structures andbehaviors at one length scale, it may be possible to infer theexistence of analogous structures and behaviors at anotherscale.

Because of the sheer magnitude and complexity of thehypothesized RNA biochemistry in eukaryotic cells,determining the various RNA pathways will be extremelydifficult without a priori guesses as to what kinds ofstructures to look for. Furthermore, even if one constructsa detailed map of the various RNA networks, such a mapwill provide almost no insight into the logic of the network.That is, a detailed representation of the RNA biochemistryinside the cell will simply appear to be a massively complex,but largely random, web of RNA interactions.

However, if there are indeed generic features associatedwith agent-built systems acting under various selectionpressures, then by observing structures such as the brain, itmay be possible to infer the existence of analogousstructures inside RNA biochemical networks.

Based on the reasoning presented here, we seek toelucidate the various ‘‘animate’’ behaviors (or computa-tional motifs) implemented by the RNA biochemistry insideliving cells. In this paper specifically, we propose theexistence of an RNA-based implementation of a computa-tional scheme known as associative learning (Kohonen,1989; Mackintosh, 1983). Associative learning is believed tobe a key aspect of the thought processes at work in thebrain (Kohonen, 1989; Mackintosh, 1983; Phattanasriet al., submitted for publication). While these concepts willbe discussed in more detail in the paper, associative learningmay be briefly described as a form of learning wherebyvarious stimuli become associated with one another.

Based on what is currently known about the RNAbiochemistry at work inside eukaryotic cells, we argue thatthe proposed RNA-based associative learning schemeoccurs in eukaryotes and the archaebacteria (and possiblyeven prokaryotes), and may even lead to genomicevolution. As support for this argument, we develop inthis paper a ‘‘toy’’ model that illustrates how associativelearning could potentially occur in a biochemical network.

We also discuss future in vitro experiments that could beused to establish the realizability of this computationalmotif. Finally, we also discuss some possible bioinfor-matics studies that could be used to search for evidencethat associative learning processes have played roles ingenomic evolution.

2. A simple biochemical model for associative learning

2.1. Definitions

We begin by defining the various concepts used in thispaper, namely, memory, learning, associative memory, andassociative learning.

Memory in a physical system refers to the ability of thesystem to preserve information about the state of theuniverse (including the system itself) at some previous time.Simple examples include childhood recollections (first dayof school, first bike, etc.) (Kohonen, 1989). An importantbiological example is the immune system: After theimmune system fights off an infection, a fraction of theantibody-producing B-cells turn into memory cells, thatallow the immune system to rapidly respond to futureinvasions by a given antigen.

Learning refers to the ability of a system to acquire newfunctions in response to some external input. Simpleexamples include learning how to read, cook, or play anew sport (Vapnik, 1998). Learning is similar to theconcept of adaptation. Strictly speaking, learning refers tothe acquisition of new functions, while adaptation refers tochanges in a system that allow it to function in a newenvironment. However, because adaptation can occur viathe acquisition of new functions, then, with an appropriatedefinition of the system, the concepts of learning andadaptation can be shown to be formally equivalent.The immune response to a new infectious agent is an

important example of learning exhibited at the cellularlevel. The population of antibody-producing B-cellsevolves through a process of clonal selection and somatichypermutation, until it becomes optimally tailored to fightthe infectious agent. Clonal selection and somatic hyper-mutation are analogous to learning by trial-and-error,whereby the immune system tests various antibodiesagainst a given antigen. Those antibody designs that aremost effective within the given antibody population areused as templates for further refinements, so that, afterseveral iterations, the optimal antibody design is found.It is believed that pathway selection in the brain occurs

via processes that are analogous in many respects to thelearning processes associated with the immune response(Ashton et al., 2002).

Associative memory is a form of memory whereby thestimulation of one memory triggers the stimulation ofanother. The two separate memories are essentiallycomponents of a larger, compound memory, and sostimulating one component of the compound memorystimulates the whole memory.

Page 3: Associative learning in biochemical networks

ARTICLE IN PRESSN. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–6660

Associative learning is a learning process where two ormore distinct stimuli become associated with one another.In a sense, then, associative learning refers to the processby which an associative memory is created.

A famous 19th century example of associative learning atwork in the brain is the series of experiments by Ivan Pavlov,in which a dog was simultaneously stimulated with the soundof a ringing bell and the sight of food. The sight of foodcaused the dog to salivate. Eventually, the dog would salivatefrom the sound of the ringing bell alone. Interestingly, it hasbeen revealed that associative learning can also occur inparamecia, a type of free-living single-celled organism(Hennessey et al., 1979; Byrne, 1987), and in individualneurons (Byrne, 1987; Walters and Byrne, 1983).

In a biochemical network, a signature of memory may bethe production of a certain compound as a result of acertain input into a system. A signature of associativememory and/or learning is then the production of distinctcompounds, each having a separate external stimulus, as aresult of input of the stimulus for only one of thecompounds. In such a case, the system behaves as if itwas stimulated with several inputs, which are thereforeeffectively associated with one another.

2.2. Kinetic model

To develop a simple model that can exhibit associativelearning, we consider a chemostat of volume V, containingtwo polynucleotide species: (1) Species A, characterized bysome base sequence sA, is stimulated to replicate via somegrowth factor GA. (2) Species B, characterized by somebase sequence sB, is stimulated to replicate via somegrowth factor GB. Pictorially, these two reactions may berepresented via,

Aþ GA ! Aþ A,

Bþ GB ! Bþ B, ð1Þ

where we assume that the reaction kinetics are second-order, with a species-independent rate constant kR (in thecontext of a genome, the growth factors GA and GB

correspond to transcription factors that trigger RNAproduction from a given DNA polynucleotide sequence).

We assume that the species A and B can chemically react,to form either the chain sA–sB or sB–sA, both of which maybe termed species A–B. We also assume that A–B candissociate, so that the forward and back reactions are given by

Aþ B2A� B, (2)

where the forward reaction has a second-order rate constantkf, and the back reaction has a first-order rate constant kb.

Finally, we assume that the growth factors GA and GB

can both stimulate replication of A–B, via the reaction,

A� Bþ GA=B ! A� Bþ A� B (3)

which also is assumed to proceed with a second-order rateconstant kR. As will be seen, the assumption that thereplication of A–B can be stimulated by either growth

factor is the key assumption in our model, and may beunderstood as follows: As mentioned previously, GA andGB may both be thought of as analogous to transcriptionfactors, that essentially ‘‘unlock’’ the promoter region of apolynucleotide sequence and allow the replicase enzyme tobind and catalyze replication. The molecule A–B has twopromoter regions, each of which can be ‘‘unlocked’’ bytheir respective transcription factors. We are assuming thatunlocking the entire molecule (for access by the replicase) isachievable by unlocking the molecule at one site (as auseful analogy, imagine a cylinder that is capped at bothends. Uncapping one of the ends unlocks the wholecylinder).We assume a volumetric flow rate F through the

chemostat, and that the input concentrations of GA andGB are given by cA,0 and cB,0, respectively. If we denotethe population numbers of GA, GB, A, B, and A–B,nGA

; nGB; nA; nB; nAB, respectively, then we have the follow-

ing system of differential equations governing the chemicalreaction kinetics inside the chemostat:

dnGA

dt¼ FcA;0 �

kR

VðnA þ nABÞnGA

�F

VnGA

,

dnGB

dt¼ FcB;0 �

kR

VðnB þ nABÞnGB

�F

VnGB

,

dnA

dt¼

kR

VnAnGA

�F

VnA �

kf

VnAnB þ kbnAB,

dnB

dt¼

kR

VnBnGB

�F

VnB �

kf

VnAnB þ kbnAB,

dnAB

dt¼

kR

VnABðnGA

þ nGBÞ �

F

VnAB

þkf

VnAnB � kbnAB. ð4Þ

To simplify these equations, we assume that kf and kb

are sufficiently large that the reaction Aþ B2A� B isalways in equilibrium. We let K ¼ kf/kb denote theequilibrium constant. We also define ~nA ¼ nA þ nAB and~nB ¼ nB þ nAB. Finally, we define fA ¼ FcA,0, fB ¼ FcB,0,and f ¼ F/V. Putting everything together, we obtain

dnGA

dt¼ f A �

kR

V~nAnGA

� fnGA,

dnGB

dt¼ f B �

kR

V~nBnGB

� fnGB,

d ~nA

dt¼

kR

V~nAnGA

þkR

VnABð ~nA; ~nBÞnGB

� f ~nA,

d ~nB

dt¼

kR

V~nBnGB

þkR

VnABð ~nA; ~nBÞnGA

� f ~nB, ð5Þ

where

nABð ~nA; ~nBÞ ¼1

2~nA þ ~nB þ

1

K

"

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi~nA þ ~nB þ

1

K

� �2

� 4 ~nA ~nB

#vuut . ð6Þ

Page 4: Associative learning in biochemical networks

ARTICLE IN PRESS

Fig. 1. Illustration of associative learning in our toy model (Eq. (5)). The

system is first stimulated with both GA and GB until some time T1 ¼ 100,

at which point the system is at steady state. The flow of GA and GB is then

turned off from T1 to T2 ¼ 130, so that the time gap T ¼ 30. At T2 ¼ 130,

the system is again stimulated, but only with GA. The value of ~nA rises to a

steady-state value, while the value of ~nB has a transient increase, but then

begins to slowly decrease to 0. Parameter values are f ¼ 0.1,

fA,1 ¼ fA,2 ¼ fB,1 ¼ 1, kR/V ¼ 1, and K ¼ 1.

N. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–66 61

To see how associative learning emerges from thechemostat dynamics, we consider the following experiment:Starting with some initial seed population of A and B

molecules inside the tank, the system is fed with growthfactors GA and GB at rates fA,1 and fB,1. This proceeds forsome time until the system has reached a steady state.While maintaining a constant volumetric flow rate throughthe system (so that f remains unchanged), fA and fB arerapidly brought to 0, and are maintained at 0 for a certainperiod of time, denoted T.

After this time T, fA is raised to some value fA,2, while fB

is maintained at 0. In this second regime, it is possible toshow that the steady-state value of ~nB is 0. To see why, notethat the steady-state value of nGB

, denoted nGB;ss, is 0.Therefore, the differential equation for nA gives, at steadystate

kR

VnGA;ss

¼ f , (7)

so that the steady-state equation for ~nB becomes

0 ¼ �f ð ~nB;ss � nAB;ssÞ ¼ �fnB;ss (8)

which gives nB,ss ¼ 0. Coupled to the equilibrium criterion,nAB ¼ KnAnB, we obtain nAB,ss ¼ 0, and hence ~nB;ss ¼ 0.

Therefore, after a sufficiently long amount of time,stimulation of the chemostat by GA alone will eventuallylead to the disappearance of all B molecules from thechemostat.

However, the transient behavior of the dynamics isqualitatively different for the K ¼ 0 and 40 cases. WhenK ¼ 0, then nAB ¼ 0, and so the dynamical equation for ~nB

is

d ~nB

dt¼ ~nB

kR

VnGB� f

� �. (9)

If the time gap T is sufficiently large, then nGBwill be

sufficiently small so that kR=V� �

nGB� fo0 and so ~nB will

steadily decrease with time.However, when K40, then the presence of nAB in the

population can lead to an initial, transient increase in thevalue of ~nB after GA is again allowed to flow into the system(the larger the value of K, the larger and more persistentthe increase). This transient increase in the number of B

molecules in the solution is exactly indicative of associativelearning, because it results from the covalent linkage of A

and B molecules previously generated by simultaneouslyfeeding the system with both GA and GB. The population inthe chemostat has essentially ‘‘learned’’ to associate GA andGB, in the sense that stimulation with only GA at somefuture time leads to a transient signature of stimulationwith GB as well. Fig. 1 provides an example clearlyillustrating this phenomenon.

It should be noted that the transient nature of theassociation means that, if the system is stimulated by GA

alone, then, the system will eventually ‘‘forget’’ the earlierassociation between GA and GB. This is consistent with theoriginal experiments by Pavlov: If the bell was rung a

sufficient number of times without subsequently presentingthe dog with food, the association between the ringing belland the food was lost, and the dog would stop salivating atthe sound of the ringing bell alone.It should also be noted that the strength of the

association is strongly dependent on the dilution rateconstant f. The larger the value of f, the shorter thecharacteristic time that the chemostat retains memory of aspecific stimulation event, and hence the association iscorrespondingly weaker and of shorter duration as well.

2.3. Possible in vitro experimental tests

The simplest biochemical networks can be constructedfrom RNA (Kim and Joyce, 2004), DNA (Sievers and vonKiedrowski, 1994; Sievers and von Kiedrowski, 1998) orpeptide (Ashkenasy et al., 2004; Lee et al., 1997b; Yaoet al., 1998) molecules that exploit non-enzymatic replica-tion as recurring elements (Lee et al., 1997a; Paul andJoyce, 2004). The most relevant systems for the search ofassociative learning might consist of the ribozymes (RNA‘enzymes’), which can catalyze various different chemicalreactions, including their own replication (Cech, 2002).Since their discovery in the 1980s, natural ribozymes werefound to play different roles in regulating cell processes.It should be noted that auto-catalytic RNA moleculeshave not yet been observed in cells, but their functionalitywas realized in several in vitro systems (Chen et al., 2007;

Page 5: Associative learning in biochemical networks

ARTICLE IN PRESS

Fig. 2. Associative learning process within a simple biochemical network. The possibility of the suggested system to actually perform the associative

learning is high, since all the aspects of this process have been demonstrated with similar chemical entities. Molecules A and B represent either DNA,

RNA, or polypeptides. The reversible chemical reactions in a through d show the kinetic steps of the process under study as explained in the text.

N. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–6662

Paul and Joyce, 2002). This discovery was used to supportthe postulated role of RNA in the formation and evolutionof the pre-cellular RNA world (Chen et al., 2007).

It was shown recently that modules of RNA, DNA orpeptide networks can perform (simple) in vitro computa-tional algorithms such as reciprocal replication (Ashkenasyet al., 2004; Kim and Joyce, 2004; Sievers and vonKiedrowski, 1998) and Boolean logic functions (Ashkenasyand Ghadiri, 2004).

The molecular setup described below is customized toinclude the different steps of the associative learningalgorithm. As discussed further in the text the chemostatmodel, and accordingly its in vitro implementation, issomewhat different from a plausible scenario within thecells’ transcription machinery. Nevertheless, these studiesprovide valuable insight into the motif’s kinetic character-istics. The possibility of the suggested system to actuallyperform the associative learning is high, since all theaspects of this process have been demonstrated with similarchemical entities (see references below). Furthermore, weconstruct this model in such a way that its functionalitydoes not depend on specific species (e.g. enzyme) thatshould have been emerged, but rather on general biochem-ical functions that have been observed in different RNAand protein systems. In the near future we intend toprepare the building blocks and characterize such systemsexperimentally.

In our experimental setup, the self-replication processof A and B molecules requires two steps (steps a and b inFig. 2; shown only for A molecules). The first is a reversibleinitiation step by which the replicating molecule isactivated by an external trigger. It is shown schematicallyas a cleavage process, equivalent to enzyme activationthrough dephosphorylation or photocleavage, but canrepresent in a more general sense any molecular-activating

conformational changes. The actual system will include aphoto-sensitive dye attached to the replicating protein (orDNA) (Mayer and Heckel, 2006). This side chain groupwill be removed upon shining monochromatic light tofacilitate protein association and to consequently initiatereplication. In the second step, the activated molecules A*and B* serve as templates for association and ligationof their own fragments (e.g. A1 and A2 that produce A)(Paul and Joyce, 2004).Molecules A and B can then bind to each other to form

the long polymer (step c). To account for reversible bindingbetween A and B we consider backbone modification—usually manifested in nucleic acids or peptides throughester or thioester formation (Deechongkit et al., 2004). Thereplication of the A–B polymer can be triggered throughactivating of molecule A alone if we consider a scenario bywhich the template–product complex (A*–A) can serve astemplate for the formation of B molecules, a process thatwill lead to high local concentration of A and B and as aresult to the formation of more A–B molecules. Molecularreplication processes that rely on high oligomerizationstates (i.e. trimers or tetramers) have been observed both inthe DNA- and peptide-based systems (Ashkenasy et al.,2004; Yao et al., 1998), suggesting that the postulated stepis plausible.The kinetic experiment will be performed by placing

together molecules A and B and their precursor A1, A2 andB1, B2, respectively, and initiating replication throughintroduction of external triggers (shining light). After atime that corresponds to significant formation of addi-tional A and B molecules through self-replication, andprobably also through reciprocal replication (step d ofFig. 2), the external triggering will be turned off and thesystem will be allowed to equilibrate. Since all the processesare reversible, it is expected that after some time the system

Page 6: Associative learning in biochemical networks

ARTICLE IN PRESSN. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–66 63

will reach equilibrium with a smaller, but nonzero, amountof the full length molecules A, B and A–B present.Applying the trigger that activates only the A moleculeswill start a cascade reaction that will increase the amountof A itself, but more importantly that of the A–B and thusalso of the B molecules. As explained above, the require-ments for replication of B in the absence of its own trigger,what we called associative memory or learning, is theexistence of some (even small) amounts of B molecules inthe mixture, and not of less importance the ability of A toactivate B replication as a consequence of the formation ofA–B molecules.

3. Discussion

3.1. How associative memory and learning speed up the rate

of adaptation

Associative memory and learning can drastically speedup rates of adaptation by triggering a system to use thesolution of one problem to solve a new, but similarproblem. To understand this, consider first the example ofhow someone who knows how to surf might find it easier tolearn how to snowboard than someone starting for the firsttime.

The person who knows how to surf needed to learn howto surf at one time. The process of surfing has severalaspects, or properties, each of which may be regarded asseparate inputs into the brain: (1) the shape of the board;(2) the body position when standing on the board; (3) thebody positions one must assume to maintain stability whilesurfing a wave; and (4) the speed associated with surfing.

These separate but simultaneous inputs become asso-ciated with each other in the brain, so that stimulation ofone of the memories generated by these inputs maystimulate the others. So, it is likely that a surfer who startslearning how to snowboard will have the memory of thesurfboard triggered by the sight of the snowboard. This inturn will trigger the memories of the various body positionsand movements associated with surfing, which may thenform the basis for learning the body positions andmovements for snowboarding.

3.2. Biochemical implementations of associative learning

and implications for genomic evolution

The chemostat model described in this paper relies onautocatalytically replicating molecules. For this kind ofmodel to be directly applicable to the RNA biochemistryinside living cells, we would have to assume that there existautocatalytically replicating RNA molecules inside livingcells. Given that the current paradigm for the emergence ofcellular life holds that cells emerged from a pre-cellular‘‘RNA World’’, and that autocatalytic RNAs have beenstudied in vitro (see above), the existence of autocatalyti-cally replicating intra-cellular RNAs cannot be ruled out.

Nevertheless, associative learning in a eukaryotic cellcould occur via a slightly different mechanism than thesimple chemostat model described in this paper. In aeukaryotic cell, two distinct transcription factors could leadto transcription of two distinct genes, resulting in a steadyproduction of two distinct sets of RNAs (these RNAscould be mRNAs or, in the case of the non-coding regions,these could be various RNAs that are never translated intoprotein, such as siRNA). If some of these RNAs were tocovalently bind, and if these longer RNAs were thenreverse transcribed into the DNA genome, the result wouldbe a new DNA sequence that corresponds to the twooriginal genes in succession. If either of the originaltranscription factors could then activate this new geneset, then the end result would exactly be a DNA/RNA-based implementation of associative learning. This schemeis illustrated in Fig. 3.The argument presented here of course assumes that

reverse transcriptase is active in eukaryotic cells. In supportof this claim, it is believed that gene duplication ineukaryotes occurs primarily through a mechanism knownas retrotransposition, whereby the gene first goes throughan RNA intermediate that is then reverse transcribed intothe DNA genome. Recent work and speculation has alsosuggested that much genomic change in eukaryotes firstoccurs in the RNA population of the cell, and only then islater reverse transcribed into the genome (Herbert andRich, 1999). The key role that the RNA population insideeukaryotic cells is hypothesized to play in genomicevolution has led to the term ribotype, as a way ofcharacterizing the distribution of RNA sequence types(Herbert and Rich, 1999).

3.3. Finding evidence of associative learning in actual

genomes

The strongest evidence for associative learning inbiochemical systems would be its actual discovery in realorganisms. Therefore, an essential line of research would beto look for certain genes or genome regions in eukaryoticcells that appear to be linkages of other, smaller genes orgenome regions.One possible example of associative learning processes in

genomes is the existence of polycistronic RNA inprokaryotic organisms. Because proteins generally functionas part of interconnected biochemical networks, when oneprotein needs to be produced by a cell, generally severalother proteins need to be produced as well.Polycistronic RNA in bacteria is an mRNA transcript

that encodes for several proteins. Basically, when one geneencoding for part of a biochemical network is transcribed,the other genes coding for the remainder of the network aretranscribed as well.Although polycistronic RNA occurs in prokaryotes, and

not eukaryotes, which are the focus of this paper, it isbelieved that prokaryotes and eukaryotes evolved simulta-neously from the archaebacteria, whose genome organization

Page 7: Associative learning in biochemical networks

ARTICLE IN PRESS

Fig. 3. Implementation of associative learning in a eukaryotic cell. Two RNA transcripts from two distinct genes become covalently linked. Reverse

transcription then results in two distinct genes becoming linked in the genome.

N. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–6664

is more similar to that of the eukaryotes than that of theprokaryotes. Therefore, it is possible that polycistronic RNAevolved from the linking and then reverse transcription ofvarious mRNA transcripts in archaebacterial cells. ThesemRNA transcripts were present at the same time becausethey all encoded for essential pieces of some biochemicalnetwork, and so their production was triggered by thepresence of several transcription factors. If these mRNAtranscripts then became covalently linked, and werethen reverse transcribed back into the archaebacterialgenome, the result would be a sequence of genes encodingfor a polycistronic mRNA transcript.

Within this line of research, mathematical modelingcould be used to determine whether or not associativelearning is necessary to reconcile the observed time scalesfor the emergence of polycistronic RNA with actual ratesof gene transcription and duplication. That is, can naturalselection alone select for genomes with linked genes on theobserved time scales, or is associative learning necessary toensure a sufficiently large seed population of organismswith polycistronic genes?

The following simplified example illustrates the kinds ofmathematical models we plan to develop: Consider aresource, denoted R, that is metabolized to some finalproduct P in two steps.

In the first step, the resource R is converted to someintermediate I, a process that is catalyzed by some enzyme,denoted ERI. In the second step, the intermediate I isconverted into the final product P, a process that iscatalyzed by some enzyme, denoted EIP.

Now, we assume that the genes encoding for ERI andEIP, denoted GRI and GIP, respectively, are only transcribedwhen necessary. Therefore, we assume that R acts as atranscription factor for GRI, and I acts as a transcriptionfactor for GIP.

If R enters a cell where GRI and GIP are not linked andare transcribed independently, then the mRNA transcriptfor GRI, denoted mRNARI, will initially be produced. Thisleads to the production of ERI, which leads to theconversion of R into I, and an increase in the concentration

of I in the cell. This in turn triggers the transcription of GIP

to form mRNAIP.If the characteristic lifetime of the mRNAs is sufficiently

long, then there will be a period during which bothmRNARI and mRNAIP will be simultaneously present insignificant concentrations in the cell. Based on theassociative learning mechanism discussed in the previoussubsection, this can then lead to the formation of apolycistronic gene that encodes for both ERI and EIP, andwhich can be transcribed by R alone.A specific polycistronic gene whose evolution we wish to

study, via both mathematical models and bioinformaticsstudies, is the lac operon in Escherichia coli. The reason forthis is that the lac operon is one of the best studiedpolycistronic genes in biology, and so it is probably the bestsystem to initially study to determine whether associativelearning is indeed responsible for the emergence ofpolycistronic genes.

3.4. How can learning provide a selective advantage at the

cellular level?

The existence of associative learning in living cells, andin particular the discovery that associative learning mayplay a role in genomic evolution, would have a profoundimpact on our view and understanding of the evolution andbiochemistry of living systems. Since natural selection onlypermits the existence of replicative strategies that confer asurvival advantage (almost by definition), it only makessense to consider learning mechanisms at the intra-cellularlevel if such mechanisms can confer a significant survivaladvantage to a cellular organism.The fact that paramecia and other single-celled organ-

isms (such as individual neurons) can exhibit classicalconditioning behavior is clear evidence that associativelearning provides a survival advantage already at thecellular level. It also suggests that there exist intra-cellularprocesses that define biochemical implementations ofassociative learning. From here, it is only a short step toconsider intra-cellular associative learning mechanisms

Page 8: Associative learning in biochemical networks

ARTICLE IN PRESSN. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–66 65

that could also be used to modify organismal genomes, andthereby contribute to organismal evolution (after all, theorganismal genome is ultimately a large molecule that maybe chemically modified just like any other molecule presentin the cell).

We can use the speculations in the previous subsectionregarding polycistronic RNA as an example for howassociative learning can lead to genomic modifications thatcan provide a survival advantage to a free-living cell:Polycistronic RNA can provide a selective advantage to acell by greatly speeding up the rate at which essentialproteins are produced. Instead of separately producing theindividual components of some biochemical network,polycistronic RNA allows all the components to beproduced at the same time. This leads to a much moreefficient protein-manufacturing system, which can lead tofaster response times to changing environmental condi-tions, and hence greater adaptability and survivability.

Associative learning, by providing a mechanism for a cellto non-randomly link the various genes responsible forcreating an entire biochemical network, can allow a free-living cell to rapidly streamline its genome and therebyincrease its chances of survival in a dynamic environment.

In general, we believe that many of the learning motifsexhibited by complex structures in the brain will be foundalready at the biochemical level inside living cells. Therefore,we believe that a proper understanding of the computationalmotifs at work in cellular biochemistry will provide a basisfor understanding the kinds of computational motifsimplemented by neural pathways in the brain.

As an example illustrating why we believe this to bethe case, there is now a line of research exploring thepossibility that the neuronal action-potential is the mediumby which neurons communicate with each other, via aneural language called ‘‘the neural code’’ (Rieke et al.,1997). If correct, then the neuronal action-potential isanalogous to the sound vibrations produced by humanvocal cords, which provide the medium for inter-humancommunication via language.

Given that language requires the existence of certainneural pathways that implement the necessary computa-tional motifs associated with language, it is reasonable toassume that the neural code requires the existence ofanalogous biochemical pathways at work inside individualneurons. If this indeed proves to be the case, then it suggestsa broader, deeper principle at work: Namely, that certainsets of behaviors require certain kinds of computationalmotifs to implement them, and systems that share a given setof behaviors will share the corresponding computationalmotifs. As a result, understanding the underlying motifs atwork in one system will provide insight into the kinds ofmotifs that may be found in the other system.

4. Concluding remarks

This paper presented a simple, ‘‘toy’’ model illustratinghow associative learning could occur in a biochemical

network. We discussed how this process could beimplemented in eukaryotic cells, and how it could lead togenomic evolution.The model presented in this paper only shows that

associative learning is in principle possible inside livingcells. Experiments and bioinformatics studies will beneeded to confirm (or disprove) the existence of associativeprocesses at work inside living cells.We should point out that the concept of associative

processes in polypeptides or polynucleotides has beenconsidered before. In 1995, Eric B. Baum developed ascheme for using double-stranded DNA to construct alarge associative memory (Baum, 1995). However, theassociative memory considered by Baum was for thepurpose of retrieving information in a database encodedin the DNA chain. The idea is that incomplete information,in the form of short strands of DNA, could attach tocorresponding subsequences along the DNA, and therebyretrieve all possibly relevant database items (Baum, 1995).Furthermore, we should also mention once again that

associative learning is known to occur in living cells, suchas paramecia and neurons (Hennessey et al., 1979; Byrne,1987; Walters and Byrne, 1983).This paper differs from previous work on associative

processes in two important ways: First, this paperconsiders an associative process generated by stimulatinga system with two simultaneous inputs. The association ofthe two inputs is not ‘‘hard-wired’’ as in the modelconsidered by Baum, but rather is learned (a DNA-basedmemory scheme closer to the model considered in thispaper may be found in Chen et al., 2005).Second, this paper considers associative learning as a

possible mechanism for genomic evolution. Previous workon associative learning in single-celled organisms takes theview that associative learning is part of the geneticallyencoded (i.e. ‘‘hard-wired’’) collection of behaviors that anorganism can engage in. In this view, associative learning isa form of adaptive behavior, and not a mechanism forgenomic evolution.All three of these associative processes may be at work

inside eukaryotic cells. Nevertheless, we conjecture that, ifassociative processes are indeed a major mechanism formacro-evolutionary change, then it is likely via a mechan-ism closer to the one outlined here. This speculation isderived from the observation that species evolve inresponse to external selection pressures. Therefore, ifassociative processes are indeed relevant to macroevolu-tion, then it appears more likely to be a form of associativelearning, whereby external inputs drive the associationsproduced, and whereby the associations produced can thenbecome subsequently encoded into the genome.If associative processes analogous to the one presented

here are found to occur in living cells, then our hypothesisthat there are likely generic features in agent-built systemsmay indeed be correct. Certainly, the existence ofassociative processes inside living cells would suggest theneed for further studies to elucidate other computational

Page 9: Associative learning in biochemical networks

ARTICLE IN PRESSN. Gandhi et al. / Journal of Theoretical Biology 249 (2007) 58–6666

structures at work inside living cells (Landweber et al.,2000), and the possible role these computational structuresplay in genomic evolution.

Acknowledgments

This research was performed while E.T. was an AssistantProfessor in the School of Biology at the Georgia Instituteof Technology. The authors thank Mark Borodovsky(Georgia Tech) for helpful conversations regarding thiswork. G.A. thanks the Human Frontier Science Programfor Career Development Award.

References

Ashkenasy, G., Ghadiri, M.R., 2004. Boolean logic functions of a

synthetic peptide network. J. Am. Chem. Soc. 126, 11140.

Ashkenasy, G., Jagasia, R., Yadav, M., Ghadiri, M.R., 2004. Design

of a directed molecular network. Proc. Natl. Acad. Sci. USA 101,

10877–11872.

Ashton, H., Perry, E.K., Young, A.H., 2002. Neurochemistry of

Consciousness. John Benjamins Publishing Company, Philadelphia.

Baum, E.B., 1995. Building an associative memory vastly larger than the

brain. Science 268, 583–585.

Byrne, J.H., 1987. Cellular analysis of associative learning. Physiol. Rev.

67, 329–439.

Cech, T.R., 2002. Ribozymes, the first 20 years. Biochem. Soc. Trans. 30,

1162.

Chen, J., Deaton, R., Wang, Y.Z., 2005. A DNA-based memory with in

vitro learning and associative recall. Nat. Comput. 4, 83.

Chen, X., Li, N., Ellington, A.D., 2007. Ribozyme catalysis of metabolism

in the RNA world. Chem. Biodivers. 4, 633.

Claverie, J.-M., 2005. Fewer genes, more noncoding RNA. Science 309,

1529–1530.

Costa, F.F., 2005. Non-coding RNAs: new players in eukaryotic biology.

Gene 357, 83–94.

Deechongkit, S., Nguyen, H., Powers, E.T., Dawson, P.E., Gruebele, M.,

Kelly, J.W., 2004. Context-dependent contributions of backbone

hydrogen bonding to b-sheet folding energetics. Nature (London)

430, 6995.

Dennis, P.P., Omer, A., 2005. Small non-coding RNAs in Archaea. Curr.

Opin. Macrobiol. 8, 685–694.

Green, R., Doudna, J.A., 2006. RNAs regulate biology. ACS Chem. Biol.

1, 335–338.

Hennessey, T.M., Rucker, W.B., McDiarmid, C.G., 1979. Classical

conditioning in paramecia. Anim. Learn. Behav. 7, 417.

Herbert, A., 2004. The four Rs of RNA-directed evolution. Nat. Genet.

36, 19–25.

Herbert, A., Rich, A., 1999. RNA processing and the evolution of

eukaryotes. Nat. Genet. 21, 265–269.

Kim, D.-E., Joyce, G.F., 2004. Cross-catalytic replication of an RNA

ligase ribozyme. Chem. Biol. 11, 1505–1512.

Kohonen, T., 1989. Self-organization and Associative Memory. Springer,

New York, NY.

Laaberki, M.H., Repoila, F., 2003. Non-coding RNAs, another class of

regulatory molecules. Recent Res. Dev. Mol. Biol. 1, 119–143.

Landweber, L.F., Kuo, T.C., Curtis, E.A., 2000. Proc. Natl. Acad. Sci.

USA 97, 3298.

Lee, D.H., Severin, K., Ghadiri, M.R., 1997a. Autocatalytic networks: the

transition from molecular self-replication to molecular ecosystems.

Curr. Opin. Chem. Biol. 1, 491–496.

Lee, D.H., Severin, K., Yokobayashi, Y., Ghadiri, M.R., 1997b.

Emergence of symbiosis in peptide self-replication through a hyper-

cyclic network. Nature 390, 591–594.

Mackintosh, N.J., 1983. Conditioning and Associative Learning. Oxford

University Press, New York, NY.

Mattick, J.S., 2005. The functional genomics of noncoding RNA. Science

309, 1527–1528.

Mattick, J.S., Makunin, I.V., 2006. Non-coding RNA. Hum. Mol. Genet.

15, R17–R29.

Mayer, G., Heckel, A., 2006. Biologically active molecules with a ‘‘Light

switch’’. Angew. Chem. Int. Ed. 45, 4900–4921.

Moulton, V., 2005. Tracking down noncoding RNAs. Proc. Natl. Acad.

Sci. USA 102, 2269–2270.

Paul, N., Joyce, G.F., 2002. A self-replicating ligase ribozyme. Proc. Natl.

Acad. Sci. USA 99, 12733.

Paul, N., Joyce, G.F., 2004. Minimal self-replicating systems. Curr. Opin.

Chem. Biol. 8, 634–639.

Phattanasri, P., Chiel, H.J., Beer, R.D., submitted for publication. The

dynamics of associative learning in evolved model circuits.

Plasterk, R.H.A., 2006. Micro RNAs in animal development. Cell 124,

877–881.

Rieke, F., Bialek, W., Warland, D., van Steveninck, R.d.R., 1997. Spikes:

Exploring the Neural Code. The MIT Press.

Sievers, D., von Kiedrowski, G., 1994. Self-replication of complementary

nucleotide-based oligomers. Nature 369, 221–224.

Sievers, D., von Kiedrowski, G., 1998. Self-replication of hexadeoxynu-

cleotide analogs: autocatalysis versus cross-catalysis. Chem. Eur. J. 4,

629–641.

Taft, R.J., Pheasant, M., Mattick, J.S., 2007. The relationship between

non-protein-coding DNA and eukaryotic complexity. Bioessays 29,

288–299.

Tannenbaum, E., 2006. An RNA-centered view of eukaryotic cells.

BioSystems 84, 217–224.

Vapnik, V.N., 1998. Statistical Learning Theory. Wiley, New York, NY.

Walters, E.T., Byrne, J.H., 1983. Associative conditioning of single

sensory neurons suggests a cellular mechanism for learning. Science

219, 405–408.

Wassarman, K.M., 2004. RNA regulators of transcription. Nat. Struct.

Mol. Biol. 11, 803–804.

Yao, S., Ghosh, I., Zutshi, R., Chmielewski, J., 1998. Selective

amplification by auto- and cross-catalysis in a replicating peptide

system. Nature 396, 447–450.