adaptive behavior - users.sussex.ac.ukusers.sussex.ac.uk/~ezequiel/stdp-robots.pdf · additional...

http://adb.sagepub.comAdaptive Behavior

2002; 10; 243 Adaptive BehaviorEzequiel Di Paolo

Spike-Timing Dependent Plasticity for Evolved Robots

http://adb.sagepub.com/cgi/content/abstract/10/3-4/243 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

On behalf of:

International Society of Adaptive Behavior

can be found at:Adaptive Behavior Additional services and information for

http://adb.sagepub.com/cgi/alerts Email Alerts:

http://adb.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

© 2002 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at University of Sussex Library on March 10, 2008 http://adb.sagepub.comDownloaded from

http://www.isab.org.uk

http://adb.sagepub.com/cgi/alerts

http://adb.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://adb.sagepub.com

!"#

$%&'()*&+&,-./(%(,0(,1.23451&6&17.89:.;<93<(0.

=9>915

!"#$%&#'()*(+&(,-.'.

!"#$$%&$'&($)*+,+-.&/*0&($123,+*)&!"+.*".4

,'-/0&1(/2&3&45(4#%6-'(4#07.63/(-6#(/8409#/&"#:(;.6(29.0.0-10&1(6.<.0/(%/&45(#=.'%0&.4-68(0#194&$%#/*

>84-20&1(2'-/0&1&08(-/8??#06&1-''8(:#2#4:/(.4( 09#(26#1&/#( 6#'-0&=#( 0&?&45(<#07##4(26#/84-20&1(-4:

2./0/84-20&1(/2&3#/(-0(09#(?&''&/#1.4:(6-45#(-4:(.4('.45#6@0#6?(-10&=&08@:#2#4:#40(6#5%'-0.68(/1-'@

&45*(A.?2-6-0&=#(/0%:&#/(9-=#(<##4(1-66&#:(.%0(;.6(:&;;#6#40(3&4:/(.;(2'-/0&1(4#%6-'(4#07.63/(7&09('.7

-4:(9&59('#=#'/(.;(4#%6-'(4.&/#*(B4(-''(1-/#/C(09#(#=.'=#:(1.406.''#6/(-6#(9&59'8(6.<%/0(-5-&4/0(&40#64-'

/84-20&1(:#1-8(-4:(.09#6(2#60%6<-0&.4/*(D9#( &?2.60-41#(.;( 09#(26#1&/#( 0&?&45(.;( /2&3#/( &/(:#?.4@

/06-0#:(<8(6-4:.?&"&45(09#(/2&3#(06-&4/*(B4(09#('.7(4#%6-'(4.&/#(/1#4-6&.C(7#&590(=-'%#/(%4:#65.(69809@

?&1( 19-45#/( -0( 09#(?#/./1-'#( :%#( 0.( <%6/0&45C( <%0( :%6&45( 2#6&.:/( .;( 9&59( -10&=&08( 09#8( -6#( ;&4#'8

6#5%'-0#:(-0(09#(?&16./1-'#(<8(/84196.4.%/(.6(#406-&4#:(;&6&45*(>2&3#(06-&4(6-4:.?&"-0&.4(6#/%'0/( &4

'.//(.;(2#6;.6?-41#(&4(09&/(1-/#*(B4(1.406-/0C(&4(09#(9&59(4#%6-'(4.&/#(/1#4-6&.C(6.<.0/(-6#(6.<%/0(0.

'.//(.;(&4;.6?-0&.4(&4(09#(0&?&45(.;(09#(/2&3#(06-&4/C(:#?.4/06-0&45(09#(1.%40#6&40%&0&=#(6#/%'0/(09-0(2'-/@

0&1&08C(79&19(&/(:#2#4:#40(.4(26#1&/#(/2&3#(0&?&45C(1-4(7.63(#=#4(&4(&0/(-</#41#C(26.=&:#:(09#(<#9-=&.@

6-'( /06-0#5&#/(?-3#(%/#(.;( 6.<%/0( '.45#6@0#6?( &4=-6&-40/(.;(/#4/.6&?.0.6( &40#6-10&.4*()(1.?2-6&/.4

7&09(-(6-0#@<-/#:(?.:#'(.;(/84-20&1(2'-/0&1&08(/9.7/(09-0(%4:#6(/&?&'-6'8(4.&/8(1.4:&0&.4/C(-/8??#06&1

/2&3#@0&?&45( :#2#4:#40( 2'-/0&1&08( -19&#=#/( <#00#6( 2#6;.6?-41#( <8( ?#-4/( .;( #;;&1&#40( 6#:%10&.4( &4

7#&590(=-6&-41#(.=#6(0&?#*(,#6;.6?-41#(-'/.(26#/#40/(4#5-0&=#(/#4/&0&=&08(0.(6#:%1#:('#=#'/(.;(4.&/#C

/9.7&45(09-0(6-4:.?(;&6&45(9-/(-(;%410&.4-'(=-'%#*

?(7@9:05 #=.'%0&.4-68(6.<.0&1/(E(/2&3&45(4#%6-'(4#07.63/(E(/2&3#@0&?&45(:#2#4:#40(2'-/0&1&08(E(-10&=@

&08@:#2#4:#40(/84-20&1(/1-'&45(E(4#%6-'(4.&/#(E(6.<%/04#//

A B,1:90C61&9,

!"#$%&$'()*++,-*(%&.)$-)$%&)/&.'0#)-1)*2$-#-3-2.),-4-$.

*'35)*3-#0$)-$%&,) $%'#0.5)*$)+,-6'/'#0)3'#'3*7) '#$&8

0,*$&/)3-/&7.) -1) 4,*'#)3&(%*#'.3.) '#) *#) &34-/'&/

*#/).'$2*$&/)+7*$1-,39):%&,&)'.)*)6*.$)/'.$*#(&)4&$;&&#

$%&.&)3-/&7.)*#/)*($2*7)4,*'#.)*#/)"&$).'3+7&)4'-7-0'8

(*77")'#.+',&/)(-#$,-77&,.)0'6'#0),'.&)$-)*/*+$'6&5)7'1&8

7'<&) ,-4-$) 4&%*6'-,) (*#) $2,#) -2$) $-) 4&) 6&,") 6*72*47&

/&.+'$&5)-,)+&,%*+.)4&(*2.&)-15)$%&',).'3+7'('$"9):%&.&

3-/&7.)3*")(*+$2,&).-3&)'#$&,&.$'#0)*.+&($)-1)4,*'#

-,0*#'=*$'-#)-,5)'#)$%&',)12#($'-#'#05)$%&")3*"),&6&*7

2#2.2*7) -,) 2#&>+&($&/) +,-+&,$'&.) -1) <#-;#) 3&(%*8

#'.3.9)?#)(-#$,*.$5)(-#$&3+-,*,");-,<)'#)(-3+2$*$'-#*7

#&2,-.('&#(&5)'#)*77)'$.).-+%'.$'(*$'-#5)-1$&#)7*(<.).2(%

*);%-7&8*0&#$)/'3&#.'-#9):%&)+,-+&,$'&.)-1).'#07&)#&28

,-#.)'#).+&('1'().24.".$&3.)*,&)$"+'(*77")3-/&7&/)*#/

.$2/'&/)2#/&,)'/&*7'=&/)(-#/'$'-#.)@&9095),*#/-3)'#+2$.5

2#(-,,&7*$&/)-,);'$%)<#-;#)(-,,&7*$'-#.5)&$(9A)*#/)'$)'.

%*,/)$-).&&)%-;)$%&).&#.-,'3-$-,)7--+.)3'0%$)&6&,)4&

B-+",'0%$)C)DEED)?#$&,#*$'-#*7)!-('&$")1-,)F/*+$'6&)G&%*6'-,

@DEEDA5)H-7)IE@JKLAM)DLJKDNJ9

OIEPQ8RIDJ)@DEEDIEA)IEMJKLS)DLJKDNJS)EJJTDDU

($55.42$*0.*".&,$M)V9)F9)W')X*-7-5)!(%--7)-1)B-0#'$'6&)*#/)B-3+2$'#0)

!('&#(&.5)Y#'6&,.'$")-1)!2..&>5)G,'0%$-#)GZI)Q[\5)Y]9)

671/+%M)&=&^2'&7_(-0.9.2.>9*(92<

8/9M)`LL8IDRJ8NRIJDE



244 Adaptive Behavior 10(3–4)

closed, valuable though this work is. Recent studies inevolutionary robotics have aimed at harnessing thepower of automatic evolutionary design to try to crossthe gap between these two modes of research. So far,these studies have been mainly exploratory, drawinginspiration from neuroscience to enrich the buildingblocks used for evolutionary design—but the potentialis there for feeding useful information back to neuro-science. On this issue see a recent review by Ruppin(2002). One example of this kind of research is thework by Husbands and colleagues using gaseous dif-fusion of neuromodulators as part of their evolvedrobot controllers (Husbands, Smith, Jakobi, & O’Shea,1998).

Recent work in evolutionary robotics has began toexplore the use of spike-based neural controllers (Flo-reano & Mattiussi, 2001). Spiking neural networkspossess a number of attractive features. They havecomparatively greater computational power than simi-lar networks of threshold sigmoidal gates (Maass,1997). They can support a variety of functional specif-icity from rate-based codes to structured codes basedon the timing of action potentials (Gerstner, Kreiter,Markram, & Herz, 1997). They can perform novelkinds of computations such as the recognition of tem-poral patterns using transient synchrony (Hopfield &Brody, 2001) and real-time computation without sta-ble states in high-dimensional ‘‘liquids’’ of transientactivity, (Maass, Natschläger, & Markram, 2002). Theircomplexity makes evolutionary robotics an appropri-ate tool of design and exploration.

Here the exploration continues by addressing theevolvability and properties of plastic spiking neural net-works where synaptic plasticity depends on the pre-cise timing of spikes. Parameters regulating plasticityin light-seeking robots are evolved in simulation. It isfound that not only are such controllers evolvable, butthat they also produce a rich variety of behaviors anddesirable properties such as sensorimotor and synapticrobustness. Two series of experiments have been car-ried out, one in which neurons are modeled with lowlevels of noise, and another with significant levels ofneural noise. In both cases, the evolved controllersproduce regular patterns of neural activity. However,despite the precise spike timing needed to activate theplastic rules, only controllers with low neural noiseseem to rely on relative timing information. High neu-ral noise controllers are generally quite robust to jitterand spike train randomization, suggesting counterin-

tuitively that under certain conditions, spike-timingdependent synaptic rules can work very well even whenspike timing is disrupted. However, these controllersretain some unique properties even in the presence ofnoise, as shown by a comparison with rate-basedevolved plastic networks.

There are two kinds of motivations for this work.The first is, as suggested, exploratory. To the best ofour knowledge this is the first attempt to control anintegrated robot using spike-timing dependent plastic-ity in combination with an evolutionary strategy fordesign. It is questionable whether there are any bene-fits in the use of these mechanisms, either in terms ofbetter adaptability or complexity of performance. Wewill try to answer these questions at least partially bythe end of the article. Such motivation runs in parallelwith another, which is less clearly realized in the cur-rent article because of its preliminary nature: to informcomputational neuroscience not with a model that isdetailed in its microstructure but grossly simplified interms of its relevance to whole behaving agents, butthe other way around; that is, a model simple in designwhere sensory stimuli correlate to motor activitythrough environmental coupling in a situated robot.

2 Spike-Timing Dependent Plasticity (STDP)

Experimental neuroscientific evidence suggests thatthe degree and direction of change in the strength of asynapse subjected to repeated pairings of pre- and post-synaptic action potentials depend on their relative tim-ing (Markram, Lubke, Frotscher, & Sakmann, 1997;Bi & Poo, 1998). See Bi and Poo (2001) for a review.Synaptic modification depends on whether the pre-and postsynaptic spikes are separated in time in lessthan a critical window of the order of a few tens of mil-liseconds. In most cases studied, if a presynaptic spikeprecedes the postsynaptic spike the synapse is potenti-ated, whereas the opposite relation leads to depressionof the synapse. This results in a temporally asymmet-ric plasticity rule (Figure 1) that deserves the nameHebbian because of its tendency to strengthen causalcorrelations between spikes. There is empirical evi-dence, however, for non-Hebbian plasticity of thiskind (Abbott & Nelson, 2000; Bi & Poo, 2001). Manytheoretical studies have concerned themselves withthis rule of plasticity and its desirable properties such



Di Paolo Spike-Timing Dependent Plasticity for Evolved Robots 245

as a trend toward inherent stability in weight distribu-tion and neural activity, unlike purely rate-based Heb-bian rules that often require additional constraints(Kempter, Gerstner, & van Hemmen, 1999; Song,Miller, & Abbott, 2000; Rubin, Lee, & Sompolinsky,2001). One possible expression for this rule is

where s = to – ti is the time difference between a post-synaptic and presynaptic spike and A+ and A– are posi-tive constants. Other filters may be used instead of theexponential decay, but this form is particularly suita-ble for implementation in an evolutionary roboticscontext as will be shown in the next section.

One of the key concerns when studying rules forsynaptic plasticity is their regulatory properties. Heb-bian learning on its own leads to runaway processes ofpotentiation and cannot account for the stability of neu-ral function. Additional elements, such as the direc-tional damping of synaptic change (Rubin et al., 2001)or longer-term stabilizing regulation based on postsy-naptic activity (Horn, Levy, & Ruppin, 1998; Turrigiano,1999) may come to the rescue. These can lead tounsaturated distributions of synaptic strengths in thefirst case and to regulated neuronal firing in the sec-ond and will also be investigated in this work.

Although spike-timing dependent plasticity (STDP)is a topic that has drawn much attention recently, most

theoretical studies have concerned themselves with theproperties of the temporally asymmetric plastic rule.There are, however, a few hypotheses about its func-tional role. For instance, Abbott and Blum (1996) showin a general model how firing patterns in a neural array(such as a receptive field), where neurons fire prefera-bly at certain input values in a sequence of inputs, canby means of temporal asymmetry in plasticity lead toprediction of the inputs in a sequence through repeatedpresentation. This is because the synapses betweenneurons that fire in succession are strengthened fromthose that fire first to those that fire later (and aredepressed in the opposite direction). Empirical evi-dence in the experience-dependent change in skew-ness in place fields in the rat hippocampus supportsthe findings of this model (Mehta, Barnes, &McNaughton, 1997; Mehta, Quirk, & Wilson, 2000).Related to this, Yao and Dan (2001) have found thatrepetitive pairing of visual stimuli at two different ori-entations induced a shift in orientation tuning in catcortical neurons depending on the relative timing ofpresentation and compatible with STDP.

Other related functional implications have alsobeen suggested. Rao and Sejnowski (2001) suggestthat STDP could be involved in implementing someform of temporal difference learning (Sutton, 1988)and show this in a model of input spike prediction;and Chechik (2002) has recently compared theoreticalrules of plasticity derived from the principle of infor-mation maximization of relevant input with empiricalrules to conclude that temporal asymmetry can increaseinput information to nearly optimal levels.

This kind of functionality is hard to compare withthe results obtainable from the present work on a sim-ple robotic task, as it is more likely to play a signifi-cant role when sensory surfaces or arrays are includedin the robot model as well as something equivalent toreceptive fields. Because of the constraints inherent inthe number of evolutionary evaluations, such ele-ments are not included in this initial study but will beof central importance in the future.

3 Methods

3.1 Robots and Task

Since we are interested in exploring a novel mecha-nism for robot control, the chosen task is at this stage

!wwmaxA+ s "+⁄–( )exp if s 0>

wmaxA– s "–⁄( )exp– if s 0<#$%$&

=

Figure 1 Time window for spike-timing dependent plas-ticity. The percentage and direction of synaptic change isgiven by time difference between presynaptic (ti) andpostsynaptic (to) spikes.




deliberately simple to facilitate comparisons with alter-native approaches. Simulated robots are evolved toperform phototaxis on a series of light sources. Robotshave circular bodies of radius R0 = 4 with two motorsand two light sensors. The angle between sensors is120° but a small random displacement between –5°and 5° is added at the start of each evaluation. Motorscan drive the robot backward and forward in a two-dimensional unlimited arena.

The neural network consists of six nodes, fullyconnected except for self-connections. Neurons canbe either excitatory or inhibitory and this is set geneti-cally. Trials with larger number of neurons have beencarried out successfully, but not systematically studied.

The whole system is simulated using an Eulerintegration method with a time step of 1 ms (25% ofthe minimum time scale). Robots are run for two inde-pendent evaluations, each consisting on the sequentialpresentation of two distant light sources. Only onesource is presented at a time for a relatively longperiod TS chosen randomly for each source from theinterval [7.5 s, 12.5 s], (each evaluation consists there-fore of an average of 2 ! 104 update cycles). The ini-tial distance between robot and new source israndomly chosen from [60, 80], the angle from [0, 2')and the source intensity from [3,000, 5,000]. Theintensity decays in inverse proportion to the square ofthe distance to the source.

The simulated robots use photoreceptors that areactivated by the light intensity corresponding to theircurrent position if the light source is directly visible(i.e., an angle of acceptance of 180°). This intensity ismultiplied by the sensor gain equal for both sensors(genetically set from range [0.1, 50]) and clipped forvalues beyond a maximum of 20. A spike train is gen-erated using a Poisson process with variable rate (max-imum 200 Hz) by linearly transforming the sensorvalue into the instantaneous firing frequency. The Pois-son spike trains coming from the left and right sensorsare fed into neurons n2 and n3 respectively. Addition-ally, uniform noise is present in the sensors (andmotors) with range 0.2 (before scaling by gains)—thisresults in spikes that fire randomly with very lowprobability when the sensor is not stimulated.

Two motors control the robot wheels. Each motoris controlled by two neurons, one that drives it forwardand the other one backward, using a spike-based leakyintegrator. The left motor is controlled by neurons n0

(forward) and n4 (backward), and the right by neuronsn1 (forward) and n5 (backward).

A population of 30 robots is evolved using a gen-erational genetic algorithm (GA) with truncation selec-tion. In the plastic scenarios described below initialweights are randomly chosen at the start of each eval-uation from the interval [0, wmax ] while the parame-ters for the plasticity windows and scaling constantsare evolved. In the nonplastic scenario, synapticstrengths are encoded genetically. Other geneticallyset parameters include sensor and motor gains, motordecay constant, and whether neurons are inhibitory orexcitatory. All parameters are encoded in a real-valuedgenotype, each gene assuming a value within [0, 1];each parameter is linearly scaled to the correspondingrange of values, except for sensor and motor gains,which are scaled exponentially. Only vector mutation(Beer, 1996) is used with a standard deviation of vec-tor displacement of 0.5 (maximum genotype length is220); genetic boundaries are reflective. Fitness is cal-culated according to

if the current distance to the source d is less than theinitial distance Di , otherwise f = 0. M measures theaverage difference in activity between the motorsdivided by the motor gain:

Near optimal fitness will be obtained by robots approach-ing a source of light rapidly and with minimal inte-grated angular movement.

3.2 Neural Controller

An integrate-and-fire model with reversal is used forthe neural controller. The time evolution of the mem-brane potential V of a neuron is given by

where "m is the membrane time constant (range[10 ms, 40 ms]), Vrest = –70 mV is the rest potential,and the excitatory and inhibitory reversal potentialsare respectively Eex = 0 mV and Ein = –70 mV.

F1 M 2–( )

TS--------------------- f t f;d( 1 d

Di-----–= =

M0.125

TS-------------

ML MR–( )MG

------------------------- td(=

"mdVdt------- Vrest V– gex t( ) Eex V–( ) gin t( ) Ein V–( )+ +=




A noisy threshold value, Vthres , is given by a nor-mal distribution with a genetically set mean value(range [–65 mV, –50 mV]) and a deviation of 1 mV.When the membrane potential reaches this threshold,an action potential is fired and V is reset to Vrest . Anabsolute refractory time of 4 ms prevents the neuronfrom firing another spike within this period.

Every time a spike arrives at neuron j from anexcitatory presynaptic neuron i the excitatory con-ductance of j is increased by the current value of thesynaptic strength (wij (t)): gex(t) ) gex(t) + wij (t). Theinhibitory conductance gin is similarly affected byspikes coming from inhibitory neurons. Conductancesotherwise decay exponentially:

with "ex and "in genetically set for each neuron fromthe range [4 ms, 8 ms].

The current motor value is stored in variablesML,R which is directly translated into the left and rightvelocities respectively.

with "mot genetically set from the range [40 ms,100 ms] and MG from [0.1, 50]. Both motors have thesame value for their gains and decay constants. Thisapproach marks a difference from previous work onthe evolution of spiking controllers that have used aneural rate estimation method for driving the motor(Floreano & Mattiussi, 2001).

3.2.1 STDP The properties of plastic windows (Fig-ure 1) are evolved for each synapse in the neural net-work controller. Following (Song et al., 2000), synapticchange is implemented using two recording functionsper synapse P –(t) and P +(t). Every time a spike arrivesat the synapse the corresponding P +(t) is incrementedby A+, and every time the postsynaptic neuron firesthe corresponding P –(t) is decremented by A–. Other-wise, these functions decay exponentially with timeconstants " – and " +, respectively. P –(t) is used to decreasethe synaptic strength every time the presynaptic neu-ron fires: wij (t) ) wij (t) + wmax P –(t). Analogously,P +(t) is used to increase the synaptic strength every

time the postsynaptic neuron fires: wij (t) ) wij (t) +wmax P +(t). In all cases of the first series of experimentsthe maximum synaptic strength is wmax = 1. This methodfacilitates the computational implementation of STDPby eliminating the need to keep track of spike trains orcalculate other response functions that could be morecostly.

The values for A+ and A–, and " + and " – are genet-ically set per synapse from the ranges [0.0001, 0.05]and [10 ms, 40 ms], respectively. In all the experimentsreported here the plastic windows are Hebbian, that is,spikes arriving before a postsynaptic action potentialalways potentiate a synapse and those arriving afteralways depress it. Experiments relaxing this constraint,that is, allowing anti-Hebbian or purely potentiatingor depressing windows, have also been carried out suc-cessfully but are not reported here.

3.2.2 Activity-Dependent Synaptic Scaling (ADS)Some of the mechanisms used by neurons to regulatetheir firing rate homeostatically are thought to affectall incoming synapses, scaling them up or down inde-pendent of the presynaptic activity (Turrigiano, 1999).If the postsynaptic activity is above a certain target,excitatory synapses are scaled down. Otherwise, theyare scaled up, thus preventing sustained levels of activ-ity that are too high or too low. Following van Rossumet al. (2000) excitatory synapses are modified accord-ing to

where zgoal = 50 Hz and "ads is genetically set from therange [1 s, 10 s]. The firing rate zj of a neuron is esti-mated by a leaky integration of the spike train:

where t( f ) are the times when the neuron emits a spike(the sum runs over all previous spikes) and "z =100 ms.

In real neurons, this is a mechanism that acts overlong time scales (over hundreds to thousands of sec-onds) (Turrigiano, Leslie, Desai, Rutherford, & Nel-son, 1998), but due to computational limitations (thevery long evaluation runs that would be required) thechosen time scale (~ "ads) is faster than this but stillsignificantly slower than the rest of the time scales in

"exdgex

dt---------- gex "in

dgin

dt----------;– gin–= =

"motdML R,

dt--------------- = ML R,–

+ MG * t tforwf( )–( ) * t tback

f( )–( )–+( )

"adsdwij

dt---------- wij zgoal zj–( )=

"z

dzj

dt------- zj– * t t f( )–( )++=




the system. Even though the above mechanism acts onexcitatory synapses, in the current context it has alsobeen applied when the presynaptic neuron is inhibi-tory by multiplying the right-hand side above by –1. Asimilar homeostatic mechanism has been successfullyimplemented in robots capable of adapting to sensori-motor disruptions not previously experienced (DiPaolo, 2000).

3.2.3 Directional Damping Synaptic weights are con-strained within the range [0, 1]. This can be achievedsimply by a stop condition at the boundaries or bymeans of damping factors that vanish as the weightvalue approaches a boundary. The choice can haveimportant consequences. No damping leads to a bimo-dal distribution of weights under random stimulation(Song et al., 2000), where most weights adopt theminimum or maximum values in the range, but fewvalues in between. This also happens with purely posi-tional damping, that is, factors that slow down weightchange near the boundaries but only depend on thecurrent weight value. A biologically plausible alterna-tive is directional damping whereby if a weight valueis near a boundary, changes that push this value towardthe boundary are slowed down, but changes that pushit away from the boundary are not. The equilibriumweight distribution in this case tends to be unimodaland centered around the point where potentiation anddepression equilibrate (Rubin et al., 2001). Directionaldamping is supported empirically by the observationthat spike-driven potentiation is more pronounced thanthe expected linear variation at synapses of relativelylow initial strength in cultured hippocampal cells (Bi& Poo, 1998). It was also observed that the mean frac-tional negative change was constant over a wide rangeof initial weights, corresponding to the linear dampingfactor for absolute depression equal to the currentweight value.

Linear directional, or multiplicative damping issimply implemented by transforming a weight change[as resulting from STDP or activity-dependent synap-tic scaling (ADS) or both]: !wij ) (1 – wij )!wij if !wij

> 0 and !wij ) wij !wij if !wij < 0 for wij , [0,1].

3.2.4 Neural Noise Different sources of neural noisehave been modeled. At any given time, Gaussian noisewith zero mean and 1 mV deviation is applied to the

value of the firing threshold. This is the only source ofneural noise in the first set of experiments. Addition-ally, for the second set, the refractory period is ran-domly set every time step using a uniform distribution([2 ms, 4 ms] for cases of short refractory period, [4 ms,8 ms] for long refractory period). Background noise ismodeled as an incoming Poisson train to every neuronwith a frequency of 10 Hz, and spontaneous firing hasalso been modeled using a baseline 10 Hz Poisson proc-ess for each neuron, but subject to refraction.

3.2.5 Synaptic Decay To test the robustness of theevolved controllers to perturbations in their internalconfiguration, synaptic weights are allowed to decayexponentially to 0 with a time constant that can be asfast as 100 ms. Synaptic decay is not affected by direc-tional damping and is not applied during evolution butonly during behavioral tests.

3.2.6 Poisson Filters and Randomized Delays Totest the reliability of the evolved controllers on theprecise timing of spikes, the simple expedient of fil-tering the output of a neuron with a Poisson processemitting random spikes at the same instantaneous ratehas been used. Information about firing rate is con-served, but precise spike timing is disrupted. Becausethe rate z is estimated using only previous spikes, it isonly possible to approximate the instantaneous firingrate of the neuron in this manner. It is expected, how-ever, that if controllers rely heavily on firing rates, thedisruption in performance should not be too strong. Amore sophisticated method consists of introducing arti-ficial random delays in the firing time of single ormultiple neurons. This is done by keeping a short sub-train corresponding to the last T ms of activity andswapping the current fire state of a neuron with a ran-domly selected state in the sub-train, thus conservingthe number of spikes. This is similar in objective totests in vivo on honeybee odor discrimination demon-strating the role of synchronized neural assemblies(Stopfer, Bhagavan, Smith, & Laurent, 1997). Thesetests are applied only after evolution.

3.2.7 CTRNN Control runs using rate-based continu-ous-time, recurrent neural networks (CTRNN; Beer,1990) have been performed. These are defined by




where Vi represents the membrane potential of neu-ron i, "i the decay constant (range [0.4 s, 4 s]), bi thebias (range [–3, 3]), zi the firing rate, wij the strengthof synaptic connection from node i to node j (range[–8, 8]), and Ii the degree of sensory perturbation forsensory nodes. A plastic version of this controllerhas also been used and is described in more detaillater.

4 Results: Low Neural Noise

Robots using spiking controllers were successfullyevolved for four different scenarios: (1) No plasticity(evolution of fixed weights); (2) STDP no damping(evolution of STDP windows without directionaldamping, random initial weights); (3) STDP (evolu-tion of STDP windows with directional damping, ran-dom initial weights); and (4) STDP+ADS (evolutionof STDP windows and ADS with directional damp-ing, random initial weights).

4.1 Evolvability

Five 100-generation independent runs were made foreach of the four scenarios above. It was qualitativelyfound that, contrary to expectations, the more com-plex case (in terms of the dimensions of the searchspace and the additional features of the mechanism),that is, STDP+ADS, was easier to evolve, particu-larly during the initial generations, than the simplercase of no plasticity. This is observed in Figure 2where the mean population fitness, averaged over thefive runs, is plotted for these cases (error bars indi-cate standard deviation). There is little observable dis-tinction between using STDP with or without damping(not shown) but there is a significant differencebetween STDP+ADS and no plasticity. However, thelong-term fitness of the best individual of the lastgeneration is not significantly different between thecases (Figure 3a). These were obtained by runningfor each case the best individual of five runs for 10independent evaluations. For comparison purposes, aCTRNN controller has been evolved and testedunder the same conditions and is also shown in thefigure.

"i

dVi

dt-------- –Vi wjizj Ii zj;+

j

++1

1 Vj bj+( )–[ ]exp+-----------------------------------------------,= =

Figure 2 Mean population fitness averaged over fiveindependent runs for two of the evolutionary scenarios(no plasticity and STDP+ADS).

Figure 3 Fitness and robustness. (a) Average fitness ofthe best individual in the last generation; (b) robustnessagainst synaptic decay; tdecay indicates the speed withwhich weights decay exponentially to 0.




4.2 Synaptic Decay

Plastic controllers maintain their functionality dynam-ically as a consequence of their own activity. To assesstheir reliability a disruptive decay of synapses was intro-duced as described above. Figure 3b shows results forthe three plastic set-ups (again using five independentruns for each and testing the best individual 10 times).It is apparent that STPD+ADS controllers are able toperform quite reliably even for decay times of up to250 ms, whereas controllers using STDP with or with-out damping are unable to maintain their performance.It is possible to explain this as a consequence of thecompensatory nature of the ADS mechanism, which isable to alter synapses as a consequence of longer termchanges in neural activity in ways that tend to main-tain this activity and therefore the functionality of thecontroller.

4.3 Behavioral Strategies

Evolved robots show a rich variety of behavioral strat-egies. Practically all of the observed strategies are active,involving scanning behavior, which is not surprisinggiven that sensors saturate rather easily. Figure 4a showsa trajectory for an STDP+ADS robot. Unlike what iscommonly observed using rate-based neural control-lers, robots are often able to come to a full stop near asource of light (but not facing it)—this can be observedin the inset showing the distance to the light source.During these periods there is very little neural activity,until a random spike from the sensors triggers a roundof activity and the robot starts scanning again (seearrow in Figure 4b, which shows the network activityfor the same controller). In the absence of light, robotsscan their surroundings using this mechanism.

4.4 Timing Dependence

Applying a Poisson filter test to all the neurons in acontroller, whereby rate information is conserved butnot spike timing, results in total loss of fitness (< 1%).This was tested in different controllers in the four sce-narios with the same result. A more detailed studyinvolves the application of random delays to single neu-rons. As explained above, this method conserves thenumber of spikes but randomizes the timing within asub-train of a given size corresponding to the last T msin the simulation. Figure 5 shows how fitness is affected

for the best individual in one STDP+ADS run whenrandom delays are applied to all the neurons and tosingle neurons (similar results were obtained for theother classes). There is a sharp reduction in fitness inthe first case for a randomization of sub-trains as shortas 2 ms, but applying the test to single neurons showsthat the controller is crucially dependent on the pre-cise timing of only three of the six neurons (corre-sponding in this case to one motor and the two sensorneurons, but to different neurons in other cases).

4.5 Synaptic Dynamics and Internal Regulation

Weight values also exhibit rich dynamical behavior.Figure 6a shows three of the synaptic weights affecting

Figure 4 Evolved robot using STDP+ADS. (a) Trajectory,inset: distance to light source, (b) network activity duringa fraction of the trajectory triggered by a single spike. SL:left sensor; SR: right sensor; n0–n5: neurons 0–5.




neuron 0 for the same run corresponding to Figure 4(all other weights behave similarly). The other neuronsshow similar qualitative dynamics at this time scale,consisting of rhythmic periods of activity (during whichthe robot rotates left and right) punctuated by silentperiods (during which the robot gently comes to a fullstop facing away from the light). While neurons arefiring, STDP drives the weights always to a same areawithin the range; the rest of the time ADS takes controland increases the synaptic efficacies so as to compen-sate for the lack of firing activity in the neuron. Thishas the effect of hypersensitizing the whole network,so that even a single spike arriving from the sensors isoften capable of triggering a new round of activity andthe robot starts moving again.

The dynamics of the microstructure are also inter-esting. Figure 6b shows how a particular synaptic strengthis regulated during a period of activity lasting about 0.6 s.Its value is maintained nearly constant. The insets showthe pattern of firing of the pre- and postsynaptic neuronsthat undergoes a process of synchronization that per-sist while at least one sensor is active and finally desyn-chronizes and stops. The plasticity window shown in thefigure indicates that synchronous firing translates intonet potentiation that is compensated for by ADS, result-ing in an equilibrium (just after activity stops, ADSdepresses the weight due to the inertia of the frequency

Figure 5 Fitness effect of randomizing spike sub-trainsfor an STDP+ADS controller. (a) All neurons; (b) individ-ual neurons. Bars indicate maximum and minimum values.

Figure 6 Example of weight dynamics. (a) Three syn-apses affecting node n0 together with its firing pattern;(b) weight regulation (synapse w20) during a period ofnetwork activity and sensor activations (not to scale).Insets: firing patterns of corresponding neurons at theonset of activity round and during highly ordered period(left), evolved plasticity window for this synapse (right).




estimation before resensitizing the network). This reg-ulatory pattern has been observed in all of the casesstudied for the STDP+ ADS class.

The neurons that drive the left motor (n0 and n4)fire at the same frequency, yet it is possible to observethat their phase relation changes nonrandomly so as toproduce a peak in motor output roughly toward themiddle of the activity round. The right motor doessomething similar just after the left motor peaks,although its driving neurons (n1 and n5) are not clearlyentrained (data not shown).

5 Results: Noisy Neurons

A second series of experiments using noisy neuronswere run for the cases of no plasticity, STDP (withdamping) and STDP+ADS (five 400-generation inde-pendent runs each). In addition to small threshold noise,each neuron included a noisy refractory period andeither 10 Hz Poisson baseline firing or 10 Hz Poissonextra input to each neuron.

One undesirable feature of the first set of experi-ments is the high frequencies of firing utilized, reach-ing up to 200 Hz. Although this is unrealistically high,it was not expected that with a very small network allthe evolved properties would be equally plausiblebiologically. There may be, however, a problem inthat the low-noise networks often seem to fire at themaximum possible frequency, that is, a typical inter-spike interval equal to the absolute refractory period.This feature may indeed be undesirable, so the follow-ing modifications were made. In the case of ADS, zgoal

was lowered from 50 Hz to 40 Hz. The range for fir-ing threshold was reduced from [–65 mV, –50 mV] to[–60 mV, –50 mV], the maximum frequency of the sen-sor input trains was reduced from 200 Hz to 100 Hz,and the range of synaptic strengths was reduced from[0, 1] to [0, 0.5] (synaptic scaling was modified accord-ingly). The inhibitory reversal potential Ein was changedfrom –70 mV to –80 mV. The average refractory period(now uniformly distributed) was 6 ms (instead of4 ms) although successful trials were also performedfor shorter average refractory periods. The averageduration of single light source presentation waschanged from 10 s to 20 s and the range of possiblemotor gains from [0.1, 50] to [1, 20]. These modifica-tions succeeded in producing networks with moreplausible frequencies (range 10 Hz to 80 Hz).

5.1 Robustness

Again test were carried out for robustness against syn-aptic decay. Figure 7 shows that these results are notsignificantly altered by the addition of neural noise.Controllers can also cope with sensorimotor distur-bances such as asymmetric modification of sensor andmotor gains (not shown). Networks seem to be a bitless robust against synaptic decay, but still STDP+ADS controllers cope better than STDP-only control-lers (Figure 7b).

5.2 Neural Noise

Figure 8 shows the response of three independentlyevolved individuals for each set to variations in the

Figure 7 Fitness and robustness. (a) Average fitness ofthe best individual in the last generation for noisy neu-rons; (b) robustness against synaptic decay; tdecay indi-cates the speed with which weights decay exponentiallyto 0.




amount of spontaneous background firing. In all thesecases, controllers were evolved with a Poisson back-ground firing for each neuron of 10 Hz (correspondingto the vertical line). Fitness values have been obtainedfor 20 runs per point and normalized for 10 Hz. Theresponse to increased levels of random firing is, asexpected, a decreased level of performance. Some con-trollers are more robust than others but the trend isclear in all of them. Interestingly, decreasing the levelof neural noise also results in worse controllers (againwith different degrees of sensitivity). This implies thatrandom firing is being used functionally. Similar resultswere observed for controllers evolved with a 10 HzPoisson input.

5.3 Spike Timing

The introduction of different sources of neural noisedisrupts the highly regular firing patterns obtained inthe first series of experiments. Still, it is possible toobserve in some cases that during periods of activity,the noisy controllers still seem to fire with some regu-larity. To observe this more clearly, some paired patternsand their covariograms (cross-correlograms correctedfor time-dependent shifts in frequency) are shown inFigure 9 for two STDP+ADS controllers, one of themexhibiting bursting behavior. Covariograms are roughlyproportional to the probability of the two neurons fir-ing after the corresponding shift in time. Around 40 sof data are used in their calculation (see Appendix). Inboth cases it is possible to observe distinct peaks andvalleys in the covariation of the two spike trains forsome values of the time shift (x-axis). The peaks corre-spond to very short shifts (a few milliseconds) in timeeven though there are no peaks corresponding to zeroshift (synchrony). The smooth lines give an idea of theinterval of significance (±-N) given the variance ofeach spike train (see Appendix). Some of the peaksand valleys clearly cross this interval implying that thetiming regularity in the patterns is significant. Still,the question remains whether such regularity has afunctional value or whether it is epiphenomenal.

5.4 Disruption of Spike Timing

To answer the last question we repeated the experimentson disruption of spike trains done in the first series. Onapplication of a Poisson filter, performance in the firstseries dropped to almost, 0%. Figure 10a shows the

Figure 8 Effect of increased and decreased spontane-ous random neural firing on network performance onthree independently evolved individuals for each condi-tion (each line corresponds to one individual robot). (a)No plasticity; (b) STDP; and (c) STDP+ADS. All networkswere evolved with a background random firing of 10 Hz(vertical lines) and performance in normalized at thispoint. Each point is the average of 20 runs.




effect of applying the same filters to noisy controllersfor each case (five independent runs, 20 evaluations,each). It is clear that even though there is a reductionin performance, the effect is not nearly as dramatic asbefore. The case of STDP+ADS is even quite robust.On observation, it was found that the disrupted robotsdid perform phototaxis, only less efficiently.

These results are supported by the randomizationof spike trains (Figure 10b). In sharp contrast to Fig-ure 5 there is a considerable degree of robustness fordisruption corresponding to randomizing the sub-trains of up to 10 ms, and then a slow decay for longersub-trains. The average result for STDP+ADS meansthat up to 60 ms of spike-timing information can beshuffled without significant loss of fitness. This indi-

cates that noisy controllers are very unlikely to beusing any information contained in the precise timingof spikes. Covariograms and sample trains for the samecontrollers and neurons shown in Figure 9 can be seenin Figure 11 corresponding to spike-train randomiza-tion with a sub-train size of 40 ms. No noticeable peakstands out of the estimated interval of significance.

5.5 Analysis of One Strategy

A series of different behavioral strategies have beenobserved. Most of them involved periods of high neu-ronal activity in the whole network, punctuated byperiods of low activity and random background firing.Figure 12 shows one strategy for an STDP bursting

Figure 9 Covariograms and sample activity for two STDP controllers. The controller at the top shows rather constantactivity while the controller at the bottom shows evidence of bursting. In both cases there is a peak in the covariogramsfor a small negative time shift. Careful inspection of the spike trains shows a tendency of one train to fire just after theother. Smooth bands in covariograms show the estimated interval of significance.




controller. The robot performs an unusual approach tothe light source by traveling ‘‘backward’’ and avoid-ing sensor activation by occluding the light with itsown body. As long as the occlusion persists and thesource is within sensor range, the robot will travel inthe general direction of the source. This situation cor-responds to periods of low and random firing and ismarked by the thin segments in the trajectory. Eventu-ally, one of the sensors comes into contact with thelight—no matter which sensor it is, effectively thesame round of neural activity is triggered by this

event. This causes the robot to start moving at a higherspeed in an arc that lasts until light is occluded onceagain (thick segments in the trajectory). The strategyworks by taking advantage of an invariant geometricalrelation: If the robot travels backward (or alternativelyif the sensors are at the back of an occluding volume)then, as long as the periods of sensor activation areminimized, the robot will eventually reach the sourceof light. The robot approaches the light by keeping itssensors oriented within the cone of shadow caused byits own body.

All neurons in this particular case are excitatory(although typically controllers are composed of acombination of neural types) and similar in their prop-erties except neuron n0, corresponding to the leftmotor forward control, which has a longer membranedecay constant "m. As a result, when the all the neu-rons are firing, this neuron is firing at a significantlylower rate, resulting in faster speed on the left side inthe backward direction, and thus the robot is able todescribe the arc that will eventually produce a shadowon both sensors.

The explanation is supported by a simple experi-ment. If the sensors are positioned diametrically opposedand at 90° to the axis of the motors (so that one sensorwill always be active), the robot fails to perform (lessthan 0.1% of the original fitness). However, if oneremoves the sensor pointing in the direction of move-ment and leaves only the other sensor at equal anglesbetween the motors the robot is still able to performphototaxis, although with lower fitness (around 50%)due to the longer time it takes to reach the position ofthe source.

The strategy also explains why performancedegrades when the level of neural noise is reduced(Figure 8). The robot relies for its approach on the lowlevel of noisy activity that keeps it traveling in thedirection of the source while the sensors remain inac-tive. Given the self-correcting nature of the trajectory,it does not matter whether such random activity causesthe robot to deviate a little, as this will only triggerneural activation and a corrective arc trajectory thatwill last until sensor readings drop once again, eventu-ally resulting in a course regulation. It also seems plau-sible now that even the application of Poisson filtersand jitter in the spike trains will not have a very strongeffect on the performance, as shown in Figure 10.

It must be stressed that this is one of many strate-gies found in the second series of experiments. Other

Figure 10 Disruption of spike timing. (a) Effect on per-formance of applying a Poisson filter to the output of eachneuron, (average of 20 evaluations for five independentruns in each case); (b) effect of spike-train randomizationfor all neurons, also averaged over five runs for eachcase. In contrast to Figure 5 noisy controllers do notdegrade in performance when spike timing is altered,despite the nature of the synaptic rules of plasticity.




strategies resemble more traditional forms of photo-taxis with an active oscillatory (but not cycloidal) ele-ment, particularly in the case of STDP+ADS, whererhythmic bursting is often observed.

5.6 Synaptic Dynamics

Figure 13a shows the weight dynamics for one run inan STDP controller. Except for a few slow-changingweights, most of the synapses settle very rapidly into astable value. The distribution of values covers the wholerange. Figure 13b shows for the same run a detail ofthe change in one particular weight together with thepre- and postsynaptic trains. The synaptic strength isquite stable. Both neurons fire with a very similar fre-quency and there is almost a one-to-one correspond-ence in the number of spikes.

Figure 11 Covariograms. Same as in Figure 9 but with randomized spike trains (40 ms).

Figure 12 Robot trajectory for an STDP controller.Thick segments corresponds to periods of high neuralactivity when light impinges on at least one of the sensors.




As in the case with low neural noise, endogenousbursting is very common for STDP+ADS controllers.The corresponding weight dynamics are similar aswell. This is shown in Figure 14a where the burstingof one neuron and one corresponding incoming synap-tic strength are shown. Notice that sensor activation(not shown to scale) does not drive the bursts of activ-ity, but that these are endogenously generated by theoscillatory dynamics created by the ADS balancing.Interestingly, the pattern is altered when the sensor isup and this produces the overall effect of a retardationin the phase of oscillation. Figure 14b shows a detailwith both the pre- and postsynaptic spike trains for thesame synapse.

6 Comparison with Rate-Based Synaptic Plasticity

It is not surprising that if spike timing is perturbed bythe presence of noise, then the controller will relymore heavily on other network properties or sensorim-otor invariants. However, the rules that determine weightchange remain temporally asymmetric and thereforedependent to some extent on spike timing; this is theparadoxical result. To investigate if there is any addi-tional feature granted by the use of STDP in the noisycase, a comparison is made with a rate-based model ofsynaptic plasticity. A CTRNN controller is used inwhich synapses undergo weight change depending onthe pre- and postsynaptic firing rates.

Figure 13 Synaptic dynamics for an STDP controller.(a) Weight value for all synapses for one single run; (b)detail for one synapse together with pre- (top) and post-synaptic trains (bottom). Dashed line indicates period ofsensor activation.

Figure 14 Synaptic dynamics for a bursting STDP+ADS controller. (a) A typical synapse; (b) detail for thesame synapse together with pre- (top) and postsynaptictrains (bottom), dashed line indicates period of sensoractivation.




Designing a fair comparison is not straightforward.Simple Hebbian rules of the form dwij /dt = .ij zizj havebeen unable by themselves to produce a single con-troller that performed well over a dozen attempts.However, extended rules or a combination of them areknown to do so with some ease (Floreano & Urzelai,2000). A general expression for such rules is

where all parameters (Ak, .ij ) are subject to evolution-ary change. Even when weights are initialized ran-domly, if firing rates (zi) have an initial short-rangerandom distribution around their mean value (0.5), theabove parameters can largely determine the initialdirection of weight change, thus facilitating the task ofevolution by providing an innate bias for the develop-ment of the neural network configuration. This is notwhat happens with STDP controllers, as neurons arerandomly initialized as firing or nonfiring at t = 0. Inother words, all the cases of STDP controllers studiedhere start with random weight initial values and ran-dom initial weight derivatives.

Since firing rates in the CTRNN are initializedaround the middle of the range the above conditioncan be achieved by modifying the plastic rule to

The initial direction of weight change in this case isalso random, and evolution must be able to build abootstrapping process whereby neural properties andenvironmental interaction play a stronger role in theshaping of the controller. These are the conditionsunder which STDP controllers are evolved. Dampingfactors are also applied.

Maximum weight derivatives are made of the sameorder (half the weight range can be covered in 1 s ofsimulated activity) and in some trials even faster thanin the STDP case. This condition results in Ak , [–1, 1]and .ij , [–3/s, 3/s]. Noise is also simulated in therate-based model by perturbing neuron rates with theaddition of a uniformly distributed random variable ofrange 0.1. Poisson input noise was simulated by mod-ifying the input currents: Ii ) Ii + with / a nor-mally distributed random variable with unit stardarddeviation and zero mean (Tuckwell, 1988). A non-

noisy scenario was also studied where these condi-tions were not applied; the differences were not signif-icant.

It was found that rate-based controllers evolvedunder the same conditions as the STDP networks sig-nificantly underperformed in the phototaxis task. Fig-ure 15 shows the average fitness of five independentruns both for noisy and non-noisy scenarios comparedwith STDP controllers with neural noise, both undernormal conditions and with spike-train randomization(20 ms).

Inspection of the synaptic dynamics shows thatthe network is too slow to settle into a stable condition.This is shown in Figure 16 where a comparison ismade for two weights representative of the fastest andslowest convergence, for both one rate-based and oneSTDP noisy controller during 10 independent runs.

STDP controllers are rapidly able to define a direc-tion of weight change depending on the relation betweenthe plastic rules and the neural properties. The ran-domness in weight derivative lasts only of a few tensof milliseconds (it cannot be appreciated in the figure)whereas rate-based plastic controllers take much longerto settle into a given range (if they settle at all). Figure17a shows the average reduction in variance acrosstrials for all weights in the two cases corresponding toFigure 16. Even though this variance takes into accountall the weights (and consequently may overestimatethe variance of those with higher functional signifi-

dwij

dt---------- .ij A0 A1zi A2zj A3zizj+ + +( ),=

dwij

dt---------- = .i j A( 1 zi 0.5–( ) A2 zj 0.5–( )+

+ A3 zi 0.5–( ) zj 0.5–( ) )

Ii/

Figure 15 Fitness for rate-based plastic controllers withand without noise in neurons. The fitness for STDP withnoisy neurons is included for comparison as well as thefitness obtained with randomized spike trains (size ofsubtrain 20 ms).




cance) it is clear that the difference, both in transientand longer-term behavior, is significant.

It is also interesting to compare the weight dynam-ics for an STDP controller with neural noise bothwhen functioning normally and when a Poisson filteris applied (total loss of timing information). A squared-difference matrix Dwij (t) = (wij (t) – wpij (t))

2 is recordedand averaged over 10 trials (wpij (t) correspond to theweights for a Poisson-filtered run). Figure 17b showsthis quantity for all the weights and for the averageacross all weights (thick line). It is clear that the long-term behavior of the weight matrix does not dependstrongly on timing information as weights tend toapproach the same value as the Poisson-filtered network.This is interesting, as it suggests that, under noisy con-ditions, evolution is able to find networks that relymore strongly on the timing properties of the neuronsand plastic rules to determine long-term weight distri-bution. The performance of the rate-based plastic con-trollers improves to levels comparable with STDP

controllers with the presentation of more light sourcesper evaluation (more than five instead of the two pre-sented in the STDP case) and if initial weight deriva-tives are permitted to be set genetically.

7 Discussion

Despite the exploratory nature of this work, it is possi-ble to assert that the richness of behavior in plasticspiking neural networks, not often found in other con-trollers, makes them extremely interesting candidatesfor further testing in adaptive behavior research. It isperhaps inevitable that a synthetic design method shouldbe used to approach their complexity in integrated agentswith closed sensorimotor loops. This article has shownthat it can be done successfully for a simple task. Thesuitability of these mechanisms in other scenarios, andultimately their potential for more complex cognitiveperformance, remains to be seen. In particular, it will

Figure 16 Weight change for one STDP noisy controller (left) and one rate-based noisy controller (right). Each figureshows the change in the same synapse for 10 independent evaluations. A fast and a slow changing weight are shown ineach case (top and bottom, respectively).




be of much interest to explore more closely the func-tionality proposed for STDP (Section 2) by using amore appropriate neural architecture. One of the maindisadvantages of the approach is obviously the extracomputational cost involved in the longer evaluations(roughly 200 times the equivalent of CTRNN control-lers). But this cost has passed from being prohibitive a

few years ago to being acceptable nowadays if thebenefits justify it.

The first series of evolved controllers demonstratethe use of some uncommon mechanisms (in a roboticscontext) such as the precise timing information of spiketrains. Neural networks undergo rhythmic periods ofactivity during which pairs of neurons start uncorre-lated, then reach a highly entrained state, and finallylose their entrainment and become inactive. Such peri-ods can be triggered by a sensory stimulus or if theinactivity has lasted long enough, even by single ran-dom spikes coming from the sensors, owing to the excit-ability that builds up thanks to the ADS mechanismthat acts as an adaptive balancer of synaptic input. Dur-ing these periods synaptic strengths are kept nearlyconstant and despite firing in strongly entrained modemotor neurons maintain enough variety to coordinatetheir relative timing and achieve functionally usefulmovement. Single neuron randomization of spike trainshas revealed that not all neurons are crucial in allow-ing the network to make use of timing information.

The second series introduced a more plausible sce-nario with various sources of neural noise. In sharpcontrast to the first series, robot performance for noisycontrollers degraded little or nothing at all on applica-tion of Poisson filters or randomization of spike trains,indicating that despite the nature of the plastic rulesthat drive the controllers in the STDP and STDP+ADS cases, and despite some evidence of regularity inspike times, these controllers do not need to make useof precise timing information to work properly. Thedependence on precise timing in the low noise serieswould indeed seem to fit with the nature of the plasticrules. However, as most direct evidence for STDPoriginates from cell culture studies (Bi & Poo, 2001),it remains questionable how these mechanisms willoperate on a behaving animal—particularly if theyunderlie learning processes that need to operate at a verydifferent timescale, (Mehta, Lee, & Wilson, 2002). Inthe second series, the same plastic rules are at work andyet precise timing seems not to be essential, even thoughit is naturally present to some degree in the unperturbedrobot. In a sense, regularity in spike timing in this caseis epiphenomenal and STDP can function without it.

It is a well-known principle of evolutionary designthat reliable aspects of the performance evaluation maybe taken advantage of by evolution, and that makingthose aspects unreliable produces solutions that arerobust to their variation. This principle guides the min-

Figure 17 Weight dynamics. (a) Reduction in variancefor all weights in the network averaged over 10 evalua-tions for an STDP noisy controller (solid) and a rate-based noisy controller (dashed); (b) squared difference inweights between normal and Poisson-filtered conditionsfor a STDP controller with neural noise. Each line corre-sponds to one synapse and is obtained as the average of10 runs. The thick line is the average for all weights.




imal simulation design strategy (Jakobi, 1997). In thecurrent context, the presence of neural noise may havemade it easier for evolution to find solutions that relyon neural firing rates while ignoring noisy spike tim-ing. This is supported by the evidence in Figure 17b,where the weights of STDP controllers both with andwithout Poisson filters converge to a same distribution.

Although nonplastic rate-based controllers willwork optimally for this task (Beer & Gallagher, 1992),evolving purely plastic rate-based controllers underthe same conditions results in poorer performance thanSTDP controllers due to lack of efficiency in reachinga stable weight distribution. Temporally asymmetricplasticity is much more stable and efficient, even inthe presence of noise, and is able to “develop” a con-troller faster (which is an implicit fitness requirementin this case).

It is hard to draw general implications for morecomplex scenarios from the nonreliance of noisy con-trollers on spike timing. What this exploration doesshow is that it is possible for STDP rules to work in anintegrated sensorimotor system even in the absence ofprecise timing information (which seems paradoxical)while still retaining advantanges over rate-based mod-els evolved under the same conditions. The importantquestion is in what situations, if any, analogous robust-ness may be found in natural systems. To answer this,behavioral studies are needed, but for the moment auton-omous robotics modeling may provide some initialinsights. The next step will be to approach more com-plex tasks where the evolutionary approach has alreadyproved successful: Tasks requiring different levels ofmemory, orientation to stimuli, and selective attentionsuch as visual shape discrimination (Harvey, Husbands,& Cliff, 1994), delayed response (Jakobi, 1997), orBeer’s minimal cognitive systems (Beer, 1996; Slo-cum, Downey, & Beer, 2000) with the addition of sen-sory arrays and neural fields.

Acknowledgements

Thanks to Jianfeng Feng for comments and advice and for firstsuggesting STDP might be worth looking into. Many thanksalso to Eytan Ruppin’s and Gal Chechick’s comments and goodadvice. The review process for this article was handled by theEditor in Chief. The author wishes to acknowledge the supportof the Nuffield Foundation (grant no. NAL/00274/G).

References

Abbott, L. F., & Blum, K. I. (1996). Functional significance oflong-term potentiation for sequence learning and predic-tion. Cerebral Cortex, 6, 406–416.

Abbott, L. F., & Nelson, S. B. (2000). Synaptic plasticity: Tam-ing the beast. Nature Neuroscience, 3, 1178–1183.

Beer, R. D., 1990. Intelligence as adaptive behavior: An exper-iment in computational neuroscience. San Diego: Aca-demic Press.

Beer, R. D. (1996). Toward the evolution of dynamical neuralnetworks for minimally cognitive behavior, In P. Maes, M.J. Mataric, J,-A. Meyer, J. B. Pollack, & S. W. Wilson(Eds.), From animals to animats 4: Proceedings of theFourth International Conference on Simulation of Adap-tive Behavior (pp. 421–429). Cambridge, MA: MIT Press.

Beer, R. D., & Gallagher, J. C. (1992). Evolving dynamical neu-ral networks for adaptive behavior. Adaptive Behavior, 1,91–122.

Bi, G. Q., & Poo, M. M. (1998). Synaptic modifications in cul-tured hippocampal neurons: Dependence on spike timing,synaptic strength, and postsynaptic cell type. Journal ofNeuroscience, 18, 10464–10472.

Bi, G. Q., & Poo, M. M. (2001). Synaptic modifications by cor-related activity: Hebb’s postulated revisited. AnnualReview of Neuroscience, 24, 139–166.

Brody, C. D. (1999). Correlations without synchrony. NeuralComputation, 11, 1537–1551.

Chechik, G. (2002). Spike-timing dependent plasticity and rele-vant mutual information maximization. Manuscript inpreparation.

Di Paolo, E. A. (2000). Homeostatic adaptation to inversion ofthe visual field and other sensorimotor disruptions. In J.-A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, & S. Wil-son (Eds.), From animals to animats 6: Proceedings of the6th International Conference on the Simulation of Adap-tive Behavior Cambridge MA: MIT Press.

Floreano, D., & Mattiussi, C. (2001). Evolution of spiking neu-ral controllers for autonomous vision-based robots. In T.Gomi (Ed.), Evolutionary robotics IV. Berlin: Springer.

Floreano, D., & Urzelai, J. (2000). Evolutionary robots withon-line self-organization and behavioral fitness. NeuralNetworks, 13, 431–443.

Gerstner, W., Kreiter, A. K., Markram, H., & Herz, A. V. M.(1997). Neural codes: Firing rates and beyond. Proceed-ings of the National Academy of Science, USA, 94, 12740–12741.

Harvey, I., Husbands, P., & Cliff, D. (1994). Seeing the light:Artificial evolution, real vision. In D. Cliff, P. Husbands,J.-A. Meyer, & S. Wilson (Eds.), From animals to animats3: Proceedings of the 3rd International Conference on




Simulation of Adaptive Behavior (pp. 392–401). Cam-bridge, MA: MIT Press.

Hopfield, J. J., & Brody, C. D. (2001). What is a moment?Transient synchrony as a collective mechanism for spatio-temporal integration. Proceedings of the National Acad-emy of Science, USA, 98, 1282–1287.

Horn, D., Levy, N., & Ruppin, E. (1998). Memory maintenancevia neuronal regulation. Neural Computation, 10, 1–18.

Husbands, P., Smith, T., Jakobi, N., & O’Shea, M. (1998). Bet-ter living through chemistry: Evolving GasNets for robotcontrol. Connection Science, 10, 185–210.

Jakobi, N. (1997). Evolutionary robotics and the radical enve-lope-of-noise hypothesis. Adaptive Behavior, 6, 325–368.

Kempter, R., Gerstner, W., & Hemmen, J. L. van (1999). Heb-bian learning and spiking neurons. Physical Review E, 59,4498–4514.

Maass, W. (1997). Networks of spiking neurons: The third gen-eration of neural network models. Neural Networks, 10,1656–1671.

Maass, W., Natschläger, T., & Markram, H. (2002). Real-timecomputing without stable states: A new framework forneural computation based on perturbations. Neural Com-putation, 14(11), 2531–2560.

Markram, H., Lubke, J., Frotscher, M., & Sakmann, B. (1997).Regulation of synaptic efficacy by coincidence of postsyn-aptic APs and EPSPs. Science, 275, 213–215.

Mehta, M., Lee, A. K., & Wilson, M. A. (2002). Role of experi-ence and oscillations in transforming a rate code into atemporal code. Nature, 417, 741–746.

Mehta, M. R., Barnes, C. A., & McNaughton, B. L. (1997).Experience-dependent asymmetric expansion of hippoc-ampal place fields. Proceedings of the National Academyof Science, USA, 94, 8918–8921.

Mehta, M. R., Quirk, M. C., & Wilson, M. A. (2000). Experi-ence-dependent asymmetric shape of hippocampal recep-tive fields. Neuron, 25, 707–715.

Rao, R. P. N., & Sejnowski, T. J. A. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learn-ing. Neural Computation, 13, 2221–2237.

Rossum, M. C. W. van, Bi, G. Q., & Turrigiano, G. G. (2000).Stable Hebbian learning from spike-timing dependentplasticity. Journal of Neuroscience, 20, 8812–8821.

Rubin, J., Lee, D. D., & Sompolinsky, H. (2001). Equilibriumproperties of temporally asymmetric Hebbian plasticity.Physical Review Letters, 86, 364–367.

Ruppin, E. (2002). Evolutionary autonomous agents: A neuro-science perspective. Nature Reviews Neuroscience, 3,132–141.

Slocum, A., Downey, D., & Beer, R. D. (2000). Further experi-ments in the evolution of minimally cognitive behavior:From perceiving affordances to selective attention. In J.

Meyer, A. Berthoz, D. Floreano, H. Roitblat, & S. Wilson(Eds.), From animals to animats 6: Proceedings of the 6thInternational Conference on Simulation of AdaptiveBehavior (pp. 430–439). Cambridge, MA: MIT Press.

Song, S., Miller, K. D., & Abbott, L. F. (2000). CompetitiveHebbian learning through spike-timing-dependent synap-tic plasticity. Nature Neuroscience, 3, 919–926.

Stopfer, M., Bhagavan, S., Smith, B. H., & Laurent, G. (1997).Impaired odour discrimination on desynchronization ofodour-encoding neural assemblies. Nature, 390, 70–74.

Sutton, R. S. (1988). Learning to predict by the method of tem-poral differences. Machine Learning, 3, 220–224.

Tuckwell, H. C. (1988). Introduction to theoretical neurobiol-ogy (Vol. 2). Cambridge: Cambridge University Press.

Turrigiano, G. G. (1999). Homeostatic plasticity in neuronalnetworks: The more things change, the more they stay thesame. Trends in Neurosciences, 22, 221–227.

Turrigiano, G. G., Leslie, K. R., Desai, N. S., Rutherford, L. C.,& Nelson, S. B. (1998). Activity-dependent scaling ofquantal amplitude in neocortical neurons. Nature, 391,892–896.

Yao, H., & Dan, Y. (2001). Stimulus timing-dependent plastic-ity in cortical processing orientation. Neuron, 32, 315–323.

Appendix: Estimation of - 2N

Here we calculate the variance - 2N for a covariogram

of two independent spike trains S1(t) and S2(t) withrespective time-dependent averages R1(t) and R2(t)and variances - 2

1(t) and - 22(t). We follow Brody’s

(1999) development for many identical recorded trialsby adapting it to a single sufficiently long trial.

First we estimate the R1(t) and R2(t) by perform-ing simple sliding-window averages of the spike trainswith windows of size 2td. The corresponding vari-ances are estimated using the same windows. In ourcase td = 20 ms.

The covariogram is defined as

Independence of the trains is the null hypothesis. Theexpectation E(C(")) in this case is zero. To calculatethe corresponding variance - 2

N , we obtain the varianceof each term in the sum, yielding

C "( ) S1 t( ) R1 t( )–( ) S2 t "+( ) R2 t "+( )–( )t 0–=

0+=




where we have used the definition of a variance of atime series x is E(x2) – E(x)2, and the variance of theproduct of two independent series x and y is thereforeE(x2)E(y2) – E(x)2E(y)2 = (- 2

x + R2x )(- 2

y + R2y ) – R2

x R2y .

The bands plotted in Figures 9 and 11 correspondto the ±-N interval for each ". To facilitate compari-sons, all the covariograms have been normalized by afactor that makes the value of the autocovariogram for" = 0 equal to 1—that is, by dividing each series i by

(Si(t) – Ri(t))2. The same factor has been applied

correspondingly to the terms above in the calculationof - 2

N .

About the Author

Ezequiel A. Di Paolo has studied physics and mathematics at the University of BuenosAires. He graduated with an MSc in nuclear engineering in 1994 from the Insituto Bal-seiro (Argentina) and pursued his Ph.D. on behavioural and evolutionary models of socialcoordination at the University of Sussex. From 1999 to 2000 he worked as a post-docresearcher in evolutionary robotics at the German National Research Center for Informa-tion Processing (GMD), developing models of homeostatic adaptation for robots. He isnow a lecturer in evolutionary and adaptive systems at the University of Sussex and amember of the Centre for Computational Neuroscience and Robotics. His currentresearch interests are focused on using evolutionary methods to synthesize organicallyplausible robot controllers.

- N2 "( ) = - 1

2 t( )- 22 t "+( ) - 1

2 t( )R+ 2 t "+( )t 0–=

0+

+ R1 t( )- 22 t "+( ),

+



adaptive behavior - users.sussex.ac.ukusers.sussex.ac.uk/~ezequiel/stdp-robots.pdf · additional...

Documents