neural coding of a natural stimulus ensemble: uncovering

Neural coding of a natural stimulus ensemble:Uncovering information at sub–millisecond resolution

Ilya Nemenman,a Geoffrey D. Lewen,b William Bialekc and Rob R. de Ruyter van Steveninckd

aComputer, Computational and Statistical Science DivisionLos Alamos National Laboratory, Los Alamos, New Mexico 87545

bThe Hun School of Princeton, 176 Edgerstoune Road, Princeton, New Jersey 08540cJoseph Henry Laboratories of Physics and Lewis–Sigler Institute for Integrative Genomics

Princeton University, Princeton, New Jersey 08544dDepartment of Physics, Indiana University, Bloomington, Indiana 47405

(Dated: December 29, 2006)

Our knowledge of the sensory world is encoded by neurons in sequences of discrete, identical pulsestermed action potentials or spikes. There is persistent controversy about the extent to which theprecise timing of these spikes is relevant to the function of the brain. We revisit this issue, using themotion–sensitive neurons of the fly visual system as a test case. New experimental methods allowus to deliver more nearly natural visual stimuli, comparable to those which flies encounter in free,acrobatic flight, and new mathematical methods allow us to draw more reliable conclusions aboutthe information content of neural responses even when the set of possible responses is very large.We find that significant amounts of visual information are represented by details of the spike trainat millisecond and sub–millisecond precision, even though the sensory input has a correlation timeof ∼ 60ms; different patterns of spike timing represent distinct motion trajectories, and the absolutetiming of spikes points to particular features of these trajectories with high precision. Under thesenaturalistic conditions, the system continues to transmit more information at higher photon flux,even though individual photoreceptors are counting more than one million photons per second, andremoves redundancy in the stimulus to generate a more efficient neural code.

I. INTRODUCTION

Throughout the brain, information is represented bydiscrete electrical pulses termed action potentials or‘spikes’ [1]. For decades there has been controversy aboutthe extent to which the precise timing of these spikes issignificant: should we think of each spike arrival time ashaving meaning down to millisecond precision [2–4], ordoes the brain only keep track of the number of spikesoccurring in much larger windows of time? Is precise tim-ing relevant only in response to rapidly varying sensorystimuli, as in the auditory system [5], or can the brainconstruct specific patterns of spikes with a time resolu-tion much smaller than the time scales of the sensoryand motor signals that these patterns represent [3, 6]?Here we address these issues using the motion–sensitiveneurons of the fly visual system as a model [7].

We bring together new experimental methods for deliv-ering truly naturalistic visual inputs [8] and new math-ematical methods that allow us to draw more reliableinferences about the information content of spike trains[9–11]. We find that as we improve our time resolutionfor the analysis of spike trains from 2ms down to 0.2 mswe reveal nearly one–third more information about thetrajectory of visual motion. The natural stimuli used inour experiments have essentially no power above 30 Hz,so that the precision of spike timing is not a necessarycorrelate of the stimulus bandwidth; instead the differ-ent patterns of precise spike timing represent subtly dif-ferent trajectories chosen out of the stimulus ensemble.Further, despite the long correlation times of the sen-sory stimulus, segments of the neural response separated

by ∼ 30 ms provide essentially independent information,suggesting that the neural code in this system achievesdecorrelation [12, 13] in the time domain. This decor-relation is not evident in the time dependent spike ratealone, but the time scale for the independence of infor-mation does match the time scale on which visual motionsignals are used to guide behavior [16].

II. POSING THE PROBLEM

Flies exhibit a wide variety of visually guided behav-iors, of which perhaps the best known is the optomotorresponse, in which visual motion drives a compensatingtorque, stabilizing straight flight [14]. This system of-fers many advantages for the exploration of neural codingand computation: there is a small groups of identified,wide–field motion–sensitive neurons [7] that provide anobligatory link in the process [15], and it is possible tomake very long, stable recordings from these neurons aswell as to characterize in detail the signal and noise prop-erties of the photoreceptors that provide the input datafor the computation. In free flight, the trajectory of vi-sual motion is determined largely by the fly’s own motionthrough the world, and there is a large body of data onflight behavior under natural conditions [16–19], offeringus the opportunity to generate stimuli that approximatethose experienced in nature. But the natural visual worldof flies involves not only the enormous angular velocitiesassociated with acrobatic flight; natural light intensitiesand the dynamic range of their variations also are verylarge, and the fly’s compound eyes are stimulated over

2

1

0

-1

1.5 1.6 1.7 1.8 1.9 2

v(1

03 ˚/s

)

t (s)

75

50

25

tria

l

75

50

25

1.7621.75

tria

l

1.7621.75

FIG. 1: Neural responses to a natural stimulus ensemble. At left we show a schematic of the experimental setup (see Methodsfor details). A fly is immobilized with wax, its body in a plastic tube, with its head protruding. Through a small hole in theback of the head an electrode is inserted to record extracellular potentials from H1, a wide field neuron sensitive to horizontalmotion. This signal is amplified, fed through a slip ring system to a second stage amplifier and filter, and recorded by a dataacquisition card. In synchrony with its master timer clock, the DAQ card generates a 500 Hz frame clock signal. Every 2 ms,through a bidirectional parallel port, this clock triggers a successive read of a divisor value from a file stored in the stimuluslaptop computer. The Intel 8254 Counter/Timer chip uses this divisor value to divide down the pulse frequency of a freerunning 8MHz clock. In this way, in each successive 2ms interval, and in strict synchrony with the data taking clock, a definedand evenly spaced burst of pulses is produced. These pulses drive the stepper motor, generating the angular velocity signal. Abrief segment of this motion stimulus is shown in the top right panel, below which we plot a raster of action potentials fromH1 in response to 100 repetitions of this stimulus. At bottom we expand the scale to illustrate (at left) that individual spikesfollowing a transition from negative to positive velocity jitter from trial to trial by ∼ 1ms: the standard deviations of spiketimes shown here are 0.72ms for the first spike (·), 0.81ms for the second spike (◦), and 1.22ms for the third spike (×). Whenwe align the first spikes in this window, we see (at right) that the jitter of interspike intervals is even smaller, 0.21ms for thefirst interval and 0.69ms for the second interval. Our challenge is to quantify the information content of such precise responses.

more than 2π steradians; all of these features are diffi-cult to replicate in the laboratory [20]. As an alternative,we have moved our experiments outside [8], so that fliesexperience the scenes from the region in which they werecaught. We record from a single motion–sensitive cell,H1, while we rotate the fly along trajectories that aremodeled on the natural flight trajectories (see Methodsfor details). For other approaches to the delivery of nat-uralistic stimuli in this system see [21].

A schematic of our experiment, and an example ofthe data we obtain, are shown in Fig 1. We see qual-itatively that the responses to natural stimuli are veryreproducible, and we can point to specific features ofthe stimulus—such as reversals of motion direction—thatgenerate individual spikes and interspike intervals withbetter than millisecond precision. The challenge is toquantify these observations: do precise and reproduciblepatterns of spikes occur just at some isolated moments,or does looking at the spike train with higher time resolu-tion generally provide more information about the visual

input?Precise spike timing endows each neuron with a huge

“vocabulary” of responses [1, 2], but this potential ad-vantage in coding capacity creates challenges for exper-imental investigation. If we look with a time resolutionof τ = 1 ms, then in each bin of size τ we can see eitherzero or one spike; across the behaviorally relevant timescale of 30 ms the neural response thus can be describedas a 30–bit binary word, and there are 230, or roughlyone billion such words. Although some of these responsesnever occur (because of refractoriness) and others are ex-pected to occur only with low probability, it is clear thatif precise timing is important then neurons can generatemany more meaningfully distinguishable responses thanthe number that we can sample in realistic experiments.

Can we make progress on assessing the content andmeaning of neural responses even when we can’t sam-ple all of them? Some hope is provided by the classicalproblem of how many people need to be present in aroom before there is a reasonable chance that they share

3

-2 -1 0 1 2

1 4 7

N

P(p)=1

1.0

0.75

0.5

0.25

0

S0,

bits

1 4 7

N

〈(S´−S0)/δS〉

P(S)=1

FIG. 2: Systematic errors in entropy estimation. We considera coin with unknown probability p of coming up heads; fromN coin flips we try to estimate the entropy S = −p log2 p −(1−p) log2(1−p); see Methods for details of the calculations.At left, we make Bayesian estimates starting from the priorhypothesis that all values of p are equally likely, P(p) = 1.We show how the best estimate S′ differs from the true valueS0 when this deviation is measured in units of the estimatederror bar δS (posterior standard deviation). For small num-bers of samples, the best estimate is systematically in errorby more than two times the size of the error bar, so we wouldhave false confidence in a wrong answer, even at intermediatevalues of the entropy which are most relevant for real data. Atright, we repeat the same procedure but with a prior hypoth-esis that all possible value of the entropy are equally likely,P(S) = 1. Systematic errors still appear, but they are morenearly compatible with the error bars, even at small N , andespecially in the range of entropies which is relevant to ourexperiments.

a birthday. This number, N ∼ 23, is vastly less thanthe number of possible birthdays, K = 365. Turningthis argument around, if we didn’t know the number ofpossible birthdays we could estimate it by polling N peo-ple and checking the frequency of coincidences. Once Nis large enough to generate several coincidences we canget a pretty good estimate of K, and this happens whenN ∼

√K � K. Some years ago Ma proposed that this

coincidence counting method be used to estimate the en-tropy of physical systems from molecular dynamics orMonte Carlo simulations [22] (see also Ref [23]). If thesearguments could be generalized, it would become feasi-ble to estimate the entropy and information content ofneural responses even when experiments provide only asparse sampling of these responses. The results of Ref

[9, 10] provide such a generalization.To understand how the methods of Ref [9] generate

more accurate entropy estimates from small samples, itis useful to think about the simpler problem of flipping acoin under conditions where we don’t know the probabil-ity p that it will come up heads. One strategy is to countthe number of heads nH that we see after N flips, andidentify p = nH/N ; if we then use this “frequentist” ormaximum likelihood estimate to compute the entropy ofthe underlying distribution, it is well known that we willunderestimate the entropy systematically [24–26]. Al-ternatively, we could take a Bayesian approach and saythat a priori all values of 0 < p < 1 are equally likely;the standard methods of Bayesian estimation then willgenerate a mean and an error bar for our estimate of theentropy given N observations. As shown in Fig 2, thisprocedure actually leads to a systematic overestimate ofthe entropy in cases where the real entropy is not near itsmaximal value. More seriously, this systematic error islarger than the error bars that emerge from the Bayesiananalysis, so we would be falsely confident in the wronganswer.

Figure 2 also shows us that if we use a Bayesian ap-proach with the a priori hypothesis that all values of theentropy are equally likely, then (and as far as we know,only then) we find estimates such that the systematicerrors are comparable to or smaller than the error bars,even when we have seen only one sample. Thus the prob-lem of systematic errors in entropy estimation is not, asone might have thought, the problem of not having seenall the possibilities; the problem rather is that seeminglynatural and unbiased prior hypotheses about the natureof the underlying probabilities correspond to highly bi-ased hypotheses about the entropy itself, and this prob-lem gets much worse when we consider distributions overmany alternatives. The strategy of Ref [9] thus is toconstruct, at least approximately, a ‘flat prior’ on the en-tropy (see Methods for details). The results of Ref [11]demonstrate that this procedure actually works for bothsimulated and real spike trains, where ‘works’ means thatwe generate estimates that agree with the true entropywithin error bars even when the number of samples ismuch smaller than the number of possible responses. Asexpected from the discussion of the birthday problem,what is required for reliable estimation is that the num-ber of coincidences be significantly larger than one [10].

III. WORDS, ENTROPY AND INFORMATION

Armed with tools that allow us to estimate the entropyof neural responses, we first analyze a long experimentin which the fly experiences a continuous trajectory ofmotion with statistics modeled on those of natural flighttrajectories (Fig 3; see Methods for details). As shown inFig 4a, we examine segments of the response of durationT , and we break these segments into discrete bins withtime resolution τ . For sufficiently small τ each bin either

4

5 cm

1

2

345

6

1

2

3

4

5

6

a

0 0.2 0.4 0.6 0.8 1 1.2−4000

−2000

0

2000

4000

1 2

3

4

5 6

time (s)

V (

° /s)

b

Vk (°/s)

Vk+

1 (°/

s)

c

−2000 0 2000

−2000

0

2000

0

0.5

1

1.5x 10

−3

0 0.2 0.4 0.6 0.8 1 1.2−4000

−2000

0

2000

4000

time (s)

V (

° /s)

d

0 1000 2000 3000 40000

1

2

3

4

5

6x 10

−4

V (°/s)

p(V

) (1

/(°/s

))

velocity statistics

e

10−1

100

101

102

102

103

104

f (Hz)

PS

D(f

), (°

/s)2 /H

z

f

0 2 4 6 8 10

x 104

0

0.5

1

1.5x 10

−5

dV/dt (°/s2)

p(dV

/dt)

(1/

(°/s

2 ))

acceleration statistics

g

FIG. 3: Constructing a naturalistic stimulus. (a) Digitizedversion of original video tracking data by Land and Collett[16]. The panel shows traces of a leading fly (blue) and achasing fly (green). Successive points along the trajectoriesare recorded at 20ms intervals. Every tenth point along eachtrajectory is indicated by a number. From these traces weestimate rotational velocities of the body axis by calculatingthe angular change in orientation of the trajectory from onepoint in the sequence to the next, and dividing by 20 ms. Theresult of this calculation for the leading fly is shown in panel(b). (c) From these data (on both flies) we construct a jointdistribution, P (Vk, Vk+1), of successive velocities taken 20msapart. (d) Short sample of a trajectory constructed using thedistribution in (c) as a Markov process, and then interpolat-ing the velocity trace to 2ms resolution. (e) Probability den-sities of angular velocity generated from this Markov process(black dashed line) and scaled down by a factor of two (blackline) to avoid destabilizing the experiment; distributions aresymmetric and we show only positive velocities. For compar-ison we show (red line) the distribution of angular velocitiesrecorded for head motion of Calliphora during episodes of sac-cadic turning [19]. (f) Power spectrum of synthesized velocitysignal, demonstrating the absence of power above 30Hz. (g)As in (e) but for the accelerations. Note that the distributionof our synthesized and scaled signal is surprisingly close tothat found for saccadic head motions.

has one or zero spikes, and hence the response becomes abinary word with T/τ bits, while in the opposite limit wecan let τ = T and then the response is the total numberof spikes in a window of size T ; for intermediate valuesof τ the responses are multi–letter words, but with largerthan binary alphabet when more than one spike can occurwithin a single bin. An interesting feature of these words

is that they occur with a probability distribution similarto the distribution of words in English (Zipf’s law; Fig4b). This Zipf–like behavior emerges only for T > 20 ms,and was not observed in experiments with less natural,noisy stimuli [4].

With a fixed value of T , improving our time resolution(smaller τ) means that we distinguish more alternatives,increasing the “vocabulary” of the neuron. Mathemati-cally this means that the entropy S(T, τ) of the neuralresponses is larger, corresponding to a larger capacityfor carrying information. This is shown quantitatively inFig 4c, where we plot the entropy rate, S(T, τ)/T . Thequestion of whether precise spike timing is important inthe neural code is precisely the question of whether thiscapacity is used by the system to carry information [2, 4].

To estimate the information content of the neural re-sponses, we follow the strategy of Refs [4, 27]. Roughlyspeaking, the information content of the ‘words’ gener-ated by the neuron is less than the total size of the neuralvocabulary because there is some randomness or noise inthe association of words with sensory stimuli. To quan-tify this noise we choose a five second segment of thestimulus, and then repeat this stimulus 100 times. Ateach moment 0 < t < 5 s in the cycle of the repeatedstimulus, we can look across the one hundred trials tosample the different possible responses to the same in-put, and with the same mathematical methods as be-fore we use these samples to estimate the ‘noise entropy’Sn(T, τ |t) in this ‘slice’ of responses. The informationwhich the responses carry about the stimulus then isgiven by I(T, τ) = S(T, τ) − 〈Sn(T, τ |T )〉t, where 〈· · ·〉tdenotes an average over time t, which implicitly is anaverage over stimuli. It is convenient to express this asan information rate Rinfo(T, τ) = I(T, τ)/T , and this iswhat we show in Fig 4d, with T = 25 ms chosen to reflectthe time scale of behavioral decisions [16].

The striking feature of Fig 4d is the growth of infor-mation rate with time resolution. We emphasize thatthis measurement is made under conditions comparableto those which the fly encounters in nature—outdoors,in natural light, moving along trajectories with statisticssimilar to those observed in free flight. Thus, under theseconditions, we can conclude that the fly’s visual systemcarries information about motion in the timing of spikesdown to sub–millisecond resolution. Quantitatively, in-formation rates double as we increase our time resolutionfrom τ = 25 ms to τ = 0.2 ms, and the final ∼ 30% ofthis increase occurs between τ = 2 ms and τ = 0.2 ms.In the behaviorally relevant time windows [16], this 30%extra information corresponds to a almost a full bit fromthis one cell, which would provide the fly with the abil-ity to distinguish reliably among twice as many differentmotion trajectories.

5

100

10-2

10-4

10-6

106105104103102101100

prob

abili

ty

rank

bT = 50 msT = 40 msT = 20 ms

80

100

120

140

160

180

1 10

info

rmat

ion

rate

(bi

ts/s

)

τ (ms)

dT = 25 ms

100

200

300

400

500

600

1 10

entr

opy

rate

(bi

ts/s

)

τ (ms)

cT = 25 ms

Poissonrefractory

-800

-400

0

400

800

10 20 30 40 50 60

v(◦

/s)

a

τ, (ms)

τ = 32 ms

τ = 8 ms

τ = 2 ms

T = 32 ms T = 32 ms

11 9

4 3 3 1 3 4 2 0

1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0

FIG. 4: Words, entropy and information in the neural response to natural signals. (a) Schematic showing how we convertthe sequence of action potentials into discrete ‘words’ [4, 27]. At the top we show the stimulus and spike arrival times (reddots) in a 64ms segment of the experiment. We treat this as two successive segments of duration T = 32, ms, and divide thesesegments into bins of duration τ = 2, 8, or 32 ms. For sufficiently small τ (here, τ = 2ms), each bin contains either zero or onespike, and so each neural response becomes a binary word with T/τ bits; larger values of τ generate larger alphabets, until atτ = T the response of the neuron is just the spike count in the window of duration T . Note that the words are shown here asnon–overlapping; this is just for graphical convenience. (b) The distribution of words with τ = 1ms, for various values of T ;words are plotted in rank order. We see that, for large T (T = 40 or 50ms) but not for small T (T = 20ms), the distributionof words has a large segment in which the probability of a word is P ∝ 1/rankα, corresponding to a straight line on this doublelogarithmic plot. Similar behavior is observed for words in English, with α = 1, which we show for comparison (solid line); thisis sometimes referred to as Zipf’s law [28]. (c) The entropy of a T = 25ms segment of the spike train, as a function of the timeresolution τ with which we record the spikes. We plot this as an entropy rate, S(T, τ)/T , in bits/s; this value of T is chosenbecause this is the time scale on which visual motion drives motor behavior [16]. For comparison we show the theoretical results(valid at small τ) for a Poisson process [1], and a Poisson process with a refractory period [11], with spike rates and refractoryperiods matched to the data. Note that the real spike train has significantly less entropy than do these simple models. In Ref[11] we showed that our estimation methods can recover the correct results for these models using data sets comparable in sizeto the one analyzed here; thus our conclusion that real entropies are smaller cannot be the result of undersampling. Error barsare smaller than the data points. (d) The information content of T = 25ms words, as a function of time resolution τ ; again weplot this as a rate Rinfo(T, τ) = I(T, τ)/T , in bits/s.

IV. WHAT DO THE WORDS MEAN?

The information rate tells us how much we can learnabout the sensory inputs by examining the neural re-sponse, but it doesn’t tell us what we learn. In partic-ular, we would like to make explicit the nature of theextra information that emerges as we increase our timeresolution from τ = 2 ms to τ = 0.2 ms. To do this, we

look at particular “words” in a segment of the neural re-sponse, as shown in Fig. 5, and then examine the motiontrajectories that correspond to these words [29]. For sim-plicity, we consider all responses that have two spikes insuccessive 2 ms bins, that is the pattern 11 when seen atτ = 2 ms resolution. When we improve our time reso-lution to τ = 0.2 ms, some of these responses turn outto be of the form 10000000000000000001, while at theother extreme some of the responses have the two spikes

6

-600

-400

-200

0

200

400

600

800

-60 -40 -20 0 20

v(◦

/s)

t (ms)

2 ms2 ms

-600

-400

-200

0

200

400

600

800

-60 -40 -20 0 20

v(◦

/s)

t (ms)

d´ = 0.10

d´ = 0.63

2 ms2 ms

Equivalent Gaussian discriminability

FIG. 5: Response conditional ensembles [29]. We consider five different neural responses, all of which are identical when viewedat τ = 2ms resolution, corresponding to the pattern 11, spikes in two successive bins. At left, we consider responses which,at higher time resolution, correspond to different interspike intervals. At right, the interspike interval is fixed but higher timeresolution reveals that the absolute spike arrival times differ. In each case we compute the median motion trajectory conditionalon the high time resolution response (lines) and we indicate the width of the distribution with bars that range plus and minusone quartile around the median. It is clear that changes in interspike interval produce changes in the distribution of stimuluswaveform that are discriminable, since the mid–quartiles do not overlap. Changes in absolute timing are more subtle, and sowe estimate the conditional distributions of velocity at each moment in time using the methods of Ref [30], compute the overlapof these distributions, and convert the result into the equivalent signal–to–noise ratio d′ for discrimination against Gaussiannoise [31]. Note that we compute this discriminability using single points in time; d′ values based on extended segments of thewaveforms would be even higher.

essentially as close as is possible given the refractory pe-riod, 00000100000000100000. Remarkably, as we sweepthrough these subtly different patterns—which all havethe same average spike arrival time but different inter-spike intervals—the average velocity trajectory changesform qualitatively, from a smooth “on” (negative to pos-itive velocity) transition, to a prolonged period of posi-tive velocity, to a more complex waveform with off andon transitions in succession. Examining more closely thedistribution of waveforms conditional on the different re-sponses, we see that these differences among mean wave-forms are in fact discriminable. Thus, variations in inter-spike interval on the millisecond or sub–millisecond scalerepresent significantly different stimulus trajectories.

A second axis along which we can ask about the na-ture of the extra information at high time resolutionconcerns the absolute timing of spikes. As an exam-ple, responses which at τ = 2 ms resolution are of theform 11 can be unpacked at τ = 0.2 ms resolution togive patterns ranging from 01000000001000000000 to00000000010000000010, all with the same interspike in-terval but with different absolute arrival times. As shownin Fig 5, all of these responses code for motion trajec-tories with two zero crossings, but the times of thesezero crossings shift as the spike arrival times shift. Thus,whereas the times between spikes represent the shapeof the waveform, the absolute arrival time of the spikesmark, with some latency, the time at which a specific fea-ture of the waveform occurs, in this case a zero crossing.Again we find that millisecond and sub–millisecond scaleshifts generate discriminable differences.

The idea that sub–millisecond timing of action po-tentials could carry significant information is not new,

but the clearest evidence comes from systems in whichthe dynamics of the stimulus itself has significant sub–millisecond structure, as in hearing and electroreception[5, 32]. Even for H1, experiments demonstrating the im-portance of spike timing at the ∼ 2 ms level [4, 33] couldbe criticized on the grounds that the stimuli had unnatu-rally rapid variations. It thus is important to emphasizethat, in these experiments, H1 does not achieve millisec-ond precision simply because the input has a bandwidthof kiloHertz; in fact, the stimulus has a correlation timeof ∼ 60 ms (Fig 6), and 99.9% of the stimulus power iscontained below 30Hz (Fig 3f).

V. REDUNDANCY REDUCTION

The long correlation time of these naturalistic stim-uli also raises questions about redundancy—while eachspike and interspike interval can be highly informative,does the long correlation time of the stimulus inevitablymean that successive spikes carry redundant informationabout essentially the same value of the instantaneous ve-locity? Certainly on very short time scales this is true:although Rinfo(T, τ) actually increases at small T , sincelarger segments of the response reveal more informativepatterns of several spikes [33, 34], it does decrease atlarger T , a sign of redundancy. On the other hand, theapproach to a constant information rate happens veryrapidly: we can measure the redundancy on time scaleT by computing ΥI(T, τ) = 2I(T, τ)/I(2T, τ)− 1, whereΥ = 0 means that successive windows of size T providecompletely independent information, and Υ = 1 meansthat they are completely redundant. As shown in Fig

7

0.1

1

0 20 40 60 80 100

Υ

T, ms

Υv

Υr

ΥI

FIG. 6: Redundancy reduction in the time domain. We mea-sure the redundancy ΥI(T, τ) (points with error bars) be-tween words of length T in the neural response, as explainedin the text. To allow exploration of large T we work at a timeresolution τ = 3ms. The redundancy can be compared to cor-relations in the stimulus Υv = 〈v(t+T )v(t)〉/〈v2〉 (dotted line)or correlations in the spike rate Υr = 〈δr(t+T )δr(t)〉/〈(δr)2〉(dashed line). Note that the redundancy decays rapidly—weshow an exponential fit with a time constant of 17.3ms. Incontrast, the correlations in the stimulus the firing rate decaymuch more slowly—the solid line, for comparison, shows anexponential decay with a time constant of 53.4ms. Correla-tions in spike rate are calculated from a separate experimenton the same cell, with 200 repetitions of a 10 s stimulus drawnfrom the same distribution, that generates more accurate es-timates of r(t).

6, ΥI(T, τ) decays rapidly, on a time scale of less than20 ms. In contrast, correlations in the stimulus decaymuch more slowly, on the ∼ 60 ms time scale. Further,we can compute at each moment of time the spike rater(t), and this has a correlation time comparable to thestimulus itself, suggesting that the decorrelation of in-formation is more subtle than a simple filtering of thestimulus.

VI. BIT RATES AND PHOTON COUNTINGRATES

The ability of the fly’s visual system to mark featuresof the stimulus with millisecond precision, even when thestimulus correlation time is ∼ 60 ms, depends on havingaccess to a representation of visual motion with very highsignal–to–noise ratio. Previous work has suggested thatthis system can estimate motion with a precision closeto the limits set by noise in the photoreceptors [35, 36],which is dominated by photon shot noise [37, 38]. Thepresent experiments, however, are done under very dif-ferent conditions: velocities of motion are much larger,the fly’s eye is stimulated over a much larger area, andlight intensities outdoors are much larger than gener-ated by laboratory displays. During the course of ourexperiments we monitor the light intensity at zenith, us-

ing a detector matched to the properties of the fly pho-toreceptors (see Methods); from these measurements weestimate that the mean light intensity corresponds to1.56× 106 photon/s per photoreceptor, which is near thelimit of the photoreceptor’s dynamic range for photoncounting. Is it possible that photon counting statisticsstill are relevant even at these high rates?

Because the experiments are done outdoors, there aresmall fluctuations in light intensity from trial to trial asclouds drift by and obscure the sun. Although the dy-namic range of these fluctuations is less than a factortwo, the arrival times of individual spikes (e.g., the “firstspike” after t = 1.75 s in Fig 1) have correlation coeffi-cients of up to ρ = −0.42 with the light intensity, withthe negative sign indicating that higher light intensitieslead to earlier spikes. One might see this effect as a fail-ure of the system to adapt to the overall light intensity,but it also suggests that some of what we have callednoise really represents a response to trial–by–trial vari-ations in stimulus conditions. Indeed, a correlation be-tween light intensity and spike time means that the noiseentropy Sn(T, τ |t) in windows which contain these spikesis smaller than we have estimated because some of thevariability can be ascribed to stimulus variation.

More subtly, if photon shot noise is relevant, we expectthat on trials with higher light intensity the neuron willactually convey more information about the trajectory ofmotion. We emphasize that this is a delicate question.To begin, the differences in light intensity are small, andwe expect (at most) proportionately small effects. Fur-ther, as the light intensity increases, the total spike rateincreases, and this increases both the total entropy andthe noise entropy. To ask if the system uses the more re-liable signal at higher light intensities to convey more in-formation we have to determine which of these increasesis larger.

To test the effects of light intensity on informationtransmission (see Methods for details), we divide the tri-als into halves based on the average light intensity overthe trial, and we try to estimate the information rates inboth halves; the two groups of trials differ by just 3% intheir median light intensities. Since cutting the numberof trials in half makes our sampling problems much worse,we focus on short segments of the response (T = 6ms) athigh time resolution (τ = 0.2 ms); note that these are still“words” with 30 letters. For this case we find that for thetrials with higher light intensities the information aboutthe motion stimulus is larger by ∆ = 0.0204±0.0108 bits,which is small but significant at the 94% confidence level.We find differences with the same sign for all accessiblecombinations of T and τ , and the overall significance ofthe difference thus is much larger. Note that since we areanalyzing T = 6ms windows, this difference correspondsto ∆R ∼ 3 bits/s, 1−2% of the total (cf Fig 4). Thus evenat rates of more than one million photons per second perreceptor cell, small increases in photon flux produce sig-nificant changes in the transmission of information aboutthe visual input.

8

VII. CONCLUSION

To summarize, we have found that under natural stim-ulus conditions the fly visual system generates spikes andinterspike intervals with extraordinary temporal preci-sion. As a consequence, the neural response carries asubstantial amount of information that is available onlyat sub–millisecond time resolution. At this high resolu-tion, absolute spike timing is informative about the timeat which particular stimulus features occur, while differ-ent interspike intervals provide a rich representation ofdistinguishable stimulus features. These results providea clear demonstration that the visual system uses sub–millisecond timing to provide a richer representation ofthe natural sensory world, at least in this corner of thefly’s brain. In addition, the data provide support for theidea that the system performs efficiently both in the tasksof estimation and coding, making use of the extra signal–to–noise provided by increased photon flux and reducingthe redundancy of the stimulus as it is transformed intospikes. Finally, we note that our ability to reach theseconclusions depends not just on new experimental meth-ods that allow us to generate truly naturalistic stimuli [8],but critically on new mathematical methods that allowus to analyze neural responses quantitatively even whenit is impossible for us to sample the distribution of re-sponses exhaustively [9, 11]. We expect that these sortsof mathematical tools will become even more critical forneuroscience in the future.

Acknowledgments

We thank J Miller, KD Miller and the participantsin the NIPS 2003 Workshop on Entropy estimation forhelpful discussions. This work was supported in part bygrants from the National Science Foundation (PHY99–07949, ECS–0425850, IIS–0423039), the Department ofEnergy under contract DE–AC52–06NA25396, and theSwartz Foundation. Early stages of this work were donewhen all the authors were at the NEC Research Institute.IN thanks the Kavli Institute for Theoretical Physics andColumbia University for their support during this work,and WB thanks the Center for Theoretical Neuroscienceat Columbia University for its hospitality.

APPENDIX A: METHODS

Neural recording and stimulus generation. H1was recorded extracellularly by a short (12mm shanklength) tungsten electrode (FHC). The signal was pream-plified by a differential bandpass amplifier based on theINA111. After amplification by a second stage sampleswere digitized at 10 kHz by an AD converter (NationalInstruments DAQCard–AI–16E–4, mounted in a Field-works FW5066P ruggedized laptop). In off line analysis,

the analog signal was digitally filtered by a template de-rived from the average spike waveform. Spikes were thentime stamped by interpolating threshold crossing times.The ultimate precision of this procedure is limited by thesignal to noise ratio in the recording; for typical condi-tions this error is estimated to be 50− 100µs. Note thatwe analyze spike trains down to a precision of τ = 200µs,so that some saturation of information at this high timeresolution may actually result from instrumental limi-tations. The experiments were performed outside in awooded environment, with the fly mounted on a step-per motor with vertical axis. The speed of the steppermotor was under computer control, and could be set at2 ms intervals. The DAQ card generates a clock signalat 500 Hz in synchrony with the master clock which cal-ibrates the neural recording. As explained in the legendto Fig 1, each tick of the clock drives the stepper motorthrough an amount determined by reading the stimulusfile stored on a dedicated computer. The motor (SIG–Positec RDM566/50 stepper motor, 104 pulses per rev-olution) is driven by a controller (SIG–Positec DivistepD331.1), which in turn receives pulses at a frequency di-vided down from a free running 8 MHz clock; the stimulusvelocity is represented by the divisor for the pulse fre-quency. In this way, the stepper motor is driven in each2 ms period, in strict synchrony with the data acquisitionclock, by steps that are evenly spaced. This design waschosen to minimize the effects of discrete steps and tomaximize the reliability of all timing measurements. Tostabilize temperature the setup was enclosed by a trans-parent plexiglass cylinder (radius 15 cm, height 28 cm),with a transparent plexiglass lid.

Monitoring light intensity and controlling tem-perature. The air temperature in the experimental en-closure was regulated by a Peltier element fitted withheat vanes and fans on both sides for efficient heat dis-persal, driven by a custom built feedback controller. Thetemperature could be set over a range from approxi-mately five degrees below to fifteen degrees above am-bient temperature, and the controller stabilized temper-ature over this range to within about a degree. In theexperiments described here, temperature was 23 ± 1◦ C.An overall measure of light intensity was obtained bymonitoring the current of a photodiode (Hamamatsu)enclosed in a diffusing ping pong ball. The photodiodesignal was amplified by a logarithmic amplifier operatingover five decades. The photodiode was located ∼ 50 cmfrom the fly, and in the experiments the setup was al-ways placed in the shade. The photodiode measurementwas intended primarily to get a rough impression of rela-tive light intensity fluctuations. However, to relate thesemeasurements to outside light levels, before the start ofeach experiment a separate calibration measurement ofzenith radiance was taken using a calibrated light inten-sity meter. To relate this measurement to fly physiology,the radiance reading was converted to an estimated ef-fective fly photoreceptor photon rate. The reading of thephotodiode was roughly proportional to the zenith inten-

9

sity reading, with a proportionality factor determined bythe placement of the setup and the time of day. To ob-tain a practical rule of thumb, the photodiode readingswere converted to equivalent zenith photon flux values,using the current to zenith intensity conversion factor es-tablished at the beginning of the experiment. During theexperiments the photodiode current was sampled at 1 sintervals.

Repeated stimuli. In their now classical experi-ments, Land and Collett measured the trajectories of fliesin free flight [16]; in particular they reported the angularposition (orientation) of the fly vs time, from which wecan compute the angular velocity v(t). The short seg-ments of individual trajectories shown in the publisheddata have a net drift in angle, so we include both themeasured v(t) and −v(t) as parts of the stimulus. Weuse the trajectories for the two different flies in Fig 4of Ref [16], and graft all four segments together, withsome zero padding to avoid dramatic jumps in velocity,generating a stimulus that is 5 seconds in duration andhas zero drift so that repetition of the angular velocityvs time also repeats the angular position vs time. SinceLand and Collett report data every 20ms, we interpolateto generate a signal that drives the stepper motor at 2msresolution; interpolation is done using the MATLAB rou-tine interp, which preserves the bandlimited nature ofthe original signal and hence does not distort the powerspectrum.

Nonrepeated stimulus. To analyze the full entropyof neural responses, it is useful to have a stimulus thatis not repeated. We would like such a stimulus to matchthe statistical properties of natural stimulus segments de-scribed above. To do this, we estimate the probabilitydistribution P [v(t+ ∆t)|v(t)] from the published trajec-tories, where ∆t = 20 ms is the time resolution, and thenuse this as the transition matrix of a Markov processfrom which we can generate arbitrarily long samples; ournonrepeated experiment is based on a 990 s trajectorydrawn in this way. The resulting velocity trajectorieswill, in particular, have exactly the same distributions ofvelocity and acceleration as in the observed free flight tra-jectories. Although the real trajectories are not exactlyMarkovian, our Markovian approximation also capturesother features of the natural signals, for example gener-ating a similar number of velocity reversals per second.Again we interpolate these trajectories to obtain a stim-ulus at 2ms resolution.

Entropy estimation in a model problem. Theproblem in Fig 2 is that of a potentially biased coin.Heads appear with probability p, and the probability ofobserving n heads out of N flips is

PN (n|p) ∝ pn(1− p)N−n. (A1)

If we observe n and try to infer p, we use Bayes’ rule toconstruct

PN (p|n) = PN (n|p) P(p)PN (n)

∝ P(p)pn(1− p)N−n, (A2)

where P(p) is our prior and PN (n) =∫ 1

0dpPN (n|p)P(p).

Given this posterior distribution of p we can calculate thedistribution of the entropy,

S(p) = −p log2(p)− (1− p) log2(1− p). (A3)

We proceed as usual to define a function g(S) that is theinverse of S(p), that is g(S(p)) = p; since p and 1 − pgive the same value of S, we choose 0 < g ≤ 0.5 and letg̃(S) = 1− g(S). Then we have

PN (S|n) = [PN (p = g(S)|n) + PN (p = g̃(S)|n)]∣∣∣∣dg(S)dS

∣∣∣∣.(A4)

From this distribution, we can estimate a mean S̄N (n)and a variance σ2(n,N) in the usual way. What interestsus is the difference between S̄N (n) and the true entropyS(p) associated with the actual value of p characterizingthe coin; it makes sense to measure this difference in unitsof the standard deviation δS(n,N). Thus we compute

〈(S′ − S0)/δS〉 ≡N∑

n=0

PN (n|p)[S̄N (n)− S(p)δS(n,N)

], (A5)

and this is what is shown in Fig 2. We consider two cases.First, a flat prior on p itself, so that P(p) = 1. Second,a flat prior on the entropy, which corresponds to

P(p) =12

∣∣∣∣dS(p)dp

∣∣∣∣ (A6)

=12

∣∣∣∣ log2

(1− p

p

) ∣∣∣∣. (A7)

Note that this prior is (gently) diverging near the limitsp = 0 and p = 1, but all the expectation values that weare interested in are finite.

Entropy estimation: General features. Our dis-cussion here follows Refs [9, 11] very closely. Consider aset of possible neural responses labeled by i = 1, 2, · · · ,K.The probability distribution of these responses, which wedon’t know, is given by p ≡ {pi}. A well studied fam-ily of priors on this distribution is the Dirichlet prior,parameterized by β,

Pβ(p) =1

Z(β;K)

[K∏i=1

pβ−1i

]δ

(K∑i=1

pi − 1

). (A8)

Maximum likelihood estimation, which identifies proba-bilities with frequencies of occurrence, is obtained in thelimit β → 0, while β → 1 is the natural “uniform” prior.When K becomes large, almost any p chosen out of thisdistribution has an entropy S = −

∑i pi log2 pi very close

to the mean value,

S̄(β;K) = ψ0(Kβ + 1)− ψ0(β + 1), (A9)

where ψ0(x) = d log2 Γ(x)/dx, and Γ(x) is the gammafunction. We therefore construct a prior which is ap-proximately flat on the entropy itself by a continuous

10

superposition of Dirichlet priors,

P(p) =∫dβ∂S̄(β;K)

∂βPβ(p), (A10)

and we then use this prior to perform standard Bayesianinference. In particular, if we observe each alternative ito occur ni times in our experiment, then

P ({ni}|p) ∝K∏i=1

pnii , (A11)

and hence by Bayes’ rule

P (p|{ni}) ∝

[K∏i=1

pnii

]P(p). (A12)

Once we normalize this distribution we can integrate overall p to give the mean and the variance of the entropygiven our data {ni}. In fact, all the integrals can bedone analytically except for the integral over β. Soft-ware implementation of this approach is available fromhttp://nsb-entropy.sourceforge.net/. This basicstrategy can be supplemented in cases where we haveprior knowledge about the entropies. In particular, whenwe are trying to estimate entropy in “words” of increas-ing duration T , we know that S(T, τ) ≤ S(T ′, τ)+S(T −T ′, τ) for any T ′, and thus it makes sense to constrainthe priors at T using the results from smaller windows,although this is not critical to our results. We obtain re-sults at all integer values of T/τ for which our estimationprocedure is stable (see below) and use cubic splines tointerpolate to non–integer values as needed.

Entropy estimation: Details for total entropy.There are two critical challenges to estimating the en-tropy of neural responses to natural signals. First, theoverall distribution of (long) words has a Zipf–like struc-ture (Fig 4b), which is troublesome for most estimationstrategies and leads to biases dependent on sample size.Second, the long correlation times in the stimulus meanthat, successive words ‘spoken’ by the neuron are stronglycorrelated, and hence it is impossible to guarantee thatwe have independent samples, as assumed implicitly inEq (A11). We can tame the long tails in the probabilitydistribution by partitioning the space of responses, esti-mating entropies within each partition, and then usingthe additivity of the entropy to estimate the total. Weinvestigate a variety of different partitions, including (a)no spikes vs. all other words, (b) no spikes, all wordswith one spike, all words with two spikes, etc., (c) nospikes, all words with frequencies of over 1000, and allother words. Further, for each partitioning, we follow [4]and evaluate S(T, τ) for data sets of different sizes αN ,0 < α ≤ 1. Note that by choosing fractions of the data indifferent ways we can separate the problems of correla-tion and sample size. Thus, to check that our estimatesare stable as a function of sample size, we choose con-tiguous segments of experiment, while to check for the

impact of correlations we can ‘dilute’ our sampling sothat there are longer and longer intervals between words.Obviously there are limits to this exploration (one can-not access large, very dilute samples), but as far as wecould explore the impact of correlations on our estimatesis negligible once the samples sizes are sufficiently large.For the effects of sample size we look for behavior of theform S(α) = S∞+S1/α+S2/α

2 and take S∞ as our esti-mate of S(T, τ), as in Ref [4]. For all partitions in whichthe the most common word (silence) is separated fromthe rest, these extrapolated estimates agree and indicatenegligible biases at all combinations of τ and T for whichthe 1/α2 term is negligible compared to the 1/α (that is,τ ≥ 0.5 ms at T ≤ 25 ms). For smaller τ , estimation failsat progressively smaller T , and to obtain an entropy ratefor large T we extrapolate to τ/T → 0 using

1TS(T, τ) = S(τ) +A(τ/T ) +B(τ/T )2, (A13)

where S(τ) is our best estimate of the entropy rate at res-olution τ . All fits were of high quality, and the resultingerror bars on the total entropy are negligible comparedto those for the noise entropy. In principle, we could bemissing features of the code which appear only when weuse high resolution for very long words, but this unlikelyscenario is almost impossible to exclude by any means.

Entropy estimation: Details for noise entropy.Putting error bars on the noise entropy averaged overtime is more difficult because these should include a con-tribution from the fact that our finite sample over timeis only an approximation to the true average over the un-derlying distribution of stimuli. Most seriously, the en-tropies are very different in epochs that have net positiveor negative velocities. Because of the way that we con-structed the repeated stimulus, v(t) = −v(t + T0), withT0 = 2.5 s; thus if we compute Sn(T, τ |t)+Sn(T, τ |t+T1)with T1 ≈ T0, this fluctuates much less as a function oft than the entropy in an individual slice. Because ourstimulus has zero mean, every slice has a partner un-der this shift, and the small difference between T0 andT1 takes account of the difference in latency betweenresponses to positive and negative inputs. A plot ofSn(T, τ |t) + Sn(T, τ |t + T1) vs time t has clear dips attimes corresponding to zero crossings of the stimulus,and we partition the data at these points. We deriveerror bars on the mean noise entropy 〈Sn(T, τ |t)〉t by abootstrap–like method, in which we construct samplesby randomly sampling with replacements from amongthese blocks, jittering the individual entropies Sn(T, τ |t)by the errors that emerge from the Bayesian analysis ofindividual slices. As with the total entropy we extrapo-late to otherwise inaccessible combinations of T and τ ,now writing

1T〈Sn(T, τ |t)〉t = Sn(τ) +A(τ/T ) +B(τ/T )2

+C cos(2πT/τ0) (A14)

and fitting by weighted regression. Note that results atdifferent T but the same value of τ are strongly corre-

11

lated, and so the computation of χ2 is done using the full(non–diagonal) covariance matrix. The periodic term isimportant at small τ , where we can see structure as thewindow size T crosses integer multiples of the averageinterspike interval, τ0 = 2.53 ms. Error estimates emergefrom the regression in the standard way, and all fits hadχ2 ∼ 1 per degree of freedom.

Impact of photon flux on information rates.Since there are no responses to repeated and unrepeatedstimuli recorded at exactly the same illuminations, weuse the data from the repeated experiment to evaluateboth the noise entropy and the total entropy. We expectthat we are looking for small differences, so we tightenour analysis by discarding the first two trials, which aresignificantly different from all the rest (presumably be-cause adaptation is not complete), as well as excludingthe epochs in which the stimulus was padded with ze-roes. The remaining 98 trials are split into two groupsof 49 trials each with the highest and the lowest ambi-

ent light levels. We can then estimate the total entropyS(h,l)(T, τ) for the high (h) and low (l) intensity groupsof trials, and similarly for the noise entropy in each sliceat time t, S(h,l)

n (T, τ |t). As noted above, assigning errorbars is clearer once we form quantities that are balancedacross positive and negative velocities, and we do thisdirectly for the difference in noise entropies,

∆Sn(T, τ ; t) = [S(h)n (T, τ |t) + S(h)

n (T, τ |t+ T1)]

−[S(l)n (T, τ |t) + S(l)

n (T, τ |t+ T ′1)],(A15)

where we allow for a small difference in latencies (T1−T ′1)between the groups of trials at different intensities. Wefind that ∆Sn(T, τ ; t) has a unimodal distribution and acorrelation time of ∼ 1.4 ms, which allows for an easyevaluation of the estimation error.

[1] F Rieke, D Warland, R de Ruyter van Steveninck & WBialek Spikes: Exploring the Neural Code (MIT Press,Cambridge, 1997).

[2] D MacKay & WS McCulloch, The limiting informationcapacity of a neuronal link. Bull Math Biophys 14, 127–135 (1952).

[3] M Abeles, Local Cortical Circuits: An Electrophysiologi-cal Study (Springer–Verlag, Berlin, 1982).

[4] SP Strong, R Koberle, RR de Ruyter van Steveninck &W Bialek, Entropy and information in neural spike trains.Phys Rev Lett 80, 197–200 (1998).

[5] CE Carr, Processing of temporal information in thebrain. Ann Rev Neurosci 16, 223–243 (1993).

[6] JJ Hopfield, Pattern recognition computation using ac-tion potential timing for stimulus representation. Nature376, 33–36 (1995).

[7] K Hausen, The lobular complex of the fly: Structure,function and significance in behavior. In Photorecep-tion and Vision in Invertebrates, M Ali, ed, pp 523–559(Plenum, New York, 1984).

[8] GD Lewen, W Bialek & RR de Ruyter van Steveninck,Neural coding of naturalistic motion stimuli. Network 12,317–329 (2001); physics/0103088.

[9] I Nemenman, F Shafee & W Bialek, Entropy and infer-ence, revisited. In Advances in Neural Information Pro-cessing Systems 14, TG Dietterich, S Becker & Z Ghara-mani, eds, pp 471–478 (MIT Press, Cambridge, 2002);physics/0108025.

[10] I Nemenman, Inference of entropies of discrete randomvariables with unknown cardinalities; physics/0207009(2002).

[11] I Nemenman, W Bialek & R de Ruyter van Steveninck,Entropy and information in neural spike trains: Progresson the sampling problem. Phys Rev E 69, 056111 (2004);physics/0306063.

[12] HB Barlow, Sensory mechanisms, the reduction of redun-dancy and intelligence. In Proceedings of the Symposiumon the Mechanization of Thought Processes, Vol 2, DVBlake & AM Uttley, eds, pp 537–574 (HM Stationery

Office, London, 1959).[13] HB Barlow, Possible principles underlying the transfor-

mation of sensory messages. In Sensory Communication,W Rosenblith, ed, pp 217–234 (MIT Press, Cambridge,1961).

[14] W Reichardt & T Poggio, Visual control of orientationbehavior in the fly. Part I: A quantitative analysis. Q RevBiophys 9, 311–375 (1976).

[15] K Hausen & C Wehrhahn, Microsurgical lesions of hor-izontal cells changes optomotor yaw responses in theblowfly Calliphora erythrocephela. Proc R Soc Lond SerB 219, 211–216 (1983).

[16] MF Land & TS Collett, Chasing behavior of houseflies(Fannia canicularis). A description and analysis. J CompPhysiol 89, 331–357 (1974).

[17] H Wagner, Flight performance and visual control of flightin the free–flying house fly (Musca domestica L.). I–III.Phil Trans R Soc Ser B 312, 527–595 (1986).

[18] C Schilstra & JH van Hateren, Blowfly flight and opticflow. I. Thorax kinematics and flight dynamics. J ExpBiol 202, 1481–1490 (1999).

[19] JH van Hateren & C Schilstra, Blowfly flight and opticflow. II. Head movements during flight. J Exp Biol 202,1491–1500 (1999).

[20] R de Ruyter van Steveninck, A Borst & W Bialek,Real time encoding of motion: Answerable questionsand questionable answers from the fly’s visual system.In Processing Visual Motion in the Real World: A Sur-vey of Computational, Neural and Ecological Constraints,JM Zanker & J Zeil, eds, pp 279–306 (Springer–Verlag,Berlin, 2001); physics/0004060.

[21] JH van Hateren, R Kern, G Schwerdtfeger & M Egelhaaf,Function and coding in the blowfly H1 neuron duringnaturalistic optic flow. J Neurosci 25, 4343–4352 (2005).

[22] S Ma, Calculation of entropy from data of motion. J StatPhys 26, 221–240 (1981).

[23] GAF Seber, Estimation of Animal Abundance and Re-lated Parameters. (Griffin, London, 1973).

[24] GA Miller, Note on the bias of information estimates. In

12

Information Theory in Psychology: Problems and Meth-ods II–B, H Quastler, ed, pp 95–100 (Free Press, GlencoeIL, 1955).

[25] A Treves & S Panzeri, The upward bias in measures ofinformation derived from limited data samples. NeuralComp 7, 399–407 (1995).

[26] L Paninski, Estimation of entropy and mutual informa-tion. Neural Comp 15, 1191–1253 (2003).

[27] RR de Ruyter van Steveninck, GD Lewen, SP Strong, RKoberle & W Bialek, Reproducibility and variability inneural spike trains. Science 275, 1805–1808 (1997).

[28] GK Zipf, Human Behavior and the Principle of LeastEffort (Addison–Wesley, Cambridge, 1949).

[29] R de Ruyter van Steveninck & W Bialek, Real–time per-formance of a movement sensitive neuron in the blowflyvisual system: Coding and information transfer in shortspike sequences. Proc R Soc London Ser B 234, 379–414(1988).

[30] I Nemenman & W Bialek, Occam factors and model–independent Bayesian learning of continuous distribu-tions. Phys Rev E 65, 026137 (2002); cond-mat/0009165.

[31] DM Green & JA Swets, Signal Detection Theory and Psy-chophysics (Wiley, New York, 1966).

[32] CE Carr, W Heiligenberg & GJ Rose, A time–comparisoncircuit in the electric fish midbrain. I. Behavior and phys-iology. J Neurosci 10, 3227–3246 (1986).

[33] N Brenner, SP Strong, R Koberle, W Bialek & RR deRuyter van Steveninck, Synergy in a neural code. NeuralComp 12, 1531–1552 (2000); physics/9902067.

[34] P Reinagel & RC Reid, Temporal coding of visual in-formation in the thalamus. J Neurosci 20, 5392–5400(2000).

[35] W Bialek, F Rieke, RR de Ruyter van Steveninck & DWarland, Reading a neural code. Science 252, 1854–1857(1991).

[36] R de Ruyter van Steveninck & W Bialek, Reliability andstatistical efficiency of a blowfly movement–sensitive neu-ron. Phil Trans R Soc Lond Ser B 348, 321–340 (1995).

[37] RR de Ruyter van Steveninck & SB Laughlin, The rateof information transfer at graded–potential synapses. Na-ture 379, 642–645 (1996).

[38] R de Ruyter van Steveninck & SB Laughlin, Light adap-tation and reliability in blowfly photoreceptors. Int JNeural Syst 7, 437–444 (1996).

neural coding of a natural stimulus ensemble: uncovering

Documents