information theory in neuroscience noise, probability and information theory msc neuroscience prof....

Information Theory in Neuroscience

Noise, probability and information theory

MSc Neuroscience

Prof. Jan Schnupp

[email protected]

0 80 160 240 320 400 480 560 640 720 800msec

0 80 160 240 320 400 480 560 640 720 800msec

Neural Responses are NoisyNeural Responses are Noisy

Recordings in cat A1 to recordings of sheep and frog sounds.Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern.How much information does an individual response convey about the stimulus?

Recordings in cat A1 to recordings of sheep and frog sounds.Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern.How much information does an individual response convey about the stimulus?

cats\9920\zoo50.src

Joint and Marginal ProbabilitiesJoint and Marginal Probabilities

A plausible hypothetical exampleA plausible hypothetical example

Stimulus On

Stimulus Off

(marginal p(r) )

Neuron Responds 0.35 0.05 0.4

Neuron does not Respond

0.15 0.45 0.6

(marginal p(s) )

0.5 0.5

Joint Probabilities and IndependenceJoint Probabilities and Independence

Let s be stimulus present, r be neuron responds.p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability)p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability)Note: p(s|r) =p(s,r)/p(r)If r and s are independent, then p(s,r)=p(s) • p(r)Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus.

Let s be stimulus present, r be neuron responds.p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability)p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability)Note: p(s|r) =p(s,r)/p(r)If r and s are independent, then p(s,r)=p(s) • p(r)Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus.

What is Information?What is Information?

If I tell you something you already know I don’t give you any (new) information. If I tell you something that you could have easily guessed I give you only little information.The less likely a message, the more “surprising” it is: Surprise=1/p.The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p)Examples:

“A is the first letter of the alphabet”: p=1, I=-log2(1)=0

“I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1

“His phone number is 928 399”: p=1/107 I=log2(107)=23.25

If I tell you something you already know I don’t give you any (new) information. If I tell you something that you could have easily guessed I give you only little information.The less likely a message, the more “surprising” it is: Surprise=1/p.The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p)Examples:

“A is the first letter of the alphabet”: p=1, I=-log2(1)=0

“I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1

“His phone number is 928 399”: p=1/107 I=log2(107)=23.25

“Entropy” S(s) or H(s)“Entropy” S(s) or H(s)

s

spspsS ))((log)()( 2

Measures “uncertainty” about a message s.Equal to the “average” information content of messages from a particular source.Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. Entropy of flipping a fair coin:S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 Convention: 0 • log(0) = 0; Entropy of flipping a trick coin with “heads” on both sides:S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 Entropy of rolling a die:S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585

Measures “uncertainty” about a message s.Equal to the “average” information content of messages from a particular source.Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. Entropy of flipping a fair coin:S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 Convention: 0 • log(0) = 0; Entropy of flipping a trick coin with “heads” on both sides:S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 Entropy of rolling a die:S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585

If two random processes are statistically independent, their entropies add

If two random processes are statistically independent, their entropies add

In this example:S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2)

In this example:S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2)

Outcome of 2 coin flips

HH HT TH TT

Probability 1/4 1/4 1/4 1/4

s

spspsS ))((log)()( 2

If two processes are not independent, their joint entropy is less than the sum of the individual entropies

If two processes are not independent, their joint entropy is less than the sum of the individual entropies

)()(),( rSsSrsS Outcome of 2 coin flips

HH HT TH TT

Probability 1/2 0 0 1/2

In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2S(s,r)= -2 • 1/2 • log2(1/2) = 1

In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2S(s,r)= -2 • 1/2 • log2(1/2) = 1

“Mutual Information” I(r,s)“Mutual Information” I(r,s)

Also sometimes called the “transmitted information” T(r;s).Equal to the difference between joint-entropy and sum of individual entropies. Measures how much uncertainty about one random variable is reduced if the value of another random variable is known.

Also sometimes called the “transmitted information” T(r;s).Equal to the difference between joint-entropy and sum of individual entropies. Measures how much uncertainty about one random variable is reduced if the value of another random variable is known.

sr sprp

srpsrpsrI

,2 )

)()(

),((log),(),(

),()()(),( srSsSrSsrI

Traffic Light ExampleSwiss DriversTraffic Light ExampleSwiss Drivers

Relative freq(estimated prob)

Red Green

Stop 1/2 0

Go 0 1/2

Here:I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½))

+ 0 = log2(2) = 1

Here:I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½))

+ 0 = log2(2) = 1

sr sprp

srpsrpsrI

,2 )

)()(

),((log),(),(

Traffic Light ExampleEgyptian DriversTraffic Light ExampleEgyptian Drivers

Relative freq Red Green

Stop 0.2 0.05

Go 0.3 0.45

Here:I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 /

(0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 /

(0.75• 0.5)) = 0.3545

Here:I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 /

(0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 /

(0.75• 0.5)) = 0.3545

Note: In this case p(Stop)=0.25, hence S(Go) = 0.8133 < 1

sr sprp

srpsrpsrI

,2 )

)()(

),((log),(),(

Hypothetical ExampleHypothetical Example

Non-monotonic (quadratic) relationship between stimulus and response. No (linear or first order) correlation between stimulus and response.Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus.Correlation is zero, but mutual information is large.

Non-monotonic (quadratic) relationship between stimulus and response. No (linear or first order) correlation between stimulus and response.Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus.Correlation is zero, but mutual information is large.

0 0.5 1 1.5 2 2.50

1

2

3

4

stimulus intensity

resp

onse

Estimating Information in Spike Counts. Example:Estimating Information in Spike Counts. Example:

Data from Mrsic-Flogel et al. Nature Neurosci (2003)Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count.Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space?

Data from Mrsic-Flogel et al. Nature Neurosci (2003)Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count.Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space?

-180 -135 -90 -45 0 45 90 135 180Azim [deg]

-90

-60

-30

0

30

60

90

Ele

v [

de

g]

-135 -90 -45 0 45 90 135

-45

0

45

90

azimuth

elev

atio

n

24 “Sectors”, p(s)=1/24, S(s) = 4.585

Estimating Information in Spike Counts - continued.Estimating Information in Spike Counts - continued.

We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s).p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses.These values are then plugged into the formula above

We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s).p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses.These values are then plugged into the formula above

I(s,r)=0.7019

s

ect

or

sr sprp

srpsrpsrI

,2 )

)()(

),((log),(),(

p(sector, count)

Difficulties with Estimating Mutual Information: Bias!Difficulties with Estimating Mutual Information: Bias!

I(s,r)=0.1281

p(sector, count)

s

ect

or

To calculate transmitted information, we use observed frequencies as estimates for true underlying probabilities. However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content.Example: here on the right, responses were randomly re-assigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained.

To calculate transmitted information, we use observed frequencies as estimates for true underlying probabilities. However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content.Example: here on the right, responses were randomly re-assigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained.

Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment

Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment

Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task

Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task

IT responsesIT responses

Example of responses recorded by Eskander et al.Different Walsh patterns produced different response patterns as well as different spike counts.

Example of responses recorded by Eskander et al.Different Walsh patterns produced different response patterns as well as different spike counts.

Principal Component Analysis of Response PatternsPrincipal Component Analysis of Response Patterns

PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components)

PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components)

Principle ComponentsPrinciple ComponentsIT Neuron Response PatternsIT Neuron Response Patterns

PCA coefficientsPCA coefficients

Eskandar et al. ResultsEskandar et al. Results

Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars).Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus).

Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars).Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus).

Rat “Barrel” CortexRat “Barrel” Cortex

Rat S1 has a a large “barrel field” in which the vibrissae are represented.

Rat S1 has a a large “barrel field” in which the vibrissae are represented.

Spike Latency Coding in Rat Somatosenory CortexSpike Latency Coding in Rat Somatosenory Cortex

Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on rightWhile spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information.

Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on rightWhile spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information.

Applications of Information Theory in Neuroscience – Some Further Examples

Applications of Information Theory in Neuroscience – Some Further Examples

Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response patternMachens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs.Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons.Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?).

Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response patternMachens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs.Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons.Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?).

Information Theory in Neuroscience: a SummaryInformation Theory in Neuroscience: a Summary

Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another.Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y))However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain.Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies.Real neurons have to be encoders and decoders as much as they are transmission channels.The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained.

Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another.Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y))However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain.Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies.Real neurons have to be encoders and decoders as much as they are transmission channels.The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained.

Further ReadingFurther ReadingTrappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford).Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992.Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.

Trappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford).Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992.Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.

information theory in neuroscience noise, probability and information theory msc neuroscience prof....

Documents