information theory in neuroscience noise, probability and information theory msc neuroscience prof....
TRANSCRIPT
Information Theory in Neuroscience
Noise, probability and information theory
MSc Neuroscience
Prof. Jan Schnupp
0 80 160 240 320 400 480 560 640 720 800msec
0 80 160 240 320 400 480 560 640 720 800msec
Neural Responses are NoisyNeural Responses are Noisy
Recordings in cat A1 to recordings of sheep and frog sounds.Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern.How much information does an individual response convey about the stimulus?
Recordings in cat A1 to recordings of sheep and frog sounds.Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern.How much information does an individual response convey about the stimulus?
cats\9920\zoo50.src
Joint and Marginal ProbabilitiesJoint and Marginal Probabilities
A plausible hypothetical exampleA plausible hypothetical example
Stimulus On
Stimulus Off
(marginal p(r) )
Neuron Responds 0.35 0.05 0.4
Neuron does not Respond
0.15 0.45 0.6
(marginal p(s) )
0.5 0.5
Joint Probabilities and IndependenceJoint Probabilities and Independence
Let s be stimulus present, r be neuron responds.p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability)p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability)Note: p(s|r) =p(s,r)/p(r)If r and s are independent, then p(s,r)=p(s) • p(r)Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus.
Let s be stimulus present, r be neuron responds.p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability)p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability)Note: p(s|r) =p(s,r)/p(r)If r and s are independent, then p(s,r)=p(s) • p(r)Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus.
What is Information?What is Information?
If I tell you something you already know I don’t give you any (new) information. If I tell you something that you could have easily guessed I give you only little information.The less likely a message, the more “surprising” it is: Surprise=1/p.The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p)Examples:
“A is the first letter of the alphabet”: p=1, I=-log2(1)=0
“I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1
“His phone number is 928 399”: p=1/107 I=log2(107)=23.25
If I tell you something you already know I don’t give you any (new) information. If I tell you something that you could have easily guessed I give you only little information.The less likely a message, the more “surprising” it is: Surprise=1/p.The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p)Examples:
“A is the first letter of the alphabet”: p=1, I=-log2(1)=0
“I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1
“His phone number is 928 399”: p=1/107 I=log2(107)=23.25
“Entropy” S(s) or H(s)“Entropy” S(s) or H(s)
s
spspsS ))((log)()( 2
Measures “uncertainty” about a message s.Equal to the “average” information content of messages from a particular source.Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. Entropy of flipping a fair coin:S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 Convention: 0 • log(0) = 0; Entropy of flipping a trick coin with “heads” on both sides:S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 Entropy of rolling a die:S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585
Measures “uncertainty” about a message s.Equal to the “average” information content of messages from a particular source.Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. Entropy of flipping a fair coin:S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 Convention: 0 • log(0) = 0; Entropy of flipping a trick coin with “heads” on both sides:S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 Entropy of rolling a die:S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585
If two random processes are statistically independent, their entropies add
If two random processes are statistically independent, their entropies add
In this example:S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2)
In this example:S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2)
Outcome of 2 coin flips
HH HT TH TT
Probability 1/4 1/4 1/4 1/4
s
spspsS ))((log)()( 2
If two processes are not independent, their joint entropy is less than the sum of the individual entropies
If two processes are not independent, their joint entropy is less than the sum of the individual entropies
)()(),( rSsSrsS Outcome of 2 coin flips
HH HT TH TT
Probability 1/2 0 0 1/2
In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2S(s,r)= -2 • 1/2 • log2(1/2) = 1
In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2S(s,r)= -2 • 1/2 • log2(1/2) = 1
“Mutual Information” I(r,s)“Mutual Information” I(r,s)
Also sometimes called the “transmitted information” T(r;s).Equal to the difference between joint-entropy and sum of individual entropies. Measures how much uncertainty about one random variable is reduced if the value of another random variable is known.
Also sometimes called the “transmitted information” T(r;s).Equal to the difference between joint-entropy and sum of individual entropies. Measures how much uncertainty about one random variable is reduced if the value of another random variable is known.
sr sprp
srpsrpsrI
,2 )
)()(
),((log),(),(
),()()(),( srSsSrSsrI
Traffic Light ExampleSwiss DriversTraffic Light ExampleSwiss Drivers
Relative freq(estimated prob)
Red Green
Stop 1/2 0
Go 0 1/2
Here:I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½))
+ 0 = log2(2) = 1
Here:I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½))
+ 0 = log2(2) = 1
sr sprp
srpsrpsrI
,2 )
)()(
),((log),(),(
Traffic Light ExampleEgyptian DriversTraffic Light ExampleEgyptian Drivers
Relative freq Red Green
Stop 0.2 0.05
Go 0.3 0.45
Here:I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 /
(0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 /
(0.75• 0.5)) = 0.3545
Here:I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 /
(0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 /
(0.75• 0.5)) = 0.3545
Note: In this case p(Stop)=0.25, hence S(Go) = 0.8133 < 1
sr sprp
srpsrpsrI
,2 )
)()(
),((log),(),(
Hypothetical ExampleHypothetical Example
Non-monotonic (quadratic) relationship between stimulus and response. No (linear or first order) correlation between stimulus and response.Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus.Correlation is zero, but mutual information is large.
Non-monotonic (quadratic) relationship between stimulus and response. No (linear or first order) correlation between stimulus and response.Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus.Correlation is zero, but mutual information is large.
0 0.5 1 1.5 2 2.50
1
2
3
4
stimulus intensity
resp
onse
Estimating Information in Spike Counts. Example:Estimating Information in Spike Counts. Example:
Data from Mrsic-Flogel et al. Nature Neurosci (2003)Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count.Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space?
Data from Mrsic-Flogel et al. Nature Neurosci (2003)Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count.Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space?
-180 -135 -90 -45 0 45 90 135 180Azim [deg]
-90
-60
-30
0
30
60
90
Ele
v [
de
g]
-135 -90 -45 0 45 90 135
-45
0
45
90
azimuth
elev
atio
n
24 “Sectors”, p(s)=1/24, S(s) = 4.585
Estimating Information in Spike Counts - continued.Estimating Information in Spike Counts - continued.
We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s).p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses.These values are then plugged into the formula above
We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s).p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses.These values are then plugged into the formula above
I(s,r)=0.7019
s
ect
or
sr sprp
srpsrpsrI
,2 )
)()(
),((log),(),(
p(sector, count)
Difficulties with Estimating Mutual Information: Bias!Difficulties with Estimating Mutual Information: Bias!
I(s,r)=0.1281
p(sector, count)
s
ect
or
To calculate transmitted information, we use observed frequencies as estimates for true underlying probabilities. However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content.Example: here on the right, responses were randomly re-assigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained.
To calculate transmitted information, we use observed frequencies as estimates for true underlying probabilities. However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content.Example: here on the right, responses were randomly re-assigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained.
Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment
Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment
Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task
Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task
IT responsesIT responses
Example of responses recorded by Eskander et al.Different Walsh patterns produced different response patterns as well as different spike counts.
Example of responses recorded by Eskander et al.Different Walsh patterns produced different response patterns as well as different spike counts.
Principal Component Analysis of Response PatternsPrincipal Component Analysis of Response Patterns
PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components)
PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components)
Principle ComponentsPrinciple ComponentsIT Neuron Response PatternsIT Neuron Response Patterns
PCA coefficientsPCA coefficients
Eskandar et al. ResultsEskandar et al. Results
Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars).Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus).
Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars).Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus).
Rat “Barrel” CortexRat “Barrel” Cortex
Rat S1 has a a large “barrel field” in which the vibrissae are represented.
Rat S1 has a a large “barrel field” in which the vibrissae are represented.
Spike Latency Coding in Rat Somatosenory CortexSpike Latency Coding in Rat Somatosenory Cortex
Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on rightWhile spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information.
Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on rightWhile spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information.
Applications of Information Theory in Neuroscience – Some Further Examples
Applications of Information Theory in Neuroscience – Some Further Examples
Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response patternMachens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs.Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons.Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?).
Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response patternMachens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs.Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons.Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?).
Information Theory in Neuroscience: a SummaryInformation Theory in Neuroscience: a Summary
Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another.Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y))However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain.Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies.Real neurons have to be encoders and decoders as much as they are transmission channels.The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained.
Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another.Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y))However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain.Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies.Real neurons have to be encoders and decoders as much as they are transmission channels.The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained.
Further ReadingFurther ReadingTrappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford).Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992.Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.
Trappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford).Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992.Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.