exercise sheet 4: covariance and correlation, bayes ... · exercise sheet 4: covariance and...

Exercise Sheet 4: Covariance and Correlation, Bayes’ theorem, and Lineardiscriminant analysis

Younesse Kaddar

1. Covariance and Correlation

Assume that we have recorded two neurons in the two-alternative-forced choice task discussed in class. We denote the firing rate of neuron in trial as and the firing rate of neuron as . We furthermore denote the averaged firing rate of neuron as and of neuron as . Let us "center" the data

by defining two data vectors:

(a) Show that the variance of the firing rates of the first neuron is:

The actual variance of is not known. So to estimate it, we compute the average of the squared deviations on a sample ( ) of the population:the biased sample variance:

To what extent is this estimate biased? By denoting by (resp. ) the actual mean (resp. variance) of :

1 i

r1,i 2 r2,i 1 r̄1 2 r̄2

x ≝ y ≝

⎛⎝⎜⎜

−r1,1 r̄1

⋮−r1,N r̄1

⎞⎠⎟⎟

⎛⎝⎜⎜

−r2,1 r̄2

⋮−r2,N r̄2

⎞⎠⎟⎟

Var( ) = ∥xr11

N − 1∥2

r1 , ⋯ ,r1,1 r1,N

( ) = ( −Varbiased r11N∑i=1

N

r1,i r̄1 = 1

N∑i=1

N

r1,i

)2

μ1 Var( )r1 r1

Thus we get the following unbiased sample variance:

And we get the expected result:

(b) Compute the cosine of the angle between and . What do you get?

Analogously to what we did in the previous question, we can show that:

E( ( ))Varbiased r1 = E( ( − )1N

∑i=1

N

r1,i r̄1)2

= E1N

⎛⎝∑

i=1

N ( − )r1,i1N∑j=1

N

r1,j

2⎞⎠

= E − +1N

⎛⎝∑i=1

N

r21,i

2N

r1,i∑j=1

N

r1,j1N 2

( )∑j=1

N

r1,j

2⎞⎠

= E( − + )1N

∑i=1

N

r21,i

2N

r1,i∑j=1

N

r1,j1N 2

∑1≤j,k≤N

r1,j r1,k

= ( E( ) − E( ) + E( ))1N

∑i=1

N

r21,i

2N∑j=1

N

r1,i r1,j1N 2

∑1≤j,k≤N

r1,j r1,k

= ( E( ) − E( ) − E( ) E( )1N

∑i=1

N

r21,i

2N

r21,i

2N∑j=1j≠i

N

r1,i r1,j

+ E( ) + E( ) E( ))1N 2

∑1≤j≤N

r21,j

1N 2

∑1≤j≠k≤N

r1,j r1,k

= (N E( ) − 2E( ) − 2 E( ) E( )1N

r21,i r2

1,i ∑j=1j≠i

N

r1,i r1,j

+ E( ) + E( ) E( ))1N

∑1≤j≤N

r21,j

1N

∑1≤j≠k≤N

r1,j r1,k

= (N − 2E( ) − 21N

E( )r21,i

≝ Var( +r1)2μ2

1

r21,i ∑

j=1j≠i

N

E( )r1,i ≝ μ1

E( )r1,j = μ1

+ E( ) + )1N

∑1≤j≤N

r21,j

1N

∑1≤j≠k≤N

E( )r1,j =μ1

E( )r1,k =μ1

= (N(Var( + ) − 2(Var( + ) − 2(N − 1) + (Var( + ) + )1N

r1)2 μ21 r1)2 μ2

1 μ21

N

Nr1)2 μ2

1N(N − 1)

Nμ2

1

= Var(N − 1N

r1)2

(by independence)

Var( )r1 = ( )N

N − 1Varbiased r1

=1

N − 1( −∑

i=1

N

r1,i r̄1)2

=∥x∥2

since ∥x∥ = ( −∑i=1

N

r1,i r̄1)2

− −−−−−−−−−−⎷

Var( ) = ∥xr11

N − 1∥2

x y

cos(x, y) =x ⋅ y

∥x∥ ∥y∥

=( − )( − )∑

i=1

N

r1,i r̄1 r2,i r̄2

(N − 1) Var( )r1− −−−−−−−−−−−−√ (N − 1) Var( )r2

− −−−−−−−−−−−−√

=1

N − 1

( − )( − )∑i=1

N

r1,i r̄1 r2,i r̄2

Var( ) Var( )r1 r2− −−−−−−−−−−−√

Therefore:

which is the correlation coefficient

( c) What are the maximum and minimum values that the correlation coefficient between and can take?Why?

By the Cauchy-Schwarz inequality:

i.e.

hence

The maximum (resp. minimum) value of the correlation coefficient between and is (resp. )

By Cauchy-Schwarz:

a correlation coefficient of corresponds to colinear vectors: there exists such that for all , a correlation coefficient of corresponds to uncorrelated vectorsa correlation coefficient of corresponds to anti-colinear vectors: there exists such that for all ,

(d) What do you think the term “centered” refers to?

“Centered” means that the data sample mean is equal to . We can easily check it is the case here (let’s do it for , it is similar for ):

2. Bayes’ theorem

The theorem of Bayes summarizes all the knowledge we have about about the stimulus by observing the responses of a set of neurons, independently ofthe specific decoding rule. To get a better intuition about this theorem, we will look at the motion discrimination task again and compute the probabilitythat the stimulus moved to the left or right . For a stimulus , and a firing rate response of a single neuron, Bayes’ theorem reads:

Here, is the probability that the firing rate is if the stimulus was . The respective distribution can be measured and we assume that it follows aGaussian probability density with mean and standard deviation :

The relative frequency with which the stimuli (leftward or rightward motion, or ) appear is denoted by , often called the prior probability or, forshort, the prior. The distribution denotes the probability of observing a response , independent of any knowledge about the stimulus.

Cov( , ) = ( − )( − )r1 r21

N − 1∑i=1

N

r1,i r̄1 r2,i r̄2

cos(x, y) =Cov( , )r1 r2

Var( ) Var( )r1 r2− −−−−−−−−−−−√

r1 r2

| Cov( , ) ≤ Var( ) Var( )r1 r2 |2 r1 r2

| Cov( , )| ≤r1 r2 Var( ) Var( )r1 r2− −−−−−−−−−−−√

≤ 1∣

∣∣

Cov( , )r1 r2

Var( ) Var( )r1 r2− −−−−−−−−−−−√

∣

∣∣

r1 r2 1 −1

1 α ≥ 0 i − = α( − )r1,i r̄1 r2,i r̄2

0−1 α ≤ 0 i − = α( − )r1,i r̄1 r2,i r̄2

0 x y

( − ) = ( ) − = 01N∑i=1

N

r1,i r̄11N

∑i=1

N

r1,i r̄1 ≝

1N∑i=1

N

r1,i

(←) (→) s ∈ {←, →} r

p(s ∣ r) =p(r ∣ s)p(s)

p(r)

p(r ∣ s) r s

μs σ

p(r ∣ s) = exp(− )1

2πσ2− −−−√

(r − μs)2

2σ2

← → p(s)p(r) r

(a) How can you calculate ? What shape does it have?

As is a partition of the sample space:

As a result:

To use the same example as in the last exercise sheet: if the firing rates range from to Hz, and the means and standard deviation are set to be:

then we have:

Figure 2.a.1 - Plot of the neuron's response probability for left and right motion

Figure 2.a.2 - Plot of the previous neuron's response (in green) probability, if and

p(r)

{{←}, {→}}

p(r) = p(r ∩ s) = p(r ∣ s)p(s) = p(r ∣←)p(←) + p(r ∣→)p(→)∑s

∑s

p(r) = (p(←) exp(− ) + p(→) exp(− ))1

2πσ2− −−−√

(r − μ←)2

2σ2

(r − μ→)2

2σ2

0 100

≝ 60 Hzμ→≝ 40 Hzμ←

σ = 12 Hz

p(←) = 1/3 p(→) = 2/3

http://younesse.net/Computational-neuroscience/Exercise3/

(b) The distribution is often called the posterior probability or, for short, the posterior. Calculate theposterior for and sketch it as a function of , assuming a prior . Draw theposterior into the same plot.

By Bayes’ theorem:

So is a sigmoid of .

As for , as is a probability distribution:

Figure 2.b.1 - Plot of the previous neuron's response (in green) probability, if

p(s ∣ r)s =← r p(←) = p(→) = 1/2

p(→∣ r)

p(←∣ r) =p(r ∣←)p(←)

p(r)

=p(r ∣←)p(←)

p(r ∣←)p(←) + p(r ∣→)p(→)

=1

1 +p(r∣→)p(→)p(r∣←)p(←)

=1

1 +p(r∣→)p(r∣←)

=1

1 + exp(− )(r− −(r−μ→)2μ←)2

2σ2

=1

1 + exp(− )(2r− − )( − )μ→ μ← μ← μ→

2σ2

p(←∣ r)(2r− − )( − )μ→ μ← μ← μ→

2σ2

p(→∣ r) p(∙ ∣ r)

p(→∣ r) = 1 − p(←∣ r)

p(←) = p(→) = 1/2

Figure 2.b.2 - Plot of the previous neuron's posteriors, if

( c) What happens if you change the prior? Investigate how the posterior changes if becomes muchlarger than and vice versa. Make a sketch similar to (b).

If the prior changes, the ratio is no longer equal to and the posterior changes as well, insofar as the general expression of is:

As a matter of fact:

if (for a fixed ):

if (for a fixed ):

p(←) = p(→) = 1/2

p(←)p(→)

p(→)p(←)

1 p(←∣ r)

p(←∣ r) =1

1 + exp(− )p(→)p(←)

(2r− − )( − )μ→ μ← μ← μ→

2σ2

= = 1 − p(→∣ r)1

1 + exp(− + ln( ))(2r− − )( − )μ→ μ← μ← μ→

2σ2

p(→)p(←)

p(←) >> p(→) r

p(←∣ r) ≃ 1 − exp(− ) 1p(→)p(←)

(2r − − )( − )μ→ μ← μ← μ→

2σ2− →−−−−

→p(→)

p(←)0+

p(←) << p(→) r

p(←∣ r) − →−−−−−→+∞

p(→)

p(←)

0+

Figure 2.c - Plot of the previous neuron's posteriors, for various values of the prior

(d) Let us assume that we decide for leftward motion whenever . Interpret this decision rulein the plots above. How well does this rule do depending on the prior? What do you lose when you movefrom the full posterior to a simple decision (decoding) rule?

NB: the decision rule mentioned in the problem statement suggests that , i.e. we are now considering a neuron whose tuning is such thatis fires more strongly if the motion stimulus moves leftwards. As nothing was specified before, the example neuron we were focusing on so farwas of opposite tuning. We will, from now on, switch to a neuron such that:

Figure 2.d.1 - Plot of the response probability of the neuron we are focusing on from now on, for left and right motion

r > ( + )12 μ← μ→

>μ← μ→

>μ← μ→

Figure 2.d.2 - Plot of the new neuron posteriors (for various priors) and decision rule for leftward motion

The decision rule corresponds to inferring that the motion is to the left (resp. to the right) in the red (resp. blue) area of figure 2.d.2,which is perfectly suited to the case where (bold dash-dotted curves).

However, it does a poor job at predicting the motion direction in the other cases, since it does not take into account the changing prior.

To infer a more sensible rule, let us reconsider the situation: we want to come up with a threshold rate such that an ideal observer - making thedecision that the motion that caused the neuron to fire at was to the left (preferred direction of the neuron) - would have a higher chance of beingright than wrong. In other words, the decision rule we are looking for is:

As such, satisfies:

A more sensible decoding rule would then be:

NB: we can check that, in case , we do recover the previous rule.

However, moving from the full posterior to a simple decision rule makes us lose the information relating to how much we are sure about the decisionmade. Indeed, the full content of the information is contained in the posterior values and . If , then there is a higherchance that the motion that caused the neuron to fire was to the left, but just stating that doesn’t tell you how much higher. You need and

to do so: as it happens, the chance of leftward motion is times higher.

3. Linear discriminant analysis

Let us redo the calculations for the case of neurons. If we denote by the vector of firing rates, Bayes’ theorem reads:

We assume that the distribution of firing rates again follows a Gaussian so that:

r > ( + )12 μ← μ→

r > ( + )12 μ← μ→

p(←) = p(→) = 1/2

rthr > rth

p(←∣ r) > p(→∣ r)

rth

⟺

⟺

⟺

p(←∣ ) = p(→∣ )rth rth

p(←∣ ) = 1/2rth

exp(− + ln( )) = 1(2 − − )( − )rth μ→ μ← μ← μ→

2σ2

p(→)p(←)

= ln( ) +rthσ2

−μ← μ→

p(→)p(←)

+μ← μ→

2

r > ln( ) +σ2

−μ← μ→

p(→)

p(←)

+μ← μ→

2

p(→) = p(←)

p(←∣ r) p(→∣ r) p(←∣ r) > p(→∣ r)p(←∣ r) p(→∣ r)

p(←∣r)p(→∣r)

N r

p(s ∣ r) =p(r ∣ s)p(s)

p(r)

where

denotes the mean of the density for stimulus is the covariance matrix assumed identical for both stimuli

(a) Compute the log-likelihood ratio

But:

since is a covariance matrix, it is symmetric, hence: , and

Therefore:

and

(b) Assume that is the decision boundary, so that any firing rate vector giving a log-likelihoodratio larger than zero is classified as coming from the stimulus . Compute a formula for the decisionboundary. What shape does this boundary have?

First, let us show how we could come up with the decision boundary.

So the decision boundary is a good one if and only if

p(r ∣ s) = exp(− (r − (r − ))1

(2π)N/2 det C− −−−−√

12

μs)T C−1 μs

μs s ∈ {←, →}C

l(r) ≝ lnp(r ∣←)p(r ∣→)

l(r) = ln⎛⎝⎜

exp(− (r − (r − ))1

(2π)N/2 det C√

12 μ←)T C−1 μ←

exp(− (r − (r − ))1

(2π)N/2 det C√

12 μ→)T C−1 μ→

⎞⎠⎟

= ln( )exp(− (r − (r − ))12 μ←)T C−1 μ←

exp(− (r − (r − ))12 μ→)T C−1 μ→

= ln(exp(− (r − (r − ))) − ln(exp(− (r − (r − )))12

μ←)T C−1 μ←12

μ→)T C−1 μ→

= (r − (r − ) − (r − (r − )12

μ→)T C−1 μ→12

μ→)T C−1 μ→

= ( r + − − r12

rTC−1 μT→C−1 μ→ rTC−1 μ→ μT

→C−1

− r − + + r)rTC−1 μT←C−1 μ← rTC−1 μ← μT

←C−1

= ( − − ( + r) + ( + r))12

μT→C−1 μ→ μT

←C−1 μ← rTC−1 μ→ μT→C−1 rTC−1 μ← μT

←C−1

R ∋ = ( = ( rrTC−1 μs rTC−1 μs)T μTs C−1 )T

C ( = ( C = = ICT C−1 )T C−1 )T IT ( = (C−1 )T CT )−1

l(r) = ( − − 2 + 2 )12

μT→C−1 μ→ μT

←C−1 μ← rTC−1 μ→ rTC−1 μ←

l(r) = +( − )12

μT→C−1μ→ μT

←C−1μ← ≝ α←

rT ( − )C−1 μ← μ→ ≝ β←

l(r) = 0 r

←

l(r) > 0

⟺

⟺

⟺

⟺

⟺

p(←∣ r) > p(→∣ r)

> 1p(←∣ r)p(→∣ r)

> 1p(r ∣←)p(←)/p(r)p(r ∣→)p(→)/p(r)

>p(r ∣←)p(r ∣→)

p(→)p(←)

>p(r ∣←)p(r ∣→)

p(→)p(←)

l(r) ≝ ln( ) > ln( )p(r ∣←)p(r ∣→)

p(→)p(←)

l(r) > 0

The formula for this decision boundary is the following one:

It’s an affine hyperplane (i.e. a subspace of codimension ) in the -dimensional space .

( c) Assume that . Assume we are analyzing two neurons with uncorrelated activities, sothat the covariance matrix is:

Sketch the decision boundary for this case.

With:

we get, by resorting to the LDA decision boundary :

Figure 3.c - Plot of the LDA decision boundary for the two neurons with uncorrelated activities

ln( ) = 0⟺ p(→) = p(←) = 1/2p(→)

p(←)

l(r) = 0⟺ + = 0( − )12

μT→C−1μ→ μT

←C−1μ← ≝ α←

rT ( − )C−1 μ← μ→ ≝ β←

⟺ = −rT β← α←

1 n Rn

p(←) = p(→) = 1/2

C = ( )σ21

0

0

σ22

≝ (60 Hz, 40 Hz)μ←≝ (40 Hz, 60 Hz)μ→

C ≝ ( )122 Hz2

00

52 Hz2

l(r) = + > 0rTβ← α←

exercise sheet 4: covariance and correlation, bayes ... · exercise sheet 4: covariance and...

Documents