population codes & inference in neurons richard zemel department of computer science university...
TRANSCRIPT
Population Codes & Inference in Neurons
Richard Zemel
Department of Computer ScienceDepartment of Computer ScienceUniversity of TorontoUniversity of Toronto
Basic questions of neural representation
Fundamental issue in computational neuroscience:
How is information represented in the brain?
What are the units of computation?
How is information processed at neural level?
Important part of answer: information not processed by single cells, but by populations
Population CodesCoding first thought to be localist: neurons as binary
units, encode unique value
Alternative: more distributed, graded response; neuron’s level of activity conveys information
Population code: group of units tuned to common variable
Good computational strategy: efficient and robust
Population codes all the way down
Examples: visual features;motor commands; other sensoryproperties; place fields
Outline1) Information processing in population codes
a) reading the neural codeb) computation in populations
2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding
distributions in neurons3) Maintaining and updating distributions
through time: dynamic distributionsa) optimal analytic formulationb) network approximation
Reading the Neural CodeNeurophysiologists collect neural recordings:
sequences of action potentials (spikes) from one or several cells during controlled experiment
Task: reconstruct identity, or value of parameter(s)
Why play the homunculus? Assess degree to which that parameter encoded
(establish sufficiency, not necessity) Limits on reliability and accuracy of neuronal encoding
(estimate optimal parameters) Characterize information processing: nervous system
faced with this decoding problem
Rate representation of response
Spikes convey information through timing
Typically converted into scalar rate value, summarized in ri: firing rate of cell i (#spikes in interval/interval size)
Interval size determines amount of information about spike timing lost in firing rate representation
Can also consider firing rate of cell as the probability that the cell will fire within specified time interval
Example: Reconstructing movement directionTask: given rates in population of direction-selective
cells (r = r1,…,rN) compute arm direction
Cells in motor cortex (M1) tuned to movement angle
Tuning function (curve) fi(x):
fi(x) = A + B cos (x-xi)
A = ½ (rimax + ri
min) B = ½ (rimax - ri
min)
Population vector methodConsider each cell as vector pointing in preferred direction xi
Length of vector represents relative response strength for particular movement direction
Sum of vectors is the estimated movement direction: the population vector
Simple, robust accurate method if N large, and {xi} randomly, uniformly span the space of directions
Can also view as reconstruction with cosine basis:
=
Bayesian reconstructionBasis function methods perform well, but other class of methods in some sense optimal
Set up statistical model: signal x produces response r, need to invert to model to find likely x for given r
Begin with encoding model
ri(x) = fi(x) + η rate ri(x) is random variable: response of cell i in population to stimulus x
tuning function fi(x) describes expected rate
noise η typically assumed Poisson or Gaussian
Goal: decode responses to form posterior distribution
P(x|r)= P(r|x) P(x) / P(r)
Standard Bayesian reconstruction
likelihood P(r|x) based on encoding model
assumptions in standard model: spikes have Poisson distribution (natural if rate defined as
spike count, spikes distributed independently, randomly) noise uncorrelated between different cells: all variability
captured in P(ri|x)
intuition: gain precision through multiplying rather than adding basis functions (tuning curves here)
obtain single value estimate through MAP or ML
Application: hippocampal population codes
P(x) based on spatial occupancy; P(ri|x) are place fields
Zhang et al.
ML reconstruction
under simplifying assumptions, ML reconstruction has simple intuitive form
implement ML by maximizing
if tuning curves evenly distributed ( constant)
for Gaussian tuning curves,
xM L =P
i ri xiPi ri
Computation in population codes
most of computational focus on population codes based on observation that they offer compromise: localist codes have problems with noise, robustness,
number of neurons required fully distributed codes can make decoding complicated,
cannot handle multiple values
other properties of population codes studied recently, key focus (driven partly by biological studies) on recurrent connections between units in population
Line attractor
simple network model, with recurrent connections Tij, governed by dynamic equation: ui is net input into unit i; rate ri its output; is
recurrent input; hi its feedforward input
if rate linear above threshold input:
recurrent contribution of j on i: feedforward contribution:
in general, set of N linear equations in N unknowns has unique solution, but can tune connections so fixed points (attractors) lie along a line
Line attractor model
applied to number of problems: short-term memory: remembering facing direction after
closing eyes, rotating head noise removal: used to clean up noisy population
responses set up lateral cxns so smooth hill centered on any point is stable transient noisy input, network settles into hill of activity peak position close approximation to xML: process allows simple
decoder (e.g., population vector method) to approximate ML
other recurrent connection schemes produce stimulus selection (nonlinear, WTA); gain modulation (linear, scale responses by background amplitude)
Outline1) Information processing in population codes
a) reading the neural codeb) computation in populations
2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding
distributions in neurons3) Maintaining and updating distributions
through time: dynamic distributionsa) optimal analytic formulationb) network approximation
Extending information in population codes Standard model focuses on encoding single value of x in
face of noisy r
Alternative: populations represent more than single value; motivated by computational efficiency, also necessity – handle important natural situations
(1). Multiple values
Extending information in population codes
(2). uncertainty (noise at all levels; inherent in image – insufficient information, e.g., low-contrast images)
Aperture Problem
Adelson & Movshon
v v
Inherent Ambiguity
All possible motion vectors lie along a line in the 2D vx,vy ‘velocity space’
vy
vx
Human behavior: Bayesian judgements
PriorLikelihood
Posterior
*
Weiss, Simoncelli, AdelsonWeiss, Simoncelli, Adelson
Bayesian cue combination
Ernst & BanksErnst & Banks
(A). Gain EncodingSimple extension of standard population code interpretation:
activity is noisy response of units to single underlying value
Encoding: P(ri|), for example, bell-shaped tuning:
Aim: given unit activities r, tuning curves fi(), find directions P(|r)
Decoding: log P(ri|), e.g., assume indy Poisson noise
(A). Gain Encoding (cont).
Gaussian, homogeneous fi(), uniform prior: log P(|{ri}) ! Gaussian
Solve for , by completing the square
Simple mechanism for encoding uncertainty: change overall population
activity (gain); but limited to Gaussian posterior
(A). Gain encoding: Transparent motion
Solve for , by completing the square
convolves responses w/ unimodal kernels
1. unimodal response pattern produces unimodal distn.2. surprisingly, also fails on bimodal response patterns
only extracts single motion component from responses to transparent motion
(B). Direct Encoding
Activity corresponds directly to probability
Simple case: binary (A vs. B):probability neuron 1 spikes P(A),or can wait to compute rates r1 P(A)
Note: r1 can also represent log P(A); log P(A)/P(B )
Shadlen et al; Rao; Deneve; Hoyer & Hyvarinen
(B). Direct Encoding: Example
Discrete alternatives i for explaining input s
ri / log P(s|i) = likelihood for i
Standard model for neural motion analysis: motion energy filter
Filter response gi(s) is energy of video s(y,t) convolved with oriented filter, tuned to velocity
i
Probabilistic model predicts ideal video is formed by image s(y) translating at velocity
i
(B). Direct Encoding: Example
= 0
= 1
= 2
Weiss & Fleet, 02
t
y
Weiss & Fleet
(C). Convolution Codes
Characterize population response in terms of P(|r) --standard model restricted to Gaussian posterior
Convolution codes can represent more general density fcns, introduce level of indirection to
direct method
Two forms of convolution codes:1. Decoding kernels2. Encoding kernels
(C). Convolution Codes
• bases can be distributions: P(|r) normalized
• bases can have simple form: i() = ( - i)
• multimodal P(|r) if active neurons have different i
Anderson
Decoding kernels (bases):
(C). Convolution Codes: DPC
if P() = (,*) then <ri> = i(*), so couldchoose tuning functions fi() as kernels
Zemel, Dayan, & Pouget
Encoding kernels (bases):
Decoding:• deconvolution (cannot recover high
freqs.)• probabilistic approach: nonlinear
regression to optimize P(|r) under encoding model
Sums or Products?
kernel decoder
kernel encoder
(C). Convolution codes: Transparent motion
Bimodal response patterns: recovers generating distributionUnimodal patterns fit, until (matches subject’s uncertainty)
(C). Convolution Codes: Extension
handle situation with multiple values and uncertainty
library of functions () that describe combinations
of values of
Sahani & Dayan
Outline1) Information processing in population codes
a) reading the neural codeb) computation in populations
2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding
distributions in neurons3) Maintaining and updating distributions
through time: dynamic distributionsa) optimal analytic formulationb) network approximation
Dynamic distributions: motivation
Dynamic cue combination
information constantly changing over time: extend framework to encode/decode dynamic
distributions
Kording & WolpertKording & Wolpert
Dynamic cue combination
Dynamic Distributions: decodingSpike train R(t) what is P(X(t)|R(t))?
Markov: dynamics determined by Tij = P(Xi(t)|Xj(t-1))
More general form: continuous time: R(t), X(t) is spike, posn from 0 to t: R(0)…R(t-²); X(0)…X(t-²)
GP spikes: Encoding model & prior
instantaneous, independent, inhomogeneous Poisson process:
and a Gaussian Process prior:
® defines the smoothness of the prior, and ¿ defines the speed of movement
P (R (t)jX (t)) =NY
j =1
MY
m=0
P (R j (tm)jX (tm)) /Y
j ;tm
f j (X (tm))
Huys, Zemel, Natarajan, Dayan
GP spikes decoding: Dynamics prior is key
static stimulus prior (® = 0):
dynamic stimulus prior (® > 0): spikes not eternally informative 1st-order Markov (® = 1) : OU process high-order (® = 2): smooth process
m(t) =P
j µj (X
f tm <tg
k(tm)R j (tm))
Trajectories & kernelsOU (® = 1) Smooth (®=2)
Optimal Dynamic Distributions
Analytically tractable formulationPrior important for rapidly changing stimuli – fewer spikes than temporal variations in stimulusFor smooth (natural) stimuli: no recursive formulation, recompute kernel per spikeDecoding: must maintain spike history
Hypothesis: Recoding spikes
Recode input spikes into a new set of spikes to facilitate downstream processing; obviate need to store spike historyTrain network to produce new spikes so that simple decoder can approximate optimal decoding of input spikes
Natarajan, Huys, Dayan, ZemelNatarajan, Huys, Dayan, Zemel
Log-linear spike decodingeffect of spike on postsynaptic neuron: produces smoothly decaying postsynaptic potential
t)
t
1 1
1
1
1 1
X X
Hinton & BrownHinton & Brown
1. Convolution kernel decoder for S(t):
2. Processing dynamics: standard recurrent net
3. Learn weights W, V to minimize
Dynamic Distributions: recoding networkAim: map spikes R(t) to S(t), so that simple decoding of S(t) approximates optimal P(X(t)|R(t))
Recoding network: example
Recoding network: analyzing kernels
Recoding network: results summary
Discussion
Current directions: Apply scheme recursively, hierarchically
Relate model to experimental results, e.g., Kording & Wolpert
Open issues: High-dimensional spaces: curse of dimensionality
doubled?
Experimental validation or refutation of proposed distributional schemes?