józsef fiser
DESCRIPTION
Exploring a sampling-based framework for probabilistic representation and computation in the cortex. József Fiser. Brandeis University. - PowerPoint PPT PresentationTRANSCRIPT
Exploring a sampling-based framework for probabilistic representation and computation
in the cortex
József Fiser
Brandeis University
Pietro Berkes Máté Lengyel, Gergő Orbán Brandeis University University of
Cambridge
Outline
Proposal: the sampling hypothesis
Challenges of the standard model of vision in the cortex
Empirical evidence explained
How viable the sampling model is
Challenges
Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity
Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain
Coding: High trial-to-trial variability of cell responses even in primary sensory cortices that strongly interferes with reliable neural computation. Systematic variations of neural response variance is not explained by current theories.
The sampling hypothesis
The cortex performs a probabilistic computation for inference and learning and to achieve this it uses a sampling-based neural representation in which spontaneous activity represents the prior information used for inference.
Probabilistic computation: Represents and computes with probability distributions in order to handle uncertainty. Sampling-based: Distributions are represented by samples. Spontaneous = prior: Spontaneous activity is not noise.
Logic of sampling-based representationLogic of sampling-based representation
variable 1
varia
ble
2
firing rate
variable 1
varia
ble
2neuron 1
neuron 2
trials / time
firing rateneuron 1
feature intensity1orientation1
orie
ntati
on2
feat
ure
inte
nsity
2
Functional role of spontaneous activity
Brightness coding in the primary visual cortex is factorized (Rossi et al. Science 1996)
Spontaneous activity = samples from the prior
Formalizing the relationship between SA and EA
Activity p(y | x) in absence of input stimuli (x) represent samples from the prior p(y) - and this is the spontaneous activity.
Spontaneous activity Evoked activity
Pθ (y) learning fromPdata (x)
⏐ →⏐ ⏐ ⏐ ⏐ Pθ (y|x)⋅Pdata(x) dx∫
PSA (y) ⏐ →⏐ ⏐ ⏐ ⟨PEA(y|x)⟩Pdata (x)
Natural scenes
Data collection• Awake, head-restrained ferrets
• 4 age groups (P30, P45, P90, P120)
• Multi-electrode recording with 200 µ spacing from layer 2-3 of V1
• Complete dark (spont. activity)
• Natural scene movie
• Drifting grating
• Random noise
Evidence of sampling based probabilistic representation in the awake cortex
Visual
Results I
SA and EA in the Movie condition converge with age
In the adult animal, they do not differ significantly
Results II
The match between SA and EA is specific to natural
scenes
Results III
The match between SA and EA is not specific to the
visual modality
FAQ
FHS
Limited number of samples must lead to bias in
learning
Why sampling just cannot be the answer
It must break down if the input is non-stationary
It is not clear how the neural circuitry can do it
It requires way too much time to collect sufficient number of samples
Well, let’s see…Well, let’s see…
How can sampling be inplemented in neural networks?
Gibbs sampling
Hamiltonian Monte Carlo – Langevin sampling:
Quite naturally
A simple model of cue combination based on sampling
Task: given auditory and visual information where is the target?
Hamiltonian Monte Carlo – Langevin sampling:
How many samples are needed for accurate estimation?
With 1 sample, the variance of the estimate is just twice the optimal (ML) variance
Sampling might be optimized by the cortex via minimizing the autocorrelation of successive samples:
Surprisingly few
What happens with non-static input?
As long as the internal model is correct, the model will have no problem tracking the signal
Why is this happening?
Things are actually getting better not worse
No need for burn-in, plus tighter prior
Can we learn with such a small number of sampels?
Learning tuning curve positions and variances
Results
Yes.....
Limited number of samples must lead to bias in
learning
Sampling can be the answer
It must break down if the input is non-stationary
It is not clear how the neural circuitry can do it
It requires way too much time to collect sufficient number of samples
✔✔✔✔
Experimental results explained
Simulations
Maximum likelihood learning using Expectation Maximization
Trained on 80 000 patches of size 8x8 from the van Hateren image database
u – membrane pot.
ri - Poisson rate ni - spike count
Gain [u-Thres]1.2
Receptive fields
Specific questions
Contrast dependence of variability
Can we reproduce our KL results?
Stimulus specificity of variability
Onset effects of variability
Stimulus specificity of co-variability
Does the model agree on characteristics of variability measured in neural activity?
Sanity check
Sampling reproduces the KL divergence results
Contrast dependent variability of u
Sampling gives a natural link between contrast and variance
u1
u2
P(z
|x)
Blank
High
High Blank
Posterior
Stimulus dependent variability of u and the effect of stimulus onset
Sampling can reproduce both the stimulus independence of variability-change and the quenching effect of stimulus onset
SIMULATIONS
time (ms)
Nonpreferred
Preferred
EXPERIMENT
(Yu & Ferster 2010 Neuron)
Stimulus dependent co-variability of u
Sampling produces a reduced co-variability of the membrane potential at stimulus presentation just as it was found in vivo
SIMULATION
Stimulus dependent co-variability of u
Sampling can reproduce these correlational effects, too
EXPERIMENT
(Yu & Ferster 2010 Neuron)
Similar orientation preference
Dissimilar orientation preference
SIMULATION
Conclusions
• Sampling based probabilistic schemes are viable alternatives for neural representation in the cortex providing a unified framework to investigate learning and instantaneous perception
• Measures based on comparing SA and EA support the idea that the cortex uses sampling based representation and SA has the functional role of providing prior information for probabilistic perceptual inferences
• The sampling based representations corroborate the idea that rather than using mostly first moments, cortical coding is strongly related to signal co-variances and higher order moments of the neural signal
Thank you !
The standard view on vision
The fundamental method of information processing in the visual system can be described as
• deterministic computation with added neural noise• feed-forward • based on mean firings of cells
The function of the primary visual cortex in this framework is
• restricted to represent only visual information • to create a faithful representation of the sensory
stimulus with a neural code that is as sparse and independent as possible
Challenges
Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain
Life is uncertain and ambiguous…
The brain encodes both the value and the uncertainty of the stimulus during perception.
Orientationof stimulus
Intensityof stimulus
s: stimulus 2D to 3D
Proposal
Probabilistic approaches in …
Cue combination (Atkins et al. Vis Res 2001; Ernst & Banks Nature 2002)
Classical conditioning (Courville et al. TICS 2006)
Perceptual processes (Kersten et al. Ann. Rev. Psych. 2006)
Visuo-motor coordination (Kording & Wolper Nature 2004)
High-level cognitive processes (Griffiths & Tenenbaum TICS 2006)
Visual statistical learning (Orban et al. PNAS 2008)
Decision making (Trommershäuser et al. TICS 2008)
Is the neural computation deterministic?Is the neural computation deterministic?
Computation
Evid
ence
Inference Learning
Behavioral
Atkins et al., 2001Ernst & Banks, 2002
Körding & Wolpert, 2004Weiss et al., 2002
Yang & Shadlen, 2007Kepecs et al., 2008
Dayan et al., 2000Tenenbaum et al.,
2006
Neural
Platt & Glimcher, 1999Yang & Shadlen, 2007
Kepecs, et al., 2008 ?
Orbán et al., 2008
Probabilistic computation in the brain
Challenges
Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity
Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain
Spontaneous activity in the awake brainSpontaneous activity in the awake brain
Is it really noise?Is it really noise?
Churchland & >25; Nat Neuro 2010
Example: Stimulus onset quenches trial-to-trial variability
Why?Why?
The sampling hypothesis
Parametric Parametric Sampling-based Sampling-based
Probabilistic Population Codes (PPCs)
Probabilistic Population Codes (PPCs)
Predecessors: Kernel Dens. Estimation
Distr. Population Codes
Predecessors: Kernel Dens. Estimation
Distr. Population Codes Implicit examples:• Boltzmann machine• Helmholtz machine
Implicit examples:• Boltzmann machine• Helmholtz machine
Special cases:• Olshausen & Field 1996• Karklin & Lewicki 2009
Special cases:• Olshausen & Field 1996• Karklin & Lewicki 2009
Probabilistic representation in the cortexProbabilistic representation in the cortex
Earlier proposals:• Lee & Mumford 2003• Hoyer & Hyvarinen 2003
Earlier proposals:• Lee & Mumford 2003• Hoyer & Hyvarinen 2003
A simple model of the early visual system
Perception is an inference process which is always based on internally available prior information
e.g. Olshausen & Field 1996 e.g. Olshausen & Field 1996 Sparseness and independence Sparseness and independence
probabilistic formulation probabilistic formulation
Looking for supporting evidence
Predictions
Probability of exhibiting a given neural activity pattern should be identical
With accumulating visual experience, the distribution of spontaneous activity should become increasingly similar to the distribution of evoked activity averaged over natural stimuli .
Similarity should be specific to natural scenes but not to other stimulus ensembles
Probability of transitioning between particular patterns should also match
Data analysis
KL pMovie P pSpont( ) = pMovie x( )
X∑ ∗log
pMovie x( )pSpont x( )
Results II
Both SA and EAMovie become increasingly correlated
P~
(y) = P(yi )i=1
16
∏ Spatial correlations are similar between SA and
EAMovie
Results III
Temporal correlations are similar between SA and
EAMovie
State −transitiondistributions : P(yt+τ |yt)Control : P
~
(yt +τ |yt) =P(y)
Data collection• Awake, head-restrained ferrets
• One age group (adults)
• Single-electrode single-unit recording in A1
Evidence of sampling based probabilistic representation in the awake cortex
Auditory
• Complete silence (spont. activity)
• Natural speech
• White noise (0-20 KHz)
Shamma-lab University of Maryland
Stimulus dependence of spike count correlations
Sampling can reproduce these correlational effects, too
Evidence against sparseness and independence
• No support for active sparsification or increase of independence in the cortex
Relation between MP, mean, variance and Fano
MP variance decreases, both SC meand and variance increase, but the mean increases stronger therefore the Fano factor still decreases
Standard models of visual recognition
Neural bases:
(Hubel and Wiesel, 1962) (Ohki et al., 2006)
System:
Example: Cue combination
Atkins, Fiser & Jacobs VisRes 2001; Ernst & Banks Nature 2002
Is the neural computation deterministic?Is the neural computation deterministic?
Example 2: Decision making
Trommershäuser, Maloney & Landy TICS 2008
Probabilistic inference and learning are coupled
A fact
If instantaneous perception is a process of
probabilistic inference, learning also must be
probabilistic.
Probabilistic inference and learning
Probabilistic representation in the cortex: neural
Part 3
Computation
Evid
ence
Inference Learning
Behavioral
Atkins et al., 2001Ernst & Banks, 2002Körding & Wolpert,
2004Weiss et al., 2002
Yang & Shadlen, 2007Kepecs et al., 2008
Dayan et al., 2000Tenenbaum et al.,
2006
Neural
Platt & Glimcher, 1999Yang & Shadlen, 2007
Kepecs, et al., 2008 ?
Orbán et al., 2008
Alternative representational schemes
Parametric Sampling-based
Probabilistic Population Codes (PPCs)
Predecessor: Kernel Dens. Estim.
Implicit examples:• Boltzmann machine• Helmholtz machine
Special cases:• Olshausen & Field 1996• Karklin & Lewicki 2009
Practical issues in probabilisitic representation
Parametric (PPC) Sampling-based
s: stimulus
PPCs sampling-based
neurons correspond to parameters variables
network dynamics required (beyond the first layer)
deterministic stochastic (self consistent)
representable distributions
must correspond to a particular parametric form
can be arbitrary
critical factor in accuracy of encoding a distribution
number of neurons time allowed for sampling
instantaneous representation of uncertainty
complete, the whole distribution is represented at any time
partial, a sequence of samples is required
number of neurons needed for representing multimodal distributions
scales exponentially with the number of dimensions
scales linearly with the number of dimensions
implementation of learning
unknown well-suited
Comparing the two schemes
PPC Sampling-based
PPCPPC Sampling-basedSampling-basedLogic of the two types of representationLogic of the two types of representation
variable 1
varia
ble
2
variable 1
neuron 2
varia
ble
2
neuron 1 neuron 3 ...
firing rate
variable 1
varia
ble
2
neuron 1neuron 2
neuron 3 ...
variable 1
varia
ble
2
neuron 1
neuron 2
trials / time
firing rateneuron 1
feature intensity1orientation1
orie
ntati
on2
feat
ure
inte
nsity
2
variable 1
varia
ble
2
Bottom line
Sampling based probabilisitic representational
schemes are promising alternatives but there is
no neural evidence supporting the hypothesis
that the cortex uses such a scheme
In order to obtain such evidence we take a detour…
• Raichle-lab: humans, entire cortex cortex, fMRI
- Fox & Raichle Nature Rev Neu. 2007; Vincent et al. Nature 2007; Fox et al. Nat. Neu. 2005
• Grinvald-lab: cat, visual cortex, optical imaging
- Kenet et al. Nature 2004; Tsodyks et al. Science 1999; Arieli et al. Science 1996
• Buzsáki-lab: rat, hippocampus, tetrode recording
- Buzsáki & Draghun Science 2004; Harris et al. Nature 2003; Leinekugel et al. Science 2002;
• McCormick-lab: ferret, cat, rat slices
- Shu et al. Nature 2003; Shu et al. J. Neurosci 2003; McCormicki et al. Cereb. Cortex 2003
• Yuste & Ferster labs: in vivo in vitro single cell recording, 2-photon Ca imaging
- Ikegaya et al. Science 2004; Anderson et al. Nat. Neuro 2000; Cossart et al. Nature 2003
The role of spontaneous activity in visual coding
Part 4
Spontaneous activity in the awake brain(Fiser et al. Nature 2004)
• Awake, head-restrained ferretsV
Visual stimulus
• 3 age groups (P30, P45, P90)
• Multi-electrode recording from V1
• Three interleaved trial types:
• complete dark (spontaneous act.)• random noise stimuli
• natural scene movie
Goal: Compare neural activity under the three different conditions
The MATRIX
Spontaneous activity in the awake brain
Results:
(Fiser et al. Nature 2004)
• Minimal difference between spontaneous and evoked activities
• A systematic change in the correlational structure with age
Framework
Not perceptual learning in the classic sense, but represen-tational learning of the structure of the visual world through detecting "suspicious coincidences" a'la Barlow
Not higher cognition or concept learning, but visual perception
– Classical perceptual learning
Supervised learningReinforcement learningSignal-detection problem
– Representational learning
Unsupervised learning
Combinatorial problem
A plausible method for infants and animals, too
Experiments with pairs
Inventory:
Test: 2-AFC, Base-pair vs. Non-base pair
Non-base element in either novel or familiar grid position
A
B
E
F
I J
A
B
Base-pair Non-base pairs
IF
Novel element pos.
I
Familiar element pos.
F
Pair inventory
Subjects automatically extracted both position-dependent and position-independent statistics from the scenes
Novel position Familiar position
Elements of the non-base pairs
** **
Frequency-balanced inventory
Frequent base-pairs Frequency balanced nonbase-pairs
App
eara
nce
freq
.
Base pairs
App
eara
nce
freq
.
Base pairs
Rare Frequent Freq. balanced nonbase-pairs
SCENE
Frequency-balanced results
At the same time, they were also sensitive to single element appearance frequency
Subjects automatically preferred the rare but more predictable pairs
Embedded inventory
Method:
Two base-pairs
Two base-quadruples
Testing embedded and not embedded base-pairs against non-base pairs
Sample scene Six elements per scene
How do we learn hierarchical structures?
Embedded results
Chance
• significant preference for base quads and non-embedded pairs
• no significant preference for embedded pairs
*
1st round
2nd round
Results with balanced appearance of each element
p < 0.007
Experiment 1
Results with frequency-balanced non-base pairs
Pairs: p < 0.012
Singles: p = ns.
Experiment 2
Experiment 3
Model performances I
Basic experiments with pairs
Frequency-counting models fail
Model performances II
Embedded experiment with triplets
All three of the naïve statistical models fail
AB
C
DE
F
GH
I
JK
L
AB
C
GH
I
Sample scene
Summary
Correct prediction without any parameter tweaking
The capacity of Working Memory
Working memory
What is the effect of long-term memory on working memory?
External
world
Visual stimulus
Long-term memory
Part 3
Now we have a tool to investigate this!
Vogel & Luck, Nature (1997) Olsson & Poom, PNAS (2005)
Issues with the capacity of WM
WM is defined as a limited-capacity memory buffer
Capacity of WM is traditionally given as the maximum number of ‘chunks’ stored in WM
There is a controversy around the capacity of WM: 4 or 2 or 1?
Encoding signal
Idea: Long-term memory is used to compress data in WM
aa aa bb dd bb aa cc
bb aa aa
Likely stimulusEncoding signal:
DL
Unlikely stimulusEncoding signal:
Working memory
External
world
Visual stimulu
s
Long-term memoryPredictive
distribution
Learning
LTM dependent capacityDescription
length
Source coding theorem
Experimental paradigm
Learning predictive distribution
Familiarization 2AFC task
?vs.
Assessing learning efficiency
Change detection task
Compression -description length
Key idea: modulating DL by building from different chunks
Overall performance of humans
2AFC task
vs.vs. vs.vs.vs.vs.
True inventory CD task
Naïve All GL BL
Overall performance of humans: Effect of set size
Learning decouples set size and description length
Controlling for set sizeControlling for DL
Performance change of trained subjects relative to naive subjects
Low-DL trialsIntermediate-DL trialsHigh-DL trials
Conditioning eliminates most of set size dependence
Controlling for set sizeControlling for DL
The role of spontaneous activity in visual coding
•Probabilistic representation of data and uncertainty
Part 4
Can we reconcile these issue?
An abstract computational model of representational learning
•Use of a generative model of the world
A partial model of visual coding in the brain with issues
•Ongoing activity, trial-by-trial response variability
•Context dependency, attention, illusions
A Bayesian generative framework for the visual system
Generative modeling hypothesis: the visual cortex implements an approximate inference in a generative model of the visual input.
…
…
pixels (y)
Causes: objects (x)
p(y | x)
p(x) p(x | y)∝ p(y | x)∗p(x)
y
Representation = probability distributions
Recognition = inference
Sampling hypothesis: neural activity represents Monte Carlo samples from the probability distribution over possible causes p(x | y)
Learning
Top-downeffects
Confirmation
Similarity measured by Kullback-Leibler divergence between the distributions of spontaneous and evoked activity
Results:
KL pMovie P pSpont( ) = pMovie x( )
X∑ ∗log
pMovie x( )pSpont x( )
The probabilistic model
Familiarization/learning:
Testing/calculating predictive probabilities
Computational frameworkSigmoid Belief Network
Graphical model:
Likelihood:
Model posterior:
Choice probability:
Computational frameworkAssociative learner
Graphical model:
Likelihood:
Model posterior:
Choice probability:
Embedded pairs
Bayesian model comparison
possible data setspossible data sets
experienced data setexperienced data set
Compare evidences for inventories:
Automatic Occam’s razor
Overall performance of humans: Effect of set size
Fitting 2-AFC:
How about temporal correlations?
• Adults can rapidly learn to segment (and group) sequences of shapes based solely on statistical information (joint and/or transitional probabilities)
(Fiser & Aslin JEP LM&C 2002)
(Fiser, McCrink & Aslin, in preparation)
• Infants can do the same