józsef fiser

Exploring a sampling-based framework for probabilistic representation and computation

in the cortex

József Fiser

Brandeis University

Pietro Berkes Máté Lengyel, Gergő Orbán Brandeis University University of

Cambridge

Outline

Proposal: the sampling hypothesis

Challenges of the standard model of vision in the cortex

Empirical evidence explained

How viable the sampling model is

Challenges

Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity

Behavioral: In many cases, animals' and humans' perception, action and learning is best captured by models that assume probabilistic representation and computation in the brain

Coding: High trial-to-trial variability of cell responses even in primary sensory cortices that strongly interferes with reliable neural computation. Systematic variations of neural response variance is not explained by current theories.

The sampling hypothesis

The cortex performs a probabilistic computation for inference and learning and to achieve this it uses a sampling-based neural representation in which spontaneous activity represents the prior information used for inference.

Probabilistic computation: Represents and computes with probability distributions in order to handle uncertainty. Sampling-based: Distributions are represented by samples. Spontaneous = prior: Spontaneous activity is not noise.

Logic of sampling-based representationLogic of sampling-based representation

variable 1

varia

ble

2

firing rate

variable 1

varia

ble

2neuron 1

neuron 2

trials / time

firing rateneuron 1

feature intensity1orientation1

orie

ntati

on2

feat

ure

inte

nsity

2

Functional role of spontaneous activity

Brightness coding in the primary visual cortex is factorized (Rossi et al. Science 1996)

Spontaneous activity = samples from the prior

Formalizing the relationship between SA and EA

Activity p(y | x) in absence of input stimuli (x) represent samples from the prior p(y) - and this is the spontaneous activity.

Spontaneous activity Evoked activity

Pθ (y) learning fromPdata (x)

⏐ →⏐ ⏐ ⏐ ⏐ Pθ (y|x)⋅Pdata(x) dx∫

PSA (y) ⏐ →⏐ ⏐ ⏐ ⟨PEA(y|x)⟩Pdata (x)

Natural scenes

Data collection• Awake, head-restrained ferrets

• 4 age groups (P30, P45, P90, P120)

• Multi-electrode recording with 200 µ spacing from layer 2-3 of V1

• Complete dark (spont. activity)

• Natural scene movie

• Drifting grating

• Random noise

Evidence of sampling based probabilistic representation in the awake cortex

Visual

Results I

SA and EA in the Movie condition converge with age

In the adult animal, they do not differ significantly

Results II

The match between SA and EA is specific to natural

scenes

Results III

The match between SA and EA is not specific to the

visual modality

FAQ

FHS

Limited number of samples must lead to bias in

learning

Why sampling just cannot be the answer

It must break down if the input is non-stationary

It is not clear how the neural circuitry can do it

It requires way too much time to collect sufficient number of samples

Well, let’s see…Well, let’s see…

How can sampling be inplemented in neural networks?

Gibbs sampling

Hamiltonian Monte Carlo – Langevin sampling:

Quite naturally

A simple model of cue combination based on sampling

Task: given auditory and visual information where is the target?

Hamiltonian Monte Carlo – Langevin sampling:

How many samples are needed for accurate estimation?

With 1 sample, the variance of the estimate is just twice the optimal (ML) variance

Sampling might be optimized by the cortex via minimizing the autocorrelation of successive samples:

Surprisingly few

What happens with non-static input?

As long as the internal model is correct, the model will have no problem tracking the signal

Why is this happening?

Things are actually getting better not worse

No need for burn-in, plus tighter prior

Can we learn with such a small number of sampels?

Learning tuning curve positions and variances

Results

Yes.....

Limited number of samples must lead to bias in

learning

Sampling can be the answer

It must break down if the input is non-stationary

It is not clear how the neural circuitry can do it

It requires way too much time to collect sufficient number of samples

✔✔✔✔

Experimental results explained

Simulations

Maximum likelihood learning using Expectation Maximization

Trained on 80 000 patches of size 8x8 from the van Hateren image database

u – membrane pot.

ri - Poisson rate ni - spike count

Gain [u-Thres]1.2

Receptive fields

Specific questions

Contrast dependence of variability

Can we reproduce our KL results?

Stimulus specificity of variability

Onset effects of variability

Stimulus specificity of co-variability

Does the model agree on characteristics of variability measured in neural activity?

Sanity check

Sampling reproduces the KL divergence results

Contrast dependent variability of u

Sampling gives a natural link between contrast and variance

u1

u2

P(z

|x)

Blank

High

High Blank

Posterior

Stimulus dependent variability of u and the effect of stimulus onset

Sampling can reproduce both the stimulus independence of variability-change and the quenching effect of stimulus onset

SIMULATIONS

time (ms)

Nonpreferred

Preferred

EXPERIMENT

(Yu & Ferster 2010 Neuron)

Stimulus dependent co-variability of u

Sampling produces a reduced co-variability of the membrane potential at stimulus presentation just as it was found in vivo

SIMULATION

Stimulus dependent co-variability of u

Sampling can reproduce these correlational effects, too

EXPERIMENT

(Yu & Ferster 2010 Neuron)

Similar orientation preference

Dissimilar orientation preference

SIMULATION

Conclusions

• Sampling based probabilistic schemes are viable alternatives for neural representation in the cortex providing a unified framework to investigate learning and instantaneous perception

• Measures based on comparing SA and EA support the idea that the cortex uses sampling based representation and SA has the functional role of providing prior information for probabilistic perceptual inferences

• The sampling based representations corroborate the idea that rather than using mostly first moments, cortical coding is strongly related to signal co-variances and higher order moments of the neural signal

Thank you !

The standard view on vision

The fundamental method of information processing in the visual system can be described as

• deterministic computation with added neural noise• feed-forward • based on mean firings of cells

The function of the primary visual cortex in this framework is

• restricted to represent only visual information • to create a faithful representation of the sensory

stimulus with a neural code that is as sparse and independent as possible

Challenges


Life is uncertain and ambiguous…

The brain encodes both the value and the uncertainty of the stimulus during perception.

Orientationof stimulus

Intensityof stimulus

s: stimulus 2D to 3D

Proposal

Probabilistic approaches in …

Cue combination (Atkins et al. Vis Res 2001; Ernst & Banks Nature 2002)

Classical conditioning (Courville et al. TICS 2006)

Perceptual processes (Kersten et al. Ann. Rev. Psych. 2006)

Visuo-motor coordination (Kording & Wolper Nature 2004)

High-level cognitive processes (Griffiths & Tenenbaum TICS 2006)

Visual statistical learning (Orban et al. PNAS 2008)

Decision making (Trommershäuser et al. TICS 2008)

Is the neural computation deterministic?Is the neural computation deterministic?

Computation

Evid

ence

Inference Learning

Behavioral

Atkins et al., 2001Ernst & Banks, 2002

Körding & Wolpert, 2004Weiss et al., 2002

Yang & Shadlen, 2007Kepecs et al., 2008

Dayan et al., 2000Tenenbaum et al.,

2006

Neural

Platt & Glimcher, 1999Yang & Shadlen, 2007

Kepecs, et al., 2008 ?

Orbán et al., 2008

Probabilistic computation in the brain

Challenges

Physiological: High level of structured spontaneous activity in the cortex that questions the feed forward nature of information processing and the role of ongoing activity


Spontaneous activity in the awake brainSpontaneous activity in the awake brain

Is it really noise?Is it really noise?

Churchland & >25; Nat Neuro 2010

Example: Stimulus onset quenches trial-to-trial variability

Why?Why?

The sampling hypothesis

Parametric Parametric Sampling-based Sampling-based

Probabilistic Population Codes (PPCs)


Predecessors: Kernel Dens. Estimation

Distr. Population Codes

Predecessors: Kernel Dens. Estimation

Distr. Population Codes Implicit examples:• Boltzmann machine• Helmholtz machine

Implicit examples:• Boltzmann machine• Helmholtz machine

Special cases:• Olshausen & Field 1996• Karklin & Lewicki 2009


Probabilistic representation in the cortexProbabilistic representation in the cortex

Earlier proposals:• Lee & Mumford 2003• Hoyer & Hyvarinen 2003

Earlier proposals:• Lee & Mumford 2003• Hoyer & Hyvarinen 2003

A simple model of the early visual system

Perception is an inference process which is always based on internally available prior information

e.g. Olshausen & Field 1996 e.g. Olshausen & Field 1996 Sparseness and independence Sparseness and independence

probabilistic formulation probabilistic formulation

Looking for supporting evidence

Predictions

Probability of exhibiting a given neural activity pattern should be identical

With accumulating visual experience, the distribution of spontaneous activity should become increasingly similar to the distribution of evoked activity averaged over natural stimuli .

Similarity should be specific to natural scenes but not to other stimulus ensembles

Probability of transitioning between particular patterns should also match

Data analysis

KL pMovie P pSpont( ) = pMovie x( )

X∑ ∗log

pMovie x( )pSpont x( )

Results II

Both SA and EAMovie become increasingly correlated

P~

(y) = P(yi )i=1

16

∏ Spatial correlations are similar between SA and

EAMovie

Results III

Temporal correlations are similar between SA and

EAMovie

State −transitiondistributions : P(yt+τ |yt)Control : P

~

(yt +τ |yt) =P(y)

Data collection• Awake, head-restrained ferrets

• One age group (adults)

• Single-electrode single-unit recording in A1

Evidence of sampling based probabilistic representation in the awake cortex

Auditory

• Complete silence (spont. activity)

• Natural speech

• White noise (0-20 KHz)

Shamma-lab University of Maryland

Stimulus dependence of spike count correlations

Sampling can reproduce these correlational effects, too

Evidence against sparseness and independence

• No support for active sparsification or increase of independence in the cortex

Relation between MP, mean, variance and Fano

MP variance decreases, both SC meand and variance increase, but the mean increases stronger therefore the Fano factor still decreases

Standard models of visual recognition

Neural bases:

(Hubel and Wiesel, 1962) (Ohki et al., 2006)

System:

Example: Cue combination

Atkins, Fiser & Jacobs VisRes 2001; Ernst & Banks Nature 2002

Is the neural computation deterministic?Is the neural computation deterministic?

Example 2: Decision making

Trommershäuser, Maloney & Landy TICS 2008

Probabilistic inference and learning are coupled

A fact

If instantaneous perception is a process of

probabilistic inference, learning also must be

probabilistic.

Probabilistic inference and learning

Probabilistic representation in the cortex: neural

Part 3

Computation

Evid

ence

Inference Learning

Behavioral

Atkins et al., 2001Ernst & Banks, 2002Körding & Wolpert,

2004Weiss et al., 2002

Yang & Shadlen, 2007Kepecs et al., 2008

Dayan et al., 2000Tenenbaum et al.,

2006

Neural

Platt & Glimcher, 1999Yang & Shadlen, 2007

Kepecs, et al., 2008 ?

Orbán et al., 2008

Alternative representational schemes

Parametric Sampling-based


Predecessor: Kernel Dens. Estim.

Implicit examples:• Boltzmann machine• Helmholtz machine


Practical issues in probabilisitic representation

Parametric (PPC) Sampling-based

s: stimulus

PPCs sampling-based

neurons correspond to parameters variables

network dynamics required (beyond the first layer)

deterministic stochastic (self consistent)

representable distributions

must correspond to a particular parametric form

can be arbitrary

critical factor in accuracy of encoding a distribution

number of neurons time allowed for sampling

instantaneous representation of uncertainty

complete, the whole distribution is represented at any time

partial, a sequence of samples is required

number of neurons needed for representing multimodal distributions

scales exponentially with the number of dimensions

scales linearly with the number of dimensions

implementation of learning

unknown well-suited

Comparing the two schemes

PPC Sampling-based

PPCPPC Sampling-basedSampling-basedLogic of the two types of representationLogic of the two types of representation

variable 1

varia

ble

2

variable 1

neuron 2

varia

ble

2

neuron 1 neuron 3 ...

firing rate

variable 1

varia

ble

2

neuron 1neuron 2

neuron 3 ...

variable 1

varia

ble

2

neuron 1

neuron 2

trials / time

firing rateneuron 1

feature intensity1orientation1

orie

ntati

on2

feat

ure

inte

nsity

2

variable 1

varia

ble

2

Bottom line

Sampling based probabilisitic representational

schemes are promising alternatives but there is

no neural evidence supporting the hypothesis

that the cortex uses such a scheme

In order to obtain such evidence we take a detour…

• Raichle-lab: humans, entire cortex cortex, fMRI

- Fox & Raichle Nature Rev Neu. 2007; Vincent et al. Nature 2007; Fox et al. Nat. Neu. 2005

• Grinvald-lab: cat, visual cortex, optical imaging

- Kenet et al. Nature 2004; Tsodyks et al. Science 1999; Arieli et al. Science 1996

• Buzsáki-lab: rat, hippocampus, tetrode recording

- Buzsáki & Draghun Science 2004; Harris et al. Nature 2003; Leinekugel et al. Science 2002;

• McCormick-lab: ferret, cat, rat slices

- Shu et al. Nature 2003; Shu et al. J. Neurosci 2003; McCormicki et al. Cereb. Cortex 2003

• Yuste & Ferster labs: in vivo in vitro single cell recording, 2-photon Ca imaging

- Ikegaya et al. Science 2004; Anderson et al. Nat. Neuro 2000; Cossart et al. Nature 2003

The role of spontaneous activity in visual coding

Part 4

Spontaneous activity in the awake brain(Fiser et al. Nature 2004)

• Awake, head-restrained ferretsV

Visual stimulus

• 3 age groups (P30, P45, P90)

• Multi-electrode recording from V1

• Three interleaved trial types:

• complete dark (spontaneous act.)• random noise stimuli

• natural scene movie

Goal: Compare neural activity under the three different conditions

The MATRIX

Spontaneous activity in the awake brain

Results:

(Fiser et al. Nature 2004)

• Minimal difference between spontaneous and evoked activities

• A systematic change in the correlational structure with age

Framework

Not perceptual learning in the classic sense, but represen-tational learning of the structure of the visual world through detecting "suspicious coincidences" a'la Barlow

Not higher cognition or concept learning, but visual perception

– Classical perceptual learning

Supervised learningReinforcement learningSignal-detection problem

– Representational learning

Unsupervised learning

Combinatorial problem

A plausible method for infants and animals, too

Experiments with pairs

Inventory:

Test: 2-AFC, Base-pair vs. Non-base pair

Non-base element in either novel or familiar grid position

A

B

E

F

I J

A

B

Base-pair Non-base pairs

IF

Novel element pos.

I

Familiar element pos.

F

Pair inventory

Subjects automatically extracted both position-dependent and position-independent statistics from the scenes

Novel position Familiar position

Elements of the non-base pairs

** **

Frequency-balanced inventory

Frequent base-pairs Frequency balanced nonbase-pairs

App

eara

nce

freq

.

Base pairs

App

eara

nce

freq

.

Base pairs

Rare Frequent Freq. balanced nonbase-pairs

SCENE

Frequency-balanced results

At the same time, they were also sensitive to single element appearance frequency

Subjects automatically preferred the rare but more predictable pairs

Embedded inventory

Method:

Two base-pairs

Two base-quadruples

Testing embedded and not embedded base-pairs against non-base pairs

Sample scene Six elements per scene

How do we learn hierarchical structures?

Embedded results

Chance

• significant preference for base quads and non-embedded pairs

• no significant preference for embedded pairs

*

1st round

2nd round

Results with balanced appearance of each element

p < 0.007

Experiment 1

Results with frequency-balanced non-base pairs

Pairs: p < 0.012

Singles: p = ns.

Experiment 2

Experiment 3

Model performances I

Basic experiments with pairs

Frequency-counting models fail

Model performances II

Embedded experiment with triplets

All three of the naïve statistical models fail

AB

C

DE

F

GH

I

JK

L

AB

C

GH

I

Sample scene

Summary

Correct prediction without any parameter tweaking

The capacity of Working Memory

Working memory

What is the effect of long-term memory on working memory?

External

world

Visual stimulus

Long-term memory

Part 3

Now we have a tool to investigate this!

Vogel & Luck, Nature (1997) Olsson & Poom, PNAS (2005)

Issues with the capacity of WM

WM is defined as a limited-capacity memory buffer

Capacity of WM is traditionally given as the maximum number of ‘chunks’ stored in WM

There is a controversy around the capacity of WM: 4 or 2 or 1?

Encoding signal

Idea: Long-term memory is used to compress data in WM

aa aa bb dd bb aa cc

bb aa aa

Likely stimulusEncoding signal:

DL

Unlikely stimulusEncoding signal:

Working memory

External

world

Visual stimulu

s

Long-term memoryPredictive

distribution

Learning

LTM dependent capacityDescription

length

Source coding theorem

Experimental paradigm

Learning predictive distribution

Familiarization 2AFC task

?vs.

Assessing learning efficiency

Change detection task

Compression -description length

Key idea: modulating DL by building from different chunks

Overall performance of humans

2AFC task

vs.vs. vs.vs.vs.vs.

True inventory CD task

Naïve All GL BL

Overall performance of humans: Effect of set size

Learning decouples set size and description length

Controlling for set sizeControlling for DL

Performance change of trained subjects relative to naive subjects

Low-DL trialsIntermediate-DL trialsHigh-DL trials

Conditioning eliminates most of set size dependence

Controlling for set sizeControlling for DL

The role of spontaneous activity in visual coding

•Probabilistic representation of data and uncertainty

Part 4

Can we reconcile these issue?

An abstract computational model of representational learning

•Use of a generative model of the world

A partial model of visual coding in the brain with issues

•Ongoing activity, trial-by-trial response variability

•Context dependency, attention, illusions

A Bayesian generative framework for the visual system

Generative modeling hypothesis: the visual cortex implements an approximate inference in a generative model of the visual input.

…

…

pixels (y)

Causes: objects (x)

p(y | x)

p(x) p(x | y)∝ p(y | x)∗p(x)

y

Representation = probability distributions

Recognition = inference

Sampling hypothesis: neural activity represents Monte Carlo samples from the probability distribution over possible causes p(x | y)

Learning

Top-downeffects

Confirmation

Similarity measured by Kullback-Leibler divergence between the distributions of spontaneous and evoked activity

Results:

KL pMovie P pSpont( ) = pMovie x( )

X∑ ∗log

pMovie x( )pSpont x( )

The probabilistic model

Familiarization/learning:

Testing/calculating predictive probabilities

Computational frameworkSigmoid Belief Network

Graphical model:

Likelihood:

Model posterior:

Choice probability:

Computational frameworkAssociative learner

Graphical model:

Likelihood:

Model posterior:

Choice probability:

Embedded pairs

Bayesian model comparison

possible data setspossible data sets

experienced data setexperienced data set

Compare evidences for inventories:

Automatic Occam’s razor

Overall performance of humans: Effect of set size

Fitting 2-AFC:

How about temporal correlations?

• Adults can rapidly learn to segment (and group) sequences of shapes based solely on statistical information (joint and/or transitional probabilities)

(Fiser & Aslin JEP LM&C 2002)

(Fiser, McCrink & Aslin, in preparation)

• Infants can do the same

józsef fiser

Documents