topics in brain computer interfaces cs295-7

Frank Wood - CS295-7 2005 Brown University

Topics in Brain Computer InterfacesTopics in Brain Computer InterfacesCS295CS295--77

Professor: MICHAEL BLACK

TA: FRANK WOOD

Spring 2005

Bayesian Inference through Particle Filtering


Homework Review

• Results?• Questions?• Causal vs. Generative?


Decoding Methods

Direct decoding methods:

,...),( 1−= kkk zzfx vvv

Simple linear regression method

dkkT

k

dkkT

k

Zfy

Zfx

−

−

=

=

:2

:1vv

vv


Decoding Methods

Direct decoding methods:

In contrast to generative encoding models:

Need a sound way to exploit generative models for decoding.

,...),( 1−= kkk zzfx vvv

)( kk xfz vv =


Today’s Strategy

• More mathematical than the previous classes.• Group exploration and discovery.• One topic with deeper level of understanding.

– Particle Filtering• Review and explore recursive Bayesian estimation• Introduce SIS algorithm• Explore Monte Carlo integration• Examine SIS algorithm (if time permits)


Accounting for UncertaintyEvery real process has process and measurement noise

noise)(f :0nobservatio += kk xz vv

yuncertaint)(f̂~| :0nobservatio +kkk xxz vvv

noise)(f 1:0process += −kk xx vv

A probabilistic process model accounts for process and measurement noise probabilistically. Noise appears as modeling uncertainty.

yuncertaint)(f̂~| 1:0process1 +−− kkk xxx vvv

Example: missile interceptor system. The missile propulsion system is noisy and radar observations are noisy. Even if we are given exact process

and observation models our estimate of the missile’s position may diverge if we don’t account for uncertainty.


Recursive Bayesian Estimation

• Optimally integrates subsequent observations into process and observation models.

• Example– Biased coin.– Fair Bernoulli prior

yuncertaint)(f̂~| :0tmeasuremen +kkk xxz vvv

yuncertaint)(f̂~| 1:0model1 +−− kkk xxx vvv


normalization constant (independent of mouth)

Prior (a priori –before the evidence)

Likelihood(evidence)

Bayesian Inference

Posterior

a posteriori probability (after the evidence)

)z()x()x|z()z| (

pppxp =

We infer system state from uncertain observations and our prior knowledge (model) of system state.


Notation and BCI Example

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

kn

k

k

k

z

zz

z

,

,2

,1

M

v

e.g. firing rates of all n cells at time k

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

ky

kx

ky

kx

k

k

k

aa

vvyx

x

,

,

,

,v

e.g. hand kinematics at time k

),,,(or 121:1 −= kkk zzzzZ vK

vvvv ),,,(or 21:1 kkk xxxxX vK

vvvv=

Observations System State


Generative Model

kkobsk qxfz vvv += )( :1

kkpk wxfx vvv += − )( 1:1

neural firing rate

state (e.g. hand position, velocity, acceleration)

noise (e.g. Normal or Poisson)

Encoding: linear, non-linear?

f()’s Markov?


Build a probabilistic model of this real process and with it estimate the posterior distribution so that we can infer the most likely state or the expected state

Today’s Goal

)z|x( :1 kkp

How ???

Recursion!)z|x( )z|x( :11:11 kkkk pp ⇒−−

• How can we formulate this recursion ?• How can we compute this recursion ?• What assumptions must we make ?

)|(argmax :1 kkx

zxpk

∫ )|( :1 kkk zxpx


Modeling

• An useful aside: Graphical models

Graphical models are a way of systematically diagramming the dependencies amongst groups of random variables. Graphical models can help elucidate assumptions and modeling choices that would otherwise be hard to visualize and understand.

Using a graphical model will help us design our model!


Graphical Model

kxv

kzv

)|( kk xzp vv

Generative model:


Graphical Model

kxv

kzv

1−kxv

1−kzv

1+kxv

1+kzv


Graphical Model

kxv

kzv

1−kxv

1−kzv

1+kxv

1+kzv

)|(),,,|( 1121 −−− = kkkkk xxpxxxxp K

)|(),|( 1:1 kkkkk xzpxzzp vvvvv =− )|()|( 11,1:1 −−− = kkkkk xxpxzxp vvvrv


Summary

From these modeling choices all we have to choose is:

Likelihood model

Temporal prior model

How to compute the posterior

)|( kk xzp vv

)|( 1−kk xxp vv

∫ −−−−= 11:111:1 )|()|()|()|( kkkkkkkkk xdzxpxxpxzpzxp vvvvvvvvv κ

E nc o

ding

mo d

e ld e

c od i

n g

,)( 0xp v )( 0zp rInitial distributions


⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

42

2

1

k

k

k

z

zz

M

firingratevector(zero mean,sqrt)

42 X 42 matrix

L

v

,2,1,0

) ,0(~=k

k QNq

42 X 6 matrix

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

k

k

k

k

k

k

y

x

y

x

aavvyx

systemstatevector (zero mean)

6 X 6 matrix

Linear Gaussian Generative Model

Observation Equation:

6 X 6 matrixSystem Equation:

kkk wxAx vvv +=+ 1L

v

,2,1,0

),0(~=k

k WNw

kkk qxHz vvv +=


Gaussian Assumption Clarified

Gaussian distribution:

) ,0(~) ,(~

QNqxHzQxHNz

kkk

kkvvv

vv

=−

Recall:

)/)(21exp(

21)( 22 σµσπ

−−= xxp


Graphical Model

∏∏==

−=

=M

kkk

M

kkk

MMMMM

ppp

ppp

1211 )](][)()([

)()(),(

xzxxx

X|ZXZX

kx1−kx 1+kxkkkk wxAx += −1

1−kz kz 1+kzkkkk qxHz +=


Break!

• When we come back quickly arrange yourselves in groups of 4 or 5. Uniformly distribute the applied math people and people who have taken CS143 (Vision) into these groups.

• Instead of straight-up lecture we are going to work through some derivations together to improve retention and facilitate understanding.

• If you don’t have pencil and paper please get some.


Next step.

• Now we have a model – how do we do recursive Bayesian inference.

• I will present to you several relatively easy problems which I expect each group to solve in 5-10 minutes. When every group is finished I will select one group and ask for the person in that group who understood the problem the least to explain the solution to the class. The group is responsible for nominating this person and his or her ability to explain the solution.


normalization constant (independent of kinematics)

Prior (a priori –before the evidence)

Likelihood(evidence)

Recursive Bayesian Inference

Posterior

a posteriori probability (after the evidence)

)firing()kinematics()kinematics|firing()firing| kinematics(

t:1

tttt:1t p

ppp =

We sequentially infer hand kinematics from uncertain evidence and our prior knowledge of how hands move.


Recursive Bayesian Estimation

• Update Stage– From the prediction stage you have a prior

distribution over the system state at the current time k. After observing the process at time k you can update the posterior to reflect that new information.

• Prediction Stage– Given the posterior from a previous update stage

and your system model you produce the next prior distribution.


Update Stage

)zx( :1 kkp

)zz()zx()x(

1:1

1:1

−

−=kk

kkkk

ppzp

CancelsCancels

)z()z,x(

:1

:1

k

kk

pp

= Bayes RuleBayes Rule

)z()zz()z,x(),x(

1:11:1

1:11:1

−−

−−=kkk

kkkkk

pppzzp Bayes Rule AgainBayes Rule Again

)z()zz()z()zx()x(

1:11:1

1:11:1

−−

−−=kkk

kkkkk

ppppzpIndependenceIndependence

New ObservationNew Observation PriorPrior

PosteriorPosterior


Prediction Stage

)zxp( 1:1 −kk

11:111 x)zxp()xxp( −−−− ∂= ∫ kkkkk

ACACABCB δ)|Pr(),|Pr()|Pr( ∫=

11:111:11 x)zxp()z,xxp( −−−−− ∂= ∫ kkkkkk

Law of Total Probability

Law of Total Probability

IndependenceIndependencePosteriorPosterior

PriorPrior


Phew!

• Let’s drill this into our heads and actually run the Bayesian recursion to see how it starts and behaves.


The Bayesian Recursion( ) ( ) ( ) ( )100 P,P model theand P and Pgiven −kkkk xxxzzx rrrrrr

( ) ( ) ( )( )0

00000 P

P|P|Pz

xxzzx r

rrrrr

=

( ) ( ) ( ) 0000101 |P|P|P xzxxxzx rrrrrrr∂= ∫

( ) ( ) ( )( )01

0111101 |P

|P|P,|Pzz

zxxzzzx rr

rrrrrrr

=

( ) ( ) ( ) 110112102 ,|P|P,|P xzzxxxzzx rrrrrrrr∂= ∫

( ) ( ) ( )( )102

102222102 ,|P

,|P|P,,|Pzzz

zzxxzzzzx rrr

rrrrrrrrr

=M

Run the Bayesian recursion to depth 2.


Highlighting the Assumptions

)zx( :1 kkp)zzx( 1:1 kkk ,p −=

11:111 x)zx()xx()xz( −−−−∫∝ kkkkkkk dppp

)zx()zxz( 1:11:1 −−∝ kkkkk p,p

Bayes rule:)(/)()|()|( apbpabpbap =

Independence assumption:)x|x()z,x|x( 11:11 −−− = kkkkk pp

Independence assumption:)|(),|( 1 kkkkk pp xzZxz =−

)zx()xz( 1:1 −∝ kkkk pp

Law of Total Probability:

∫= dbcbpcbapcap )|(),|()|(

11:111:11 x)zx()z,xx()xz( −−−−−∫∝ kkkkkkkk δppp

What’s missing?


Bayesian Formulation

11011:0 pppp −−−−∫= k:kkkkkkkk )dxzx()xx()xz(κ )zx(

):|x(z kkp likelihood

temporal prior

posterior probability at previous time step

normalizing term

):|x(x kk 1p −

):|z(x :kk 111p −−

:κ


General Model

• can be an arbitrary, non-Gaussian, multi-modal distribution.

• The recursive equation may have no explicit solution, but can usually be approximated numerically using Monte Carlo techniques such as particle filtering.

• However, if both the likelihood and prior are linear Gaussian, then the recursive equation has a closed form solution. This model, which we’ll see next week, is known as the Kalman filter. (Kalman, 1960)

)z|x( :0 kkp


Particle Filtering

• “A technique for implementing a recursive Bayesian filter by Monte Carlo simulations” Arulampalam et. al.

• A set of samples (particles) and weights that represent the posterior distribution (a Random Measure) is maintained throughout the algorithm.

• It boils down to sampling, density representation by samples, and Monte Carlo integration.


Sampling• Uniform

– rand() Linear Congruential Generator• x(n) = a * x(n-1) + b mod M

0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185

• Normal– randn() Box-Mueller

• x1,x2 ~ U(0,1) -> y1,y2 ~N(0,1)– y1 = sqrt( - 2 ln(x1) ) cos( 2 pi x2 ) – y2 = sqrt( - 2 ln(x1) ) sin( 2 pi x2 )

• Binomial(p)– if(rand()<p)

• Higher Dimensional Distributions– Metropolis Hastings / Gibbs


Distribution Representation by Samples

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.2

0.41/(2π)-1/2 e-x2/2

-5 -4 -3 -2 -1 0 1 2 3 4 50

1

2Hist. 10 Smplsµ = .0013var = .8162

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10Hist. 100 Smplsµ = .0012var = .7805

-5 -4 -3 -2 -1 0 1 2 3 4 50

20

40

60Hist. 1000 Smplsµ = -.043var = .9547


Law of Large Numbers• If we have n fair samples drawn from a distribution P

• Then one version of the Law of Large Numbers says that the emperical average of a the value of a function over the samples converges to the expected value of the function as the number of samples grows.

[ ])(E)(11

Xfxfn Pn

n

ii ∞→=

→∑

)(~:1 XPx n


Why Does this Matter

• Particle Filtering represents distributions by samples.

• We need to either maximize or take an expected value of the posterior (both functions) with the posterior represented by samples.

• We need to sample from distributions to simulate trajectories, etc.


Monte Carlo Integration

)(1)(P1

:0 k

N

ix

ikkkN xw

Nzx i

k∂=∂ ∑

=

δ

11:011 x)zxp()xxp( −−−− ∂= ∫ kkkkk)zxp( 1:0 −kk

111

11 x)(1)xxp(1

−−=

−− ∂∂= ∫ ∑−

kk

N

ix

ikkk xw

Nik

δ

1111

1 x)()xxp(11

−−−=

− ∂∂= ∫∑−

kkxkk

N

i

ik xw

Nik

δ

)xp(11

11

ikk

N

i

ik xw

N −=

−∑=

Given

Evaluate

WeightWeight Particle/SampleParticle/Sample


The Particle Filtering Story• A bit hand wavy

– Simplified from Arulampalam et al and DeFreitas et al– Largely overlooks the importance weights are maintained and updated– Doesn’t touch particle degeneration and replacement

• Posterior Representation by Samples• Importance Sampling• Weights in Importance Sampling• Sampling from the Prediction Distribution• Simple Particle Filter


Posterior Representation by Samples

We use a set of random samples from the posterior distribution to represent the posterior. The we can use sample statistics to approximate expectations over the posterior.

Problem:Problem: we need to update the samples such that they still accurately represent the posterior after the next observation.

Let be a set of fair samples from distribution , then for functions


Importance Sampling

Assume we have a weighted sample set

the prediction distribution becomes a linear mixture model

sample

and then sampling from the model proposal pdf

- Can sample from this mixture model by treating the weights as mixing probabilities

N

1

01

Cumulative distribution of weights

)p( 1:0 −kk zx )p(11

11

ikk

N

i

ik xxw

N −=

−∑=From Monte Carlo integration over

posterior at time k.

From Monte Carlo integration over

posterior at time k.

Specified as part of the model.

Specified as part of the model.

)xp( 1ikk x −

{ } NizxS ik

ikk <<= −−− 1for , )(

1)(11

Importance weights

Importance weights


Weights in Importance Sampling

Weighted samples Draw samples from a proposal distribution

weighted samples

i.e.

Find weights so that the linearly weighted sample statistics approximate expectations under the desired distribution


Updating the WeightsThe samples from the prediction distribution need to be re-weighted such that they still represent the posterior distribution well after a new observation:

)zxp( :1 kk )zzp()zxp()xp(

1:0

1:0

−

−=kk

kkkkzWe have a sample representation for

this.

We have a sample representation for

this.

∑ = −=N

ij

kj

kj

kk xwN

xzk1 1

1)l(

This is our model likelihood

This is our model likelihood

Goes away after re-normalization.

Goes away after re-normalization.

Major hand waving here! Inaccuracies abound!

jk

jkk

jk wxzkw 1)l( −=⇒


An algorithmic run-through

Simple particle filter:

draw samples from the prediction distribution

weights are proportional to the ratio of posterior and prediction distributions, i.e. the normalized likelihood

[Gordon et al ’93; Isard & Blake ’98; Liu & Chen ’98, …]

posterior posteriortemporaldynamics

likelihood

sample sample normalize


Particle FilterParticle Filter

Isard & Blake ‘96

Posterior )|( 11 −− kk zxp rr



Isard & Blake ‘96

Posterior )|( 1:11 −− kk zxp rr

samplesample



Isard & Blake ‘96

Temporal dynamics

samplesample

)|( 1−kk xxp

Posterior )( 1:11 −− kk zxp rr

samplesample



Isard & Blake ‘96

Temporal dynamics

samplesample

Likelihood )|( kk xzp rr

Posterior

samplesample

)|( 1:11 −− kk zxp rr

)|( 1−kk xxp



Isard & Blake ‘96

Temporal dynamics

samplesample

Posterior

Likelihood

normalizenormalize

Posterior

samplesample

)|( 1:11 −− kk zxp rr

)|( :1 kk zxp rr

)|( 1−kk xxp

)|( kk xzp rr


Sampling

• Many sampling steps in particle filtering.– Samping from a Gaussian.– Sampling from a multinomial.– Sampling from a weighted mixture model.

• More general sampling techniques that we may get to later.– Metropolis-Hastings– Gibbs


Sampling from a Gaussian

QLLT =

[ ]Tnτττττ ...,, 210=r

),(~ QNL µµτ rrr+

Given )( QN ,µ

Show

And explain how this fact can be used to sample from a Gaussian.

A Gaussian distributionA Gaussian distribution

A Cholesky decomposition of the covariance matrix.

A Cholesky decomposition of the covariance matrix.

A random vector where each element is normal with zero mean and unit

variance.

A random vector where each element is normal with zero mean and unit

variance.

[ ] [ ] 0EE == ττ LL

[ ]TT LL ττ rrE=

[ ] TT LL ττ rrE=

Q=

[ ]τrLVar


Sampling from a Multinomial

Given a weighted sample set

sample

N

1

01

Cumulative distribution of weights

}...1);,{( )()( NiwS ii == x


),(~ QxHz kkvv Ν

likelihood

observation model

∫ −−−−− = 11111 )|()|()|( kkkkkkk xdZxpxxpZxp vvvvvvv

prior

),ˆ( 11 −− kk PxN),(~ 1 WxAx kk −vv N

system model

)|()|()|( 1−= kkkkkk ZxpxzpZxpvvvvvv κ

BAYESIAN INFERENCE

Infer (decode) behavior from firing.p(behavior at k | firing up to k) =

Kalman Filter


),(~ 1 tttt WxAx −vv N

Kalman FilterLikelihood

Temporal prior

Posterior is also Gaussian

)|( tjt xzp vv−

)|( 1−tt xxp vv

∫ −−−−= 1111 )|()|()|()|( ttttttttt xdZxpxxpxzpZxp vvvvvvvvv κ

),(~ tttt QxHz vv Ν

Kalman filter.

Real-time, recursive, decoding.

observation model:

system model:


11 and ˆ of estimate Initial k-k Px −

Time UpdateMeasurement Update

Welch and Bishop 2002

KALMAN FILTER ALGORITHM

Prior estimateError covariance

Posterior estimate

Kalman gainError covariance

WAAPP

xAx

Tkk

kk

+=

=

−−

−−

1

1ˆˆ

1)(

)(

)ˆ(ˆˆ

−−−

−

−−

+=

−=

−+=

QHHPHPK

PHKIP

xHzKxx

Tk

Tkk

kkk

kkkkkv


Factored Sampling

Weighted samples

weighted samples

∑=

=

N

i

itt

ntt

p

pntw

1

)(

)(

)|(

)|()(

xI

xINormalized likelihood:

}...1);,{( )()( NiwS ii == x

topics in brain computer interfaces cs295-7

Documents