topics in brain computer interfaces cs295-7
TRANSCRIPT
Frank Wood - CS295-7 2005 Brown University
Topics in Brain Computer InterfacesTopics in Brain Computer InterfacesCS295CS295--77
Professor: MICHAEL BLACK
TA: FRANK WOOD
Spring 2005
Bayesian Inference through Particle Filtering
Frank Wood - CS295-7 2005 Brown University
Homework Review
• Results?• Questions?• Causal vs. Generative?
Frank Wood - CS295-7 2005 Brown University
Decoding Methods
Direct decoding methods:
,...),( 1−= kkk zzfx vvv
Simple linear regression method
dkkT
k
dkkT
k
Zfy
Zfx
−
−
=
=
:2
:1vv
vv
Frank Wood - CS295-7 2005 Brown University
Decoding Methods
Direct decoding methods:
In contrast to generative encoding models:
Need a sound way to exploit generative models for decoding.
,...),( 1−= kkk zzfx vvv
)( kk xfz vv =
Frank Wood - CS295-7 2005 Brown University
Today’s Strategy
• More mathematical than the previous classes.• Group exploration and discovery.• One topic with deeper level of understanding.
– Particle Filtering• Review and explore recursive Bayesian estimation• Introduce SIS algorithm• Explore Monte Carlo integration• Examine SIS algorithm (if time permits)
Frank Wood - CS295-7 2005 Brown University
Accounting for UncertaintyEvery real process has process and measurement noise
noise)(f :0nobservatio += kk xz vv
yuncertaint)(f̂~| :0nobservatio +kkk xxz vvv
noise)(f 1:0process += −kk xx vv
A probabilistic process model accounts for process and measurement noise probabilistically. Noise appears as modeling uncertainty.
yuncertaint)(f̂~| 1:0process1 +−− kkk xxx vvv
Example: missile interceptor system. The missile propulsion system is noisy and radar observations are noisy. Even if we are given exact process
and observation models our estimate of the missile’s position may diverge if we don’t account for uncertainty.
Frank Wood - CS295-7 2005 Brown University
Recursive Bayesian Estimation
• Optimally integrates subsequent observations into process and observation models.
• Example– Biased coin.– Fair Bernoulli prior
yuncertaint)(f̂~| :0tmeasuremen +kkk xxz vvv
yuncertaint)(f̂~| 1:0model1 +−− kkk xxx vvv
Frank Wood - CS295-7 2005 Brown University
normalization constant (independent of mouth)
Prior (a priori –before the evidence)
Likelihood(evidence)
Bayesian Inference
Posterior
a posteriori probability (after the evidence)
)z()x()x|z()z| (
pppxp =
We infer system state from uncertain observations and our prior knowledge (model) of system state.
Frank Wood - CS295-7 2005 Brown University
Notation and BCI Example
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
kn
k
k
k
z
zz
z
,
,2
,1
M
v
e.g. firing rates of all n cells at time k
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
ky
kx
ky
kx
k
k
k
aa
vvyx
x
,
,
,
,v
e.g. hand kinematics at time k
),,,(or 121:1 −= kkk zzzzZ vK
vvvv ),,,(or 21:1 kkk xxxxX vK
vvvv=
Observations System State
Frank Wood - CS295-7 2005 Brown University
Generative Model
kkobsk qxfz vvv += )( :1
kkpk wxfx vvv += − )( 1:1
neural firing rate
state (e.g. hand position, velocity, acceleration)
noise (e.g. Normal or Poisson)
Encoding: linear, non-linear?
f()’s Markov?
Frank Wood - CS295-7 2005 Brown University
Build a probabilistic model of this real process and with it estimate the posterior distribution so that we can infer the most likely state or the expected state
Today’s Goal
)z|x( :1 kkp
How ???
Recursion!)z|x( )z|x( :11:11 kkkk pp ⇒−−
• How can we formulate this recursion ?• How can we compute this recursion ?• What assumptions must we make ?
)|(argmax :1 kkx
zxpk
∫ )|( :1 kkk zxpx
Frank Wood - CS295-7 2005 Brown University
Modeling
• An useful aside: Graphical models
Graphical models are a way of systematically diagramming the dependencies amongst groups of random variables. Graphical models can help elucidate assumptions and modeling choices that would otherwise be hard to visualize and understand.
Using a graphical model will help us design our model!
Frank Wood - CS295-7 2005 Brown University
Graphical Model
kxv
kzv
)|( kk xzp vv
Generative model:
Frank Wood - CS295-7 2005 Brown University
Graphical Model
kxv
kzv
1−kxv
1−kzv
1+kxv
1+kzv
Frank Wood - CS295-7 2005 Brown University
Graphical Model
kxv
kzv
1−kxv
1−kzv
1+kxv
1+kzv
)|(),,,|( 1121 −−− = kkkkk xxpxxxxp K
)|(),|( 1:1 kkkkk xzpxzzp vvvvv =− )|()|( 11,1:1 −−− = kkkkk xxpxzxp vvvrv
Frank Wood - CS295-7 2005 Brown University
Summary
From these modeling choices all we have to choose is:
Likelihood model
Temporal prior model
How to compute the posterior
)|( kk xzp vv
)|( 1−kk xxp vv
∫ −−−−= 11:111:1 )|()|()|()|( kkkkkkkkk xdzxpxxpxzpzxp vvvvvvvvv κ
E nc o
ding
mo d
e ld e
c od i
n g
,)( 0xp v )( 0zp rInitial distributions
Frank Wood - CS295-7 2005 Brown University
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
42
2
1
k
k
k
z
zz
M
firingratevector(zero mean,sqrt)
42 X 42 matrix
L
v
,2,1,0
) ,0(~=k
k QNq
42 X 6 matrix
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
k
k
k
k
k
k
y
x
y
x
aavvyx
systemstatevector (zero mean)
6 X 6 matrix
Linear Gaussian Generative Model
Observation Equation:
6 X 6 matrixSystem Equation:
kkk wxAx vvv +=+ 1L
v
,2,1,0
),0(~=k
k WNw
kkk qxHz vvv +=
Frank Wood - CS295-7 2005 Brown University
Gaussian Assumption Clarified
Gaussian distribution:
) ,0(~) ,(~
QNqxHzQxHNz
kkk
kkvvv
vv
=−
Recall:
)/)(21exp(
21)( 22 σµσπ
−−= xxp
Frank Wood - CS295-7 2005 Brown University
Graphical Model
∏∏==
−=
=M
kkk
M
kkk
MMMMM
ppp
ppp
1211 )](][)()([
)()(),(
xzxxx
X|ZXZX
kx1−kx 1+kxkkkk wxAx += −1
1−kz kz 1+kzkkkk qxHz +=
Frank Wood - CS295-7 2005 Brown University
Break!
• When we come back quickly arrange yourselves in groups of 4 or 5. Uniformly distribute the applied math people and people who have taken CS143 (Vision) into these groups.
• Instead of straight-up lecture we are going to work through some derivations together to improve retention and facilitate understanding.
• If you don’t have pencil and paper please get some.
Frank Wood - CS295-7 2005 Brown University
Next step.
• Now we have a model – how do we do recursive Bayesian inference.
• I will present to you several relatively easy problems which I expect each group to solve in 5-10 minutes. When every group is finished I will select one group and ask for the person in that group who understood the problem the least to explain the solution to the class. The group is responsible for nominating this person and his or her ability to explain the solution.
Frank Wood - CS295-7 2005 Brown University
normalization constant (independent of kinematics)
Prior (a priori –before the evidence)
Likelihood(evidence)
Recursive Bayesian Inference
Posterior
a posteriori probability (after the evidence)
)firing()kinematics()kinematics|firing()firing| kinematics(
t:1
tttt:1t p
ppp =
We sequentially infer hand kinematics from uncertain evidence and our prior knowledge of how hands move.
Frank Wood - CS295-7 2005 Brown University
Recursive Bayesian Estimation
• Update Stage– From the prediction stage you have a prior
distribution over the system state at the current time k. After observing the process at time k you can update the posterior to reflect that new information.
• Prediction Stage– Given the posterior from a previous update stage
and your system model you produce the next prior distribution.
Frank Wood - CS295-7 2005 Brown University
Update Stage
)zx( :1 kkp
)zz()zx()x(
1:1
1:1
−
−=kk
kkkk
ppzp
CancelsCancels
)z()z,x(
:1
:1
k
kk
pp
= Bayes RuleBayes Rule
)z()zz()z,x(),x(
1:11:1
1:11:1
−−
−−=kkk
kkkkk
pppzzp Bayes Rule AgainBayes Rule Again
)z()zz()z()zx()x(
1:11:1
1:11:1
−−
−−=kkk
kkkkk
ppppzpIndependenceIndependence
New ObservationNew Observation PriorPrior
PosteriorPosterior
Frank Wood - CS295-7 2005 Brown University
Prediction Stage
)zxp( 1:1 −kk
11:111 x)zxp()xxp( −−−− ∂= ∫ kkkkk
ACACABCB δ)|Pr(),|Pr()|Pr( ∫=
11:111:11 x)zxp()z,xxp( −−−−− ∂= ∫ kkkkkk
Law of Total Probability
Law of Total Probability
IndependenceIndependencePosteriorPosterior
PriorPrior
Frank Wood - CS295-7 2005 Brown University
Phew!
• Let’s drill this into our heads and actually run the Bayesian recursion to see how it starts and behaves.
Frank Wood - CS295-7 2005 Brown University
The Bayesian Recursion( ) ( ) ( ) ( )100 P,P model theand P and Pgiven −kkkk xxxzzx rrrrrr
( ) ( ) ( )( )0
00000 P
P|P|Pz
xxzzx r
rrrrr
=
( ) ( ) ( ) 0000101 |P|P|P xzxxxzx rrrrrrr∂= ∫
( ) ( ) ( )( )01
0111101 |P
|P|P,|Pzz
zxxzzzx rr
rrrrrrr
=
( ) ( ) ( ) 110112102 ,|P|P,|P xzzxxxzzx rrrrrrrr∂= ∫
( ) ( ) ( )( )102
102222102 ,|P
,|P|P,,|Pzzz
zzxxzzzzx rrr
rrrrrrrrr
=M
Run the Bayesian recursion to depth 2.
Frank Wood - CS295-7 2005 Brown University
Highlighting the Assumptions
)zx( :1 kkp)zzx( 1:1 kkk ,p −=
11:111 x)zx()xx()xz( −−−−∫∝ kkkkkkk dppp
)zx()zxz( 1:11:1 −−∝ kkkkk p,p
Bayes rule:)(/)()|()|( apbpabpbap =
Independence assumption:)x|x()z,x|x( 11:11 −−− = kkkkk pp
Independence assumption:)|(),|( 1 kkkkk pp xzZxz =−
)zx()xz( 1:1 −∝ kkkk pp
Law of Total Probability:
∫= dbcbpcbapcap )|(),|()|(
11:111:11 x)zx()z,xx()xz( −−−−−∫∝ kkkkkkkk δppp
What’s missing?
Frank Wood - CS295-7 2005 Brown University
Bayesian Formulation
11011:0 pppp −−−−∫= k:kkkkkkkk )dxzx()xx()xz(κ )zx(
):|x(z kkp likelihood
temporal prior
posterior probability at previous time step
normalizing term
):|x(x kk 1p −
):|z(x :kk 111p −−
:κ
Frank Wood - CS295-7 2005 Brown University
General Model
• can be an arbitrary, non-Gaussian, multi-modal distribution.
• The recursive equation may have no explicit solution, but can usually be approximated numerically using Monte Carlo techniques such as particle filtering.
• However, if both the likelihood and prior are linear Gaussian, then the recursive equation has a closed form solution. This model, which we’ll see next week, is known as the Kalman filter. (Kalman, 1960)
)z|x( :0 kkp
Frank Wood - CS295-7 2005 Brown University
Particle Filtering
• “A technique for implementing a recursive Bayesian filter by Monte Carlo simulations” Arulampalam et. al.
• A set of samples (particles) and weights that represent the posterior distribution (a Random Measure) is maintained throughout the algorithm.
• It boils down to sampling, density representation by samples, and Monte Carlo integration.
Frank Wood - CS295-7 2005 Brown University
Sampling• Uniform
– rand() Linear Congruential Generator• x(n) = a * x(n-1) + b mod M
0.2311 0.6068 0.4860 0.8913 0.7621 0.4565 0.0185
• Normal– randn() Box-Mueller
• x1,x2 ~ U(0,1) -> y1,y2 ~N(0,1)– y1 = sqrt( - 2 ln(x1) ) cos( 2 pi x2 ) – y2 = sqrt( - 2 ln(x1) ) sin( 2 pi x2 )
• Binomial(p)– if(rand()<p)
• Higher Dimensional Distributions– Metropolis Hastings / Gibbs
Frank Wood - CS295-7 2005 Brown University
Distribution Representation by Samples
-5 -4 -3 -2 -1 0 1 2 3 4 50
0.2
0.41/(2π)-1/2 e-x2/2
-5 -4 -3 -2 -1 0 1 2 3 4 50
1
2Hist. 10 Smplsµ = .0013var = .8162
-5 -4 -3 -2 -1 0 1 2 3 4 50
5
10Hist. 100 Smplsµ = .0012var = .7805
-5 -4 -3 -2 -1 0 1 2 3 4 50
20
40
60Hist. 1000 Smplsµ = -.043var = .9547
Frank Wood - CS295-7 2005 Brown University
Law of Large Numbers• If we have n fair samples drawn from a distribution P
• Then one version of the Law of Large Numbers says that the emperical average of a the value of a function over the samples converges to the expected value of the function as the number of samples grows.
[ ])(E)(11
Xfxfn Pn
n
ii ∞→=
→∑
)(~:1 XPx n
Frank Wood - CS295-7 2005 Brown University
Why Does this Matter
• Particle Filtering represents distributions by samples.
• We need to either maximize or take an expected value of the posterior (both functions) with the posterior represented by samples.
• We need to sample from distributions to simulate trajectories, etc.
Frank Wood - CS295-7 2005 Brown University
Monte Carlo Integration
)(1)(P1
:0 k
N
ix
ikkkN xw
Nzx i
k∂=∂ ∑
=
δ
11:011 x)zxp()xxp( −−−− ∂= ∫ kkkkk)zxp( 1:0 −kk
111
11 x)(1)xxp(1
−−=
−− ∂∂= ∫ ∑−
kk
N
ix
ikkk xw
Nik
δ
1111
1 x)()xxp(11
−−−=
− ∂∂= ∫∑−
kkxkk
N
i
ik xw
Nik
δ
)xp(11
11
ikk
N
i
ik xw
N −=
−∑=
Given
Evaluate
WeightWeight Particle/SampleParticle/Sample
Frank Wood - CS295-7 2005 Brown University
The Particle Filtering Story• A bit hand wavy
– Simplified from Arulampalam et al and DeFreitas et al– Largely overlooks the importance weights are maintained and updated– Doesn’t touch particle degeneration and replacement
• Posterior Representation by Samples• Importance Sampling• Weights in Importance Sampling• Sampling from the Prediction Distribution• Simple Particle Filter
Frank Wood - CS295-7 2005 Brown University
Posterior Representation by Samples
We use a set of random samples from the posterior distribution to represent the posterior. The we can use sample statistics to approximate expectations over the posterior.
Problem:Problem: we need to update the samples such that they still accurately represent the posterior after the next observation.
Let be a set of fair samples from distribution , then for functions
Frank Wood - CS295-7 2005 Brown University
Importance Sampling
Assume we have a weighted sample set
the prediction distribution becomes a linear mixture model
sample
and then sampling from the model proposal pdf
- Can sample from this mixture model by treating the weights as mixing probabilities
N
1
01
Cumulative distribution of weights
)p( 1:0 −kk zx )p(11
11
ikk
N
i
ik xxw
N −=
−∑=From Monte Carlo integration over
posterior at time k.
From Monte Carlo integration over
posterior at time k.
Specified as part of the model.
Specified as part of the model.
)xp( 1ikk x −
{ } NizxS ik
ikk <<= −−− 1for , )(
1)(11
Importance weights
Importance weights
Frank Wood - CS295-7 2005 Brown University
Weights in Importance Sampling
Weighted samples Draw samples from a proposal distribution
weighted samples
i.e.
Find weights so that the linearly weighted sample statistics approximate expectations under the desired distribution
Frank Wood - CS295-7 2005 Brown University
Updating the WeightsThe samples from the prediction distribution need to be re-weighted such that they still represent the posterior distribution well after a new observation:
)zxp( :1 kk )zzp()zxp()xp(
1:0
1:0
−
−=kk
kkkkzWe have a sample representation for
this.
We have a sample representation for
this.
∑ = −=N
ij
kj
kj
kk xwN
xzk1 1
1)l(
This is our model likelihood
This is our model likelihood
Goes away after re-normalization.
Goes away after re-normalization.
Major hand waving here! Inaccuracies abound!
jk
jkk
jk wxzkw 1)l( −=⇒
Frank Wood - CS295-7 2005 Brown University
An algorithmic run-through
Simple particle filter:
draw samples from the prediction distribution
weights are proportional to the ratio of posterior and prediction distributions, i.e. the normalized likelihood
[Gordon et al ’93; Isard & Blake ’98; Liu & Chen ’98, …]
posterior posteriortemporaldynamics
likelihood
sample sample normalize
Frank Wood - CS295-7 2005 Brown University
Particle FilterParticle Filter
Isard & Blake ‘96
Posterior )|( 11 −− kk zxp rr
Frank Wood - CS295-7 2005 Brown University
Particle FilterParticle Filter
Isard & Blake ‘96
Posterior )|( 1:11 −− kk zxp rr
samplesample
Frank Wood - CS295-7 2005 Brown University
Particle FilterParticle Filter
Isard & Blake ‘96
Temporal dynamics
samplesample
)|( 1−kk xxp
Posterior )( 1:11 −− kk zxp rr
samplesample
Frank Wood - CS295-7 2005 Brown University
Particle FilterParticle Filter
Isard & Blake ‘96
Temporal dynamics
samplesample
Likelihood )|( kk xzp rr
Posterior
samplesample
)|( 1:11 −− kk zxp rr
)|( 1−kk xxp
Frank Wood - CS295-7 2005 Brown University
Particle FilterParticle Filter
Isard & Blake ‘96
Temporal dynamics
samplesample
Posterior
Likelihood
normalizenormalize
Posterior
samplesample
)|( 1:11 −− kk zxp rr
)|( :1 kk zxp rr
)|( 1−kk xxp
)|( kk xzp rr
Frank Wood - CS295-7 2005 Brown University
Sampling
• Many sampling steps in particle filtering.– Samping from a Gaussian.– Sampling from a multinomial.– Sampling from a weighted mixture model.
• More general sampling techniques that we may get to later.– Metropolis-Hastings– Gibbs
Frank Wood - CS295-7 2005 Brown University
Sampling from a Gaussian
QLLT =
[ ]Tnτττττ ...,, 210=r
),(~ QNL µµτ rrr+
Given )( QN ,µ
Show
And explain how this fact can be used to sample from a Gaussian.
A Gaussian distributionA Gaussian distribution
A Cholesky decomposition of the covariance matrix.
A Cholesky decomposition of the covariance matrix.
A random vector where each element is normal with zero mean and unit
variance.
A random vector where each element is normal with zero mean and unit
variance.
[ ] [ ] 0EE == ττ LL
[ ]TT LL ττ rrE=
[ ] TT LL ττ rrE=
Q=
[ ]τrLVar
Frank Wood - CS295-7 2005 Brown University
Sampling from a Multinomial
Given a weighted sample set
sample
N
1
01
Cumulative distribution of weights
}...1);,{( )()( NiwS ii == x
Frank Wood - CS295-7 2005 Brown University
),(~ QxHz kkvv Ν
likelihood
observation model
∫ −−−−− = 11111 )|()|()|( kkkkkkk xdZxpxxpZxp vvvvvvv
prior
),ˆ( 11 −− kk PxN),(~ 1 WxAx kk −vv N
system model
)|()|()|( 1−= kkkkkk ZxpxzpZxpvvvvvv κ
BAYESIAN INFERENCE
Infer (decode) behavior from firing.p(behavior at k | firing up to k) =
Kalman Filter
Frank Wood - CS295-7 2005 Brown University
),(~ 1 tttt WxAx −vv N
Kalman FilterLikelihood
Temporal prior
Posterior is also Gaussian
)|( tjt xzp vv−
)|( 1−tt xxp vv
∫ −−−−= 1111 )|()|()|()|( ttttttttt xdZxpxxpxzpZxp vvvvvvvvv κ
),(~ tttt QxHz vv Ν
Kalman filter.
Real-time, recursive, decoding.
observation model:
system model:
Frank Wood - CS295-7 2005 Brown University
11 and ˆ of estimate Initial k-k Px −
Time UpdateMeasurement Update
Welch and Bishop 2002
KALMAN FILTER ALGORITHM
Prior estimateError covariance
Posterior estimate
Kalman gainError covariance
WAAPP
xAx
Tkk
kk
+=
=
−−
−−
1
1ˆˆ
1)(
)(
)ˆ(ˆˆ
−−−
−
−−
+=
−=
−+=
QHHPHPK
PHKIP
xHzKxx
Tk
Tkk
kkk
kkkkkv
Frank Wood - CS295-7 2005 Brown University
Factored Sampling
Weighted samples
weighted samples
∑=
=
N
i
itt
ntt
p
pntw
1
)(
)(
)|(
)|()(
xI
xINormalized likelihood:
}...1);,{( )()( NiwS ii == x