particle filtering and smoothing particle filtering and smoothing methods arnaud doucet department...

Download Particle Filtering and Smoothing Particle Filtering and Smoothing Methods Arnaud Doucet Department of

Post on 09-May-2020

7 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Particle Filtering and Smoothing Methods

    Arnaud Doucet Department of Statistics, Oxford University

    University College London

    3rd October 2012

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 1 / 46

  • State-Space Models

    Let {Xt}t≥1 be a latent/hidden X -valued Markov process with

    X1 ∼ µ (·) and Xt | (Xt−1 = x) ∼ f ( ·| x) .

    Let {Yt}t≥1 be an Y-valued Markov observation process such that observations are conditionally independent given {Xt}t≥1 and

    Yt | (Xt = x) ∼ g ( ·| x) .

    General class of time series models aka Hidden Markov Models (HMM) including

    Xt = Ψ (Xt−1,Vt ) , Yt = Φ (Xt ,Wt )

    where Vt ,Wt are two sequences of i.i.d. random variables.

    Aim: Infer {Xt} given observations {Yt} on-line or off-line.

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 2 / 46

  • State-Space Models

    State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar).

    Finite State-space HMM: X is a finite space, i.e. {Xt} is a finite Markov chain

    Yt | (Xt = x) ∼ g ( ·| x) Linear Gaussian state-space model

    Xt = AXt−1 + BVt , Vt i.i.d.∼ N (0, I )

    Yt = CXt +DWt , Wt i.i.d.∼ N (0, I )

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 3 / 46

  • State-Space Models

    Stochastic Volatility model

    Xt = φXt−1 + σVt , Vt i.i.d.∼ N (0, 1)

    Yt = β exp (Xt/2)Wt , Wt i.i.d.∼ N (0, 1)

    Biochemical Network model Pr ( X 1t+dt=x

    1 t+1,X

    2 t+dt=x

    2 t

    ∣∣ x1t , x2t ) = α x1t dt + o (dt) , Pr ( X 1t+dt=x

    1 t−1,X 2t+dt=x2t+1

    ∣∣ x1t , x2t ) = β x1t x2t dt + o (dt) , Pr ( X 1t+dt=x

    1 t ,X

    2 t+dt=x

    2 t−1

    ∣∣ x1t , x2t ) = γ x2t dt + o (dt) , with

    Yk = X 1 k∆T +Wk with Wk

    i.i.d.∼ N ( 0, σ2

    ) .

    Nonlinear Diffusion model

    dXt = α (Xt ) dt + β (Xt ) dVt , Vt Brownian motion

    Yk = γ (Xk∆T ) +Wk , Wk i.i.d.∼ N

    ( 0, σ2

    ) .

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 4 / 46

  • Inference in State-Space Models

    Given observations y1:t := (y1, y2, . . . , yt ), inference about X1:t := (X1, ...,Xt ) relies on the posterior

    p (x1:t | y1:t ) = p (x1:t , y1:t ) p (y1:t )

    where

    p (x1:t , y1:t ) = µ (x1) t

    ∏ k=2

    f (xk | xk−1)︸ ︷︷ ︸ p(x1:t )

    t

    ∏ k=1

    g (yk | xk )︸ ︷︷ ︸ p( y1:t |x1:t )

    ,

    p (y1:t ) = ∫ · · ·

    ∫ p (x1:t , y1:t ) dx1:t

    When X is finite & linear Gaussian models, {p (xt | y1:t )}t≥1 can be computed exactly. For non-linear models, approximations are required: EKF, UKF, Gaussian sum filters, etc. Approximations of {p (xt | y1:t )}Tt=1 provide approximation of p (x1:T | y1:T ) .

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 5 / 46

  • Monte Carlo Methods Basics

    Assume you can generate X (i )1:t ∼ p (x1:t | y1:t ) where i = 1, ...,N then MC approximation is

    p̂ (x1:t | y1:t ) = 1 N

    N

    ∑ i=1

    δ X (i )1:t (x1:t )

    Integration is straightforward.∫ ϕt (x1:t ) p (x1:t | y1:t ) dx1:t ≈

    ∫ ϕt (x1:t ) p̂ (x1:t | y1:t ) dx1:t

    = 1N ∑ N i=1 ϕ

    ( X (i )1:t

    ) Marginalization is straightforward.

    p̂ (xk | y1:t ) = ∫ p̂ (x1:t | y1:t ) dx1:k−1dxk+1:t =

    1 N

    N

    ∑ i=1

    δ X (i )k (xk ) .

    Basic and “key” property: V [ 1 N ∑

    N i=1 ϕ

    ( X (i )1:t

    )] = C (t dim(X ))N , i.e.

    rate of convergence to zero is independent of t dim (X ). A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 6 / 46

  • Monte Carlo Methods

    Problem 1: We cannot typically generate exact samples from p (x1:t | y1:t ) for non-linear non-Gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x1:t | y1:t ) will have at least complexity O (t) . Particle Methods solves partially Problems 1 & 2 by breaking the problem of sampling from p (x1:t | y1:t ) into a collection of simpler subproblems. First approximate p (x1| y1) and p (y1) at time 1, then p (x1:2| y1:2) and p (y1:2) at time 2 and so on.

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 7 / 46

  • Bayesian Recursion on Path Space

    We have

    p (x1:t | y1:t ) = p (x1:t , y1:t ) p (y1:t )

    = g (yt | xt ) f (xt | xt−1)

    p (yt | y1:t−1) p (x1:t−1, y1:t−1) p (y1:t−1)

    = g (yt | xt )

    predictive p( x1:t |y1:t−1)︷ ︸︸ ︷ f (xt | xt−1) p (x1:t−1| y1:t−1) p (yt | y1:t−1)

    where

    p (yt | y1:t−1) = ∫ g (yt | xt ) p (x1:t | y1:t−1) dx1:t

    Prediction-Update formulation

    p (x1:t | y1:t−1) = f (xt | xt−1) p (x1:t−1| y1:t−1) ,

    p (x1:t | y1:t ) = g (yt | xt ) p (x1:t | y1:t−1)

    p (yt | y1:t−1) .

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 8 / 46

  • Monte Carlo Implementation of Prediction Step

    Assume you have at time t − 1

    p̂ (x1:t−1| y1:t−1) = 1 N

    N

    ∑ i=1

    δ X (i )1:t−1

    (x1:t−1) .

    By sampling X (i ) t ∼ f

    ( xt |X (i )t−1

    ) and setting X

    (i ) 1:t =

    ( X (i )1:t−1,X

    (i ) t

    ) then

    p (x1:t | y1:t−1) = 1 N

    N

    ∑ i=1

    δ X (i ) 1:t (x1:t ) .

    Sampling from f (xt | xt−1) is usually straightforward and is feasible even if f (xt | xt−1) does not admit any analytical expression; e.g. biochemical network models.

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 9 / 46

  • Importance Sampling Implementation of Updating Step

    Our target at time t is

    p (x1:t | y1:t ) = g (yt | xt ) p (x1:t | y1:t−1)

    p (yt | y1:t−1) so by substituting p (x1:t | y1:t−1) to p (x1:t | y1:t−1) we obtain

    p̂ (yt | y1:t−1) = ∫ g (yt | xt ) p (x1:t | y1:t−1) dx1:t

    = 1 N

    N

    ∑ i=1 g ( yt |X

    (i ) t

    ) .

    We now have

    p (x1:t | y1:t ) = g (yt | xt ) p (x1:t | y1:t−1)

    p̂ (yt | y1:t−1) =

    N

    ∑ i=1 W (i )t δX (i )1:t

    (x1:t ) .

    with W (i )t ∝ g ( yt |X

    (i ) t

    ) , ∑Ni=1W

    (i ) t = 1.

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 10 / 46

  • Multinomial Resampling

    We have a “weighted”approximation p (x1:t | y1:t ) of p (x1:t | y1:t )

    p (x1:t | y1:t ) = N

    ∑ i=1 W (i )t δX (i )1:t

    (x1:t ) .

    To obtain N samples X (i )1:t approximately from p (x1:t | y1:t ), resample N times with replacement

    X (i )1:t ∼ p (x1:t | y1:t )

    to obtain

    p̂ (x1:t | y1:t ) = 1 N

    N

    ∑ i=1

    δ X (i )1:t (x1:t ) =

    N

    ∑ i=1

    N (i )t N

    δ X (i ) 1:t (x1:t )

    where { N (i )t

    } follow a multinomial of param. N,

    { W (i )t

    } .

    This can be achieved in O (N). A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 11 / 46

  • Vanilla Particle Filter

    At time t = 1

    Sample X (i ) 1 ∼ µ (x1) then

    p (x1| y1) = N

    ∑ i=1 W (i )1 δX (i )1

    (x1) , W (i ) 1 ∝ g

    ( y1|X

    (i ) 1

    ) .

    Resample X (i )1 ∼ p (x1| y1) to obtain p̂ (x1| y1) = 1N ∑ N i=1 δX (i )1

    (x1) .

    At time t ≥ 2 Sample X

    (i ) t ∼ f

    ( xt |X (i )t−1

    ) , set X

    (i ) 1:t =

    ( X (i )1:t−1,X

    (i ) t

    ) and

    p (x1:t | y1:t ) = N

    ∑ i=1 W (i )t δX (i )1:t

    (x1:t ) , W (i ) t ∝ g

    ( yt |X

    (i ) t

    ) .

    Resample X (i )1:t ∼ p (x1:t | y1:t ) to obtain

    p̂ (x1:t | y1:t ) = 1 N

    N

    ∑ i=1

    δ X (i )1:t (x1:t ) .

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 12 / 46

  • Particle Estimates

    At time t, we get

    p̂ (x1:t | y1:t ) = 1 N

    N

    ∑ i=1

    δ X (i )1:t (x1:t ) .

    The marginal likelihood estimate is given by

    p̂ (y1:t ) = t

    ∏ k=1

    p̂ (yk | y1:k−1) = t

    ∏ k=1

    ( 1 N

    N

    ∑ i=1 g ( yk |X

    (i ) k

    )) .

    Computational complexity is O (N) at each time step and memory requirements O (tN) . If we are only interested in p (xt | y1:t ) or p ( st (x1:t )| y1:t ) where st (x1:t ) = Ψt (xt , st−1 (x1:t−1)) - e.g. st (x1:t ) = ∑tk=1 x2k - is fixed-dimensional then memory requirements O (N) .

    A. Doucet (UCL Masterclass Oct. 2012) 3rd October 2012 13 / 46

  • Some Convergence Results

    Numerous convergence results are available; see (Del Moral, 2004, Del Moral, D. & Singh, 2013). Let ϕt : X t → R and consider

    ϕt = ∫

    ϕt (x1:t ) p (x1:t | y1:t ) dx1:t ,

    ϕ̂t = ∫

    ϕt (x1:t ) p̂ (x1:t | y1:t ) dx1:t = 1 N

    N

    ∑ i=1

    ϕt

    ( X (i )1:t

    ) .

    We can prove that for any bounded function ϕ and any p ≥ 1

    E [|ϕ̂t − ϕt | p ] 1/p ≤ B (t) c (p) ‖ϕ‖∞√

    N ,

    lim N→∞

    √ N (ϕ̂t − ϕt ) ⇒ N

    ( 0, σ2

    t

    ) .

    Very weak results: For a path-dependent ϕt (x1:t ) , B (t) and σ 2 t

    typically

View more >