using particle filters to fit state-space models of wildlife population dynamics

Using particle filters to fit state-space models of

wildlife population dynamics

Len Thomas, Stephen Buckland, Ken Newman, John Harwood

University of St Andrews

16th September 2004

I always wanted to be a model

Outline 1. Introduction / background / recap

– State space models

– Motivating example – grey seal model

– Methods of fitting the models

– Types of inference from time series data 2. Tutorial: nonlinear particle filtering / sequential importance

sampling

– importance sampling

– sequential importance sampling (SIS)

– tricks 3. Results and discussion

– Potential of particle filtering in ecology and epidemiology

References

This talk: ftp://bat.mcs.st-and.ac.uk/pub/len/emstalk.zip

See also talk by David Elston, tomorrow

Books:

– Doucet A., N. de Freitas and N. Gordon (eds). 2001.

Sequential Monte Carlo Methods in Practice. Springer-

Verlag.

– Lui, J.S. 2001. Monte Carlo Strategies in Scientific

Computing. Springer-Verlag.

Papers from St. Andrews group – see abstract

1. Introduction

State-space models

Describe the evolution of two parallel time series

– nt – vector of true states (unobserved)

– yt – vector of observations on the states

t=1 ... T

State process density gt(nt|nt-1 ; Θ)

Observation process density ft(yt|nt ; Θ)

Initial state density g0(n0 ; Θ)

Don’t need to be equally spaced

Can be at the level of individual

Hidden process models

An extension of state-space models

– state process can be > 1st order Markov

– observation process can depend on previous states and observations

Example: British Grey Seal

British grey seal breeding colonies

Surveying seals Hard to survey outside of

breeding season: 80% of time at sea, 90% of this time underwater

Aerial surveys of breeding colonies since 1960s used to estimate pup production

(Other data: intensive studies, radio tracking, genetic, photo-ID, counts at haul-outs)

~6% per year overall increase in pup production

Estimated pup production

Year

Pu

p c

ou

nt

1960 1970 1980 1990 2000

05

00

01

00

00

15

00

0

orkney

Year

Pu

p c

ou

nt

1960 1970 1980 1990 2000

05

00

01

00

00

15

00

0

outer hebrides

Year

Pu

p c

ou

nt

1960 1970 1980 1990 2000

05

00

01

00

00

15

00

0

inner hebrides

Year

Pu

p c

ou

nt

1960 1970 1980 1990 2000

05

00

01

00

00

15

00

0

north sea

Orkney example colonies

Time

1960 1970 1980 1990 2000

05

00

15

00

Faraholm

Time

1960 1970 1980 1990 2000

05

00

15

00

Faray

Time

1960 1970 1980 1990 2000

05

00

15

00

Copinsay

Time

1960 1970 1980 1990 2000

02

00

40

06

00

Calf.of.Eday

Time

1960 1970 1980 1990 2000

40

06

00

80

0

Muckle.Greenhol

Time

1960 1970 1980 1990 2000

20

04

00

60

08

00

Little.Linga

Time

1960 1970 1980 1990 2000

01

02

03

04

0

Wartholm

Time

1960 1970 1980 1990 2000

04

08

01

20

Point.of.Spurne

Questions

What is the current population size? What is the future population trajectory? Biological interest in population processes

– E.g., movement of recruiting females (What sort of data should be collected to most

efficiently answer the above questions?)

?

Grey seal model: states

7 age classes

– pups (n0)

– age 1 – age 5 females (n1-n5)

– age 6+ females (n6+) = breeders

48 colonies – aggregated into 4 regions

British grey seal breeding colonies

State process

time step 1 year, which starts just after the breeding season 4 sub-processes

– survival

– age incrementation and sexing

– movement of recruiting females

– breeding

us,a,r,tna,r,t-1 ui,a,r,t um,a,r,t na,r,t

breedingmovementagesurvival

Survival

density independent adult survival

density dependent pup survival

where

atrtrs

atrtrs

nBinu

nBinu

,~

,~

1,,6,,6,

1,,1,,1,

trptctrs nBinu ,,1,,0,,0, ,~

1,,0

max,, 1

tcr

ptrp n

Age incrementation and sexing

ui,1,r,t ~Bin (us,0,r,t , 0.5)

ui,a+1,r,t = us,a,r,t a=1-4

ui,6+,r,t = us,5,r,t + us,6+,r,t

Movement of recruiting females

only age 5 females move movement is fitness dependent

– females move if expected offspring survival is higher elsewhere expected proportion moving proportional to

– difference in juvenile survival rates

– inverse of distance between colonies

– inverse of site faithfulness

tctctritrmtrm ppuMultuu ,4,1,,5,,4,5,,1,5, ,,,~,,

ri

d

ri

p

irdist

trptipdd

sf

tic :exp

0,max:

,

,,,,,

where

Breeding

density independent

ub,0,r,t ~ Bin (um,6+,r,t , α)

Matrix representation

E(nt|nt-1, Θ) ≈ B Mt A St nt-1

Matrix representation

E(nt|nt-1, Θ) ≈ Lt nt-1

Life cycle graph

tp ,1,5.0 pup 1 2 3 4 5 6+

a a a a

a

density dependence

a

Life cycle graph

ta ,11tp ,1,5.0 pup 1 2 3 4 5 6+North Sea

a a a a

a

pup 1 2 3 4 5 6+Inner Hebrides

pup 1 2 3 4 5 6+Outer Hebrides

pup 1 2 3 4 5 6+Orkneys

ta ,21

ta ,31

ta ,41

density dependent

movement depends on • distance• density dependence• site faithfulness

Observation process

pup production estimates normally distributed,

with variance proportional to expectation

)( 20

200 ,r,t,r,t,r,t n , n ~ N y

Grey seal model: summary

Parameters:

– survival : φa, φpmax, β1 ,..., β4

– breeding: α

– movement: γdd, γdist, γsf

observation CV: ψ

– total 7 + 4 = 11 States:

– 7 age classes per region per year

– total 7 x 4 = 28 per year

Fitting state-space models

Analytic approaches– Kalman filter (Gaussian linear model) Takis Besbeas– Extended Kalman filter (Gaussian nonlinear model –

approximate)– Numerical maximization of the likelihood

Monte Carlo approximations– Likelihood-based (Geyer 1996)– Bayesian Carmen Fernández

Rejection sampling Damien Clancy Markov chain Monte Carlo (MCMC) Philip O’Neill Monte Carlo particle filtering Me!

Inferences tasks for time series data

Observe data y1:t = (y1,...,yt)

We wish to infer the unobserved states n1:t = (n1,...,nt) and parameters Θ

Fundamental inference tasks:

– Smoothing p(n1:t, Θ| y1:t)

– Prediction p(nt+x| y1:t)

– Filtering p(nt, Θt| y1:t)

Filtering inference can be fast!

Suppose we have p(nt|y1:t)

A new observation yt+1 arrives. We want to update to p(nt+1|y1:t+1) .

Can use the filtering recursion:

)|(

)|()|()|(

:11

111:111:11

tt

ttttttt p

fpp

yy

nyynyn

)|(

)|( )|()|(

:11

11111:1

tt

tttttttt

p

fdgp

yy

nynnnyn

Monte-Carlo particle filters:online inference for evolving datasets

Particle filtering used when fast online methods required to produce updated (filtered) estimates as new data arrives:

– Tracking applications in radar, sonar, etc.

– Finance Stock prices, exchange rates arrive sequentially. Online

update of portfolios.

– Medical monitoring Online monitoring of ECG data for sick patients

– Digital communications

– Speech recognition and processing

2. Monte Carlo Particle Filtering

Variants/Synonyms:Sequential Importance Sampling (SIS)

Sampling Importance Sampling Resampling (SISR)Bootstrap Filter

Interacting Particle Filter Auxiliary Particle Filter

Importance sampling

Want to make inferences about some function p(), but cannot

evaluate it directly

Solution:

– Sample from another function q() (the importance function)

which has the same support as p()

– Correct using importance weights ()() qpw

Importance sampling algorithm

Want to make inferences about p(nt+1|y1:t+1)

Prediction step:Make K random draws from importance function

Correction step:Calculate:

Normalize weights so that Approximate the target function:

Kiqn it ,...,1(),~)(

1

()

)|( 1:1)(

)(1

1

q

ynpw t

ii

tt

K

i

itw

1

)(1 1

K

i

itt

ittt nnwynp

1

)(11

)(11:11 )()|(

Sequential importance sampling

SIS is just repeated application of importance sampling at each

time step

Basic sequential importance sampling:

– Proposal distribution q() = g(nt+1|nt)

– Leads to weights

To do basic SIS, need to be able to:

– Simulate forward from the state process

– Evaluate the observation process density (the likelihood)

)|( )(11

)()(1

itt

it

it nyfww

Basic SIS algorithm

Generate K “particles” from the prior on {n0, Θ} and with

weights 1/K:

For each time period t=1,...,T

– For each particle i=1,...,K

Prediction step:

Correction step:

Kiwn it

ii ,...,1 },,{ )()(0

)(0

)|(~ 1)(1 tt

it nngn

)|( )(11

)()(1

itt

it

it nyfww

Example of basic SIS

State-space model of exponential population growth

– State model

– Observation model

– Priors

)(~1 tt nPoisn

)15.0,(~ 21

211 ttt nnNy

)14(~0 Poisn

)1.0,08.1( 2N

Example of basic SISt=1

Obs: 12

0.0280.0120.2010.0730.0380.0290.0290.0000.0000.012

Predict Correct

1112141316162014916

Sample from prior

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

n0 Θ0 w0

0.10.10.10.10.10.10.10.10.10.1

0.10.10.10.10.10.10.10.10.10.1

171811152017177622

Prior at t=1

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

n1 Θ0 w0

171811152017177622

Posterior at t=1

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003

n1 Θ1 w1gives f()

Example of basic SISt=2

Obs: 14gives f()

0.1600.1900.1120.0080.0460.1600.0110.0000.0460.007

Predict Correct

171811152017177622

Posterior at t=1

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

n1 Θ1 w!

0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003

0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003

1514121011152191120

Prior at t=2

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

n2 Θ1 w1

1514121011152191120

Posterior at t=2

1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958

0.1050.0680.6910.0150.0050.1050.0070.0000.0000.000

n2 Θ2 w2

Problem: particle depletion Variance of weights increases with time, until few particles have

almost all the weight

Results in large Monte Carlo error in approximation

Can quantify:

From previous example:

K

i

itt

ittt nnwynp

1

)(11

)(11:11 )()|(

2)(1size sample effective

twCV

K

Time 0 1 2

ESS 10.0 2.5 1.8

Problem: particle depletion

Worse when:

– Observation error is small

– Lots of data at any one time point

– State process has little stochasticity

– Priors are diffuse or not congruent with observations

– State process model incorrect (e.g., time varying)

– Outliers in the data

Particle depletion: solutions

Pruning: throw out “bad” particles (rejection)

Enrichment: boost “good” particles (resampling)

– Directed enrichment (auxiliary particle filter)

– Mutation (kernel smoothing)

Other stuff

– Better proposals

– Better resampling schemes

– …

Rejection control

Idea: throw out particles with low weights Basic algorithm, at time t:

– Have a pre-determined threshold, ct, where 0 < ct <=1

– For i = 1, … , K, accept particle i with probability

– If particle is accepted, update weight to

– Now we have fewer than K samples Can make up samples by sampling from the priors,

projecting forward to the current time point and repeating the rejection control

t

iti

c

wr

)()( ,1min

),max( )(*)(t

it

it cww

Rejection control - discussion

Particularly useful at t=1 with diffuse priors

Can have a sequence of control points (not necessarily every

year)

Check points don’t need to be fixed – can trigger when variance

of weights gets too high

Thresholds, ct, don’t need to be set in advance but can be set

adaptively (e.g., mean of weights)

Instead of restarting at time t=0, can restart by sampling from

particles at previous check point (= partial rejection control)

Resampling: pruning and enrichment

Idea: allow “good” particles to amplify themselves while killing off “bad” particles

Algorithm. Before and/or after each time step (not necessarily every time step)

– For j = 1, … , K

Sample independently from the set of particles according to the probabilities

Assign new weights

Reduces particle depletion of states as “children” particles with the same “parent” now evolve independently

},,{ )()()( jt

jt

jt wn

Kiwn it

it

it ,...,1},,,{ )()()(

)()1( ,..., Ktt aa

)()(*)( i

ti

tj

t aww

Resample probabilities

Should be related to the weights

(as in the bootstrap filter)

– α could vary according to the variance of weights

– α = ½ has been suggested

related to “future trend” – as in auxiliary particle filter

)()( it

it wa

10 where)()( i

ti

t wa

)(ita

Directed resampling: auxiliary particle filter

Idea: Pre-select particles likely to have high weights in future

Example algorithm.

– For j = 1, … , K

Sample independently from the set of particles according to the probabilities

Predict:

Correct:

If “future” observations are available can extend to look >1 time step ahead – e.g., protein folding application

)|)|(( 1)(

1)()(

ti

tti

ti

t ynnEfwa

},,{ )()()( jt

jt

jt wn

Kiwn it

it

it ,...,1},,,{ )()()(

)|(~ )(1

)(1

jtt

jt nngn

)(

)(11)(

1)|(

jt

jttj

ta

nyfw

Kernel smoothing: enrichment of parameters through mutation

Idea: Introduce small “mutations” into parameter values when resampling

Algorithm:

– Given particles

– Let Vt be the variance matrix of the

– For i = 1, … , K Sample where h controls the

size of the perturbations

– Variance of parameters is now (1+h2)Vt, so need shrinkage to preserve 1st 2 moments

Kiwn it

it

it ,...,1},,,{ )()()(

s)(it

),(N from 2)(*)(t

it

it h V

Kernel smoothing - discussion

Previous algorithm does not preserve the relationship between

parameters and states

– Leads to poor smoothing inference

– Possibly unreliable filtered inference?

– Pragmatically – use as small a value of h as possible

Extensions:

– Kernel smooth states as well as parameters

– Local kernel smoothing

Other “tricks” Better proposals:

– Rao Blackwellization – the Gibbs sampling equivalent

– Importance sampling (e.g., from priors)

Better resampling:

– Residual resamling

– Stratified resampling

Alternative “mutation” algorithms:

– MCMC within SIS

Gradual focussing on posterior:

– Tempering/anneling

…

3. Results and discussionI’ll make

you fit into my

model!!!

Example: Grey seal model

Used pup production estimates for 1984-2002

Informative priors on parameters; 1984 data used with sampled

parameters to provide priors for states

Algorithm:

– Auxiliary particle filter with kernel smoothing (h=0.9) and

rejection control (c=99th %ile) at the first time step

– 400,000 particles

– Took ~6hrs to run on a PC (in SPlus)

Posterior parameter estimates

0.93 0.95 0.97

01

02

03

04

0

phi.adult 0.966

0.6 0.7 0.8 0.9

01

23

45

phi.juv.max 0.734

0.92 0.96

01

02

03

0

alpha 0.973

0.06 0.07 0.08 0.09

01

03

05

0

psi 0.07

2 4 6 8 10 14

0.0

0.1

00

.20

0.3

0

gamma.dd 3.32

0.5 1.5 2.5

0.0

0.4

0.8

gamma.dist 0.792

0.2 0.6 1.0 1.4

0.0

1.0

2.0

gamma.sf 0.355

0.0006 0.0010 0.0014

05

00

15

00

beta.ns 0.000906

0.0008 0.0014 0.0020

05

00

10

00

15

00

beta.ih 0.00127

0.0002 0.0004

02

00

04

00

06

00

0

beta.oh 0.000304

0.00010 0.00020 0.00030

04

00

08

00

0beta.ork 0.000183

Coplots

Sa

-1 0 1 2 3 -3.5 -2.5 -2 0 2 4 -9 -7 -9 -7

23

45

-11

3

Sj.Max

a

24

6

-3.5

-2.0

psi

move.1.dd

04

-22

move.2.dist

move.3.site

02

46

-9-7 b.ns

b.ih

-8.5

-6.0

-9-7

b.oh

2 3 4 5 2 4 6 0 2 4 6 0 2 4 6 -8.5 -6.5 -10.5 -8.5 -10.

5-8

.0

b.ork

Smoothed pup production

Year

Pup

s

1985 1990 1995 2000

1500

3500

North Sea

Year

Pup

s

1985 1990 1995 2000

1500

3000

Inner Hebrides

Year

Pup

s

1985 1990 1995 2000

8000

1200

0

Outer Hebrides

Year

Pup

s

1985 1990 1995 2000

6000

1600

0

Orkneys

Predicted adults

Year

Adu

lts

2004 2008 2012

9000

1300

0North Sea

Year

Adu

lts

2004 2008 2012

7000

1000

0

Inner Hebrides

Year

Adu

lts

2004 2008 2012

2500

040

000

Outer Hebrides

Year

Adu

lts

2004 2008 2012

4000

060

000

Orkneys

Model selection and multi-model inference

Can put priors on alternative models, and then sample from the models to initialize the particles

Proportion of particles with each model gives posterior model probabilities

Can also penalize for more parameters

ModelLnL AIC Akaike

weight

Outer Hebrides = Western Isles

Production, 1δ -625.0 1258.0 0.02

Production, 3δs -624.1 1260.2 0.01

Staff, 1δ -624.5 1257.1 0.03

Staff, 3δs -623.8 1259.5 0.01

Outer Hebrides = Western Isles + Northwest

Production, 1δ -621.5 1250.9 0.67

Production, 3δs -621.9 1255.9 0.06

Staff, 1δ -622.9 1253.8 0.16

Staff, 3δs -622.9 1257.9 0.02

Discussion: application inecology and epidemiology

An alternative to MCMC?

– Debatable when there is plenty of time for fitting and main emphasis is on smoothing inference.

– Best suited to situations where fast filtered estimates are required – e.g.: foot and mouth outbreak? N. American West coast salmon harvest openings?

Disadvantages:

– Methods less well developed than for MCMC?

– No general software (no WinBUPF)

Current / future research

Efficient general algorithms (and software)

Comparison with MCMC and Kalman filter

Incorporating other types of data (e.g., mark-recapture)

Parallelization

Multi-model inference

Diagnostics

Other applications!

Just another particle

using particle filters to fit state-space models of wildlife population dynamics

Documents

space models state process

bin us

femalesonly age

expectation grey seal

population processese

expected offspring survival

pup productionother

tstate process densitygtntnt