using particle filters to fit state-space models of wildlife population dynamics
DESCRIPTION
I always wanted to be a model. Using particle filters to fit state-space models of wildlife population dynamics. Len Thomas, Stephen Buckland, Ken Newman, John Harwood University of St Andrews 16 th September 2004. Outline. 1. Introduction / background / recap State space models - PowerPoint PPT PresentationTRANSCRIPT
Using particle filters to fit state-space models of
wildlife population dynamics
Len Thomas, Stephen Buckland, Ken Newman, John Harwood
University of St Andrews
16th September 2004
I always wanted to be a model
Outline 1. Introduction / background / recap
– State space models
– Motivating example – grey seal model
– Methods of fitting the models
– Types of inference from time series data 2. Tutorial: nonlinear particle filtering / sequential importance
sampling
– importance sampling
– sequential importance sampling (SIS)
– tricks 3. Results and discussion
– Potential of particle filtering in ecology and epidemiology
References
This talk: ftp://bat.mcs.st-and.ac.uk/pub/len/emstalk.zip
See also talk by David Elston, tomorrow
Books:
– Doucet A., N. de Freitas and N. Gordon (eds). 2001.
Sequential Monte Carlo Methods in Practice. Springer-
Verlag.
– Lui, J.S. 2001. Monte Carlo Strategies in Scientific
Computing. Springer-Verlag.
Papers from St. Andrews group – see abstract
1. Introduction
State-space models
Describe the evolution of two parallel time series
– nt – vector of true states (unobserved)
– yt – vector of observations on the states
t=1 ... T
State process density gt(nt|nt-1 ; Θ)
Observation process density ft(yt|nt ; Θ)
Initial state density g0(n0 ; Θ)
Don’t need to be equally spaced
Can be at the level of individual
Hidden process models
An extension of state-space models
– state process can be > 1st order Markov
– observation process can depend on previous states and observations
Example: British Grey Seal
British grey seal breeding colonies
Surveying seals Hard to survey outside of
breeding season: 80% of time at sea, 90% of this time underwater
Aerial surveys of breeding colonies since 1960s used to estimate pup production
(Other data: intensive studies, radio tracking, genetic, photo-ID, counts at haul-outs)
~6% per year overall increase in pup production
Estimated pup production
Year
Pu
p c
ou
nt
1960 1970 1980 1990 2000
05
00
01
00
00
15
00
0
orkney
Year
Pu
p c
ou
nt
1960 1970 1980 1990 2000
05
00
01
00
00
15
00
0
outer hebrides
Year
Pu
p c
ou
nt
1960 1970 1980 1990 2000
05
00
01
00
00
15
00
0
inner hebrides
Year
Pu
p c
ou
nt
1960 1970 1980 1990 2000
05
00
01
00
00
15
00
0
north sea
Orkney example colonies
Time
1960 1970 1980 1990 2000
05
00
15
00
Faraholm
Time
1960 1970 1980 1990 2000
05
00
15
00
Faray
Time
1960 1970 1980 1990 2000
05
00
15
00
Copinsay
Time
1960 1970 1980 1990 2000
02
00
40
06
00
Calf.of.Eday
Time
1960 1970 1980 1990 2000
40
06
00
80
0
Muckle.Greenhol
Time
1960 1970 1980 1990 2000
20
04
00
60
08
00
Little.Linga
Time
1960 1970 1980 1990 2000
01
02
03
04
0
Wartholm
Time
1960 1970 1980 1990 2000
04
08
01
20
Point.of.Spurne
Questions
What is the current population size? What is the future population trajectory? Biological interest in population processes
– E.g., movement of recruiting females (What sort of data should be collected to most
efficiently answer the above questions?)
?
Grey seal model: states
7 age classes
– pups (n0)
– age 1 – age 5 females (n1-n5)
– age 6+ females (n6+) = breeders
48 colonies – aggregated into 4 regions
British grey seal breeding colonies
State process
time step 1 year, which starts just after the breeding season 4 sub-processes
– survival
– age incrementation and sexing
– movement of recruiting females
– breeding
us,a,r,tna,r,t-1 ui,a,r,t um,a,r,t na,r,t
breedingmovementagesurvival
Survival
density independent adult survival
density dependent pup survival
where
atrtrs
atrtrs
nBinu
nBinu
,~
,~
1,,6,,6,
1,,1,,1,
trptctrs nBinu ,,1,,0,,0, ,~
1,,0
max,, 1
tcr
ptrp n
Age incrementation and sexing
ui,1,r,t ~Bin (us,0,r,t , 0.5)
ui,a+1,r,t = us,a,r,t a=1-4
ui,6+,r,t = us,5,r,t + us,6+,r,t
Movement of recruiting females
only age 5 females move movement is fitness dependent
– females move if expected offspring survival is higher elsewhere expected proportion moving proportional to
– difference in juvenile survival rates
– inverse of distance between colonies
– inverse of site faithfulness
tctctritrmtrm ppuMultuu ,4,1,,5,,4,5,,1,5, ,,,~,,
ri
d
ri
p
irdist
trptipdd
sf
tic :exp
0,max:
,
,,,,,
where
Breeding
density independent
ub,0,r,t ~ Bin (um,6+,r,t , α)
Matrix representation
E(nt|nt-1, Θ) ≈ B Mt A St nt-1
Matrix representation
E(nt|nt-1, Θ) ≈ Lt nt-1
Life cycle graph
tp ,1,5.0 pup 1 2 3 4 5 6+
a a a a
a
density dependence
a
Life cycle graph
ta ,11tp ,1,5.0 pup 1 2 3 4 5 6+North Sea
a a a a
a
pup 1 2 3 4 5 6+Inner Hebrides
pup 1 2 3 4 5 6+Outer Hebrides
pup 1 2 3 4 5 6+Orkneys
ta ,21
ta ,31
ta ,41
density dependent
movement depends on • distance• density dependence• site faithfulness
Observation process
pup production estimates normally distributed,
with variance proportional to expectation
)( 20
200 ,r,t,r,t,r,t n , n ~ N y
Grey seal model: summary
Parameters:
– survival : φa, φpmax, β1 ,..., β4
– breeding: α
– movement: γdd, γdist, γsf
observation CV: ψ
– total 7 + 4 = 11 States:
– 7 age classes per region per year
– total 7 x 4 = 28 per year
Fitting state-space models
Analytic approaches– Kalman filter (Gaussian linear model) Takis Besbeas– Extended Kalman filter (Gaussian nonlinear model –
approximate)– Numerical maximization of the likelihood
Monte Carlo approximations– Likelihood-based (Geyer 1996)– Bayesian Carmen Fernández
Rejection sampling Damien Clancy Markov chain Monte Carlo (MCMC) Philip O’Neill Monte Carlo particle filtering Me!
Inferences tasks for time series data
Observe data y1:t = (y1,...,yt)
We wish to infer the unobserved states n1:t = (n1,...,nt) and parameters Θ
Fundamental inference tasks:
– Smoothing p(n1:t, Θ| y1:t)
– Prediction p(nt+x| y1:t)
– Filtering p(nt, Θt| y1:t)
Filtering inference can be fast!
Suppose we have p(nt|y1:t)
A new observation yt+1 arrives. We want to update to p(nt+1|y1:t+1) .
Can use the filtering recursion:
)|(
)|()|()|(
:11
111:111:11
tt
ttttttt p
fpp
yy
nyynyn
)|(
)|( )|()|(
:11
11111:1
tt
tttttttt
p
fdgp
yy
nynnnyn
Monte-Carlo particle filters:online inference for evolving datasets
Particle filtering used when fast online methods required to produce updated (filtered) estimates as new data arrives:
– Tracking applications in radar, sonar, etc.
– Finance Stock prices, exchange rates arrive sequentially. Online
update of portfolios.
– Medical monitoring Online monitoring of ECG data for sick patients
– Digital communications
– Speech recognition and processing
2. Monte Carlo Particle Filtering
Variants/Synonyms:Sequential Importance Sampling (SIS)
Sampling Importance Sampling Resampling (SISR)Bootstrap Filter
Interacting Particle Filter Auxiliary Particle Filter
Importance sampling
Want to make inferences about some function p(), but cannot
evaluate it directly
Solution:
– Sample from another function q() (the importance function)
which has the same support as p()
– Correct using importance weights ()() qpw
Importance sampling algorithm
Want to make inferences about p(nt+1|y1:t+1)
Prediction step:Make K random draws from importance function
Correction step:Calculate:
Normalize weights so that Approximate the target function:
Kiqn it ,...,1(),~)(
1
()
)|( 1:1)(
)(1
1
q
ynpw t
ii
tt
K
i
itw
1
)(1 1
K
i
itt
ittt nnwynp
1
)(11
)(11:11 )()|(
Sequential importance sampling
SIS is just repeated application of importance sampling at each
time step
Basic sequential importance sampling:
– Proposal distribution q() = g(nt+1|nt)
– Leads to weights
To do basic SIS, need to be able to:
– Simulate forward from the state process
– Evaluate the observation process density (the likelihood)
)|( )(11
)()(1
itt
it
it nyfww
Basic SIS algorithm
Generate K “particles” from the prior on {n0, Θ} and with
weights 1/K:
For each time period t=1,...,T
– For each particle i=1,...,K
Prediction step:
Correction step:
Kiwn it
ii ,...,1 },,{ )()(0
)(0
)|(~ 1)(1 tt
it nngn
)|( )(11
)()(1
itt
it
it nyfww
Example of basic SIS
State-space model of exponential population growth
– State model
– Observation model
– Priors
)(~1 tt nPoisn
)15.0,(~ 21
211 ttt nnNy
)14(~0 Poisn
)1.0,08.1( 2N
Example of basic SISt=1
Obs: 12
0.0280.0120.2010.0730.0380.0290.0290.0000.0000.012
Predict Correct
1112141316162014916
Sample from prior
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
n0 Θ0 w0
0.10.10.10.10.10.10.10.10.10.1
0.10.10.10.10.10.10.10.10.10.1
171811152017177622
Prior at t=1
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
n1 Θ0 w0
171811152017177622
Posterior at t=1
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003
n1 Θ1 w1gives f()
Example of basic SISt=2
Obs: 14gives f()
0.1600.1900.1120.0080.0460.1600.0110.0000.0460.007
Predict Correct
171811152017177622
Posterior at t=1
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
n1 Θ1 w!
0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003
0.0630.0340.5580.2020.0100.0630.0630.0000.0000.003
1514121011152191120
Prior at t=2
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
n2 Θ1 w1
1514121011152191120
Posterior at t=2
1.0551.1071.1950.9740.9361.0291.0811.2011.0000.958
0.1050.0680.6910.0150.0050.1050.0070.0000.0000.000
n2 Θ2 w2
Problem: particle depletion Variance of weights increases with time, until few particles have
almost all the weight
Results in large Monte Carlo error in approximation
Can quantify:
From previous example:
K
i
itt
ittt nnwynp
1
)(11
)(11:11 )()|(
2)(1size sample effective
twCV
K
Time 0 1 2
ESS 10.0 2.5 1.8
Problem: particle depletion
Worse when:
– Observation error is small
– Lots of data at any one time point
– State process has little stochasticity
– Priors are diffuse or not congruent with observations
– State process model incorrect (e.g., time varying)
– Outliers in the data
Particle depletion: solutions
Pruning: throw out “bad” particles (rejection)
Enrichment: boost “good” particles (resampling)
– Directed enrichment (auxiliary particle filter)
– Mutation (kernel smoothing)
Other stuff
– Better proposals
– Better resampling schemes
– …
Rejection control
Idea: throw out particles with low weights Basic algorithm, at time t:
– Have a pre-determined threshold, ct, where 0 < ct <=1
– For i = 1, … , K, accept particle i with probability
– If particle is accepted, update weight to
– Now we have fewer than K samples Can make up samples by sampling from the priors,
projecting forward to the current time point and repeating the rejection control
t
iti
c
wr
)()( ,1min
),max( )(*)(t
it
it cww
Rejection control - discussion
Particularly useful at t=1 with diffuse priors
Can have a sequence of control points (not necessarily every
year)
Check points don’t need to be fixed – can trigger when variance
of weights gets too high
Thresholds, ct, don’t need to be set in advance but can be set
adaptively (e.g., mean of weights)
Instead of restarting at time t=0, can restart by sampling from
particles at previous check point (= partial rejection control)
Resampling: pruning and enrichment
Idea: allow “good” particles to amplify themselves while killing off “bad” particles
Algorithm. Before and/or after each time step (not necessarily every time step)
– For j = 1, … , K
Sample independently from the set of particles according to the probabilities
Assign new weights
Reduces particle depletion of states as “children” particles with the same “parent” now evolve independently
},,{ )()()( jt
jt
jt wn
Kiwn it
it
it ,...,1},,,{ )()()(
)()1( ,..., Ktt aa
)()(*)( i
ti
tj
t aww
Resample probabilities
Should be related to the weights
(as in the bootstrap filter)
– α could vary according to the variance of weights
– α = ½ has been suggested
related to “future trend” – as in auxiliary particle filter
)()( it
it wa
10 where)()( i
ti
t wa
)(ita
Directed resampling: auxiliary particle filter
Idea: Pre-select particles likely to have high weights in future
Example algorithm.
– For j = 1, … , K
Sample independently from the set of particles according to the probabilities
Predict:
Correct:
If “future” observations are available can extend to look >1 time step ahead – e.g., protein folding application
)|)|(( 1)(
1)()(
ti
tti
ti
t ynnEfwa
},,{ )()()( jt
jt
jt wn
Kiwn it
it
it ,...,1},,,{ )()()(
)|(~ )(1
)(1
jtt
jt nngn
)(
)(11)(
1)|(
jt
jttj
ta
nyfw
Kernel smoothing: enrichment of parameters through mutation
Idea: Introduce small “mutations” into parameter values when resampling
Algorithm:
– Given particles
– Let Vt be the variance matrix of the
– For i = 1, … , K Sample where h controls the
size of the perturbations
– Variance of parameters is now (1+h2)Vt, so need shrinkage to preserve 1st 2 moments
Kiwn it
it
it ,...,1},,,{ )()()(
s)(it
),(N from 2)(*)(t
it
it h V
Kernel smoothing - discussion
Previous algorithm does not preserve the relationship between
parameters and states
– Leads to poor smoothing inference
– Possibly unreliable filtered inference?
– Pragmatically – use as small a value of h as possible
Extensions:
– Kernel smooth states as well as parameters
– Local kernel smoothing
Other “tricks” Better proposals:
– Rao Blackwellization – the Gibbs sampling equivalent
– Importance sampling (e.g., from priors)
Better resampling:
– Residual resamling
– Stratified resampling
Alternative “mutation” algorithms:
– MCMC within SIS
Gradual focussing on posterior:
– Tempering/anneling
…
3. Results and discussionI’ll make
you fit into my
model!!!
Example: Grey seal model
Used pup production estimates for 1984-2002
Informative priors on parameters; 1984 data used with sampled
parameters to provide priors for states
Algorithm:
– Auxiliary particle filter with kernel smoothing (h=0.9) and
rejection control (c=99th %ile) at the first time step
– 400,000 particles
– Took ~6hrs to run on a PC (in SPlus)
Posterior parameter estimates
0.93 0.95 0.97
01
02
03
04
0
phi.adult 0.966
0.6 0.7 0.8 0.9
01
23
45
phi.juv.max 0.734
0.92 0.96
01
02
03
0
alpha 0.973
0.06 0.07 0.08 0.09
01
03
05
0
psi 0.07
2 4 6 8 10 14
0.0
0.1
00
.20
0.3
0
gamma.dd 3.32
0.5 1.5 2.5
0.0
0.4
0.8
gamma.dist 0.792
0.2 0.6 1.0 1.4
0.0
1.0
2.0
gamma.sf 0.355
0.0006 0.0010 0.0014
05
00
15
00
beta.ns 0.000906
0.0008 0.0014 0.0020
05
00
10
00
15
00
beta.ih 0.00127
0.0002 0.0004
02
00
04
00
06
00
0
beta.oh 0.000304
0.00010 0.00020 0.00030
04
00
08
00
0beta.ork 0.000183
Coplots
Sa
-1 0 1 2 3 -3.5 -2.5 -2 0 2 4 -9 -7 -9 -7
23
45
-11
3
Sj.Max
a
24
6
-3.5
-2.0
psi
move.1.dd
04
-22
move.2.dist
move.3.site
02
46
-9-7 b.ns
b.ih
-8.5
-6.0
-9-7
b.oh
2 3 4 5 2 4 6 0 2 4 6 0 2 4 6 -8.5 -6.5 -10.5 -8.5 -10.
5-8
.0
b.ork
Smoothed pup production
Year
Pup
s
1985 1990 1995 2000
1500
3500
North Sea
Year
Pup
s
1985 1990 1995 2000
1500
3000
Inner Hebrides
Year
Pup
s
1985 1990 1995 2000
8000
1200
0
Outer Hebrides
Year
Pup
s
1985 1990 1995 2000
6000
1600
0
Orkneys
Predicted adults
Year
Adu
lts
2004 2008 2012
9000
1300
0North Sea
Year
Adu
lts
2004 2008 2012
7000
1000
0
Inner Hebrides
Year
Adu
lts
2004 2008 2012
2500
040
000
Outer Hebrides
Year
Adu
lts
2004 2008 2012
4000
060
000
Orkneys
Model selection and multi-model inference
Can put priors on alternative models, and then sample from the models to initialize the particles
Proportion of particles with each model gives posterior model probabilities
Can also penalize for more parameters
ModelLnL AIC Akaike
weight
Outer Hebrides = Western Isles
Production, 1δ -625.0 1258.0 0.02
Production, 3δs -624.1 1260.2 0.01
Staff, 1δ -624.5 1257.1 0.03
Staff, 3δs -623.8 1259.5 0.01
Outer Hebrides = Western Isles + Northwest
Production, 1δ -621.5 1250.9 0.67
Production, 3δs -621.9 1255.9 0.06
Staff, 1δ -622.9 1253.8 0.16
Staff, 3δs -622.9 1257.9 0.02
Discussion: application inecology and epidemiology
An alternative to MCMC?
– Debatable when there is plenty of time for fitting and main emphasis is on smoothing inference.
– Best suited to situations where fast filtered estimates are required – e.g.: foot and mouth outbreak? N. American West coast salmon harvest openings?
Disadvantages:
– Methods less well developed than for MCMC?
– No general software (no WinBUPF)
Current / future research
Efficient general algorithms (and software)
Comparison with MCMC and Kalman filter
Incorporating other types of data (e.g., mark-recapture)
Parallelization
Multi-model inference
Diagnostics
Other applications!
Just another particle