rachel fewster department of statistics, university of auckland variance estimation for systematic...
TRANSCRIPT
Rachel Fewster Department of Statistics, University of Auckland
Variance estimation for systematic designs
in spatial surveys
• Method of estimating density of objects in a survey region.
Line transect sampling
Line transect sampling
D
# detections per unit area
= p
D
# detections per unit area
= p
Line transect sampling
Density, D
Estimate the variance of the ratio by the Delta method: “squared CVs add”
D
# detections per unit area
= p
ENCOUNTER RATE
easy
ENCOUNTER RATE VARIANCE: Largest and most difficult component Usually >70% of total variance
• Encounter rate estimates mean detections per unit line length
Encounter Rate and its variance
Inferential framework: which Var(n/L)?
Animals from spatial p.d.f. Select lines
Detect animals
• Variance is defined over conceptual survey repeats
Find n/L
Inferential framework: which Var(n/L)?
Animals from spatial p.d.f. Select lines
Detect animals
• Variance is defined over conceptual survey repeats
Find n/L
Gained value of n/L from first survey
Same animals, new positions
Second survey:
Inferential framework: which Var(n/L)?
Select new linesSame animals, new positions
Detect new animalsFind new n/L
Inferential framework: which Var(n/L)?Second survey:
Select new linesSame animals, new positions
Detect new animalsFind new n/L
Inferential framework: which Var(n/L)?
Gained value of n/L from second survey
Overall, gives var(n/L) across the repeated surveys
This is our ENCOUNTER RATE VARIANCE.
• To estimate a variance, use repeated observations with the same variance
Random-line estimator:• makes no assumptions about the unknown distribution of objects;
How to estimate Var(n/L)?
• To estimate a variance, use repeated observations with the same variance
Random-line estimator:• makes no assumptions about the unknown distribution of objects;• random variables are IID with respect to the design.
How to estimate Var(n/L)?
)(/)( lEnEln ii
Systematic Survey DesignsSurveys usually use SYSTEMATIC transect lines, instead of random lines.
Grid has random start-point
Systematic lines give LOWER VARIANCE than random lines in trended populations
But the variance is HARD TO ESTIMATE
Systematic Survey DesignsSurveys usually use SYSTEMATIC transect lines, instead of random lines.
Grid has random start-point
A systematic sample has
NO REPETITION: it is a sample
of size 1!
Variance for systematic designs
• There is no general design-unbiased variance estimator for data from a single systematic sample
• Approaches to systematic variance estimation are:
1. Ignore the problem and use estimators for random lines
2. Use some form of post-stratification
3. Model the autocorrelation in the systematic sample
Approach used to date
Variance for systematic designs
• There is no general design-unbiased variance estimator for data from a single systematic sample
• Approaches to systematic variance estimation are:
1. Ignore the problem and use estimators for random lines
2. Use some form of post-stratification
3. Model the autocorrelation in the systematic sample
Approach in Fewster et al, Biometrics, 2009
But the stratified estimators are still biased sometimes – e.g. high sampling fraction, or population clustering
Stratified variance estimators: results
Can we do better…?
Variance for systematic designs
• There is no general design-unbiased variance estimator for data from a single systematic sample
• Approaches to systematic variance estimation are:
1. Ignore the problem and use estimators for random lines
2. Use some form of post-stratification
3. Model the autocorrelation in the systematic sample
Historical Note
• Many estimators for systematic designs originated in social statistics
– discrete surveys
Correlation will clearly exist in responses of neighbours, but modelling the correlation is hard!
But space is continuous!
As a strip changes position very slightly...
... it still covers many of the same objects.
But space is continuous!
As a strip changes position very slightly...
... it still covers many of the same objects.
Idea:1. Divide the region into hundreds of tiny
‘striplets’2. Allow the number of objects available in each
striplet to be random variables X1 , X2 , …, XJ
3. The number of objects available in any full strip is the sum of the objects in the constituent striplets
1. Divide the region into hundreds of tiny ‘striplets’2. Number of objects available in striplets 1, 2, …, J
is X1 , X2 , …, XJ
3. Number of objects available in any full strip is the sum of the objects in the constituent striplets.
Expected number of objects per striplet
Random number of objects per striplet, X1 , X2 , …, XJ
~ Multinomial
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
1. Divide the region into hundreds of tiny ‘striplets’2. Number of objects available in striplets 1, 2, …, J
is X1 , X2 , …, XJ
3. Number of objects available in any full strip is the sum of the objects in the constituent striplets.
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
Full strip at this position: 10 objects
Full strip at next position: 7 objects
Full strip at next position: 8 objects
... etc
Recap:We want the variance in the
encounter rate, n/L, over:1. Moving grid;2. Moving objects;3. Detections
Account for:1. Large-scale trends2. Small-scale noise
1. Trends in object density across the region Observed number of detections per unit search area
#d
ete
ctio
ns
/ u
nit
are
a Points correspond to observed transects
Fit a GAM to give a fitted object density for any search strip in the region
x-coordinate
#d
ete
ctio
ns
/ u
nit
are
a
x-coordinate
1. Trends in object density across the region
Fit a GAM to give a fitted object density for any search strip in the region
For any striplet j, we now have an expected number of objects available, j
Expected number of objects per striplet, j
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
Account for:1. Large-scale trends
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
Account for:2. Small-scale noise
Random number of objects per striplet, X1 , X2 , …, XJ
~ Multinomial(N, j/N)
Striplet idea means we correctly model the autocorrelation between systematic grids
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
Account for:2. Small-scale noise
Recap:We want the variance in the
encounter rate, n/L, over:1. Moving grid;2. Moving objects;3. Detections
Variance in number of objects available is taken care of (1 & 2)
Variance in detections is Binomial given #objects available (1 & 2)
Law of Total Variance:
b is the grid placement: Mean and variance of
#detections, n, given grid placement, is all that’s needed.
Striplet variance estimator:
Simulation Results:
3 habitat types but no clustering
Clustering included
Simulation Results:Red lines give correct answers
Simulation Results:Ignoring the systematic design:appalling performance!
Simulation Results:Post-stratification:improvement but still clear bias
Simulation Results:Striplet method: huge improvement!
Striplet method: huge improvement!
Spotted Hyena in the Serengeti
Spotted Hyena in the Serengeti
Short grass plains: prey herds congregate in wet season
Long grass plains: unattractive in wet season
Spotted Hyena in the Serengeti
Wet season: non-territorial ‘commuters’ (n=186)
Dry season: territorial residents (n=53)
Wet season: highly clustered.cv(n/L) is:
- 17% ignoring systematic design- 14% using poststratification- 7% using striplets!
Overall cv(D) is:- 20% ignoring systematic design- 17% using poststratification- 11% using striplets
The estimator matters!
Dry season: not clustered; small ncv(n/L) is:
- 15% ignoring systematic design- 12% using poststratification- 13% using striplets
Overall cv(D) is:- 23% ignoring systematic design- 20% using poststratification- 21% using striplets
Not much difference
In Revision, Biometrics
1. For a systematic design, variance estimators based on random lines are not adequate for trended or clustered populations
2. Post-stratification improves estimation for trended pops, but far from perfect
3. New ‘striplet’ method huge improvement in all line/strip situations trialled to date
Variance can be highly overestimated
Conclusions
Striplet variance estimator:
B is the number of possible grids, in discrete approximation
j is fitted #objects in striplet j
gj(b) is fitted P(detection) in striplet j
Williams & Thomas, JCRM 2008
Application: British Columbia multi-species marine survey
Select species with greatest and least trends in encounter rate for illustration
Greatest trend: Dall’s Porpoise
Highest encounter
rates on short lines
Worst case!
Least trend: floating plastic garbage
No trend in encounter rate with line length
ResultsDall’s Porpoise: previous reported
CV=31%Stratified methods: reported CV=19%
Estimated CV=31% using Poisson-based estimator with no adjustment for systematic lines
Estimated CV=19% using design-based estimator with post-stratification and overlapping strata
ResultsFloating garbage: previous reported
CV=15%Stratified methods: reported CV=14%
For untrended population, there is little difference in the different estimators
But space is continuous!
As a strip changes position very slightly...
... it still covers many of the same objects.
But space is continuous!
As a strip changes position very slightly...
... it still covers many of the same objects.
Idea:1. Divide the region into hundreds of tiny
‘striplets’2. Allow the number of objects available in each
striplet to be random variables X1 , X2 , …, XJ
3. The number of objects available in any full strip is the sum of the objects in the constituent striplets
1. Divide the region into hundreds of tiny ‘striplets’2. Number of objects available in striplets 1, 2, …, J
is X1 , X2 , …, XJ
3. Number of objects available in any full strip is the sum of the objects in the constituent striplets.
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
Expected number of objects per striplet
Random number of objects per striplet, X1 , X2 , …, XJ
~ Multinomial
1. Divide the region into hundreds of tiny ‘striplets’2. Number of objects available in striplets 1, 2, …, J
is X1 , X2 , …, XJ
3. Number of objects available in any full strip is the sum of the objects in the constituent striplets.Full strip at this
position: 10 objects
Full strip at next position: 7 objects
Full strip at next position: 8 objects
... etc
Str
iple
t #
ob
ject
s availa
ble
striplet position 0
1
2
3
4
1. Trends in object density across the region Observed number of detections per unit search area
#d
ete
ctio
ns
/ u
nit
are
a
Points correspond to observed transects
Fit a GAM to give a fitted object density for any search strip in the region
x-coordinate
1. Trends in object density across the region#
dete
ctio
ns
/ u
nit
are
a
Fit a GAM to give a fitted object density for any search strip in the region
x-coordinate
For any new grid placement, we now have an expected number of objects available for that grid