standard errors in systematic sampling chris … · random 8900 8900 143 35000 we could pretend...
TRANSCRIPT
STANDARD ERRORS IN SYSTEMATIC SAMPLING
Chris Glasbey & Stijn Bierman
Biomathematics & Statistics Scotland
How many acorns are under an oak tree?
2
Standard estimator of total is
T = Mn∑
i=1yi where n = 144, M = 9
(NB We are only interested in the total, not in interpolating y’s)
But what is its variance?
Two frameworks for inference:
1) Design based: Assume unobserved array of y’s is fixed, and only stochas-ticity is due to sampling
2) Model based: Assume y’s are a single realisation from a super-populationof possible arrays
But, for systematic sampling, there is no design-based, unbiased estimatorof var(T − T )
3
1. Design-based inference
If instead we had sampled at random:
var(T − T ) =
1 −1
M
M2nσ2, where σ2 =1
n − 1
n∑
i=1(yi − y)2,
is an unbiased estimator with (n − 1) d.f., where σ2 is variance of y array
4
From randomisation, var(T − T ) = yTQy, with y arranged as vector
For random sampling
Q =
8, −ε, −ε, −ε, −ε, . . .−ε, 8, −ε, −ε, −ε, . . .−ε, −ε, 8, −ε, −ε, . . .−ε, −ε, −ε, 8, −ε, . . .
... ... ... . . .
ε = 0.0062
Whereas for systematic sampling
Q =
8, −1, −1, 8, −1, . . .−1, 8, −1, −1, 8, . . .−1, −1, 8, −1, −1, . . .
8, −1, −1, 8, −1, . . .... ... ... . . .
In the absence of trend, and if “covariances” decay with distance, systematicsampling is most efficient (Bellman, 1977)
5
For this problem we know the truth, a complete census:
T = 85306 acorns
(Approximated from Aubry and Debouzie, Ecology, 2000.)
So we can conduct a simulation study
6
sampling design rmsesystematic 2000random 8900
7
sampling design rmse√
var d.f. 95% c.widthsystematic 2000random 8900 8900 143 35000
8
Two other types of design with unbiased estimators of var(T − T ):
replicated systematic stratified
d.f. = 3 d.f. ≤ n2 (Satterthwaite, 1946)
9
sampling design rmse√
var d.f. 95% c.widthsystematic 20004-rep systematic 2600 2600 3 150002-sample strata 3600 3600 12 16000random 8900 8900 143 35000
10
sampling design rmse√
var d.f. 95% c.widthsystematic 20004-rep systematic 2600 2600 3 150002-sample strata 3600 3600 12 16000random 8900 8900 143 35000
We could pretend systematic design was one of other three, to obtain varif we believe the systematic design is the most efficient (Bellman, 1977)
But if there are trends it may not be!
11
For example, if the data were elevations, such as:
sampling design rmse√
var d.f. 95% c.widthsystematic 3704-rep systematic 380 380 3 23002-sample strata 110 110 10 500random 410 410 143 1640
12
2. Model-based inference
Model for trend:
fij = β1 exp
−1
2
log rij − β4
β5
6
where rij =√
√
√
√(i − β2)2 + (j − β3)2
0 5 10 15 20 25
050
100
150
200
250
r
y
Fit by maximising Poisson quasi-likelihood, ∑(y log f − f ), and Tf = ∑ fij
13
sampling design rmse T rmse(∑ f)systematic 2000 20004-rep systematic 2600 20002-sample strata 3600 2800random 8900 3600
14
Isotropic exponential model for autocorrelation of errors, estimated from
standardised residuals (e = (y − f )/√
f)
0 2 4 6 8 10
−0.
50.
00.
51.
0
distance
corr
elat
ion
Obtain y, using best linear predictor of unobserved e’s, and Tc = ∑ yij
15
sampling design rmse T rmse(∑ f) rmse(∑ y)systematic 2000 2000 21004-rep systematic 2600 2000 20002-sample strata 3600 2800 2700random 8900 3600 3400
Estimate var(Tc − T ) by simulation, using a parametric bootstrap
16
3. Summary
For design-based inference with acorn data:
• Systematic sampling is unknowably most precise. In absence of trend,other designs give conservative estimates
• 4-rep systematic and 2-sample stratified designs produce shortest confi-dence intervals
Model-based inference reduces design effects, but at the cost of subjectiveassumptions
17