standard errors in systematic sampling chris … · random 8900 8900 143 35000 we could pretend...

STANDARD ERRORS IN SYSTEMATIC SAMPLING

Chris Glasbey & Stijn Bierman

Biomathematics & Statistics Scotland

How many acorns are under an oak tree?

2

Standard estimator of total is

T = Mn∑

i=1yi where n = 144, M = 9

(NB We are only interested in the total, not in interpolating y’s)

But what is its variance?

Two frameworks for inference:

1) Design based: Assume unobserved array of y’s is fixed, and only stochas-ticity is due to sampling

2) Model based: Assume y’s are a single realisation from a super-populationof possible arrays

But, for systematic sampling, there is no design-based, unbiased estimatorof var(T − T )

3

1. Design-based inference

If instead we had sampled at random:

var(T − T ) =

1 −1

M

M2nσ2, where σ2 =1

n − 1

n∑

i=1(yi − y)2,

is an unbiased estimator with (n − 1) d.f., where σ2 is variance of y array

4

From randomisation, var(T − T ) = yTQy, with y arranged as vector

For random sampling

Q =

8, −ε, −ε, −ε, −ε, . . .−ε, 8, −ε, −ε, −ε, . . .−ε, −ε, 8, −ε, −ε, . . .−ε, −ε, −ε, 8, −ε, . . .

... ... ... . . .

ε = 0.0062

Whereas for systematic sampling

Q =

8, −1, −1, 8, −1, . . .−1, 8, −1, −1, 8, . . .−1, −1, 8, −1, −1, . . .

8, −1, −1, 8, −1, . . .... ... ... . . .

In the absence of trend, and if “covariances” decay with distance, systematicsampling is most efficient (Bellman, 1977)

5

For this problem we know the truth, a complete census:

T = 85306 acorns

(Approximated from Aubry and Debouzie, Ecology, 2000.)

So we can conduct a simulation study

6

sampling design rmsesystematic 2000random 8900

7

sampling design rmse√

var d.f. 95% c.widthsystematic 2000random 8900 8900 143 35000

8

Two other types of design with unbiased estimators of var(T − T ):

replicated systematic stratified

d.f. = 3 d.f. ≤ n2 (Satterthwaite, 1946)

9


var d.f. 95% c.widthsystematic 20004-rep systematic 2600 2600 3 150002-sample strata 3600 3600 12 16000random 8900 8900 143 35000

10



We could pretend systematic design was one of other three, to obtain varif we believe the systematic design is the most efficient (Bellman, 1977)

But if there are trends it may not be!

11

For example, if the data were elevations, such as:



12

2. Model-based inference

Model for trend:

fij = β1 exp

−1

2

log rij − β4

β5

6

where rij =√

√

√

√(i − β2)2 + (j − β3)2

0 5 10 15 20 25

050

100

150

200

250

r

y

Fit by maximising Poisson quasi-likelihood, ∑(y log f − f ), and Tf = ∑ fij

13

sampling design rmse T rmse(∑ f)systematic 2000 20004-rep systematic 2600 20002-sample strata 3600 2800random 8900 3600

14

Isotropic exponential model for autocorrelation of errors, estimated from

standardised residuals (e = (y − f )/√

f)

0 2 4 6 8 10

−0.

50.

00.

51.

0

distance

corr

elat

ion

Obtain y, using best linear predictor of unobserved e’s, and Tc = ∑ yij

15

sampling design rmse T rmse(∑ f) rmse(∑ y)systematic 2000 2000 21004-rep systematic 2600 2000 20002-sample strata 3600 2800 2700random 8900 3600 3400

Estimate var(Tc − T ) by simulation, using a parametric bootstrap

16

3. Summary

For design-based inference with acorn data:

• Systematic sampling is unknowably most precise. In absence of trend,other designs give conservative estimates

• 4-rep systematic and 2-sample stratified designs produce shortest confi-dence intervals

Model-based inference reduces design effects, but at the cost of subjectiveassumptions

17

standard errors in systematic sampling chris … · random 8900 8900 143 35000 we could pretend...

Documents