chapter 6: cluster sampling design 1 - iowa state...
TRANSCRIPT
Chapter 6: Cluster sampling design 1
Jae-Kwang Kim
Iowa State University
Fall, 2014
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 1 / 26
Introduction
1 Introduction
2 Single-Stage Cluster Sampling : Equal size case
3 Single-stege cluster sampling: General case
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 2 / 26
Introduction
Setup:1 Frame list clusters (disjoint groups of population elements)2 Select clusters by a probability sampling.
Why cluster sampling?1 Frame inadequacy: No way to get a list of elements (very expensive)
but relatively easy or cheap to list clusters2 Convenience: By grouping elements into “close” subgroups, you can
save time or money
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 3 / 26
Introduction
Single stage cluster sampling: Sample clusters and observe allelements in each selected cluster.
Two-stage sampling:
[Stage 1] Population is divided into clusters, called Primary SamplingUnits (PSU’s). A probability sample of PSU’s is drawn.[Stage 2] Each selected PSU is divided into clusters or elements, calledSecondary Sampling Units (SSU’s). A probability sample of SSU’s isdrawn in each selected PSU.
If SSU=cluster, it is called two-stage cluster sampling.If SSU=element, it is called two-stage element sampling
Multi-stage sampling: PSU, SSU, ..., USU (Ultimate Sampling Unit)If USU=cluster, it is called multi-stage cluster sampling.If USU=element, it is called multi-stage element sampling.Probably, don’t know N. So, estimation of yU is more difficult.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 4 / 26
Introduction
Notation (Population level)
UI = {1, · · · ,NI}: index set of clusters in the population
Ui : the set of elements in the i-th cluster of size Mi ,(i = 1, 2, · · · ,NI )
yij : measurement of item y at the j-th element (j = 1, 2, · · · ,Mi ) incluster i , i = 1, 2, · · · ,NI .
Population total: Y =∑NI
i=1
∑Mij=1 yij =
∑NIi=1 Yi =
∑NIi=1Mi Yi where
Yi =∑Mi
j=1 yij = Mi Yi
Population size: N =∑NI
i=1Mi
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 5 / 26
Introduction
Notation (Sample level)
AI : index set of clusters in the sample
nI = |AI |: the number of sampled clusters
A = ∪i∈AIUi : index set of elements in the sample
nA = |A| =∑
i∈AIMi : the number of sampled elements
Usually, nA is not fixed even if nI is fixed.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 6 / 26
Single-Stage Cluster Sampling : Equal size case
1 Introduction
2 Single-Stage Cluster Sampling : Equal size case
3 Single-stege cluster sampling: General case
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 7 / 26
Single-Stage Cluster Sampling : Equal size case
Single-Stage Cluster Sampling : Equal size case
Mi are all equal (denoted by M = Mi ).
Sampling design: Simple random cluster sampling1 Simple random sampling of nI clusters from NI clusters.2 Observe all the elements in the selected clusters.
Estimation of mean:
ˆYU =YHT
NIM=
1
nI
1
M
∑i∈AI
M∑j=1
yij =1
nI
∑i∈AI
Yi
Variance
Var( ˆYU) =1
nI
(1− nI
NI
)1
NI − 1
NI∑i=1
(Yi − YU
)2where Yi =
∑Mj=1 yij/M and YU = Y /(NIM) =
∑NIi=1 Yi/NI .
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 8 / 26
Single-Stage Cluster Sampling : Equal size case
ANOVA
Source D.F. Sum of Squares Mean S.S.Between cluster NI − 1 SSB S2
b
Within cluster NI (M − 1) SSW S2w
Total NIM − 1 SST S2
where
SSB =
NI∑i=1
M(Yi − YU
)2SSW =
NI∑i=1
M∑j=1
(yij − Yi
)2SST =
NI∑i=1
M∑j=1
(yij − YU
)2and MSS=SS/(d.f.).
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 9 / 26
Single-Stage Cluster Sampling : Equal size case
Note that
S2 =(NI − 1)S2
b + NI (M − 1)S2w
NIM − 1∼=
S2b + (M − 1) S2
w
M.
The variance is
Var(
ˆYU
)=
1
nIM
(1− nI
NI
)S2b .
Note that, under simple random sampling,
VarSRS
(ˆYU
)=
1
nIM
(1− nIM
NIM
)S2,
as n = nIM is the size of the sampled elements.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 10 / 26
Single-Stage Cluster Sampling : Equal size case
Design effect
Want to compare the current sampling design p (·) with SRS of equalsample size
Kish (1965) introduced design effect
deff(p, YHT
)=
Vp
(YHT
)VSRS
(YHT
)Usage 1: Compare designs
If deff > 1, then p (·) is less efficient than SRS.If deff < 1, then p (·) is more efficient than SRS.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 11 / 26
Single-Stage Cluster Sampling : Equal size case
Design effect (Cont’d)
Usage 2: Determine sample size:1 Have some desired variance V ∗
2 Under SRS, you can easily find required sample size n∗
3 Choose n∗p = deff · n∗.Then,
Vp
(YHT | n∗p
)= V ∗.
n∗ is often called effective sample size. It is the sample size requiredfor the given V ∗ if the sample design is SRS.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 12 / 26
Single-Stage Cluster Sampling : Equal size case
Intracluster correlation coefficient
A measure of within cluster homogeneity
Assume Mi = M
Intracluster correlation coefficient
ρ =Cov [yij , yik |j 6= k]√
V (yij)√
V (yik)
=
∑NIi=1
∑j 6=k
∑(yij − Y )(yik − Y )/
∑NIi=1M(M − 1)∑NI
i=1
∑Mj=1(yij − Y )2/
∑NIi=1M
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 13 / 26
Single-Stage Cluster Sampling : Equal size case
Intracluster correlation coefficient (Cont’d)
Properties1
ρ = 1− M
M − 1
SSW
SST.
= 1− S2w
S2
2 Since 0 ≤ SSW ≤ SST ,
− 1
M − 1≤ ρ ≤ 1.
For SSW = 0, ρ = 1: perfect homogeneity in cluster.For SSW = SST , ρ = −1/(M − 1): perfect heterogeneity in cluster.Each cluster is like the whole population in terms of variability.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 14 / 26
Single-Stage Cluster Sampling : Equal size case
Intracluster correlation coefficient (Cont’d)
Variance under simple random cluster (SRC) sampling satisfies
VSRC ( ˆY ).
= VSRS( ˆY )[1 + (M − 1)ρ]
Thus,deff = 1 + (M − 1) ρ.
Effective sample size
n∗ =nA
1 + (M − 1)ρ.
For example, if ρ = 0.1 and M = 11, deff = 2. Thus, even if ρ is low,design effect can be large if M is large.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 15 / 26
Single-Stage Cluster Sampling : Equal size case
Estimation of intracluster correlation coefficient
Table: ANOVA table in the sample level (under SRC sampling)
Source D.F. Sum of Squares Mean SSBetween nI − 1 Sample SSB (SSSB) s2b = SSSB/(nI − 1)Within nI (M − 1) Sample SSW (SSSW) s2w = SSSW /{nI (M − 1)}Total nIM − 1 Sample SST (SSST)
Sum of Squares: Sample SST = Sample SSB + Sample SSW
Sample SSB =∑i∈AI
M
Yi − (n−1I
∑k∈AI
Yk)
2
Sample SSW =∑i∈AI
M∑j=1
(yij − Yi )2
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 16 / 26
Single-Stage Cluster Sampling : Equal size case
Results
E (s2b) =M
NI − 1
NI∑i=1
(Yi − Y
)2= S2
b
E (s2w ) =1
NI (M − 1)
NI∑i=1
M∑j=1
(yij − Yi
)2= S2
w
Estimated intracluster correlation
ρ = 1− s2wσ2y
where
σ2y =(NI − 1)s2b + NI (M − 1)s2w
NIM
.=
1
Ms2b +
(1− 1
M
)s2w
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 17 / 26
Single-stege cluster sampling: General case
1 Introduction
2 Single-Stage Cluster Sampling : Equal size case
3 Single-stege cluster sampling: General case
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 18 / 26
Single-stege cluster sampling: General case
Single-stage cluster sampling: General case
Meaning1 Draw a probability sample AI from UI via pI (·).2 Observe every elements in each selected clusters.
Cluster inclusion probability
πIi = Pr (i ∈ AI ) =∑
AI ;i∈AI
pI (AI )
πIij = Pr (i , j ∈ AI ) =∑
AI ;i ,j∈AI
pI (AI )
Element inclusion probability
πik = Pr {(ik) ∈ A} = Pr (i ∈ AI ) = πIi where k ∈ Ui
πik,jl = Pr {(ik) ∈ A & (jl) ∈ A} =
{Pr (i ∈ AI ) = πIi if i = jPr (i , j ∈ AI ) = πIij if i 6= j
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 19 / 26
Single-stege cluster sampling: General case
Single-stage cluster sampling
HT estimation1 Point estimator:
YHT =∑i∈AI
Yi
πIi
2 Variance
Var(YHT
)=∑i∈UI
∑j∈UI
∆IijYi
πIi
Yj
πIj
where ∆Iij = πIij − πIiπIj .3 Variance estimation
V(YHT
)=∑i∈AI
∑j∈AI
Yi
πIi
Yj
πIj
∆Iij
πIij
provided πIij > 0.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 20 / 26
Single-stege cluster sampling: General case
Remark
For fixed size design (nAI= nI ),
Var(YHT
)= −1
2
∑i∈UI
∑j∈UI
∆Iij
(Yi
πIi−
Yj
πIj
)2
1 If πIi ∝ Yi , then YHT = Y
2 If πIi ∝ Mi and if Yi is constant, then YHT = Y .
3 Equal probability sampling design is generally inefficient.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 21 / 26
Single-stege cluster sampling: General case
Optimal design
Assume the following model
yij = µ+ ai + eij
where µ is an unknown parameter, ai ∼ (0, σ2a), and eij ∼ (0, σ2e ).
Total variance of YHT :
V (YHT ) = V (∑i∈AI
π−1Ii Miµ) + E{∑i∈AI
π−2Ii γi}
= V (∑i∈AI
π−1Ii Miµ) + E{∑i∈UI
π−1Ii γi}
where γi = V (Yi ) = M2i σ
2a + Miσ
2e .
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 22 / 26
Single-stege cluster sampling: General case
Optimal design (Cont’d)
The second term of the total variance is minimized when
πIi ∝ γ1/2i = Miσa
(1 +
σ2eσ2a· 1
Mi
)1/2
.
If σ2e/σ
2a is small, then πIi ∝ Mi lead to an optimal sampling design.
If σ2e/σ
2a is large, then πIi ∝ M
1/2i can be a better design.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 23 / 26
Single-stege cluster sampling: General case
Mean estimation
Population size N is generally unknown.
If the parameter of interest is population mean, we may have twodifferent concepts:
µ1 =1
NI
NI∑i=1
Yi
Mi=
1
NI
NI∑i=1
Yi
µ2 =1
N
NI∑i=1
Yi =
∑NIi=1
∑Mij=1 yij∑NI
i=1Mi
If Mi = M, then µ1 = µ2. Otherwise, the two parameters havedifferent meanings.
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 24 / 26
Single-stege cluster sampling: General case
Mean estimation
Suppose that we are interested in estimating YU = µ2.
From each sampled cluster i , we observe(Mi , Yi
).
Ratio estimation
ˆYR =
∑i∈AI
Yi/πIi∑i∈AI
Mi/πIi:=
YHT
NHT
Variance
V(
ˆYR
).
=1
N2
{V (YHT )− 2µCov(YHT , NHT ) + µ2V (NHT )
}
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 25 / 26
Single-stege cluster sampling: General case
Mean estimation
Under SRC sampling, for example, the variance is
V(
ˆYR
).
=N2I
nIN2
(1− nI
NI
)1
NI − 1
NI∑i=1
(Yi −Mi YU
)2=
1
nI M2
(1− nI
NI
)1
NI − 1
NI∑i=1
M2i
(Yi − YU
)2with M = N−1I
∑NIi=1Mi = N/NI .
Kim (ISU) Ch. 6: Cluster sampling design 1 Fall, 2014 26 / 26