stat 521: survey sampling - iowa state...
TRANSCRIPT
STAT 521: Survey Sampling
Jae Kwang Kim
Iowa State University
ii
Contents
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Probability Sampling . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Definition & Notation . . . . . . . . . . . . . . . . . 4
1.3.2 Probability sampling . . . . . . . . . . . . . . . . . . 6
1.4 Basic Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Horvitz-Thompson estimation 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Simple random sampling . . . . . . . . . . . . . . . . . . . . . 19
2.4 Domain estimation . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Element sampling design 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Poisson sampling . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 PPS sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 πps sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Systematic sampling . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
iv CONTENTS
3.7 Systematic πps . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Cluster sampling 47
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Single-Stage Cluster Sampling . . . . . . . . . . . . . . . . . . 49
4.3 Two-stage Sampling . . . . . . . . . . . . . . . . . . . . . . . 57
5 Estimation 65
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Large sample theory . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Ratio estimation . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Regression estimation . . . . . . . . . . . . . . . . . . . . . . 73
5.5 GREG estimation . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . . 88
6 Variance estimation 93
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Taylor series linearization . . . . . . . . . . . . . . . . . . . . 97
6.3 Replication method . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.1 Independent Random Group Method . . . . . . . . . . 102
6.3.2 Non-independent Random Groups . . . . . . . . . . . 103
6.3.3 Jackknife method for variance estimation . . . . . . . 106
6.3.4 Balanced repeated replication . . . . . . . . . . . . . . 111
7 Two-phase sampling 113
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Two-phase sampling for stratification . . . . . . . . . . . . . . 115
7.3 Two-phase regression estimator . . . . . . . . . . . . . . . . . 116
7.4 Repeated survey . . . . . . . . . . . . . . . . . . . . . . . . . 118
8 Nonresponse 121
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
CONTENTS v
8.2 Call-backs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3 Nonresponse weighting adjustment . . . . . . . . . . . . . . . 123
8.3.1 Weighting class NWA method . . . . . . . . . . . . . 124
8.3.2 Estimators that use weighting as well as auxiliary vari-
ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.4 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 128
8.4.2 Deterministic imputation . . . . . . . . . . . . . . . . 132
8.4.3 Stochastic imputation . . . . . . . . . . . . . . . . . . 134
8.4.4 Variance estimation after imputation . . . . . . . . . . 136
9 Small Area Estimation 139
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.2 Area level estimation . . . . . . . . . . . . . . . . . . . . . . . 140
9.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Chapter 1
Introduction
1.1 Introduction
• Population
– Finite popoluation vs Infinite population
– Target population vs Survey population
• Sample: subset of (finite) population
• Sampling: a subject of selecting a sample from a (finite) population.
• Why sampling ?
1. To reduce the cost
2. To save the time
3. Sometimes, to get a more accurate information about the popu-
lation.
4. Sometimes, it is the only way of getting information about the
target population.
• Sampling error: An error due to the fact that only a subset of finite
population is selected for observation.
1
2 CHAPTER 1. INTRODUCTION
• Two types of sampling
– Probability sampling
– Non-probability sampling
• Roughly speaking, a probability rule is assigned to obtain a sample in
probability sampling.
1.2 Example
Example 1.1. Consider the following artificial population of farms (of size
N = 4)
ID Farm size Farm yield (y)
1 4 1
2 6 3
3 6 5
4 20 15
Instead of selecting the N = 4 farms, suppose that we want to select only
n = 2 sample farms and observe yi in the sample. The parameter of interest
is the average farm yield ((1 + 3 + 5 + 15)/4 = 6). Assume that the farm
size is known for each farm in the population. How to select the samples?
• In this example, six possible ways of selecting the sample farms.
Case Sample ID Sample Mean Sampling Error
1 1, 2 2 -4
2 1, 3 3 -3
3 1, 4 8 2
4 2, 3 4 -2
5 2, 4 9 3
6 3, 4 10 4
1.2. EXAMPLE 3
• We select only one sample from the six possible samples.
• In any situation, sampling error exists.
• How to select one sample from the six possible samples ?
– Non-probability sampling
– Probability sampling
• Simple random sampling: Equal probability of selection for each pos-
sible sample.
Case Sample ID Sample mean (y) Selection Probability
1 1, 2 2 1/6
2 1, 3 3 1/6
3 1, 4 8 1/6
4 2, 3 4 1/6
5 2, 4 9 1/6
6 3, 4 10 1/6
• We can derive the probability distribution of the sample mean from
the probability distribution of the sample.
1. What is the probability distribution of the estimator y ?
2. Show that the sample mean y is unbiased for the population
mean. What is the meaning of the expectation in this case ?
3. What is the variance of y in this case ?
Remark
1. No model assumption about yi in the example: totally different frame-
work !
4 CHAPTER 1. INTRODUCTION
2. Design-based approach: the reference distribution is the sampling dis-
tribution generated by the repeated application of the given sampling
mechanism.
3. Why design-based approach ?
(a) Requires weaker assumptions: robust.
(b) Finding a good model is not easy: Useful for general purpose
estimation.
(c) Get consistent results from different users: Useful for official
statistics.
Example 1.2. Another sample design
• Unequal probability sampling
Sample ID y value Mean Estimator Selection probability
1, 4 1, 15 4.5 1/3
2, 4 3, 15 6 1/3
3, 4 5, 15 7.5 1/3
• What is the probability distribution of the mean estimator ?
• What is the expected value of the sampling error ?
• Compute the variance. Compare it with that of SRS.
1.3 Probability sampling
1.3.1 Definition & Notation
Notation
• U = 1, 2, · · · , N : index set of finite population
• A : subset of U , index set of the sample.
1.3. PROBABILITY SAMPLING 5
• A = A;A ⊂ U,P (A) > 0: set of samples under consideration, sam-
ple support.
• θ = θ(yi; i ∈ A) : statistic (a real-valued function which can be calcu-
lated given that A is selected)
Definition
1. Probability distribution of samples, or sample distribution: probability
mass function P (·) defined on A. That is, P (·) satisfies
(a) P (A) ∈ [0, 1] , ∀A ∈ A
(b)∑
A∈A P (A) = 1.
It is also called the sampling design.
2. (Induced) probability distribution of a statistic
1. Expectation : E(θ) =∑
A∈A P (A)θ (A)
2. Variance : V ar(θ)=∑
A∈A P (A)[θ (A)− E(θ)
]23. Mean squared error :
MSE(θ)
=∑A∈A
P (A)[θ (A)− θ
]2= V ar
(θ)+[E(θ)− θ
]2Note:
(a) Sampling design induces the probability distribution of the statis-
tics.
(b) Sampling design P tells us everything we need for design-based
inference.
6 CHAPTER 1. INTRODUCTION
1.3.2 Probability sampling
• Definition
1. Sample distribution is known.
2. Pr (i ∈ A) > 0 for all i ∈ U
• Why probability sampling ?
1. No subjective choice of sample
2. Can remove the sampling bias.
• What is the sampling bias ( θ : true population value, θ: an estimator
of θ) ?
• Sampling error of θ:
θ − θ =θ − E
(θ)
+E(θ)− θ
= Variation + Bias
• How to find an unbiased estimator of θ =∑N
i=1 yi ?
– Probability sampling
– Horvitz-Thompson estimator
• In non-probability sampling, “Variation=0” but bias is not zero.
• In probability sampling, “Bias=0” but variation is not zero. Variation
is small if the sample size is large.
Additional advantage about probability sampling
1. Many statistical theories are established in probability sampling.
• Large sample theory under probability sampling
(a) Law of Large Numbers
1.4. BASIC PROCEDURES 7
(b) Central limit theorem
• Additional advantage of the probability sampling under large
sample size
(a) Reduce the variance of the estimator.
(b) Can compute the confidence intervals.
1.4 Basic procedures for survey sampling
1. Planning
(a) Statement of objectives
(b) Selection of a sampling frame
2. Design and development
(a) Sample design
(b) Questionnaire design
3. Implementation
(a) Data collection
(b) Data capture and coding
(c) Editing and Imputation
(d) Estimation
(e) Data analysis
(f) Data dissemination
4. Evaluation - Documentation
1.5 Errors
1. Errors of nonobservation
8 CHAPTER 1. INTRODUCTION
• Coverage error: (Population = Frame.)
Some elements are not listed.
• Sampling error: (Frame = sample.)
Some listed elements are not sampled.
• Response error: (Sample = respondents.)
Some sampled elements does not respond.
2. Errors of observation
• Measurement error
(a) Interviewer: skill, sex, age.
(b) Respondent: lie, forget, change behavior
(c) Instrument: questionnaire, measuring device
(d) Mode: mail, phone, personal interview
• Processing error: clerical error
Remark
1. If n = N (census), there is no sampling error but we still have non-
sampling error.
2. We can decrease sampling error by increasing n.
3. Because of nonsampling error, a sample is often more accurate than
a CENSUS. For example, in labor force surveys, the questionnaire
is more detailed and interviewers are better trained than the cen-
sus. Thus, in this case information about labor market is more ac-
curate than a census (unless the census had the same questionnaire
and well-trained interviews, which is almost impossible due to opera-
tional costs.)
4. Furthermore, sampling is faster, cheaper, and broader in scope.
Chapter 2
Horvitz-Thompson
estimation
2.1 Introduction
• Sampling frame
– list frame
– area frame
• Unit
– Sampling unit
– Reporting unit = Observational unit = Element
• Two types of sampling
– Element sampling
– Cluster sampling
• Parameter of interest: Y =∑N
i=1 yi (population total of y)
9
10 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
2.2 Basic setup
Definition 2.1. 1. First order inclusion probability:
πi = Pr (i ∈ A) =∑
A; i∈AP (A)
2. Second order inclusion probability):
πij = Pr (i, j ∈ A) =∑
A; i,j∈AP (A)
3. Probability sampling design : πi > 0, ∀i ∈ U
4. Measurable sampling design : πij > 0 ∀i, j ∈ U .
5. Ik: indicator random variable with
Ik = Ik (A) =
1 if k ∈ A
0 if k /∈ A
Note that
E (Ik) = πk
E (IkIl) = πkl
V (Ik) = πk (1− πk)
C (Ik, Il) = πkl − πkπl ≡ ∆kl
6. nA =∑N
k=1 Ik (A): (realized) sample size. If nA does not depend on
A, then it is fixed in the sense that V (nA) = 0.
2.2. BASIC SETUP 11
Example 2.1. (Bernoulli Sampling)
• Each unit is selected or not selected according to the outcome of a
Bernoulli trial with inclusion probability πk = π.
• Let ϵ1, ϵ2, · · · , ϵNi.i.d.∼ Uniform (0, 1). If ϵk ≤ π then accept unit k in
the sample. If ϵk > π then do not accept unit k in the sample.
• Sample size ns : Binomial random variable.
Pr (nA = x) =
(N
x
)πx (1− π)N−x I0,1,2,··· ,N (x) .
Thus,
E (nA) = Nπ
V (nA) = Nπ (1− π)
• Sampling design
P (A) = πnA (1− π)N−nA
where nA = |A|.
• Inclusion probabilities
12 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Lemma 2.1. (Properties of the inclusion probabilities)
1. πii = πi and πij = πji.
2. For a sampling design with (expected) sample size n,
N∑i=1
πi = n.
3. For fixed sample size design (nA = n),
N∑i=1
πij = nπj
and, for ∆ij = πij − πiπj,
N∑j=1
∆ij = 0.
2.2. BASIC SETUP 13
Definition 2.2. Horvitz-Thompson estimator of Y =∑N
i=1 yi:
YHT =∑i∈A
yiπi
It is also called π-estimator.
Theorem 2.1. (Properties of HT estimator)
1. Unbiased
E(YHT
)= Y
2. Variance
V ar(YHT
)=
N∑i=1
N∑j=1
(πij − πiπj)yiπi
yjπj
3. For fixed fixed sample size design (V (ns) = 0),
V ar(YHT
)= −1
2
N∑i=1
N∑j=1
(πij − πiπj)
(yiπi− yj
πj
)2
This formula is called Sen-Yates-Grundy (SYG) variance formula.
14 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Example 2.2. Consider the following sampling design from a finite popu-
lation U = 1, 2, 3.
Sample (A) Pr (A) HT estimator
A1 = 1, 2 0.5
A2 = 1, 3 0.25
A3 = 2, 3 0.25
1. Compute the first-order inclusion probability of each element in the
population.
2. Find the HT estimator for each sample.
3. Check that the HT estimator is unbiased.
2.2. BASIC SETUP 15
Variance estimation
• Unbiased variance estimator : want to find a statistic V such that
E(V)= V ar
(YHT
).
• Idea :
If Q =∑N
i=1
∑Nj=1Ωijyiyj , then Q =
∑i∈A∑
j∈A π−1ij Ωijyiyj is an
unbiased estimator of Q.
• HT variance estimator:
V =∑i∈A
∑j∈A
(πij − πiπj)
πij
yiπi
yjπj
• SYG variance estimator (for fixed-size design):
VSY G = −1
2
∑i∈A
∑j∈A
(πij − πiπj)
πij
(yiπi− yj
πj
)2
• Under what condition does an unbiased variance estimator of HT es-
timator exist ?
16 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Example 2.2 - Continued
Consider the following sampling design from a finite population U =
1, 2, 3 with y1 = 16, y2 = 21, y3 = 18.
Sample (A) Pr (A) HT estimate HT var. est. SYG var. est.
A1 = 1, 2 0.5
A2 = 1, 3 0.25
A3 = 2, 3 0.25
Check the unbiasedness of the variance estimates.
2.2. BASIC SETUP 17
Remark
1. For unbiased estimation, HT estimator can be used if the sampling
design is a probability sampling design.
2. When the HT estimator is efficient (has smaller variance) ?
(a) Note
V ar(YHT
)= −1
2
N∑i=1
N∑j=1
(πij − πiπj)
(yiπi− yj
πj
)2
When the variance is small ? (That is, how to choose πk to make
the variance small?)
(b) We don’t know yk in advance.
(c) Hence, if xk is correlated with yk, then we can choose πk ∝ xk.
(d) On other other hand, HT estimator works poorly if πk is not
correlated with yk. (Basu’s elephant example)
3. HT estimator is not location invariant.
Definition 2.3. Write θ = θ (A) = θ (yi; i ∈ A). θ is location invari-
ant if
θ (a+ yi; i ∈ A) = a+ θ (yi; i ∈ A)
for all a.
Thus, changing from “Celsius” to “Fahrenheit”: the estimates is not
the same.
18 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Basu’s elephant example
• Circus with N=50 elephants. Want to estimate the total weights of
the elephants using a sample of size n = 1
• About three years ago, every elephant is weighted and “Sambo” was
in the middle in terms of the weight. (and “Jumbo” was the largest
one.)
• Circus owner’s idea: measure Sambo’s weight and multiply it by 50.
• Statistician: No ! It’s not a probability sampling.
• Circus owner: Well, what is your sampling scheme ?
• Statistician: Let’s select sambo with high probability. Say, select
sambo with probability 99/100, and select the other 49 elephants with
probability (1/49)(1/100).
• Circus owner: OK. Let’s select one with this scheme. (Sambo is se-
lected.) OK. Let’s multiply 50 to sambo’s weight.
• Statistician: No ! You should multiply the inverse of the first order
inclusion probability. So, you should multiply by 100/99, not by 50.
• Circus owner: ????? What if Jumbo was selected? What number
should we multiply?
• Statistician: Well, it is 4,900.
• Circus owner: What??? You are fired !
2.3. SIMPLE RANDOM SAMPLING 19
2.3 Simple random sampling
• Motivation: Choose n units from N units without replacement.
1. Each subsect of n distinct units is equally likely to be selected.
2. There are(Nn
)samples of size n from N .
3. Give equal probability of selection to each subset with n units.
Definition 2.4. Sampling design for SRS:
P (S) =
1/(
Nn
)if |A| = n
0 otherwise.
Lemma 2.2. Under SRS, the inclusion probabilities are
πi = n/N
πij =n (n− 1)
N (N − 1)for i = j.
Theorem 2.2. Under SRS design, the HT estimator
YHT =N
n
∑i∈A
yi = Ny
is unbiased for Y and has variance of the form
V(YHT
)=
N2
n
(1− n
N
)S2
where
S2 =1
2
1
N
1
N − 1
N∑i=1
N∑j=1
(yi − yj)2 =
1
N − 1
N∑i=1
(yi − Y
)2.
Also, the SYG variance estimator is
V(YHT
)=
N2
n
(1− n
N
)s2
where
s2 =1
n− 1
∑i∈A
(yi − y)2 .
20 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Remark (under SRS)
• 1 − n/N is often called the finite population correction (FPC) term.
The FPC term can be ignored (FPC.= 1) if the sampling rate n/N is
small (≤ 0.05) or for conservative inference.
• For n = 1, the variance of the sample mean is
1
n
(1− n
N
)S2 =
1
N
N∑i=1
(yi − Y
)2 ≡ σ2Y
• Central limit theorem: under some conditions,
V −1/2(YHT − Y
)=
y − Y√1n
(1− n
N
)S2→ N (0, 1) .
• Sample size determination
1. Choose the target variance V ∗ of V (y).
2. Choose n the smallest integer satisfying
1
n
(1− n
N
)S2 ≤ V ∗.
2.4. DOMAIN ESTIMATION 21
2.4 Domain estimation
Basic setup
• Estimation for domains (subpopulation): Usually want to make infer-
ence about subpopulations as well as the whole population.
• Often, we don’t plan for all subpopulation of interest => random
sample size within subpopulations.
• Denote domain d by Ud ⊂ U . Parameters are
– Nd = |Ud|: number of elements in Ud
– Pd = Nd/N : proportion of elements in Ud. Often, N is known
but Nd is unknown.
– td =∑
i∈Udyi: domain total of y in domain d
– Yd = td/Nd: domain mean of y in domain d
Estimation
• Methods
1. Direct method: Use HT estimation yd,HT =∑
i∈UdyiIi/πi.
2. Model-based method: Use a prediction to create a synthetic value
td,syn =∑
i∈Udyi
3. Small area estimation: Compromise
22 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Direct estimation
• For k = 1, 2, · · · , N , define
zkd =
1 if k ∈ Ud
0 if k /∈ Ud
Note that zid is not a random variable. (i.e., it does not depend on
the sampling scheme.)
• Properties of zkd
1.∑
k∈U zkd = Nd
2. Zd =∑
k∈U zkd/N = Nd/N = Pd
3.
S2zd =
1
N − 1
(∑k∈U
z2kd −NZ2d
)=
N
N − 1Pd (1− Pd)
• HT estimation of Nd
Nd =∑k∈U
zkdIkπk
Under SRS,
Nd =∑k∈U
zkdIkn/N
= Nnd/n = Npd
and
V ar(Nd
)=
N2
n
(1− n
N
)S2zd =
N2
n
(1− n− 1
N − 1
)Pd (1− Pd)
V(Nd
)=
N2
n
(1− n
N
)s2zd = N2
(1− n
N
) pd (1− pd)
n− 1.
2.4. DOMAIN ESTIMATION 23
• HT estimation of td =∑
k∈Udyk =
∑k∈U ykzkd:
td =∑k∈U
ykzkdIkπk
=∑k∈S
ykzkdπk
.
It is unbiased for td.
• HT estimator of Yd = td/Nd:
yd =td
Nd
Probably not unbiased, because it’s a non-linear function of unbiased
estimators.
• Generally, we will make population parameters look like functions of
population totals and then do HT estimation on each totals.
• The statistical properties of yd can be derived from the following ap-
proximation:
yd =td
Nd
= f(Nd, td
).= f (Nd, td) +
∂
∂tdf (Nd, td)
(td − td
)+
∂
∂Ndf (Nd, td)
(Nd −Nd
)=
tdNd
+
(1
Nd
)(td − td
)+
(− tdN2
d
)(Nd −Nd
)Thus,
V ar (yd).= V ar
1
Nd
(td − YdNd
).
Under SRS,
V ar (yd).=
(1
E(nd)− 1
Nd
)1
Nd − 1
∑i∈Ud
(yi − Yd
)2.
24 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION
Chapter 3
Element sampling design
3.1 Introduction
• Element sampling vs. cluster sampling
• Taxonomy
Equal probability sampling Unequal probability sampling
SRS (without replacement) πps sampling
SRS with replacement PPS sampling
Bernoulli sampling Poisson sampling
Systematic sampling Systematic πps sampling
25
26 CHAPTER 3. ELEMENT SAMPLING DESIGN
• Why consider unequal probability sampling ?
Example: N = 4 population of companies
Farm Size y (yield)
A 100 11
B 200 20
C 300 24
D 1,000 245
Select n = 1 unit by
– With equal probability
– With probability proportional to size :
Compare the variances.
3.2. POISSON SAMPLING 27
3.2 Poisson sampling
• Definition:
Iiindep∼ Bernoulli (πi) , i = 1, 2, · · · , N.
• If πi = π, it is called Bernoulli sampling.
• Estimation
YHT =
N∑i=1
Iiyi/πi
• Variance
V ar(YHT
)=
N∑i=1
(1
πi− 1
)y2i
• Optimal design: minimize V ar(YHT
)subject to
∑Ni=1 πi = n
πi ∝ yi
To prove this, use Cauchy-Schwarz inequality(n∑
i=1
a2i
) n∑j=1
b2j
≥ ( n∑i=1
aibi
)2
with equality if and only if ai ∝ bi for all i = 1, · · · , n.
28 CHAPTER 3. ELEMENT SAMPLING DESIGN
• Disadvantage: sample size is random and it can decrease the efficiency
of the HT estimator.
Example 3.2.1
N=600 of students who took a test in a university. Want to estimate
the passing rate on the test. Use a Bernoulli sampling with π = 1/6.
ns = 90 sample size is realized. Among the 90 sample students, 60
students are found to have passed. What is a reasonable estimator of
the total number of students who passed the test ?
• Remedy
1. Use an alternative estimator is
Y = N
∑Ni=1 Iiyi/πi∑Ni=1 Ii/πi
.
It is often called Hajek estimator. Its variance is
V ar(Y)
.=
N∑i=1
(1
πi− 1
)(yi − y)2 .
2. Use rejective sampling:
“conditional distribution of the Bernoulli sampling distribution
given n = n0 ⇐⇒ simple random sampling without replacement
with size n0”
3.3. PPS SAMPLING 29
3.3 PPS sampling
• Basic Setup
1. Let x1, · · · , xN be known characteristics of the population ele-
ments such that xk > 0 for all i. This xi is called the measure
of size (MOS). Examples of MOS include the size of farm, the
number of employees in a company, and the acreage of counties.
2. Wish to select a sample with the selection probability propor-
tional to xi.
3. If the sample size is equal to one, then it is an easy job to select
a sample with probability proportional to xi.
4. Probability proportional to size (PPS) sampling idea: Use m in-
dependent selection of a sample of size one with probability pro-
portional to xi. Thus, it is a with-replacement sampling in the
sense that, once drawn, an element is replaced into the population
so that all N elements participate in each draw.
• With-replacement sampling:
– Pro: Easy to implement and to investigate its properties. Maybe
a good approximation of without-replacement sampling if n/N is
negligible.
– Con: Sample elements can be duplicated. Inefficient.
30 CHAPTER 3. ELEMENT SAMPLING DESIGN
Example : Simple random sampling with replacement
• Make m independent selections.
• Each element has 1/N probability of selection.
• On the i-th draw (i = 1, · · · ,m), unit ki is selected and replaced.
• Probability that unit k is drawn r times
=
(m
r
)(1
N
)r (1− 1
N
)m−r
• First order inclusion probability
πk = 1−(1− 1
N
)m
• Second order inclusion probability
πkl = 1− 2
(1− 1
N
)m
+
(1− 2
N
)m
, k = l
• HT estimator is not useful because the actual sample size (excluding
the duplication) is random.
3.3. PPS SAMPLING 31
How to select a pps sample with n = 1?
1. Cumulative total method
[Step 1] Set T0 = 0 and compute Tk = Tk−1 + xk, k = 1, 2, · · · , N .
[Step 2] Draw ϵ ∼ Unif (0, 1). If ϵ ∈ (Tk−1/TN , Tk/TN ), element k is
selected.
Very popular. It needs a list of all xk in the population.
2. Lahiri’s method
[Step 0] Choose M > x1, x2, · · · , xN. Set r = 1.
[Step 1] Draw kr by SRS from 1, 2, · · · , N.
[Step 2] Draw ϵr ∼ Unif (0, 1).
[Step 3] If ϵr ≤ xkr/M , then select element kr and stop. Otherwise,
reject kr and goto Step 1 with r = r + 1.
The basic idea is based on the rejection algorithm due to Von Ney-
mann.
32 CHAPTER 3. ELEMENT SAMPLING DESIGN
In with-replacement sampling, order of the sample selection is important.
• Ordered sample
OS = (k1, k2, · · · , km)
where ki is the index of the element in the i-th with-replacement sam-
pling.
• Sample: S = k; k = ki for some i, i = 1, 2, · · · ,m
• Unequal probability with replacement: Consider p1, p2, · · · , pN > 0
such that∑N
i=1 pi = 1. We can construct pk from xk (MOS) by pk =
xk/∑N
i=1 xi. On the i-th draw, label k is selected with probability pk.
That is Pr (ki = k) = pk. Note that
πk = Pr (k ∈ S)
= 1− Pr (k /∈ S)
= 1− (1− pk)m
1. For m = 1, πk = pk.
2. For m > 1 and pk’s are small, πk.= mpk.
3.3. PPS SAMPLING 33
• Estimator of t =∑N
i=1 yi:
1. First, define
Zi =ykipki
=
N∑k=1
ykpk
I (ki = k) .
Note that Z1, · · · , Zm are independent random variables since the
m draws are independent.
2. Z1, · · · , Zm are identically distributed since the same probabil-
ities are used at each draw, where E (Zi) = t and V (Zi) =∑Nk=1
(ykpk− t)2
pk ≡ V1.
3. Thus, Z1, · · · , Zm are IID with mean t and variance V1. Use
z =∑m
k=1 Zk/m to estimate t.
4. Hansen-Hurwitz estimator:
tpwr ≡m∑k=1
Zk/m
where Zi = yk/pk if ki = k.
5. Properties
(a) Unbiased estimator of t
(b) V(tpwr
)= V1/m by standard result.
(c) Unbiased estimator of V(tpwr
)is
V(tpwr
)=
1
m
1
m− 1
m∑i=1
(zi − z)2
(d) For large m, tpwr is AN (t, V1/m).
34 CHAPTER 3. ELEMENT SAMPLING DESIGN
3.4 πps sampling
• Ideally,
1. The actual selection of the sample is relatively simple.
2. The first-order inclusion probabilities πk are strictly proportional
to xk.
3. The second-order inclusion probabilities satisfy πkl > 0 for all
k = l. (measurable sampling design)
4. The πkl can be computed without very heavy calculations.
5. ∆kl = πkl − πkπl < 0 for all k = l to guarantee that the SYG
variance estimator is always nonnegative.
• Motivation:
– PPS sampling satisfies the above conditions but it can have du-
plicated sample elements → inefficient.
– Want to find a fixed-size sampling design with πk ∝ xk where
xk > 0 and known. It is called πps design.
• Remark: For fixed-size design, πk ∝ xk and∑N
i=1 πi = n leads to
πk =nxk∑Ni=1 xi
,
which can be contradictory to the fact that πk → 1 for n→ N . Also,
πk can be grater than 1 if xk is extremely large.
3.4. πPS SAMPLING 35
πps sample with n = 2:
• Notation
– θi: the probability of selecting i in the first sample selection.
– θj|i: the probability of selecting j in the second sample selection
given that i is selected in the first sample selection.
• Inclusion probabilities
– Second order inclusion probability (for i = j)
πij = θiθj|i + θjθi|j
– First order inclusion probability: Since∑
j =i πij = πi,
πi = θi +∑j =i
θjθi|j
– Restrictions on θi and θj|i:
θi +∑j =i
θjθi|j = 2pi
where pi = xi/∑N
k=1 xk.
36 CHAPTER 3. ELEMENT SAMPLING DESIGN
πps sample with n = 2:
• Brewer (1963) method: Use
θi ∝pi (1− pi)
1− 2pi
and
θj|i ∝ pj
• Durbin (1967) method: Use
θi ∝ pi
and
θj|i ∝ pj
(1
1− 2pi+
1
1− 2pj
)• The two methods produce the same inclusion probabilities
πij =2pipj1 +K
(1
1− 2pi+
1
1− 2pj
)where K =
∑Ni=1 (1− 2pi)
−1 pi. Thus,
πi =∑j =i
πij = 2pi.
3.5 Systematic sampling
• Setup:
1. Have N elements in a list.
2. Choose a positive integer, a, called sampling interval. Let n =
[N/a]. That is, N = na+ c, where c is an integer 0 ≤ c < a.
3. Select a random start, r, from 1, 2, · · · , a with equal probability.
4. The final sample is
S = r, r + a, r + 2a, · · · , r + (n− 1)a , if c < r ≤ a
= r, r + a, r + 2a, · · · , r + na , if 1 ≤ r ≤ c.
3.5. SYSTEMATIC SAMPLING 37
• Sample size can be random
ns =
n if c < r ≤ a
n+ 1 if r ≤ c
• Inclusion probabilities
πk =
πkl =
38 CHAPTER 3. ELEMENT SAMPLING DESIGN
Remark
• This is very easy to do.
• This is a probability sampling design.
• This is not measurable sampling design: No design-unbiased estimator
of variance (because only one random draw)
• Pick one set of elements (which always go together) & measure each
one: Later, we will call this cluster sampling.
• Divide population into non-overlapping groups & choose an element
in each group: closely related to stratification.
3.5. SYSTEMATIC SAMPLING 39
Estimation
• Partition the population into a groups
U = S1 ∪ S2 ∪ · · · ∪ Sa
where Si: disjoint
• Population total
t =∑i∈U
yi =
a∑r=1
∑k∈Sr
yk =
a∑r=1
tSr
• Think of finite population with a elements with measurements tS1 , · · · , tSa .
• HT estimator:
tHT =tSr
1/a
• Variance: Note that we are doing SRS from the population of a ele-
ments tS1 , · · · , tSa.
V ar(tHT
)=
a2
1
(1− 1
a
)S2t
where
S2t =
1
a− 1
a∑r=1
(tSr − t)2
and t =∑a
r=1 tSr/a = t/a.
• When the variance is small ?
40 CHAPTER 3. ELEMENT SAMPLING DESIGN
Estimation - Continued
• Now, assuming N = na
V (tHT ) = a (a− 1)S2t
= n2a
a∑r=1
(ySr − yu)2
where ySr = tSr/n and yu = t/n.
• ANOVA
SST =∑k∈U
(yk − yu)2
=
a∑r=1
∑k∈Sr
(yk − ySr)2 + n
a∑r=1
(ySr − yu)2
= SSW + SSB
Then,
V (tHT ) = na · SSB = N · SSB = N (SST − SSW ) .
• If SSB is small, then ySr are more alike and V (tHT ) is small.
• If SSW is small, then V (tHT ) is large.
• Intraclass correlation coefficient ρ measures homogeniety of clusters.
ρ = 1− n
n− 1
SSW
SST
More details about ρ will be covered in the cluster sampling.
3.5. SYSTEMATIC SAMPLING 41
Comparison between systematic sampling (SY) and SRS
• How does SY compare to SRS when the population is sorted by the
following way ?
1. Random ordering: Intuitively should be the same
2. Linear ordering: SY should be better than SRS
3. Periodic ordering: if period = a, SY can be terrible.
4. Autocorrelated order: Successive yk’s tend to lie on the same side
of yu. Thus, SY should be better than SRS.
• How to quantify ? :
VSRS
(tHT
)=
N2
n
(1− n
N
) 1
N − 1
N∑k=1
(yk − YN
)2VSY
(tHT
)= n2a
a∑r=1
(ySr − yu)2
Cochran (1946) introduced superpopulation model to deal with this
problem. (treat yk as a random variable)
• Example: Superpopulation model for a population in random order.
Denote the model by ζ: yk iid(µ, σ2
)Eζ
VSRS
(tHT
)=
N2
n
(1− n
N
)σ2
Eζ
VSY
(tHT
)=
N2
n
(1− n
N
)σ2
Thus, the model expectations of the design variances are the same
under the IID model.
42 CHAPTER 3. ELEMENT SAMPLING DESIGN
3.6 Stratified sampling
• Stratified sampling:
1. The finite population is stratified into H subpopulations.
U = U1 ∪ · · · ∪ UH
2. Within each population (or stratum), samples are drawn inde-
pendently across the strata.
Pr (i ∈ Sh, j ∈ Sg) = Pr (i ∈ Sh)Pr (j ∈ Sg) , for h = g
where Sh is the index set of the sample in stratum h, h =
1, 2, · · · ,H.
• Why stratification ?
1. Control for domains of study
2. Flexibility in design and estimation
3. Convenience
4. Efficiency
3.6. STRATIFIED SAMPLING 43
• HT estimation for t =∑H
h=1 th, where th =∑
i∈Uhyi.
1. HT estimator:
tHT =H∑
h=1
th,HT
where th,HT is unbiased for th.
2. Variance
V ar(tHT
)=
H∑h=1
V ar(th,HT
)by independence
3. Variance estimation
V(tHT
)=
H∑h=1
Vh
(th,HT
)where Vh
(th,HT
)is unbiased for V ar
(th,HT
).
• Example: Stratified SRS
1. HT estimator:
tHT =
H∑h=1
Nhyh
where yh = n−1h
∑i∈Sh
yi.
2. Variance
V ar(tHT
)=
H∑h=1
N2h
nh
(1− nh
Nh
)S2h
where S2h = (Nh − 1)−1∑
i∈Uh
(yi − Yh
)2.
3. Variance estimation
V(tHT
)=
H∑h=1
N2h
nh
(1− nh
Nh
)s2h
where s2h = (nh − 1)−1∑i∈Sh
(yi − yh)2.
44 CHAPTER 3. ELEMENT SAMPLING DESIGN
• Sample allocation: Given n =∑H
h=1 nh, how to choose nh ?
1. Proportional allocation: choose nh ∝ Nh.
2. Optimal allocation: choose nh such that
minimize V ar(tHT
)subject to
H∑h=1
chnh = C,
where ch is the cost of observing an element in stratum h and C
is a given total cost. The solution (Neyman, 1934) is
nh ∝ NhSh/√ch
3. Properties
– Under proportional allocation, the weights are all equal.
– In general,
Vopt
(tHT
)≤ Vprop
(tHT
)≤ VSRS
(tHT
)where Vopt
(tHT
)is the variance of the stratified sampling
estimator under optimal allocation, Vprop
(tHT
)is the vari-
ance of the stratified sampling estimator under proportional
allocation, and VSRS
(tHT
)is the variance of SRS estimator.
Method of collapsed strata
• nh ≡ 1: No unbiased estimator of V ar(tHT
)under stratified sampling.
• Form pairs of strata:
t1, · · · , tH →(tj1, tj2
), j = 1, 2, · · · , H/2
where H: even
• Variance estimator
Vcoll =
H/2∑j=1
(tj1 − tj2
)2
3.6. STRATIFIED SAMPLING 45
• Property
E(Vcoll
)= E
H/2∑j=1
(tj1 − tj1
)−(tj2 − tj2
)− (tj2 − tj1)
2=
H/2∑j=1
V ar
(tj1)+ V ar
(tj2)+ (tj2 − tj1)
2
=
H∑h=1
V ar(th)+
H/2∑j=1
(tj1 − tj2)2
≥ V ar(tHT
)Thus, it is a conservative variance estimator.
46 CHAPTER 3. ELEMENT SAMPLING DESIGN
3.7 Systematic πps
• Let I =∑N
i=1 xi/n. Assume xk < I for all k ∈ U .
• Systematic πps sampling
1. Choose R ∼ Unif (0, I]
2. Unit k is selected iff
k−1∑j=1
xj < R+ l · I ≤k∑
j=1
xj
for some l = 0, 1, · · · , n− 1.
• Inclusion probability: if∑k−1
j=1 xj ≥ l · I,
Pr (k ∈ S) = Pr
k−1∑j=1
xj < R+ l · I ≤k∑
j=1
xj
=
nxk∑k∈U xk
.
• Advantage:
– Easy to implement
– If xk < I, you get πps.
– Automatic stratification, like systematic sampling
• Disadvantage
– No design-unbiased variance estimator
– If xk ≥ I for some k, not strict πps.
=> Pick these with probability 1.
Chapter 4
Cluster sampling
4.1 Introduction
• Setup:
1. Frame list clusters (disjoint groups of population elements)
2. Select clusters by a probability sampling.
• Why cluster sampling?
1. Frame inadequacy: No way to get a list of elements (very expen-
sive) but relatively easy or cheap to list clusters
2. Convenience: By grouping elements into “close” subgroups, you
can save time or money
47
48 CHAPTER 4. CLUSTER SAMPLING
• Single stage cluster sampling: Sample clusters and observe all elements
in each selected cluster.
• Two-stage sampling:
[Stage 1] Population is divided into clusters, called Primary Sampling
Units (PSU’s). A probability sample of PSU’s is drawn.
[Stage 2] Each selected PSU is divided into clusters or elements,
called Secondary Sampling Units (SSU’s). A probability sample
of SSU’s is drawn in each selected PSU.
If SSU=cluster, it is called two-stage cluster sampling.
If SSU=element, it is called two-stage element sampling
• Multi-stage sampling: PSU, SSU, ..., USU (Ultimate Sampling Unit)
If USU=cluster, it is called multi-stage cluster sampling.
If USU=element, it is called multi-stage element sampling.
Probably, don’t know N . So, estimation of yU is more difficult.
4.2. SINGLE-STAGE CLUSTER SAMPLING 49
4.2 Single-Stage Cluster Sampling
• Notation (about population)
– UI = 1, · · · , NI: index set of clusters in the population
– Ui: the set of elements in the i-th cluster of sizeMi, (i = 1, 2, · · · , NI)
– yij : measurement of item y at the j-th element (j = 1, 2, · · · ,Mi)
in cluster i, i = 1, 2, · · · , NI .
– Population total: t =∑NI
i=1
∑Mij=1 yij =
∑NIi=1 ti =
∑NIi=1MiYi
where ti =∑Mi
j=1 yij = MiYi
– Population size: N =∑NI
i=1Mi
• Notation (about sample)
– SI : index set of clusters in the sample
– nI = |SI |: the number of sampled clusters
– S = ∪i∈SIUi: index set of elements in the sample
– nS = |S| =∑
i∈SIMi: the number of sampled elements
Usually, nS is not fixed even if nI is fixed.
• Single-stage cluster sampling
1. Draw a probability sample SI from UI via pI (·).
2. Observe every elements in each selected clusters.
50 CHAPTER 4. CLUSTER SAMPLING
• Cluster inclusion probability
πIi = Pr (i ∈ SI) =∑
SI ;i∈SI
pI (SI)
πIij = Pr (i, j ∈ SI) =∑
SI ;i,j∈SI
pI (SI)
• Element inclusion probability
πk = Pr (k ∈ S) = Pr (i ∈ SI) = πIi where k ∈ Ui
πkl = Pr (k, l ∈ S) =
Pr (i ∈ SI) = πIi where k, l ∈ Ui
Pr (i, j ∈ SI) = πIij where k ∈ Ui, l ∈ Uj
• HT estimation
1. Point estimator:
tHT =∑i∈SI
tiπIi
=∑i∈UI
tiIIiπIi
2. Variance
V ar(tHT
)=∑i∈UI
∑j∈UI
tiπIi
tjπIj
(πIij − πIiπIj)
3. Variance estimation
V(tHT
)=∑i∈SI
∑j∈SI
tiπIi
tjπIj
∆Iij
πIij
provided πIij > 0, where ∆Iij = πIij − πIiπIj .
4.2. SINGLE-STAGE CLUSTER SAMPLING 51
Remark
For fixed size design (nSI= nI),
V ar(tHT
)= −1
2
∑i∈UI
∑j∈UI
∆Iij
(tiπIi− tj
πIj
)2
1. If πIi ∝ ti, then tHT = t
2. If πIi ∝ Ni and if yUi is constant, then tHT = t.
3. Equal probability sampling design is generally inefficient (unless Mi ∝Y −1i )
Example: Simple random cluster sampling (SIC)
• pI (·): SRS of nI clusters from NI
• HT estimation
tHT =∑i∈SI
tiπIi
=NI
nI
∑i∈SI
ti = NI tSI
• Variance
V ar(tHT
)=
N2I
nI
(1− nI
NI
)S2t
where
S2t =
1
NI − 1
∑i∈UI
(ti − tUI)2
52 CHAPTER 4. CLUSTER SAMPLING
Alternative estimation under unequal Mi
• From each sampled cluster i, we observe(Mi, Yi
).
• If we know N =∑
i∈UIMi, we can use this information to improve
tHT =∑
i∈SIti/πIi: ratio estimation idea
tR = N
∑i∈SI
ti/πIi∑i∈SI
Mi/πIi
• Under SIC, for example, the variances are
V(tHT
)=
N2I
nI
(1− nI
NI
)1
NI − 1
NI∑i=1
(ti − tU )2
and
V(tR) .=
N2I
nI
(1− nI
NI
)1
NI − 1
NI∑i=1
(ti −MiYU
)2
4.2. SINGLE-STAGE CLUSTER SAMPLING 53
Special Case of Mi = M under SIC
• Estimation of mean:
ˆYU =tHT
NIM=
1
nI
1
M
∑i∈SI
M∑j=1
yij =1
nI
∑i∈SI
Yi
• Variance
V ar(ˆYU
)=
1
nI
(1− nI
NI
)1
NI − 1
NI∑i=1
(Yi − YU
)2where Yi =
∑Mj=1 yij/M and YU = t/(NIM) =
∑NIi=1 Yi/NI .
• ANOVA
Source D.F. Sum of Squares Mean S.S.
Between cluster NI − 1 SSB S2b
Within cluster NI (M − 1) SSW S2w
Total NIM − 1 SST S2
where
SSB =
NI∑i=1
M(Yi − YU
)2SSW =
NI∑i=1
M∑j=1
(yij − Yi
)2SST =
NI∑i=1
M∑j=1
(yij − YU
)2and MSS=SS/(d.f.). Note that
S2 =(NI − 1)S2
b +NI (M − 1)S2w
NIM − 1∼=
S2b + (M − 1)S2
w
M.
The variance is
V ar(ˆYU
)=
1
nIM
(1− nI
NI
)S2b
54 CHAPTER 4. CLUSTER SAMPLING
Design effect
• Want to compare the current sampling design p (·) with SRS of equal
sample size
• Kish (1965) introduced design effect
deff(p, tHT
)=
Vp
(tHT
)VSRS
(tHT
)• Two uses
1. Compare designs
– If deff > 1, then p (·) is less efficient than SRS.
– If deff < 1, then p (·) is more efficient than SRS.
2. Determine sample size:
(a) Have some desired variance V ∗
(b) Under SRS, you can easily find required sample size n∗
(c) Choose n∗p = deff · n∗.
Then,
Vp
(tHT | n∗
p
)= V ∗.
• n∗ is often called effective sample size. It is the sample size required
for the given V ∗ if the sample design is SRS.
4.2. SINGLE-STAGE CLUSTER SAMPLING 55
Intracluster correlation coefficient
• A measure of within cluster homogeneity
• Assume Mi = M
• Intracluster correlation coefficient
ρ =Cov[yij , yik|j = k]√
V (yij)√
V (yik)
=
∑Ni=1
∑j =k
∑(yij − Y )(yik − Y )/
∑Ni=1M(M − 1)∑N
i=1
∑Mj=1(yij − Y )2/
∑Ni=1M
• Properties
1. ρ = 1− M
M − 1
SSW
SST
2. Since 0 ≤ SSW ≤ SST ,
− 1
M − 1≤ ρ ≤ 1.
For SSW = 0, ρ = 1: perfect homogeneity in cluster.
For SSW = SST , ρ = −1/(M − 1): perfect heterogeneity in
cluster. Each cluster is like the whole population in terms of
variability.
• Variance
VSIC
(ˆY)= VSRS(
ˆY )[1 + (M − 1)ρ]
Thus,
deff = 1 + (M − 1) ρ
56 CHAPTER 4. CLUSTER SAMPLING
Homogeneity coefficient
• A measure of within cluster homogeneity
δ = 1−∑
i∈UI
∑j∈Ui
(yij − Yi
)2/ (N −NI)∑
i∈UI
∑j∈Ui
(yij − YU
)2/ (N − 1)
= 1− SSW/(N −NI)
SST/(N − 1)
• For SSW = 0, δ = 1: perfect homogeneity in cluster. For SSW =
SST , δ = −(NI − 1)/(N −NI): perfect heterogeneity in cluster.
• In practice, δ tend to be positive
• Using δ, compute deff for SIC
– Let M =∑NI
i=1Mi/NI = N/NI and let
Cov =1
NI − 1
NI∑i=1
(Mi − M
)MiY
2i
be the finite population covariance between Mi and MiY2i . Note
that, if Mi = M then Cov = 0. Also, if Y 2i is constant, then
Cov ∼= Cov (Ni, Nj).
– Some algebra shows that
VSIC
(tHT
)=
N2I
nI
(1− nI
NI
)S2t
=
(1 +
N −NI
NI − 1δ
)MS2
yKI + Cov ·KI
where KI =N2
InI
(1− nI
NI
).
• To make a fair comparison with SRS, we need expected number of
elements (not clusters) under SIC
ESIC (nS) = ESIC
∑i∈SI
Mi
= nIM
4.3. TWO-STAGE SAMPLING 57
• Using n = nIM ,
VSRS
(tHT
)=
N2
nIM
(1− nIM
N
)S2y
=N
NIS2y
N2I
nI
(1− nI
NI
)= MS2
yKI
• Thus,
deff =VSIC
(tHT
)VSRS
(tHT
) =
(1 +
N −NI
NI − 1δ
)+
Cov
MS2y
• Two sources for leading to deff > 1:
1. δ > 0 (homogeneity) reduces efficiency. N may be much bigger
than NI , so even small positive δ can have a big impact.
2. Cov > 0 reduces efficiency. (Variability of cluster size reduces
efficiency.)
4.3 Two-stage Sampling
• Setup:
1. Stage 1: Draw SI ⊂ UI via pI (·)
2. Stage 2: For every i ∈ SI , draw Si ⊂ Ui via pi (· | SI)
Sample of elements: S = ∪i∈SISi
• Some simplifying assumptions
1. Invariance of the second-stage design pi (· | SI) = pi (·) for everyi ∈ UI and for every SI such that i ∈ SI
2. Independence of the second-stage design
P (S = ∪i∈SIsi | SI) =
∏i∈SI
Pr (Si = si | SI)
• Remark: A non-invariant design is two-phase sampling design.
58 CHAPTER 4. CLUSTER SAMPLING
1. Phase 1: Select a sample and observe xi
2. Phase 2: Based on the observed value of xi, the second-phase
sampling design is determined. The second-phase sample is se-
lected by the second-phase sampling design.
• Notation: Sample size
– nSI: Number of PSU’s in the sample. If the first-stage sampling
is a fixed-size sampling design, then nSI= nI .
– mSi : Number of sampled elements in Si. If the second-stage
sampling is a fixed-size sampling design, then mSi = mi.
–∑
i∈SImSi = |S|: The number of sampled elements.
• Notation: Inclusion probability
– Cluster inclusion probability: πIi and πIij (same as in the single-
stage cluster sampling)
– Conditional inclusion probability:
πk|i = Pr [k ∈ Si | i ∈ SI ]
πkl|i = Pr [k, l ∈ Si | i ∈ SI ]
∆kl|i = πkl|i − πk|iπl|i.
In general, πk|i is a random variable (in the sense that it is a
function of SI). Under invariance, it is fixed.
– Element inclusion probability:
∗ First order inclusion probability
πk = Pr [k ∈ S] = Pr (k ∈ Si | i ∈ SI)Pr (i ∈ SI) = πk|iπIi
if k ∈ Ui.
4.3. TWO-STAGE SAMPLING 59
∗ Second order inclusion probability
πkl =
πIiπk|i if k = l ∈ Ui
πIiπkl|i if k&l ∈ Ui, k = l
πIijπk|iπl|j if k ∈ Ui, l ∈ Uj (i = j)
• HT estimation for t =∑
i∈UIti =
∑i∈UI
∑k∈Ui
yk:
tHT =∑i∈SI
tiπIi
=∑i∈SI
∑k∈Si
ykπk|iπIi
• Properties of tHT
1. Unbiased
2. Variance
V(tHT
)= VPSU + VSSU
where
VPSU =∑i∈UI
∑j∈UI
∆IijtiπIi
tjπIj
VSSU =∑i∈UI
Vi
πIi
with
Vi = V(ti | Si
)=∑k∈Ui
∑l∈Ui
∆kl|iykπk|i
ylπl|i
.
• Remark
1. If SI = UI , then the design is a stratified sampling. Note that
πIi = 1, πIij = 1, and ∆Iij = 0 for all i, j. Thus, V(tHT
)=∑
i∈UIVi/1.
2. If Si = Ui for every i ∈ SI , then the design is single-stage cluster
sampling and V(tHT
)= VPSU .
60 CHAPTER 4. CLUSTER SAMPLING
• Variance estimation
V(tHT
)= VPSU + VSSU
=∑i∈SI
∑j∈SI
∆Iij
πIij
tiπIi
tjπIj
+∑i∈SI
Vi
πIi,
where
VPSU =∑i∈SI
∑j∈SI
∆Iij
πIij
tiπIi
tjπIj−∑i∈SI
1
πIi
(1
πIi− 1
)Vi
VSSU =∑i∈SI
Vi
π2Ii
and Vi satisfies E(Vi | SI
)= V
(ti | SI
). Here, we used the fact
E(titj | SI
)=
titj if i = j
Vi + t2i if i = j.
by the independence of the second-stage sampling across the clusters.
• Often,∑
i∈SI
ViπIi
is ignored. (if nI/NI.= 0).
4.3. TWO-STAGE SAMPLING 61
Example 1: Two-stage SRS cluster sampling with equal size Mi = M
• Sampling design
1. Stage 1: Select SRS of clusters of size nI from a population of NI
clusters.
2. Stage 2: Select SRS of elements of size m from a cluster of Mi =
M elements at each cluster.
• HT estimator of Y =∑NI
i=1
∑Mj=1 yij/(NIM):
ˆY =1
nIm
∑i∈SI
∑j∈Si
yij
• Variance
V(ˆY)
=
(1− nI
NI
)S2b
nIM+(1− m
M
) S2w
nIm
Or,
V(ˆY)=
(1− nI
NI
)S21
nI+(1− m
M
) S22
nIm
where
S21 =
1
NI − 1
NI∑i=1
(Yi − Y
)2=
S2b
M
and
S22 = S2
w =1
NI (M − 1)
NI∑i=1
M∑j=1
(yij − Yi
)2.
• Variance components:
S2b = S2 1 + (M − 1) ρ
S2w = S2 (1− ρ)
• Ignoring f1 = nI/NI ,
V(ˆY)
.=
S2
nIm1 + (m− 1) ρ
Thus, Design effect = 1 + (m− 1) ρ
62 CHAPTER 4. CLUSTER SAMPLING
• Sample Size determination: Minimize
V(ˆY)=
1
nI
(S21 −
1
MS22
)+
S22
m
+Constant
subject to C = c1nI + c2nIm:
m∗opt =
√c1c2· S2√
S21 − S2
2/M
• Variance estimation
V(ˆY)=
(1− nI
NI
)s21nI
+nI
NI
(1− m
M
) s22nIm
where
s21 = (nI − 1)−1∑i∈SI
(yi − ˆY
)2s22 = n−1
I (m− 1)−1∑i∈SI
∑j∈Si
(yij − yi)2
and yi =∑
j∈Siyij/m.
• If nI/NI.= 0, then V ( ˆY ) = s21/nI .
4.3. TWO-STAGE SAMPLING 63
Example 2: Two-stage PPS sampling
• Sampling design
1. Stage One: PPS sampling of nI clusters with MOS = Mi
2. Stage Two: SRS sampling of m elements in each selected clusters.
• Estimation of mean
ˆYPPS =1
nIm
∑i∈SI
∑j∈Si
yij
Self-weighting design: equal weights
• Variance estimation
V(ˆYPPS
)=
1
nIs2z
where
s2z =1
nI − 1
nI∑k=1
(zk − zn)2
and zk = ti/Mi if cluster i is selected in the k-th PPS sampling.
64 CHAPTER 4. CLUSTER SAMPLING
Chapter 5
Estimation
5.1 Introduction
• So far, we have discussed various sampling designs and its unbiased
estimator.
• HT estimator is used for each sampling design (except for PPS sam-
pling). No claim for optimality.
Definition
For parameter θ (y), y = (y1, y2, · · · , yN )′, an estimator θ∗ (S) is UMVUE
(Uniformly unbiased minimum variance estimator) if
1. Unbiased: Ey
θ∗ (S)
= θ (y), for all y
2. Minimum variance: Vy
θ∗ (S)
≤ Vy
θ (S)
for all unbiased estima-
tor θ (S) and for all y.
Remark
Uniformity is important: Suppose that my estimator is θ ≡ 12. If θ = 12,
then θ is unbiased and V(θ)= 0. That is, MVUE at θ = 12. But, it is not
UMVUE.
65
66 CHAPTER 5. ESTIMATION
Proposition
Consider a noncensus design with πk > 0, (k = 1, 2, · · · , N), then there
is no UMVUE of t =∑N
i=1 yi exists.
Proof
Suppose that there exists Q which is UMVUE of t. Fix any y∗ =
(y∗1, · · · , y∗N )′ ∈ RN . Now, consider
Q∗ (S) =∑k∈S
yk − y∗kπk
+N∑i=1
y∗k.
The new estimator Q∗ (S) satisfies
1. Unbiased
2. The variance of Q∗ (S) is zero at y = y∗.
Because Q is UMVUE, Vy(Q) ≤ Vy (Q∗). Since Vy (Q
∗) = 0 for y = y∗, we
have Vy(Q) = 0 for y = y∗. Since y∗ can be arbitrary, we have Vy(Q) = 0
for all y, which means that Q = t for all y, which is impossible for any
noncensus design. Therefore, UMVUE cannot exist.
5.1. INTRODUCTION 67
Remark
1. In the proof of the proposition, Q∗ (S) is called the difference estima-
tor. The variance of the difference estimator is
V Q∗ (S) =∑k∈U
∑l∈U
∆klyk − y∗k
πk
yl − y∗lπl
.
The variance is small if yk ∼= y∗k.
2. The class of (design) unbiased estimator is too big. We cannot find
the best one in this class.
3. If we define the class of the linear estimators as
t =∑k∈S
wiyi,
where wi are constants that are fixed in advance, the HT estimator is
the only estimator among the class of linear unbiased estimators.
4. We have the following alternative definition of the linear estimator:
t =∑k∈S
wi (S) yi =∑k∈S
wisyi
where wi (S) = wis are constants that depends on the realized sample.
That is, wi (S) = wis are random variables.
5. One advantage of linear estimator is that it is internally consistent.
An estimator is internally consistent if
t (y1 + y2) = t (y1) + t (y2) ,
where t (y) is an estimator of the total of item y.
68 CHAPTER 5. ESTIMATION
5.2 Large sample theory
Basic Setup
1. Define a sequence of finite populations :
Uk = 1, 2, · · · , Nk , k = 1, 2, · · ·
where N1 < N2 < · · · and yki is the y-value of the i-th unit in the k-th
population.
2. From each finite population Uk, select a sample Sk ⊂ Uk of size nk.
Assume nk →∞ and fk = N−1k nk → f as k →∞.
Definition
1. θn : (design) consistent for the finite population parameter θN if, for
every ϵ > 0,
limk→∞
Pr∣∣∣θnk
− θNk
∣∣∣ > ϵ= 0
Or, simple we write
limn→∞
Pr∣∣∣θn − θN
∣∣∣ > ϵ= 0
where the distribution is the sampling distribution generated by the
repeated sampling of size n from the finite population.
2. Xn is bounded in probability by gn (write Xn = Op (gn) ) if, for every
ϵ >, there exists a positive real number Mϵ such that
Pr |Xn| > gnMϵ < ϵ
for all n.
3. Xn is of smaller order in probability than gn (write Xn = op (gn) ) if,
for every ϵ > 0,
limn→∞
Prg−1n |Xn| > ϵ
= 0
5.3. RATIO ESTIMATION 69
Taylor series linearization
• Taylor’s Theorem :
Let Xn be a sequence of random variables such that
Xn = a+Op (rn)
where rn → 0 as n → ∞. If g (x) us a function with s-th continuous
derivatives at x = a, then
g (Xn) = g (a) +s−1∑k=1
1
k!g(k) (a) (Xn − a)k +Op (r
sn)
where g(k) (a) is the k-th derivative of g (x) evaluated at x = a.
• For p-dimensional y, if y = Y +Op
(n−1/2
), then
g (y) = g(Y)+
p∑j=1
∂g(Y)
∂yj
(yjn − Yj
)+Op
(n−1
).
5.3 Ratio estimation
• Basic Setup :
– Observe x (auxiliary variable) and y (study variable) in the sam-
ple
– We know X =∑N
i=1 xi or X = N−1∑N
i=1 xi in advance.
– XHT =∑
i∈S π−1i xi can be different from X.
• Ratio estimator :
Yr = XYHT
XHT
= XR
Yr = XYHT
XHT
= XR
• Algebraic properties
70 CHAPTER 5. ESTIMATION
– Linear in y (thus it is internally consistent.)
– If XHT < X, then YHT < Yr
– If XHT > X, then YHT > Yr
– If yi = xi, then the ratio estimator equals to X. ,∑i∈S
wixi = X
for Yr =∑
i∈S wiyi.
5.3. RATIO ESTIMATION 71
• Statistical properties - Bias
– It is biased because E(R)= R.
– Bias of R = YHT /XHT is called the ratio bias. That is, B(R)=
E(R)−R is called the ratio bias.
– Definition: Bias of θ is negligible
⇐⇒ R.B.(θ) =Bias(θ)√V ar(θ)
→ 0 as n→∞.
Note: If the bias of θ is negligible, then
θ − θ√V ar(θ)
=θ − E(θ)√V ar(θ)
+Bias(θ)√V ar(θ)
→ N (0, 1) ,
by CLT, and
MSE(θ) = V (θ) +Bias(θ)
2
= V (θ)
1 +
[R.B.(θ)
]2.= V (θ).
– Ratio bias is negligible.
Cov(R, XHT
)/X = E(RXHT )/X − E(R)E(XHT )/X
= Y/X − E(R)= −Bias
(R).
Thus,
R.B.(R)
2≤
V(XHT
)X2
=CV
(XHT
)2→ 0
• Statistical properties - Variance
72 CHAPTER 5. ESTIMATION
– Taylor expansion
Yr = Y +(YHT − Y
)−R
(XHT − X
)− X−1
[(XHT − X
) (YHT − Y
)−R
(XHT − X
)2]+op
(n−1
)where R = X−1Y .
– Variance
V(Yr
).= V
(∑i∈S
1
πiEi
)where Ei = yi −Rxi.
• Variance estimation : Use Ei = yi− Rxi in the HT (or SYG) variance
estimator.
• Example: SRS
V(Yr
)=
N2
n
(1− n
N
) 1
N − 1
N∑i=1
(yi −Rxi)2
For variance estimation, use
V(Yr
)=
N2
n
(1− n
N
) 1
n− 1
∑i∈S
(yi − Rxi
)2.
5.4. REGRESSION ESTIMATION 73
Application of the ratio estimator
• Hajek estimator: Ratio estimator of the mean using xi = 1
• Domain estimation : The parameter of interest can take the form of
the ratio
Yd =
∑Ni=1 δiyi∑Ni=1 δi
where δi = 1 if i ∈ D and Di = 0 if i /∈ D. Thus,
YHTd =
∑i∈S π−1
i δiyi∑i∈S π−1
i δi
is an (approximately) unbiased estimator of Yd.
5.4 Regression estimation
• Basic Setup :
– Observe xk = (x1k, · · · , xJk)′ (auxiliary variables) and yi (study
variable) in the sample
– We know X =∑N
i=1 xi or X = N−1∑N
i=1 xi in advance.
– Interested in estimating ty =∑N
i=1 yi
• Motivation: Use auxiliary information at estimation stage
• Use a regression approach:
1. Suppose we have
yok =
J∑j=1
bjxjk = b′xk, k = 1, 2, · · · , N,
for some known J-dimensional vector b. The yok is a proxy for
yk.
74 CHAPTER 5. ESTIMATION
2. Difference estimator :
ty,diff =
N∑i=1
yoi +∑i∈S
yi − yoiπi
– Unbiased (regardless of choice of yok)
– The variance is small if yok∼= yk.
3. How to choose yok = b′xk? - Let’s estimate b from the sample.
4. Regression estimator:
ty,reg =
N∑i=1
yi +∑i∈S
yi − yiπi
,
where yi = b′xk and b is estimated from the sample using a
(linear) regression model.
Eζ (yi) = x′ib
Vζ (yi) = σ2.
Note that b and σ2 are superpopulation parameters.
5. How to estimate b ?
(a) Note that, under census, b can be estimated by solving
U (b) ≡N∑i=1
(yi − b′xi
)x′i = 0′.
(b) Consider an unbiased estimator of U (b):
U (b) =∑i∈S
1
πi
(yi − b′xi
)x′i
(c) Obtain a solution b by solving U (b) = 0 for b.
The solution is
b =
(∑i∈S
1
πixix
′i
)−1∑i∈S
1
πixiyi .
5.4. REGRESSION ESTIMATION 75
• Regression estimator :
Yreg = YHT +(X− XHT
)′b
Yreg = YHT +(X− XHT
)′b
where
b =
(∑i∈S
π−1i xix
′i
)−1∑i∈S
π−1i xiyi,
and (X′
HT , YHT
)=
1
N
(X′
HT , YHT
)=
1
N
∑i∈S
1
πi
(x′i, yi).
• Note that, if 1 is in the column space of xi, we can write
Yreg =N∑i=1
yi
and
Yreg =1
N
N∑i=1
yi
where yi = x′ib.
Algebraic properties
• Linear in y:
Yreg =∑i∈S
1
πigisyi
where
gis = 1 +(X− XHT
)′(∑i∈A
π−1i xix
′i
)−1
xi.
Also,
Yreg =1
N
∑i∈S
1
πigisyi.
76 CHAPTER 5. ESTIMATION
• Calibration property ∑i∈S
1
πigisxi = X. (5.1)
The property (5.1) is also called benchmarking property.
• If xi = (1,x′i1)
′, then
Yreg = Yπ +(X1 − Xπ1
)′b1
and
Yreg = N ·Yπ +
(X1 − Xπ1
)′b1
,
where Yπ and Xπ1 are the Hajek estimators of the form
(X′
π1, Yπ)=
(∑i∈S
π−1i
)−1∑i∈S
π−1i
(x′i1, yi
),
b1 =
[∑i∈S
π−1i
(xi1 − Xπ1
) (xi1 − Xπ1
)′]−1∑i∈S
π−1i
(xi1 − Xπ1
)yi,
and X1 = N−1∑N
i=1 xi1.
• Weights in Yreg =∑
i∈S wiyi can be derived by minimizing
Q (w) =∑i∈S
πi
(wi −
1
πi
)2
subject to (5.1).
Statistical Property
• Taylor expansion: Define
C =∑k∈S
π−1k xkx
′k
d =∑k∈S
π−1k xkyk
5.4. REGRESSION ESTIMATION 77
and
C =N∑k=1
xkx′k
d =
N∑k=1
xkyk.
Using
b = C−1d.= b+C−1
(d− Cb
), (5.2)
we have
Yreg.= YHT +
(X− XHT
)′b
+(X− XHT
)′( N∑i=1
xix′i
)−1∑i∈S
π−1i xi
(yi − x′
ib)
.= YHT +
(X− XHT
)′b
where b =(∑N
i=1 xix′i
)−1∑Ni=1 xiyi.
• Alternative expression
Yreg.= X′b+
∑i∈S
π−1i
(yi − x′
ib)
• Bias : Negligible
• Variance :
V ar(Yreg
).= V ar
∑i∈S
π−1i
(yi − x′
ib)
• Variance estimation
V(Yreg
)=∑i∈S
∑j∈S
∆ij
πij
Ei
πi
Ej
πj
where Ei = yi − x′ib.
78 CHAPTER 5. ESTIMATION
Remark
• The regression estimator is derived using a regression model.
• The validity (i.e. asymptotic unbiasedness) of the regression estimator
does not depend on whether the regression model holds or not.
• However, the variance of the regression estimator is small if the regres-
sion model is good.
• That is, it is model-assisted, not model-dependent.
5.5. GREG ESTIMATION 79
5.5 GREG estimation
• Recall: Difference estimator
– Suppose that yo = (yo1, yo2, · · · , yoN ) is a guess about y = (y1, y2, · · · , yN )′.
– Difference estimator of Y =∑N
i=1 yi:
Ydiff =
N∑i=1
yoi +∑i∈S
1
πi(yi − yoi )
• Properties of the difference estimator
– Unbiased (regardless of yo)
– Efficient if yo is a good guess about y
• How to choose y0?
• Superpopulation model:
– Prior belief about the relationship between y and x.
– Regard y1, . . . , yN as a random sample from an infinite pop’n ζ
– Assumption about the distribution of y given x.
• Generalized regression (GREG) model
Eζ (yi) = x′iβ
Covζ (yi, yj) =
ciσ
2 if i = j
0 if i = j
where ci = c (xi)and c (x) is a known function of x.
Examples: GREG model
80 CHAPTER 5. ESTIMATION
1. Ratio model
Eζ (Yi) = xiβ
Vζ (Yi) = xiσ2
2. Regression model
Eζ (Yi) = β0 + xiβ1
Vζ (Yi) = σ2
3. Group mean model (or ANOVA model)
Eζ (Yi) = µg
Vζ (Yi) = σ2g
for i ∈ Ug and U = U1 ∪ U2 ∪ · · · ∪ UG
• GREG estimator
YGREG =N∑i=1
yi +∑i∈S
1
πi(yi − yi)
where yi = x′iβ with
β =
(∑i∈S
1
πicixix
′i
)−1∑i∈S
1
πicixiyi.
• Alternative representation
YGREG = YHT +(X − XHT
)′β
Examples: GREG estimators
1. Ratio model
YGREG = Yratio =
(N∑i=1
xi
) ∑i∈S
1πiyi∑
i∈S1πixi
5.5. GREG ESTIMATION 81
2. Regression model
YGREG = Yreg =∑i∈S
1
πiyi +
(N∑i=1
xi −∑i∈S
1
πixi
)β
where
β =
∑i∈S
1πi
(xi − xπ) (yi − yπ)∑i∈S
1πi
(xi − xπ)2 .
3. Group mean model (or ANOVA model)
YGREG =
G∑g=1
NgYg
Ng
where Ng =∑N
i=1 xig, Ng =∑
i∈S xig/πi, Yg =∑
i∈S xigyi/πi, and
xig = 1 if i ∈ Ug and xig = 0 otherwise.
82 CHAPTER 5. ESTIMATION
• Algebraic Properties
– Linear in y
YGREG =∑i∈A
1
πigi (S) yi
where
gi (S) = 1 +(X− XHT
)′(∑k∈S
1
πkckxkx
′k
)−1xi
ci.
Thus, “Final weight = Design weight * g-factor”.
– Calibration property
∑i∈S
1
πigi (S)xi =
N∑i=1
xi.
In fact, the final weights are chosen to minimize∑
i∈A ciπi
(wi − 1
πi
)2subject to
∑i∈Awixi =
∑Ni=1 xi.
– Read Deville and Sarndal (1992, JASA)
– (Result 6.5.1 in p. 231) If ci = λ′xi, then∑
i∈A1πiyi =
∑i∈A
1πiyi.
Thus,
YGREG =N∑i=1
x′iβ
5.5. GREG ESTIMATION 83
• Statistical Properties
– Design consistent
– Asymptotic Variance
V ar(YGREG
).= V ar
∑i∈A
π−1i
(yi − x′
iB)
where
B =
(N∑i=1
1
cixix
′i
)−1 N∑i=1
1
cixiyi.
– Variance estimation : Use Ei = yi−x′iβ instead of Ei = yi−x′
iB
in the HT variance estimator.
Conclusion: Three approaches
• Design-based approach: Use HT estimator
• Model-based approach: Use BLUP estimator
• Model-assisted approach: Use GREG estimator
– Design consistent regardless of whether the model holds or not.
– Variance is small if the model is true.
84 CHAPTER 5. ESTIMATION
Example : Group ratio model
Eζ (Yi) = βgxi
Vζ (Yi) = σ2gxi
for i ∈ Ug and U = U1 ∪U2 ∪ · · · ∪UG. The xi are observed throughout the
population. Note that if xi = 1 then it reduces to the group mean model.
• Let x′i = (x1i, x2i, · · · , xGi) where
xgi =
xi if i ∈ Ug
0 otherwise
and β = (β1, β2, · · · , βG)′. Then, Eζ (Yi) = x′iβ and
B =
(∑i∈U
xix′i
σ2i
)−1∑i∈U
xiyiσ2i
= diag
∑i∈U1
yi∑i∈U1
xi, · · · ,
∑i∈UG
yi∑i∈UG
xi
and
B =
(∑i∈S
xix′i
πiσ2i
)−1∑i∈S
xiyiπiσ2
i
= diag
∑i∈S1
yi/πi∑i∈S1
xi/πi, · · · ,
∑i∈SG
yi/πi∑i∈SG
xi/πi
• Since Vζ (Yi) = λ′xi for some λ, we have
YGREG =∑i∈U
x′iB =
G∑g=1
∑i∈Ug
xi
∑i∈Sg
yi/πi∑i∈Sg
xi/πi.
This is called separate ratio estimator. If there are differences among
but homogeneous within groups in terms of ratio, the separate ratio
estimator might be good.
• If xi = 1 , two possibilities
1. Groups = strata: stratification
2. Groups = strata: poststratification
5.5. GREG ESTIMATION 85
Example : Poststratification
• Poststratification estimator
Ypost =G∑
g=1
NgYg
Ng
• Under SRS, it is
Ypost =G∑
g=1
Ng
(∑i∈Sg
yi
ng
)where ng is the number of sampled elements in group g.
• Asymptotic variance (under SRS)
V(Ypost
)=
∑i∈U
∑j∈U
∆ijEi
πi
Ej
πj
.=
N
n
(1− n
N
) G∑g=1
∑i∈Ug
(yi − Yg
)2.
Thus, it is essentially equal to the variance under stratified sampling
with proportional allocation.
86 CHAPTER 5. ESTIMATION
Example : Two-way ANOVA (additive, no interaction)
• Model
Eζ (Yk) = αi + βj
Vζ (Yk) = σ2
• Setup: Have I × J groups or cells. Cell counts Nij are not known.
Marginal counts Ni· =∑J
j=1Nij and N·j =∑I
i=1Nij are known.
• Example: i gender, j age group. (I=2, J=3)
• Auxiliary variable: Let
δijk =
1 if k ∈ Uij
0 otherwise .
Unfortunately, we do not observe δijk in the population. Instead, we
observe
xk = (δ1·,k, δ2·k, · · · , δI·k, δ·1k, δ·2k, · · · , δ·Jk)
throughout the population. Thus, we know
N∑k=1
xk = (N1·, N2·, · · · , NI·, N·1, N·2, · · · , N·J)
• GREG estimator
YGREG =∑i∈S
1
πigi (S) yi
where
gi (S) = 1 +
(N∑k=1
xk −∑k∈S
1
πkxk
)′( N∑k=1
xkx′k
)−1xi
σ2i
.
Unfortunately, we cannot compute the inverse of∑N
k=1 xkx′k.
• Alternative method: (Raking ratio estimation)
5.5. GREG ESTIMATION 87
– We want to find gks = gk (S) such that
∑k∈S
gksπk
δi·,k =
N∑k=1
δi·k, i = 1, 2, · · · , I (5.3)
∑k∈S
gksπk
δ·jk =
N∑k=1
δ·jk, j = 1, 2, · · · , J. (5.4)
– Do this iteratively (Iterative proportional fitting)
1. Start with g(0)ks = 1.
2. For δi·k = 1,
g(t+1)ks = g
(t)ks
∑Nk=1 δi·k∑
k∈S g(t)ks δi·k/πk
.
It satisfies (9.1), but not necessarily satisfy (9.2).
3. For δ·jk = 1,
g(t+2)ks = g
(t+1)ks
∑Nk=1 δ·jk∑
k∈S g(t+1)ks δ·jk/πk
.
It satisfies (5.4), but not necessarily satisfy (5.3).
4. Set t← t+ 2 and go to Step 2. Continue until convergence.
88 CHAPTER 5. ESTIMATION
5.6 Optimal Estimation
• Optimal design & estimation : Find a pair of design & estimatorp (·) , θ
that minimizes the variance, or MSE, for a given cost (cost
.= sample size) among a suitable class of estimators.
• Class of Linear and design-unbiased estimator: Unique solution (HT
estimator)
• Non-existence of the UMVUE
Let any noncensus design with πk > 0 (k = 1, 2, · · · , N) be given.
Then no uniformly minimum variance estimator exists in the class of
all unbiased estimators of Y =∑N
i=1 yi.
• Remedy: Change the optimality criterion
– We don’t know the variance before sampling
– Use assumption about Y : superpopulation model
– Anticipated variance: AV(θ)= EζEp
(θ − θN
)2– Thus, find a pair of design & estimator
p (·) , θ
that minimizes
the AV(θ)for a given cost.
5.6. OPTIMAL ESTIMATION 89
Result 1
• If θ is design-unbiased for θN , then
AV(θ)= EpVζ
(θ)+ VpEζ
(θ)− Vζ (θN )
• (Godambe and Joshi, 1965) Consider a model ζ with yi’s independent
with Vζ (yi) = σ2i . If p (·) is a probability sampling design (πi > 0, i ∈
U) and Y is any design unbiased estimator for the total of y, then
AV(Y)≥∑i∈U
(1
πi− 1
)σ2i .
The right side is called the Godambe-Joshi Lower Bound (GJLB).
Result 2
• For the fixed-size probability sampling design, the GJLB is further
minimized if and only if
πi ∝ Vζ (yi)1/2 .
• (Isaki and Fuller, 1982) Suppose that p (·) is a fixed size probability
sampling design and ζ is a superpopulation model with yi’s indepen-
dent and Eζ (yi) = x′iβ and Vζ (yi) = ciσ
2. Then, the GREG estimator
asymptotically attains the GJLB if ci = λ′xi for some λ.
90 CHAPTER 5. ESTIMATION
Proof of Result 1
Write Y = YHT+R. Since we assume that Y is design unbiased, E (R) =
0. Thus, for any fixed j ∈ U ,
0 = E (R)
=∑S∈S
p (S)R (S)
=∑
S∈S;j∈Sp (S)R (S) +
∑S∈S;j /∈S
p (S)R (S) .
Now,
Vζ
(Y)= Vζ
(YHT
)+ Vζ (R) + 2Covζ
(YHT , R
).
Thus,
Ep
Covζ
(YHT , R
)= Ep
[Eζ
(YHT − Eζ
(YHT
))R]
= Ep
∑j∈U
Eζ
(yj − Eζ (yj)) Ij
πjR
=
∑j∈U
Eζ
(yj − Eζ (yj))
πjE Ij (S)R (S)
=∑j∈U
Eζ
(yj − Eζ (yj))
πj
∑S∈S;j∈S
R (S) p (S)
= −
∑j∈U
Eζ
(yj − Eζ (yj))
πj
∑S∈S;j /∈S
R (S) p (S)
= 0,
5.6. OPTIMAL ESTIMATION 91
where the last equality holds because∑
S∈S;j /∈S R (S) p (S) is independent
of yj . Therefore,
Ep
Vζ
(Y)
= Ep
Vζ
(YHT
)+ Ep Vζ (R)
≥ Ep
Vζ
(YHT
)= Ep
Vζ
(N∑i=1
yiIiπi
)
= Ep
N∑i=1
σ2i Iiπ2i
=
N∑i=1
σ2i
πi
and
AV(Y)
= EpVζ
(Y)+ VpEζ
(Y)− Vζ (Y )
≥ EpVζ
(Y)− Vζ (Y )
≥N∑i=1
σ2i
πi−∑i∈U
σ2i .
92 CHAPTER 5. ESTIMATION
Chapter 6
Variance estimation
6.1 Introduction
• Use of variance estimate in sampling
– Inferential purpose : construct CI, hypothesis testing
– Descriptive purpose : evaluation of survey estimates, future sur-
vey planning
• What is a good variance estimator ?
– Unbiased, or nearly unbiased (with positive bias - conservative)
– Stable : Variance of the variance estimator is low.
– Nonnegative
– Simple to calculate
• HT variance estimator (or SYG variance estimator): Some problems
1. Can take negative values
2. Need to know the joint inclusion probability πij , which can be
cumbersome for large sample size.
Variance of variance estimator
93
94 CHAPTER 6. VARIANCE ESTIMATION
• Parameter of interest: V (θ)
• Let V be an (unbiased) estimator of V (θ).
• May assume that
dV
V (θ)∼ χ2 (d)
for some d. (d.f. of V ).
• By the property of χ2 distribution,
E(V)= V (θ)
and
V(V)=
2V (θ)
2
d
Thus,
CV(V)=
√V(V)
E(V) =
√2
d.
• How to compute d?
1. Method of moments: requires an estimate for V(V).
2. Rule of thumb: use d = nPSU −H, where nPSU is the number of
sampled PSU and H is the number of strata.
6.1. INTRODUCTION 95
Alternative to HT variance estimation
• Simplified variance estimator: Motivation
1. Consider the variance estimator for PPS sampling:
V0 =1
n (n− 1)
∑i∈S
yipi− 1
n
∑j∈S
yjpj
2
,
which is always nonnegative and simple to compute.
2. What if we use V0 as an estimator for the variance of YHT =∑i∈S
yiπi
by treating YHT∼= YPPS = 1
n
∑i∈S
yipi?
3. Simplified variance estimator: Use the PPS sampling variance
estimator (V0) to estimate the variance of YHT
Theorem
E(V0
)− V ar
(YHT
)=
n
n− 1
V ar
(YPPS
)− V ar
(YHT
)where V ar
(YPPS
)is the variance of YPPS using pk = πk/n as the selection
probability for unit k for each of PPS draw, and
V ar(YPPS
)=
1
n
N∑i=1
pi
(yipi− Y
)2
.
96 CHAPTER 6. VARIANCE ESTIMATION
Remark
1. In most cases, the bias is positive. (thus, conservative estimation.)
2. Under SRS, the relative bias of the simplified variance estimator is
V0 − V ar(YHT
)V ar
(YHT
) =n
n− 1
n
N − n
and it is negligible if n/N is negligible.
3. Application to multi-stage sampling: Express
YHT =∑i∈SI
YiπIi
.
The resulting simplified variance estimator can be written
V0 =1
n (n− 1)
∑i∈AI
(Yipi− YHT
)2
=n
(n− 1)
∑i∈AI
(YiπIi− 1
nYHT
)2
where pi = πIi/n and n is the size of the sampled PSU’s. The bias is
negligible if the primary sampling rate is negligible. If the sampling
design is also a stratified (multi-stage) sampling such that
YHT =
H∑h=1
∑i∈SIh
whiYi,
the simplified variance estimator can be written
V0 =
H∑h=1
nh
(nh − 1)
nh∑i=1
whiYhi −1
nh
nh∑j=1
whj Yhj
2
.
6.2. TAYLOR SERIES LINEARIZATION 97
6.2 Taylor series linearization
• Estimate variance of nonlinear estimator by approximating estimator
by a linear function
• First-order Taylor linearization: For p-dimensional y, if yn = YN +
Op
(n−1/2
), then
g (yn) = g(Y)+
p∑j=1
∂g(Y)
∂yj
(yjn − Yj
)+Op
(n−1
)• Linearized variance
V g (yn).=
p∑i=1
p∑j=1
∂g(Y)
∂yi
∂g(Y)
∂yjCov yin, yjn
• Two methods of obtaining linearized variance estimation
1. Direct method: use
V g (yn).=
p∑i=1
p∑j=1
∂g (yn)
∂yi
∂g (yn)
∂yjC yin, yjn
2. Residual technique:
[Step 1] Obtain a first-order Taylor expansion to get
g (yn).= g
(Y)+
1
N
∑i∈S
1
πiei
for some ei.
[Step 2] The variance of g (yn) is then approximated by the
variance of N−1∑
i∈S1πiei. If we observed ei, then we would
estimate the variance of N−1∑
i∈S1πiei. Obtain a variance
estimator of N−1∑
i∈S1πiei and replace ei by ei.
98 CHAPTER 6. VARIANCE ESTIMATION
Example: Ratio
R =y
x, R =
Y
X
• Taylor expansion:
R = R+ X−1 (y −Rx) +Op
(n−1
)• Method 1
V(R)
.= x−2V (y) + x−2R2V (x)− 2x−2RC (x, y)
• Method 2
V(R)
.=
1
N2
∑i∈S
∑j∈S
πij − πiπjπij
eiπi
ejπj
where ei = x−1(yi − Rxi
).
• Ratio estimator ˆYr = XR
V1.=
1
N2
∑i∈S
∑j∈S
πij − πiπjπij
yi − Rxiπi
yj − Rxjπj
V2.=
1
N2
(X
x
)2∑i∈S
∑j∈S
πij − πiπjπij
yi − Rxiπi
yj − Rxjπj
• Which one to use ?
6.2. TAYLOR SERIES LINEARIZATION 99
Variance estimation for GREG estimator
• For simplicity, assume that ci = λ′x so that
YGREG =∑i∈S
1
πigiyi
where
gi = X′
(∑i∈S
1
πicixix
′i
)−11
cixi
• Two types of variance estimators
V1 =∑i∈S
∑j∈S
πij − πiπjπij
eiπi
ejπj
V2 =∑i∈S
∑j∈S
πij − πiπjπij
gieiπi
gjejπj
They are asymptotically equivalent because gi.= 1.
• V 2 has good conditional property.
100 CHAPTER 6. VARIANCE ESTIMATION
Variance estimation for Poststratified estimator
• Poststratified estimator
Ypost =
G∑g=1
Ng
Ng
Yg
• Unconditional variance estimator
V1 =∑i∈S
∑j∈S
πij − πiπjπij
eiπi
ejπj
where ei = yi − yg for xig = 1.
• Conditional variance estimator
V2 =∑i∈S
∑j∈S
πij − πiπjπij
gieiπi
gjejπj
where ei = yi − yg and gi = Ng/Ng for xig = 1.
• Under SRS:
V1 =N2
n
(1− n
N
) G∑g=1
ng − 1
n− 1s2g
V2 =(1− n
N
) n
n− 1
G∑g=1
N2g
ng
ng − 1
ngs2g
where s2g =∑
i∈Sg(yi − yg)
2 / (ng − 1).
6.3 Replication method
• Replication method - Idea
1. Interested in estimating the variance of θ.
2. From the original sampleA, generateG resamplesA(1), A(2), · · · , A(G).
3. Based on the observations from the resampleA(g), (g = 1, 2, · · · , G),
compute the replicate θ(g) for θ.
6.3. REPLICATION METHOD 101
4. The replicate variance estimator for θ is computed as
V = KG
G∑g=1
(θ(g) − θ(·)
)2for some suitable KG, where θ(·) = G−1
∑Gg=1 θ
(g).
• Reference: Wolter (2007): Introduction to variance estimation.
Replication method for variance estimation
• Random group method
– Independent random group method:
Mahalanobis (1939, 1946), Deming (1946)
– Non-independent random group method
• Balanced repreated replication:
Plackett and Burman (1946), McCarthy (1966)
• Jackknife: Quenoulle (1949), Tukey (1958)
• Bootstrap: Efron (1979)
102 CHAPTER 6. VARIANCE ESTIMATION
6.3.1 Independent Random Group Method
• Procedure
[Step 1] A sample, s1, is drawn from the finite population according
to the design p. Compute θ1 from the observations in s1.
[Step 2] Sample s1 is replaced into the population and a second sam-
ple, s2 is drawn according to the same sampling design p. Com-
pute θ2 from the observations in s2.
[Step 3] This process is repeated until G ≥ 2 times.
[Step 4] Use
θRG =1
G
G∑k=1
θ(k) (6.1)
as an estimator for θ. Use
V(θRG
)=
1
G
1
G− 1
G∑k=1
(θ(k) − θRG
)2(6.2)
as a variance estimator for θRG.
• Property: Let θ1, · · · , θG be uncorrelated random variables with com-
mon expectation E(θ1
)= θ. Then,
1. θRG in (6.1) is unbiased for θ.
2. V(θRG
)in (6.2) is unbiased for V
(θRG
).
6.3. REPLICATION METHOD 103
• Example : Suppose that a sample of households is to be drawn using
a multistage sampling design. Two random groups are desired. An
areal frame exists, and the target population is divided into two strata
(defined, say, on the basis of geography). Stratum 1 contains N1 PSUs
and stratum 2 consists of one PSU that is to be selected with certainty.
G = 2 independent random groups are to be used. Each sample is
selected independently according to the following plan:
– Stratum 1: Two PSUs are selected using some πps sampling de-
sign. From each selected PSU, an equal probability systematic
sample of m1 households is selected.
– Stratum 2: The certainty PSU is divided into city blocks, with
the block size varying between 10 and 15 households. An unequal
probability systematic sample of m2 blocks is selected with prob-
ability proportional to the block sizes. All households in selected
blocks are enumerated.
For point estimation, use
ˆθ =(θ(1) + θ(2)
)/2
For variance estimation, use
V(ˆθ)
=1
2(1)
2∑g=1
(θ(g) − ˆθ
)2=(θ(1) − θ(2)
)2/4.
• Easy Concept, Unbiased Variance Estimator; Unstable, Not often used
in practice
6.3.2 Non-independent Random Groups
• Idea
1. Given sample S, use a random mechanism to divide S into S =
∪Gg=1S(g), where S(1), · · · , S(G) are disjoint.
104 CHAPTER 6. VARIANCE ESTIMATION
2. Calculate θ(1), · · · , θ(G) and treat them as independent
3. Use
V =1
G
1
G− 1
G∑k=1
(θ(k) − θRG
)2as a variance estimator for θ.
• Requirement : Each S(g) should have same design as S
• Impractical in some cases, Unstable
• Property: Let θ1, · · · , θG be random variables with common expecta-
tion E(θi
)= θ. Then,
EV(θRG
)− V
(θRG
)= − 1
G (G− 1)
∑∑i=j
Cov(θ(i), θ(j)
)– If θ(1), · · · , θ(G) are independent, then RHS=0.
– If θ(1), · · · , θ(G) are identically distributed, then RHS=−Cov(θ(1), θ(2)
).
6.3. REPLICATION METHOD 105
Example : Use of non-independent random group method un-
der simple random sampling
• Interested in the variance estimation for θ = y under simple random
sampling.
• Partition the sample into G groups of dependent samples S = ∪Gg=1S(g)
where S(g) is a simple random sample of size b = n/G.
• Compute θ(g) = y(g) from S(g). Note that
θ =1
G
G∑g=1
y(g).
• How large is the bias of V(θRG
)?
Bias(V)
= −Cov(y(1), y(2)
)=
1
NS2
106 CHAPTER 6. VARIANCE ESTIMATION
6.3.3 Jackknife method for variance estimation
• Motivation
– Basic Setup : Let (xi, yi) be IID from a bivariate distribution
with mean (µx, µy). Let θ = µy/µx. Standard ratio estimator of
θ, θ = x−1y, has bias of order O(n−1
).
– Quenouille (1949) idea : propose a bias-reduced estimator of θ :
θ(.) =1
n
n∑k=1
θ(k)
where θ(k) = nθ − (n− 1) θ(−k) and
θ(−k) =
∑i =k
xi
−1∑i=k
yi
– Tukey (1958) : treat θ(1), · · · , θ(G) as an independent random
group of size n to get
VJK
(θ)
.=
1
n
1
n− 1
n∑k=1
(θ(k) − θ(.)
)2=
n− 1
n
n∑k=1
(θ(−k) − θ(.)
)2
6.3. REPLICATION METHOD 107
Taylor theorem 2 :
Let Xn,Wn be a sequence of random variables such that
Xn = Wn +Op (rn)
where rn → 0 as n→∞. If g (x) is a function with s-th continuous deriva-
tives in the line segment joining Xn and Wn and the s-th order partial
derivatives are bounded, then
g (Xn) = g (Wn) +s−1∑k=1
1
k!g(k) (Wn) (Xn −Wn)
k
+Op (rsn)
where g(k) (a) is the k-th derivative of g (x) evaluated at x = a.
Properties
• If θn = y, then
VJK =1
n
1
n− 1
n∑i=1
(yi − y)2 =1
ns2y
• Under regularity conditions, y(−k) − y = Op
(n−1
).
• For θ = f (x, y), we have
θ(−k) − θ =∂f
∂x(x, y)
(x(−k) − x
)+∂f
∂y(x, y)
(y(−k) − y
)+ op
(n−1
)• The jackknife variance estimator defined by
VJK =n− 1
n
n∑k=1
(θ(−k) − θ
)2is asymptotically equivalent to the linearized variance estimator.
108 CHAPTER 6. VARIANCE ESTIMATION
Example: Ratio
R =y
x,
• Jackknife replicates for R:
R(−k) =y(−k)
x(−k)=
ny − yknx− xk
, k = 1, 2, · · · , n
• Taylor expansion 2:
R(−k) = R+ (x)−1(y(−k) − Rx(−k)
)+Op
(n−1
)• Jackknife variance estimator
VJK =n− 1
n
n∑k=1
(R(−k) − R
)2.=
n− 1
n
n∑k=1
(x)−1
(y(−k) − Rx(−k)
)2
=1
n(n− 1)
1
x2
n∑k=1
(yk − Rxk
)2
• Jackknife variance estimator for ratio estimator ˆYr = XR is asymp-
totically equivalent to
VJK =
(X
x
)21
n(n− 1)
n∑k=1
(yk − Rxk
)2,
which is often called the conditional variance estimator.
6.3. REPLICATION METHOD 109
Example : Post-stratification (Under SRS)
• Point estimator:
Ypost =G∑
g=1
Ngyg =G∑
g=1
Ng
ng
∑i∈Ag
yi
• k-th jackknife replicate of Ypost: For k ∈ Sh,
Y(−k)post − Ypost = Nh
(y(−k)h − yh
)= Nh (nh − 1)−1 (yh − yk)
• Jackknife variance estimator
VJK
(Ypost
)=
n− 1
n
n∑k=1
(Y
(−k)post − Ypost
)2.=
n− 1
n
G∑g=1
N2g (ng − 1)−1 s2g
• Asymptotically equivalent to the conditional variance estimator.
110 CHAPTER 6. VARIANCE ESTIMATION
Extension to complex sampling
• Stratified multi-stage cluster sampling design
YHT =H∑
h=1
nh∑i=1
whiYhi
• Jackknife replicates : Delete j-th cluster in g-th stratum
Y(−gj)HT =
H∑h=1
nh∑i=1
w(gj)hi Yhi
where
w(−gj)hi =
0 if h = g and i = j
(nh − 1)−1 nhwhi if h = g and i = j
whi otherwise
• Jackknife variance estimator
VJK
(YHT
)=
H∑h=1
nh − 1
nh
nh∑i=1
Y(−hi)HT − 1
nh
nh∑j=1
Y(−hj)HT
2
• Property
VJK
(YHT
)=
H∑h=1
nh
nh − 1
nh∑i=1
whiYhi −1
nh
nh∑j=1
whj Yhj
2
≡ V0
6.3. REPLICATION METHOD 111
6.3.4 Balanced repeated replication
• Basic Set-Up : Stratified sampling with nh = 2, θ =∑H
h=1Whyh
• Half Sample Replication :
θ(v) =
H∑h=1
Wh[δ(v)h yh1 + (1− δ
(v)h )yh2], v = 1, · · · , 2H
where δ(v)h =
1 if yh1 ∈ sv
0 if yh2 ∈ sv
• Balanced Condition : There always exist s1, · · · , sG with H < G <
H + 4 such that
G∑v=1
(2δ(v)h − 1)(2δ
(v)h′ − 1) = 0 if h = h′
• Variance Estimator :
VBRR =1
G
G∑v=1
(θ(v) − θ)2
• Property: If δ(v)h satisfies the balanced condition, then
VBRR =∑h
W 2h (yh1 − yh2)
2 /4
which is equal to the variance estimator of a stratified sampling esti-
mator with nh = 2.
112 CHAPTER 6. VARIANCE ESTIMATION
Example : BRR for H = 3
• MG : Hadamard matrix of order G:
– G×G matrix of ± 1.
– M ′GMG = GIG
– If MG satisfies M ′GMG = GIG, then M2G =
(M M
M −M
)– For G = 4,
M4 =
1 1 1 1
1 −1 1 −11 −1 −1 1
1 1 −1 −1
• Each column of MG are mutually orthogonal : satisfies the balanced
condition
• G = 4 BRR replicates can be constructed as follows:
θ(1) = W1y11 +W2y21 +W3y31
θ(2) = W1y12 +W2y21 +W3y32
θ(3) = W1y12 +W2y22 +W3y31
θ(4) = W1y11 +W2y22 +W3y32
• Can check that
1
G
G∑v=1
(θ(v) − θ)2 =∑h
W 2h (yh1 − yh2)
2 /4
Chapter 7
Two-phase sampling
7.1 Introduction
Motivation
• Use of auxiliary variable x
1. Design stage : stratification, PPS (or πps) sampling
2. Estimation stage : ratio estimation, regression estimation
• Need to observe xi in the population (design stage), or need to know
the total of xi (estimation stage)
• What if x is not available ? - Use a sampling to observe x first.
• Basic structure
[Phase 1] Select a (simple) random sample S1. Observe xi ∈ S1.
[Phase 2] Treat S1 as if the population and make a well chosen sam-
pling design to select a sample S2 from S1 using the information
xi that are observed in S1. Observe yi ∈ S2.
113
114 CHAPTER 7. TWO-PHASE SAMPLING
• Notation
– π(1)i = Pr (i ∈ S1): (first-order) inclusion probability under the
first-phase sampling
– π(2)i|S1
= Pr (i ∈ S2 | S1): (first-order) conditional inclusion proba-
bility under the second-phase sampling given the first-phase sam-
ple
– πi = Pr (i ∈ S2) : (first-order) inclusion probability under the
two-phase sampling
πi =∑
S1; i∈S1
π(2)i|S1
P1 (S1) = E1
π(2)i|S1
I (i ∈ S1)
• Features
– If π(2)i|S1
= π(2)i (invariance), then πi = π
(1)i π
(2)i .
– If the invariance does not hold, then we cannot compute the first-
order inclusion probability πi
– Cannot use the HT estimator
• Example :
1. Phase one: SRS of size n.
2. Phase two: πps sampling of size r with πi ∝ xi.
Note that
π(2)i|S1
=rxi∑k∈S1
xk
and πi cannot be computed from one realization of S1.
Remedy
• Use π∗-estimator :
Y ∗ =∑i∈S2
yi
π(1)i π
(2)i|S1
≡∑i∈S2
yiπ∗i
7.2. TWO-PHASE SAMPLING FOR STRATIFICATION 115
• Properties
– Unbiased for Y =∑N
i=1 yi
– Variance
V(Y ∗)= V
∑i∈S1
yi
π(1)i
+E
∑i∈S1
∑j∈S1
(π(2)ij|S1− π
(2)i|S1
π(2)j|S1
) yiπ∗i
yjπ∗j
7.2 Two-phase sampling for stratification
• Basic setup : Wish to perform a stratified sampling, but the stra-
tum indicator variables xi = (xi1, · · · , xiH) are not available in the
population frame.
• Two-phase sampling for stratification
1. Perform a SRS of size n from the finite population and obtain∑i∈S1
xi = (n1, n2, · · · , nH) where n =∑H
h=1 nh.
2. Among the nh elements, select rh elements by SRS independently
across the strata.
• Point estimation
ˆYtp =H∑
h=1
whyh2
where wh = nh/n and yh2 = r−1h
∑i∈S2
xihyi.
• Variance
V(ˆYtp
)=
(1
n− 1
N
)S2 + E
H∑
h=1
(nh
n
)2( 1
rh− 1
nh
)s2h1
.= E
n−1
H∑h=1
wh (yh1 − y1)2 +
H∑h=1
r−1h w2
hs2h1
where
s2h1 =1
nh − 1
∑i∈S1
xih (yi − yh1)2
116 CHAPTER 7. TWO-PHASE SAMPLING
• Variance estimation
V(ˆYtp
)= n−1
H∑h=1
wh
(yh2 − ˆYtp
)2+
H∑h=1
r−1h w2
hs2h2
• Variance comparison
V(ˆYSRS
)− V
(ˆYtp
)= E
(1
r− 1
n
) H∑h=1
wh (yh1 − y1)2 +
H∑h=1
(1
r− wh
rh
)whs
2h1
• Two sources for the gain of efficiency:
1. Use n elements for the between-stratum variances.
2. An optimal choice of rh can improve the efficiency.
• Optimal allocation: Minimize
V =1
n
(S2 −
H∑h=1
WhS2h
)+
H∑h=1
WhS2h
1
νh
subject to C = n(c1 +
∑Hh=1 c2hWhνh
)• Solution
r∗hn∗ = Wh
(c1c2h
)1/2(
S2h
S2 −∑H
h=1WhS2h
)1/2
.
If c2h = c2, Sh = Sw, and ϕ = S2/S2w then the optimal solution is
r∗
n∗ =
(c1c2
)1/2( 1
ϕ− 1
)1/2
7.3 Two-phase regression estimator
• Basic setup :
7.3. TWO-PHASE REGRESSION ESTIMATOR 117
1. Phase 1 : Simple random sampling of size n to get S1, observe
xi, i ∈ A1
2. Phase 2 : From S1, simple random sampling of size r to get S2,
observe (xi, yi) , i ∈ S2
• Regression estimator
yreg,tp = y2 + (x1 − x2)′ B
• Property
1. Taylor linearization
yreg,tp = y2 + (x1 − x2)′B +Op
(r−1)
2. Approximately unbiased
3. Variance
V (yreg,tp).=
(1
n− 1
N
)B′SxxB +
(1
r− 1
N
)See
118 CHAPTER 7. TWO-PHASE SAMPLING
7.4 Repeated survey
• Motivation: sampling for the same population over time.
• Several parameters
1. Difference of the means between time: Y2 − Y1
2. Overall mean:(Y1 + Y2
)/2
3. Most recent mean: Y2
• Best sampling design
1. For θ1 = Y2 − Y1 : full overlap sampling
2. For θ2 =(Y1 + Y2
)/2 : no overlap sampling
3. For θ3 = Y2 : partial replacement sampling
• Partial overlap sampling: Let Y2 be the parameter of interest.
1. At time t = 1: Select a SRS of size n. Let S1 be the realized
sample at t = 1.
2. At time t = 2: partition the population U into two strata: S1
and U ∼ S1. From S1, select a SRS of size nm to get S2m. From
U ∼ S1, select a SRS of size nu = n − nm to get S2u. The final
sample at t = 2 is S2 = S2m ∪ S2u.
Stratum Pop’n size Sample size Estimator for Y2
Matched n nmˆYm
Unmatched N − n nuˆYu
N n α ˆYu + (1− α) ˆYm
7.4. REPEATED SURVEY 119
• Composite estimator
ˆYα = α ˆYu + (1− α) ˆYm
for some constant α.
– If ˆYu and ˆYm are unbiased, then ˆYα is unbiased.
– Variance of ˆYα is minimized at
α∗ =V(ˆYm
)− Cov
(ˆYu,
ˆYm
)V(ˆYu
)+ V
(ˆYm
)− 2Cov
(ˆYu,
ˆYm
)– Minimum variance is
V(ˆY ∗α
)=
V(ˆYm
)V(ˆYu
)−Cov
(ˆYu,
ˆYm
)2
V(ˆYu
)+ V
(ˆYm
)− 2Cov
(ˆYu,
ˆYm
)• How to choose ˆYu and ˆYu ?
120 CHAPTER 7. TWO-PHASE SAMPLING
• Two-phase sampling approach
– x: observation at t = 1, y: observation at t = 2
– Estimators
ˆYu =1
nu
∑i∈S2u
yi ≡ yu
ˆYm = ym + (x1 − xm) b
– Variances and covariance
V(ˆYu
)= n−1
u S2
V(ˆYm
)= n−1
m
(1− ρ2
)S2 + n−1ρ2S2
Cov(ˆYu,
ˆYm
)= 0
– Optimal composite estimation
α∗ =nnu − n2
uρ2
n2 − n2uρ
2
– Variance of the optimal estimator
V(ˆY ∗α
)=
n− nuρ2
n2 − n2uρ
2S2
– Optimal allocation
nu
n=
1
1 +√
1− ρ2,
nm
n=
√1− ρ2
1 +√
1− ρ2
– Variance under optimal allocation
V(ˆY ∗α
)=
S2
2n
(1 +
√1− ρ2
)
Chapter 8
Nonresponse
8.1 Introduction
• Types of nonresponse
– Unit Nonresponse: No information is collected from a sample
unit. Maybe caused by a refusal, not-at-home, inability of the
unit to cooperate.
– Item Nonresponse: The unit cooperate in the survey but fails
to provide answers to some of the questions. Maybe caused by
“Don’t knows”, refused to answer because it is too sensitive, an-
swered but inconsistent with other answers.
• Approaches for nonresponse
– Unit Nonresponse: Call-back, Nonresponse weighting adjustment
– Item Nonresponse: Imputation
• How can reduce the nonresponse ?
– Use more incentive: works for refusal
– Call-backs: works for not-at-home
121
122 CHAPTER 8. NONRESPONSE
– What if we use some power to get answers (e.g. penalty by law)
?
• Basic Setup
Stratum Pop. Size Pop. Mean Sample Size Sample Mean
Respondents NR YR nR yR
Nonrespondents NM YM nM yM
Entire population N Y n y
SRS from the entire population, but observe only on the respondents.
Use yR (=sample mean of the respondents) to estimate the population
mean Y .
Bias (yR).=
NM
N
(YR − YM
)V ar (yR)
.=
1
nRS2R
• Two problems :
– Biased : YR = YM
– Large variance due to nR < n
8.2. CALL-BACKS 123
8.2 Call-backs
• Basic setup:
Among the nM nonrespondents, contact νnM (0 < ν < 1) to get the
responses.
Pop. Original Final Sample Sample
Stratum Size Sample Size Size Mean
Resp. NR nR r1 = nR y1
Nonresp. NM nM r2 = νnM y2
N n r
– Point estimation :ˆY =
nR
ny1 +
nM
ny2
– Variance :
V ar(ˆY)=
1
n
(1− n
N
)S2 +
W2S22
n
(1
ν− 1
)where W2 = N−1NM .
– Cost function:
C = c0n+ c1W1n+ c2W2νn
– Optimal sample size (n) and call-back rate (ν) : Minimize the
variance for the given cost
ν∗ =
√c0 + c1W1
c2× S2
2
S2 −W2S22
8.3 Nonresponse weighting adjustment
• Under no missing data:
YHT =∑i∈S
1
πiyi
124 CHAPTER 8. NONRESPONSE
• Response indicator function:
Ri =
1 if unit i responds,
0 if unit i does not respond.
• Idea : Use two-phase sampling approach
Population (U)Phase1→ Sample (S)
Phase2→ Respondents (SR)
• Estimation: Let πi|S = Pr (Ri = 1 | S). If πi|S were known, then
Yϕ =∑i∈S
1
πi
1
πi|SRiyi
would be conditionally unbiased.
• In practice, we use an estimator πi|S of πi|S . The NWA estimator is
YNWA =∑i∈S
1
πi
1
πi|SRiyi
8.3.1 Weighting class NWA method
• A special case of NWA method. Commonly used.
• Partition the sample into G groups : S = S1 ∪ S2 ∪ · · · ∪ SG
• Assumption: Response homogeneity group (RHG) model
Pr (Ri = 1 | S) = πi|S = θgs for all i ∈ Sg
Pr (Ri = 1, Rj = 1 | S) = Pr (Ri = 1 | S)Pr (Rj = 1 | S) for all i = j ∈ S
• Under the RHS model, use
πi|S =
∑i∈Sg
π−1i Ri∑
i∈Sgπ−1i
for i ∈ Sg.
8.3. NONRESPONSE WEIGHTING ADJUSTMENT 125
• Weighting class NWA estimator
YNWA =G∑
g=1
∑i∈Sg
π−1i π−1
i|SRiyi =G∑
g=1
NgyRg
where Ng =∑
i∈Sgπ−1i and
yRg =
∑i∈Sg
π−1i Riyi∑
i∈Sgπ−1i Ri
• Define YRg =∑
i∈Sgπ−1i π−1
i|SRiyi and NRg =∑
i∈Sgπ−1i π−1
i|SRi. The
weighting class NWA estimator can be written
YNWA =G∑
g=1
NgYRg
NRg
126 CHAPTER 8. NONRESPONSE
• Linearization :
YNWA =
G∑g=1
NgYRg
NRg
.=
G∑g=1
Ng
Yg
Ng
+YRg − ygNRg
Ng
where yg = Yg/Ng and Yg =∑
i∈Sgπ−1i yi. Thus,
YNWA.= YHT +
G∑g=1
∑i∈Sg
π−1i π−1
i|SRi (yi − yg)
• Approximately unbiased.
• Asymptotic variance
V(YNWA
).= V
(YHT
)+ V
G∑
g=1
∑i∈Sg
π−1i π−1
i|SRi (yi − yg)
= V
(YHT
)+ E
G∑
g=1
∑i∈Sg
π−2i
(π−1i|S − 1
)(yi − yg)
2
= V1 + V2
• Variance estimation:
V = V1 + V2
where(V1, V2
)is an unbiased estimator of (V1, V2). Since
V1 = E
∑i∈S
(1− πi)y2iπ2i
+∑i,j∈S
∑i=j
πij − πiπjπij
yiπi
yjπj
,
we can use
V1 =∑i∈SR
(1− πi)
πi|S
y2iπ2i
+∑
i,j∈SR
∑i=j
πij − πiπjπij
yiπiπi|S
yjπj πj|S
and
V2 =
G∑g=1
∑i∈SRg
π−2i π−1
i|S
(π−1i|S − 1
)(yi − yRg)
2 .
8.3. NONRESPONSE WEIGHTING ADJUSTMENT 127
8.3.2 Estimators that use weighting as well as auxiliary vari-
ables
• Motivation
– The NWA estimator uses a model for Ri.
– In addition to the model for Ri, we want to use a model for yi
given xi, where xi is always observed.
• NWA regression estimator
Yreg =∑i∈S
1
πiyi +
∑i∈SR
1
πi
1
πi|S(yi − yi)
where yi = x′iBR and BR =
(∑i∈SR
π−1i π−1
i|Sxix′i
)−1∑i∈SR
π−1i π−1
i|Sxiyi
• Properties
Yreg = YNWA +(XHT − XNWA
)′BR
.= YNWA +
(XHT − XNWA
)′B
where B is the probability limit of BR and
XNWA =∑i∈SR
1
πi
1
πi|Sxi.
Furthermore, under the weighting class NWA method, we have
YNWA.= YHT +
G∑g=1
∑i∈Sg
π−1i π−1
i|SRi (yi − yg)
and
XNWA.= XHT +
G∑g=1
∑i∈Sg
π−1i π−1
i|SRi (xi − xg) .
Thus,
Yreg.= YHT +
G∑g=1
∑i∈Sg
π−1i π−1
i|SRi (ei − eg)
where ei = yi − x′iB and eg =
(∑i∈Sg
π−1i
)−1∑i∈Sg
π−1i ei.
128 CHAPTER 8. NONRESPONSE
• Approximately unbiased.
• Asymptotic variance
V(Yreg
)= V
(YHT
)+ E
G∑
g=1
∑i∈Sg
π−2i
(π−1i|S − 1
)(ei − eg)
2
.
• Variance estimation: Same V1 for NWA estimator. For V2, use
V2 =
G∑g=1
∑i∈SRg
π−2i π−1
i|S
(π−1i|S − 1
)(ei − eRg)
2 .
8.4 Imputation
8.4.1 Introduction
• Meaning: Fill in missing values by a plausible value (or by a set of
plausible values)
• Why imputation ?
– It provides a complete data file: we can apply the standard com-
plete data methods
– By filling in missing values, the analyses from different users can
be consistent.
– By a proper choice of imputation model, we may reduce the non-
response bias.
– Do not want to delete the records of partial information: Makes
full use of information. (i.e. reduce the variance)
• Basic setup
– yi: study variable. subject to missing.
– xi: auxiliary variable. always observed.
8.4. IMPUTATION 129
– Ii: sampling indicator function for unit i
– Ri: response indicator function for yi.
• Lemma 1: Let Yn =∑
i∈S π−1i yi be an unbiased estimator of Y =∑N
i=1 yi under complete response. If yi is not observed at Ri = 0 and
if we can find y∗i that satisfies
E (y∗i | Ii = 1, Ri = 0) = E (yi | Ii = 1, Ri = 0) (8.1)
then the imputed estimator of the form
YI =∑i∈S
1
πiRiyi + (1−Ri) y
∗i (8.2)
is unbiased for Y in the sense that E(YI − Y
)= 0.
Proof:
Note that
EYn | I,R
= E
∑i∈S
1
πiRiyi + (1−Ri) yi | I,R
=∑i∈S
1
πiRiE (yi | Ii = 1, Ri = 1) + (1−Ri)E (yi | Ii = 1, Ri = 0) .
Similarly,
EYI | I,R
=
∑i∈S
1
πiRiE (yi | Ii = 1, Ri = 1) + (1−Ri)E (y∗i | Ii = 1, Ri = 0) .
and so, by (9.1),
EYn − YI | I,R
= 0,
which implies E(YI − Y
)= 0.
• How to get y∗i satisfying (9.1)?
– Deterministic imputation: Use an estimator ofE (yi | Ii = 1, Ri = 0).
– Stochastic imputation: Generate y∗i from f (yi | Ii = 1, Ri = 0).
130 CHAPTER 8. NONRESPONSE
• Approaches of computing the conditional distribution f (yi | Ii = 1, Ri = 0):
– Assuming Missing Completely At Random (MCAR):
f (yi | Ii = 1, Ri = 0) = f (yi | Ii = 1, Ri = 1) . (8.3)
Under MCAR, we can estimate the parameter using the set of
respondents. However, the MCAR may not be realistic.
– Assume that there exists an auxiliary vector xi such that
f (yi | xi, Ii = 1, Ri = 0) = f (yi | xi, Ii = 1, Ri = 1) . (8.4)
Condition (9.3) is called Missing At Random (MAR). Under MAR,
we have
E (yi | Ii = 1, Ri = 0) = E E (yi | xi, Ii = 1, Ri = 0) | Ii = 1, Ri = 0
= E E (yi | xi, Ii = 1, Ri = 1) | Ii = 1, Ri = 0 .
Thus, we have only to generate y∗i from f (yi | xi, Ii = 1, Ri = 1).
8.4. IMPUTATION 131
• Lemma 2: Let y∗i be the imputed value of yi. If
E (y∗i | xi, Ii = 1, Ri = 1) = E (yi | xi, Ii = 1, Ri = 1) (8.5)
and MAR condition holds, then the imputed estimator YI in (8.2) is
unbiased.
Proof:
Note that
EYn − YI | X, I,R
=
∑i∈S
1
πi(1−Ri) E (yi | xi, Ii = 1, Ri = 0)− E (y∗i | xi, Ii = 1, Ri = 0)
=∑i∈S
1
πi(1−Ri) E (yi | xi, Ii = 1, Ri = 1)− E (y∗i | xi, Ii = 1, Ri = 1)
where the second equality follows from MAR condition. Thus, by
(8.5),
EYn − YI | X, I,R
= 0,
which implies E(YI − Y
)= 0.
• When the MAR condition holds? : If the response mechanism satisfies
Pr (Ri = 1 | yi,xi, Ii = 1) = Pr (Ri = 1 | xi, Ii = 1)
then (9.3) holds.
• Commonly used imputation methods
1. Business surveys: Ratio, regression, nearest neighbor imputation
2. Socio-economic surveys: Random donor (within classes), stochas-
tic ratio or regression, Fractional Imputation, Multiple imputa-
tion.
132 CHAPTER 8. NONRESPONSE
8.4.2 Deterministic imputation
• Assumptions
– E (yi | xi, Ii = 1) = x′iβ with an unknown β.
– V (yi | xi, Ii = 1) = ciσ2 with known ci = c (xi) and unknown σ2.
– Missing at random
• Imputed estimator of Y :
YI =∑i∈S
1
πi
Riyi + (1−Ri)x
′iβ
where β =(∑
i∈S c−1i π−1
i Rixix′i
)−1∑i∈S c−1
i π−1i Rixiyi.
• Example:
1. Cell mean imputation: xi is a vector of indicator functions of the
cells. If xi = 1, then it leads to mean imputation.
2. Ratio imputation
3. Regression imputation
8.4. IMPUTATION 133
• Property
1. E(β | x, I, R
)= β
2. Let y∗i = x′iβ be the imputed value of yi. Then, condition (8.5)
holds and the imputed estimator is unbiased by Lemma 2.
3. To discuss the variance and its estimation, we need to use a Taylor
linearization method.
• Taylor linearization: Write YI as a function of β and use
YI(β).= YI (β) +
∂YI (β)
∂β
(β − β
),
we have
YI(β).=∑i∈S
1
πi
xiβ +Ri (1 + ki)
(yi − x′
iβ)
where
ki =
∑i∈S
π−1i (1−Ri)x
′i
∑i∈S
ciπ−1i Rixix
′i
−1
cixi.
• Asymptotic variance
V(YI
).= V
∑i∈S
1
πixiβ
+ E
∑i∈S
1
π2i
Ri (1 + ki)2 ciσ
2
.
Note that
V(Yn
).= V
∑i∈S
1
πixiβ
+ E
∑i∈S
1
π2i
ciσ2
.
• Variance estimation
V(YI
)=∑i∈S
∑j∈S
πij − πiπjπij
ηiπi
ηjπj
where ηi = xiβ +Ri (1 + ki)(yi − x′
iβ).
134 CHAPTER 8. NONRESPONSE
8.4.3 Stochastic imputation
• Motivation: Deterministic imputation may provide an efficient esti-
mator for the mean, but may provide biased estimates for other pa-
rameters.
• Example
1. Mean imputation provides biased estimate for proportion Pr (a < Y < b).
2. Regression imputation underestimates the population variance.
• Remedy: Add more variability in the imputed values.
– Use y∗i = yi+e∗i , where e∗i is randomly selected from ei = yi − yi; i ∈ SR.
– Generate y∗i from f (y | xi, Ii = 1) so that EI (yi) = yi and EI
denotes the expectation over the imputation mechanism.
• Hot deck imputation
– Partition the sample into G groups: S = S1 ∪ S2 ∪ · · · ∪ SG.
– In group g, we have ng elements, rg respondents, andmg = ng−rgnonrespondents.
– For each group Sg, select mg imputed values from rg respondents
with replacement (or without replacement).
– This hot deck imputation method is often justified under the IID
model within groups and is commonly used in household surveys.
Example 1: Hot deck imputation under SRS
• Sg = SRg∪SMg with SRg = i ∈ Sg;Ri = 1 and SMg = i ∈ Sg;Ri = 0.
• Imputation mechanism: y∗ji.i.d.∼ Uniform yi; i ∈ SRg. That is, y∗j =
yi with probability 1/rg for i ∈ SRg and j ∈ SMg.
8.4. IMPUTATION 135
• Imputed estimator of Y :
ˆYI =1
n
∑i∈SRiyi + (1−Ri) y
∗i
• Variance
V(ˆYI
)= V
EI
(ˆYI
)+ E
VI
(ˆYI
)= V
n−1G∑
g=1
ngyRg
+ E
n−2G∑
g=1
mg
(1− r−1
g
)S2Rg
where yRg = r−1
g
∑i∈SRg
yi and S2Rg = (rg − 1)−1∑
i∈SRg(yi − yRg)
2.
Under the model
yi | (i ∈ Sg)i.i.d.∼
(µg, σ
2g
),
the variance can be written
V(ˆYI
)= V
n−1G∑
g=1
ngµg
+ E
n−2G∑
g=1
(ng + 2mg +
mg (mg − 1)
rg
)σ2g
.
Note that
V(ˆYn
)= V
n−1G∑
g=1
ngµg
+E
n−2G∑
g=1
ngσ2g
.
Thus, the variance is increased.
136 CHAPTER 8. NONRESPONSE
8.4.4 Variance estimation after imputation
• Variance is increased after imputation. Two sources:
1. Reduced sample size.
2. Randomness due to imputation mechanism in stochastic imputa-
tion.
• Naive approach: Treating the imputed values as if observed and ap-
plying the standard variance estimation formula to the imputed data.
• Naive approach underestimates the true variance !
• Example 1 (Continued): The naive variance estimator VI = n−1S2I has
expectation
EVI
= V
n−1G∑
g=1
ngµg
+1
n (n− 1)E
G∑
g=1
(ng −
ng
n− 2
mg
n− mg(mg − 1)
nrg
)σ2g
.= V
(ˆYn
)• Approach : Write
V(ˆYI
)= V
(ˆYn
)+ E
G∑
g=1
cgσ2g
for some cg. The (approximate) bias-corrected estimator is
V = VI +G∑
g=1
cgS2Rg
8.4. IMPUTATION 137
• Other approaches:
– Multiple imputation: Rubin (1987)
– Adjusted jackknife: Rao and Shao (1992)
– Fractional imputation: Kim and Fuller (2004)
– Linearization: Shao and Steel (1999), Kim and Rao (2009)
• Multiple imputation method
– Idea: Instead of creating a single imputed data set, create M(>
1) imputed data set independently to get M point estimators
θI(1), · · · , θI(M) andM naive variance estimators VI(1), · · · , VI(M).
The final point estimator is
θMI =1
M
M∑k=1
θI(k)
and its variance estimator is
VMI =1
M
M∑k=1
VI(k) +
(1 +
1
M
)1
M − 1
M∑k=1
(θI(k) − θMI
)2.
– Multiple imputation is justified under the following condition:
V(θI
)= V
(θn
)+ V
(θI − θn
)– Approximate Bayesian bootstrap imputation:
[Step 1] First select y∗ji.i.d.∼ Uniform yi; i ∈ SRg, j ∈ SRg.
[Step 2] The final imputed values are selected from y∗∗ji.i.d.∼
Uniform y∗i ; i ∈ SRg.
• Adjusted jackknife
– Idea: For the imputed value y∗i = yi + e∗i , we create the k-th
replicate of y∗i by y∗(k)i = y
(k)i + e∗i where y
(k)i = x′
iβ(k)
and β(k)
is computed from the formula β by deleting (xk, yk).
138 CHAPTER 8. NONRESPONSE
– The variance estimation method is justified because
V(θI
)= V
(θId
)+ V
(θI − θId
)where θId is the imputed estimator using deterministic imputa-
tion. The first term is computed using either a jackknife method
(Rao and Shao, 1992) or a linearization method (Kim and Rao,
2009).
• Fractional imputation
– Idea: Split the record with missing item into M imputed values
y∗i(k) = yi + e∗i(k) with fractional weights w∗i(k) such that
M∑k=1
w∗i(k)
(1, e∗i(k)
)= (1, 0)
holds. The fractional imputation estimator is algebraically equiv-
alent to the deterministic imputation estimator but also provide
consistent estimates for other parameters.
– Variance estimation can be easily computed by a replication method.
Chapter 9
Small Area Estimation
9.1 Introduction
• Original sample A is decomposed into G domains such that A = A1 ∪· · · ∪AG and n = n1 + · · ·+ nG
• n is large but ng can be very small.
• Direct estimator of Yg =∑
i∈Ugyi
Yd,g =∑i∈Ag
1
πiyi
– Unbiased
– May have high variance. (CV > 30%: unacceptable for official
statistics)
• Synthetic estimator of Yd
Ys,g = Xgβ
where Xd =∑
i∈Udxi is the known total of xi in Ud and β is the
estimated regression coefficient for the regression of Yg on Xg.
– Low variance.
139
140 CHAPTER 9. SMALL AREA ESTIMATION
– Could be biased (unless∑
i∈Ud(yi − x′
iB) = 0)
• Composite estimation: consider
Yc,g = αgYd,g + (1− αg) Ys,g
for some αg ∈ (0, 1). We are interested in finding α∗g that minimizes
the MSE of Yc. The optimal choice is
α∗g∼=
MSE(Ys,g
)MSE
(Yd,g
)+MSE
(Ys,g
)– For the direct estimation part, MSE
(Yd,g
)= V
(Yd,g
)can be
estimated.
– For the synthetic estimation part, under the model,
Yg = X′gβ + ug
where ug ∼ N(0, σ2
u
), MSE
(Ys,g
)∼= N2
gV (ug) = N2g σ
2u
Thus, use
Yc,g = α∗gYd,g +
(1− α∗
g
)Ys,g
where
α∗g∼=
N2g σ
2u
Vg +N2g σ
2u
and Vg is an unbiased estimator of V(Yd,g
)and σ2
u is an unbiased
estimator of σ2u.
9.2 Area level estimation
• Parameter of interest: Yg = N−1g
∑i∈Ug
yi
• ModelˆYd,g ∼ N
(Yg, Vg
)
9.2. AREA LEVEL ESTIMATION 141
with Vg = V ( ˆYd,g),
Yg = X′gβ + ug
and ug ∼ N(0, σ2
u
).
• Best unbiased predictor of Yg assuming that β, Vg and σ2u are known:
ˆY ∗g = E
Yg | ˆYd,g
= X′
gβ +Eug | ˆYd,g
= X′
gβ +σ2u
σ2u + Vg
(ˆYd,g − X′
gβ)
Thus,ˆY ∗g = α∗
gˆYd,g +
(1− α∗
g
)X′
gβ (9.1)
where α∗g = σ2
u/(Vg + σ2u).
• Alternative derivation: BLUP
ˆYd,g = Yg + eg
X′gβ = Yg − ug
where eg and ug are independent error terms with mean zeros and
variance Vg and σ2u, respectively. The BLUP of Yg is given by (9.1).
• Remark: The model should be a model for the domain mean of Y . If
it is about the domain total, the model can be written as
Yg = X′gβ +Ngug
and the composite estimator will be changed.
• MSE: If β, Vg, and σ2u are known, then
MSE(ˆY ∗g
)= V
(ˆY ∗g − Yg
)= V
α∗g
(ˆYd,g − Yg
)+(1− α∗
g
) (X′
gβ − Yg)
=(α∗g
)2Vg +
(1− α∗
g
)2σ2u
= α∗gVg =
(1− α∗
g
)σ2u.
142 CHAPTER 9. SMALL AREA ESTIMATION
Note that, since 0 < α∗g < 1,
MSE(ˆY ∗g
)< Vg
and
MSE(ˆY ∗g
)< σ2
u.
• When β is unknown (and Vg and σ2u are known):
β =
G∑g=1
wgXgX′g
−1G∑
g=1
wgXgˆYd,g
where wg =(σ2u + Vg
)−1. The EBLUP is
ˆY ∗g (β) = α∗
gˆYd,g +
(1− α∗
g
)X′
gβ (9.2)
The MSE is
MSEˆY ∗g (β)
= V
ˆY ∗g (β)− Yg
= V
α∗g
(ˆYd,g − Yg
)+(1− α∗
g
) (X′
gβ − Yg
)=
(α∗g
)2Vg +
(1− α∗
g
)2 σ2u + X′
gV (β)Xg
= α∗
gVg +(1− α∗
g
)2X′
gV (β)Xg.
where
V(β)=
G∑
g=1
wgXgX′g
−1
• If β and σ2u are unknown:
1. Find a consistent estimator of β and σ2u.
2. UseˆY ∗g (α
∗g, β) = α∗
gˆYd,g +
(1− α∗
g
)X′
gβ. (9.3)
where α∗g = σ2
u/(Vg + σ2u)
9.3. EXTENSIONS 143
• MSE
MSEˆY ∗g (α
∗g, β)
= V
ˆY ∗g (α
∗g, β)− Yg
= V
α∗g
(ˆYd,g − Yg
)+(1− α∗
g
) (X′
gβ − Yg
)=
(α∗g
)2Vg +
(1− α∗
g
)2 σ2u + X′
gV (β)Xg
+ V (αg)
Vg + σ2
u
= α∗
gVg +(1− α∗
g
)2X′
gV (β)Xg + V (αg)Vg + σ2
u
• MSE estimation (Prasad and Rao, 1990):
ˆMSEˆY ∗g (α
∗g, β)
= α∗
gVg+(1− α∗
g
)2X′
gV (β)Xg+2V (αg)Vg + σ2
u
.
• Estimation of σ2u: Method of moment
σ2u =
∑g
kg
G
G− p
(ˆYd,g − X′
gβ)2− Vd,g
,
where kg ∝σ2u + Vg
−1and
∑Gg=1 kg = 1. If σ2
u is negative, then we
set it to zero.
9.3 Extensions
• Unit level estimation: Battese, Harter, and Fuller (1988).
Use a unit level modeling
ygi = x′giβ + ug + egi
and
Y ∗g =
∑i∈Ug
x′giβ + ug
.
It can be shown that
ˆY ∗g = α∗
gYreg,g +(1− α∗
g
)Ys,g
where
Yreg,g = ˆYd,g +(Xg − ˆXd,g
)′β
144 CHAPTER 9. SMALL AREA ESTIMATION
and
Ys,g = X′gβ.
9.3. EXTENSIONS 145
• Benchmarked small area estimation: Wang, Fuller, and Qu (2009).
– sum of the small area estimates is not necessarily equal to Y =∑i∈A
1πiyi
– It is desired to make the benchmarking condition holds:
G∑g=1
NgˆY ∗g = Y
– Idea: SinceˆY ∗g = X′
gβ + α∗g
(ˆYd,g − X′
gβ),
we can adjust σ2u so that
G∑g=1
Ngα∗g
(ˆYd,g − X′
gβ)= 0.
• For other applications, read “Small Area Estimation” by Rao (2003).