stat 521: survey sampling - iowa state...

STAT 521: Survey Sampling

Jae Kwang Kim

Iowa State University

Contents

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Probability Sampling . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Definition & Notation . . . . . . . . . . . . . . . . . 4

1.3.2 Probability sampling . . . . . . . . . . . . . . . . . . 6

1.4 Basic Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Horvitz-Thompson estimation 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Simple random sampling . . . . . . . . . . . . . . . . . . . . . 19

2.4 Domain estimation . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Element sampling design 25

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Poisson sampling . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 PPS sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 πps sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Systematic sampling . . . . . . . . . . . . . . . . . . . . . . . 36

3.6 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . 42

iii

iv CONTENTS

3.7 Systematic πps . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Cluster sampling 47

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Single-Stage Cluster Sampling . . . . . . . . . . . . . . . . . . 49

4.3 Two-stage Sampling . . . . . . . . . . . . . . . . . . . . . . . 57

5 Estimation 65

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 Large sample theory . . . . . . . . . . . . . . . . . . . . . . . 68

5.3 Ratio estimation . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4 Regression estimation . . . . . . . . . . . . . . . . . . . . . . 73

5.5 GREG estimation . . . . . . . . . . . . . . . . . . . . . . . . 79

5.6 Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . . 88

6 Variance estimation 93

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.2 Taylor series linearization . . . . . . . . . . . . . . . . . . . . 97

6.3 Replication method . . . . . . . . . . . . . . . . . . . . . . . . 100

6.3.1 Independent Random Group Method . . . . . . . . . . 102

6.3.2 Non-independent Random Groups . . . . . . . . . . . 103

6.3.3 Jackknife method for variance estimation . . . . . . . 106

6.3.4 Balanced repeated replication . . . . . . . . . . . . . . 111

7 Two-phase sampling 113

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.2 Two-phase sampling for stratification . . . . . . . . . . . . . . 115

7.3 Two-phase regression estimator . . . . . . . . . . . . . . . . . 116

7.4 Repeated survey . . . . . . . . . . . . . . . . . . . . . . . . . 118

8 Nonresponse 121

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

CONTENTS v

8.2 Call-backs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3 Nonresponse weighting adjustment . . . . . . . . . . . . . . . 123

8.3.1 Weighting class NWA method . . . . . . . . . . . . . 124

8.3.2 Estimators that use weighting as well as auxiliary vari-

ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.4 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 128

8.4.2 Deterministic imputation . . . . . . . . . . . . . . . . 132

8.4.3 Stochastic imputation . . . . . . . . . . . . . . . . . . 134

8.4.4 Variance estimation after imputation . . . . . . . . . . 136

9 Small Area Estimation 139

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.2 Area level estimation . . . . . . . . . . . . . . . . . . . . . . . 140

9.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Chapter 1

Introduction

1.1 Introduction

• Population

– Finite popoluation vs Infinite population

– Target population vs Survey population

• Sample: subset of (finite) population

• Sampling: a subject of selecting a sample from a (finite) population.

• Why sampling ?

1. To reduce the cost

2. To save the time

3. Sometimes, to get a more accurate information about the popu-

lation.

4. Sometimes, it is the only way of getting information about the

target population.

• Sampling error: An error due to the fact that only a subset of finite

population is selected for observation.

1

2 CHAPTER 1. INTRODUCTION

• Two types of sampling

– Probability sampling

– Non-probability sampling

• Roughly speaking, a probability rule is assigned to obtain a sample in

probability sampling.

1.2 Example

Example 1.1. Consider the following artificial population of farms (of size

N = 4)

ID Farm size Farm yield (y)

1 4 1

2 6 3

3 6 5

4 20 15

Instead of selecting the N = 4 farms, suppose that we want to select only

n = 2 sample farms and observe yi in the sample. The parameter of interest

is the average farm yield ((1 + 3 + 5 + 15)/4 = 6). Assume that the farm

size is known for each farm in the population. How to select the samples?

• In this example, six possible ways of selecting the sample farms.

Case Sample ID Sample Mean Sampling Error

1 1, 2 2 -4

2 1, 3 3 -3

3 1, 4 8 2

4 2, 3 4 -2

5 2, 4 9 3

6 3, 4 10 4

1.2. EXAMPLE 3

• We select only one sample from the six possible samples.

• In any situation, sampling error exists.

• How to select one sample from the six possible samples ?

– Non-probability sampling


• Simple random sampling: Equal probability of selection for each pos-

sible sample.

Case Sample ID Sample mean (y) Selection Probability

1 1, 2 2 1/6

2 1, 3 3 1/6

3 1, 4 8 1/6

4 2, 3 4 1/6

5 2, 4 9 1/6

6 3, 4 10 1/6

• We can derive the probability distribution of the sample mean from

the probability distribution of the sample.

1. What is the probability distribution of the estimator y ?

2. Show that the sample mean y is unbiased for the population

mean. What is the meaning of the expectation in this case ?

3. What is the variance of y in this case ?

Remark

1. No model assumption about yi in the example: totally different frame-

work !


2. Design-based approach: the reference distribution is the sampling dis-

tribution generated by the repeated application of the given sampling

mechanism.

3. Why design-based approach ?

(a) Requires weaker assumptions: robust.

(b) Finding a good model is not easy: Useful for general purpose

estimation.

(c) Get consistent results from different users: Useful for official

statistics.

Example 1.2. Another sample design

• Unequal probability sampling

Sample ID y value Mean Estimator Selection probability

1, 4 1, 15 4.5 1/3

2, 4 3, 15 6 1/3

3, 4 5, 15 7.5 1/3

• What is the probability distribution of the mean estimator ?

• What is the expected value of the sampling error ?

• Compute the variance. Compare it with that of SRS.

1.3 Probability sampling

1.3.1 Definition & Notation

Notation

• U = 1, 2, · · · , N : index set of finite population

• A : subset of U , index set of the sample.

1.3. PROBABILITY SAMPLING 5

• A = A;A ⊂ U,P (A) > 0: set of samples under consideration, sam-

ple support.

• θ = θ(yi; i ∈ A) : statistic (a real-valued function which can be calcu-

lated given that A is selected)

Definition

1. Probability distribution of samples, or sample distribution: probability

mass function P (·) defined on A. That is, P (·) satisfies

(a) P (A) ∈ [0, 1] , ∀A ∈ A

(b)∑

A∈A P (A) = 1.

It is also called the sampling design.

2. (Induced) probability distribution of a statistic

1. Expectation : E(θ) =∑

A∈A P (A)θ (A)

2. Variance : V ar(θ)=∑

A∈A P (A)[θ (A)− E(θ)

]23. Mean squared error :

MSE(θ)

=∑A∈A

P (A)[θ (A)− θ

]2= V ar

(θ)+[E(θ)− θ

]2Note:

(a) Sampling design induces the probability distribution of the statis-

tics.

(b) Sampling design P tells us everything we need for design-based

inference.


1.3.2 Probability sampling

• Definition

1. Sample distribution is known.

2. Pr (i ∈ A) > 0 for all i ∈ U

• Why probability sampling ?

1. No subjective choice of sample

2. Can remove the sampling bias.

• What is the sampling bias ( θ : true population value, θ: an estimator

of θ) ?

• Sampling error of θ:

θ − θ =θ − E

(θ)

+E(θ)− θ

= Variation + Bias

• How to find an unbiased estimator of θ =∑N

i=1 yi ?


– Horvitz-Thompson estimator

• In non-probability sampling, “Variation=0” but bias is not zero.

• In probability sampling, “Bias=0” but variation is not zero. Variation

is small if the sample size is large.

Additional advantage about probability sampling

1. Many statistical theories are established in probability sampling.

• Large sample theory under probability sampling

(a) Law of Large Numbers

1.4. BASIC PROCEDURES 7

(b) Central limit theorem

• Additional advantage of the probability sampling under large

sample size

(a) Reduce the variance of the estimator.

(b) Can compute the confidence intervals.

1.4 Basic procedures for survey sampling

1. Planning

(a) Statement of objectives

(b) Selection of a sampling frame

2. Design and development

(a) Sample design

(b) Questionnaire design

3. Implementation

(a) Data collection

(b) Data capture and coding

(c) Editing and Imputation

(d) Estimation

(e) Data analysis

(f) Data dissemination

4. Evaluation - Documentation

1.5 Errors

1. Errors of nonobservation


• Coverage error: (Population = Frame.)

Some elements are not listed.

• Sampling error: (Frame = sample.)

Some listed elements are not sampled.

• Response error: (Sample = respondents.)

Some sampled elements does not respond.

2. Errors of observation

• Measurement error

(a) Interviewer: skill, sex, age.

(b) Respondent: lie, forget, change behavior

(c) Instrument: questionnaire, measuring device

(d) Mode: mail, phone, personal interview

• Processing error: clerical error

Remark

1. If n = N (census), there is no sampling error but we still have non-

sampling error.

2. We can decrease sampling error by increasing n.

3. Because of nonsampling error, a sample is often more accurate than

a CENSUS. For example, in labor force surveys, the questionnaire

is more detailed and interviewers are better trained than the cen-

sus. Thus, in this case information about labor market is more ac-

curate than a census (unless the census had the same questionnaire

and well-trained interviews, which is almost impossible due to opera-

tional costs.)

4. Furthermore, sampling is faster, cheaper, and broader in scope.

Chapter 2

Horvitz-Thompson

estimation

2.1 Introduction

• Sampling frame

– list frame

– area frame

• Unit

– Sampling unit

– Reporting unit = Observational unit = Element

• Two types of sampling

– Element sampling

– Cluster sampling

• Parameter of interest: Y =∑N

i=1 yi (population total of y)

9

10 CHAPTER 2. HORVITZ-THOMPSON ESTIMATION

2.2 Basic setup

Definition 2.1. 1. First order inclusion probability:

πi = Pr (i ∈ A) =∑

A; i∈AP (A)

2. Second order inclusion probability):

πij = Pr (i, j ∈ A) =∑

A; i,j∈AP (A)

3. Probability sampling design : πi > 0, ∀i ∈ U

4. Measurable sampling design : πij > 0 ∀i, j ∈ U .

5. Ik: indicator random variable with

Ik = Ik (A) =

1 if k ∈ A

0 if k /∈ A

Note that

E (Ik) = πk

E (IkIl) = πkl

V (Ik) = πk (1− πk)

C (Ik, Il) = πkl − πkπl ≡ ∆kl

6. nA =∑N

k=1 Ik (A): (realized) sample size. If nA does not depend on

A, then it is fixed in the sense that V (nA) = 0.

2.2. BASIC SETUP 11

Example 2.1. (Bernoulli Sampling)

• Each unit is selected or not selected according to the outcome of a

Bernoulli trial with inclusion probability πk = π.

• Let ϵ1, ϵ2, · · · , ϵNi.i.d.∼ Uniform (0, 1). If ϵk ≤ π then accept unit k in

the sample. If ϵk > π then do not accept unit k in the sample.

• Sample size ns : Binomial random variable.

Pr (nA = x) =

(N

x

)πx (1− π)N−x I0,1,2,··· ,N (x) .

Thus,

E (nA) = Nπ

V (nA) = Nπ (1− π)

• Sampling design

P (A) = πnA (1− π)N−nA

where nA = |A|.

• Inclusion probabilities


Lemma 2.1. (Properties of the inclusion probabilities)

1. πii = πi and πij = πji.

2. For a sampling design with (expected) sample size n,

N∑i=1

πi = n.

3. For fixed sample size design (nA = n),

N∑i=1

πij = nπj

and, for ∆ij = πij − πiπj,

N∑j=1

∆ij = 0.

2.2. BASIC SETUP 13

Definition 2.2. Horvitz-Thompson estimator of Y =∑N

i=1 yi:

YHT =∑i∈A

yiπi

It is also called π-estimator.

Theorem 2.1. (Properties of HT estimator)

1. Unbiased

E(YHT

)= Y

2. Variance

V ar(YHT

)=

N∑i=1

N∑j=1

(πij − πiπj)yiπi

yjπj

3. For fixed fixed sample size design (V (ns) = 0),

V ar(YHT

)= −1

2

N∑i=1

N∑j=1

(πij − πiπj)

(yiπi− yj

πj

)2

This formula is called Sen-Yates-Grundy (SYG) variance formula.


Example 2.2. Consider the following sampling design from a finite popu-

lation U = 1, 2, 3.

Sample (A) Pr (A) HT estimator

A1 = 1, 2 0.5

A2 = 1, 3 0.25

A3 = 2, 3 0.25

1. Compute the first-order inclusion probability of each element in the

population.

2. Find the HT estimator for each sample.

3. Check that the HT estimator is unbiased.

2.2. BASIC SETUP 15

Variance estimation

• Unbiased variance estimator : want to find a statistic V such that

E(V)= V ar

(YHT

).

• Idea :

If Q =∑N

i=1

∑Nj=1Ωijyiyj , then Q =

∑i∈A∑

j∈A π−1ij Ωijyiyj is an

unbiased estimator of Q.

• HT variance estimator:

V =∑i∈A

∑j∈A

(πij − πiπj)

πij

yiπi

yjπj

• SYG variance estimator (for fixed-size design):

VSY G = −1

2

∑i∈A

∑j∈A

(πij − πiπj)

πij

(yiπi− yj

πj

)2

• Under what condition does an unbiased variance estimator of HT es-

timator exist ?


Example 2.2 - Continued

Consider the following sampling design from a finite population U =

1, 2, 3 with y1 = 16, y2 = 21, y3 = 18.

Sample (A) Pr (A) HT estimate HT var. est. SYG var. est.

A1 = 1, 2 0.5

A2 = 1, 3 0.25

A3 = 2, 3 0.25

Check the unbiasedness of the variance estimates.

2.2. BASIC SETUP 17

Remark

1. For unbiased estimation, HT estimator can be used if the sampling

design is a probability sampling design.

2. When the HT estimator is efficient (has smaller variance) ?

(a) Note

V ar(YHT

)= −1

2

N∑i=1

N∑j=1

(πij − πiπj)

(yiπi− yj

πj

)2

When the variance is small ? (That is, how to choose πk to make

the variance small?)

(b) We don’t know yk in advance.

(c) Hence, if xk is correlated with yk, then we can choose πk ∝ xk.

(d) On other other hand, HT estimator works poorly if πk is not

correlated with yk. (Basu’s elephant example)

3. HT estimator is not location invariant.

Definition 2.3. Write θ = θ (A) = θ (yi; i ∈ A). θ is location invari-

ant if

θ (a+ yi; i ∈ A) = a+ θ (yi; i ∈ A)

for all a.

Thus, changing from “Celsius” to “Fahrenheit”: the estimates is not

the same.


Basu’s elephant example

• Circus with N=50 elephants. Want to estimate the total weights of

the elephants using a sample of size n = 1

• About three years ago, every elephant is weighted and “Sambo” was

in the middle in terms of the weight. (and “Jumbo” was the largest

one.)

• Circus owner’s idea: measure Sambo’s weight and multiply it by 50.

• Statistician: No ! It’s not a probability sampling.

• Circus owner: Well, what is your sampling scheme ?

• Statistician: Let’s select sambo with high probability. Say, select

sambo with probability 99/100, and select the other 49 elephants with

probability (1/49)(1/100).

• Circus owner: OK. Let’s select one with this scheme. (Sambo is se-

lected.) OK. Let’s multiply 50 to sambo’s weight.

• Statistician: No ! You should multiply the inverse of the first order

inclusion probability. So, you should multiply by 100/99, not by 50.

• Circus owner: ????? What if Jumbo was selected? What number

should we multiply?

• Statistician: Well, it is 4,900.

• Circus owner: What??? You are fired !

2.3. SIMPLE RANDOM SAMPLING 19

2.3 Simple random sampling

• Motivation: Choose n units from N units without replacement.

1. Each subsect of n distinct units is equally likely to be selected.

2. There are(Nn

)samples of size n from N .

3. Give equal probability of selection to each subset with n units.

Definition 2.4. Sampling design for SRS:

P (S) =

1/(

Nn

)if |A| = n

0 otherwise.

Lemma 2.2. Under SRS, the inclusion probabilities are

πi = n/N

πij =n (n− 1)

N (N − 1)for i = j.

Theorem 2.2. Under SRS design, the HT estimator

YHT =N

n

∑i∈A

yi = Ny

is unbiased for Y and has variance of the form

V(YHT

)=

N2

n

(1− n

N

)S2

where

S2 =1

2

1

N

1

N − 1

N∑i=1

N∑j=1

(yi − yj)2 =

1

N − 1

N∑i=1

(yi − Y

)2.

Also, the SYG variance estimator is

V(YHT

)=

N2

n

(1− n

N

)s2

where

s2 =1

n− 1

∑i∈A

(yi − y)2 .


Remark (under SRS)

• 1 − n/N is often called the finite population correction (FPC) term.

The FPC term can be ignored (FPC.= 1) if the sampling rate n/N is

small (≤ 0.05) or for conservative inference.

• For n = 1, the variance of the sample mean is

1

n

(1− n

N

)S2 =

1

N

N∑i=1

(yi − Y

)2 ≡ σ2Y

• Central limit theorem: under some conditions,

V −1/2(YHT − Y

)=

y − Y√1n

(1− n

N

)S2→ N (0, 1) .

• Sample size determination

1. Choose the target variance V ∗ of V (y).

2. Choose n the smallest integer satisfying

1

n

(1− n

N

)S2 ≤ V ∗.

2.4. DOMAIN ESTIMATION 21

2.4 Domain estimation

Basic setup

• Estimation for domains (subpopulation): Usually want to make infer-

ence about subpopulations as well as the whole population.

• Often, we don’t plan for all subpopulation of interest => random

sample size within subpopulations.

• Denote domain d by Ud ⊂ U . Parameters are

– Nd = |Ud|: number of elements in Ud

– Pd = Nd/N : proportion of elements in Ud. Often, N is known

but Nd is unknown.

– td =∑

i∈Udyi: domain total of y in domain d

– Yd = td/Nd: domain mean of y in domain d

Estimation

• Methods

1. Direct method: Use HT estimation yd,HT =∑

i∈UdyiIi/πi.

2. Model-based method: Use a prediction to create a synthetic value

td,syn =∑

i∈Udyi

3. Small area estimation: Compromise


Direct estimation

• For k = 1, 2, · · · , N , define

zkd =

1 if k ∈ Ud

0 if k /∈ Ud

Note that zid is not a random variable. (i.e., it does not depend on

the sampling scheme.)

• Properties of zkd

1.∑

k∈U zkd = Nd

2. Zd =∑

k∈U zkd/N = Nd/N = Pd

3.

S2zd =

1

N − 1

(∑k∈U

z2kd −NZ2d

)=

N

N − 1Pd (1− Pd)

• HT estimation of Nd

Nd =∑k∈U

zkdIkπk

Under SRS,

Nd =∑k∈U

zkdIkn/N

= Nnd/n = Npd

and

V ar(Nd

)=

N2

n

(1− n

N

)S2zd =

N2

n

(1− n− 1

N − 1

)Pd (1− Pd)

V(Nd

)=

N2

n

(1− n

N

)s2zd = N2

(1− n

N

) pd (1− pd)

n− 1.

2.4. DOMAIN ESTIMATION 23

• HT estimation of td =∑

k∈Udyk =

∑k∈U ykzkd:

td =∑k∈U

ykzkdIkπk

=∑k∈S

ykzkdπk

.

It is unbiased for td.

• HT estimator of Yd = td/Nd:

yd =td

Nd

Probably not unbiased, because it’s a non-linear function of unbiased

estimators.

• Generally, we will make population parameters look like functions of

population totals and then do HT estimation on each totals.

• The statistical properties of yd can be derived from the following ap-

proximation:

yd =td

Nd

= f(Nd, td

).= f (Nd, td) +

∂

∂tdf (Nd, td)

(td − td

)+

∂

∂Ndf (Nd, td)

(Nd −Nd

)=

tdNd

+

(1

Nd

)(td − td

)+

(− tdN2

d

)(Nd −Nd

)Thus,

V ar (yd).= V ar

1

Nd

(td − YdNd

).

Under SRS,

V ar (yd).=

(1

E(nd)− 1

Nd

)1

Nd − 1

∑i∈Ud

(yi − Yd

)2.

Chapter 3

Element sampling design

3.1 Introduction

• Element sampling vs. cluster sampling

• Taxonomy

Equal probability sampling Unequal probability sampling

SRS (without replacement) πps sampling

SRS with replacement PPS sampling

Bernoulli sampling Poisson sampling

Systematic sampling Systematic πps sampling

25

26 CHAPTER 3. ELEMENT SAMPLING DESIGN

• Why consider unequal probability sampling ?

Example: N = 4 population of companies

Farm Size y (yield)

A 100 11

B 200 20

C 300 24

D 1,000 245

Select n = 1 unit by

– With equal probability

– With probability proportional to size :

Compare the variances.

3.2. POISSON SAMPLING 27

3.2 Poisson sampling

• Definition:

Iiindep∼ Bernoulli (πi) , i = 1, 2, · · · , N.

• If πi = π, it is called Bernoulli sampling.

• Estimation

YHT =

N∑i=1

Iiyi/πi

• Variance

V ar(YHT

)=

N∑i=1

(1

πi− 1

)y2i

• Optimal design: minimize V ar(YHT

)subject to

∑Ni=1 πi = n

πi ∝ yi

To prove this, use Cauchy-Schwarz inequality(n∑

i=1

a2i

) n∑j=1

b2j

≥ ( n∑i=1

aibi

)2

with equality if and only if ai ∝ bi for all i = 1, · · · , n.


• Disadvantage: sample size is random and it can decrease the efficiency

of the HT estimator.

Example 3.2.1

N=600 of students who took a test in a university. Want to estimate

the passing rate on the test. Use a Bernoulli sampling with π = 1/6.

ns = 90 sample size is realized. Among the 90 sample students, 60

students are found to have passed. What is a reasonable estimator of

the total number of students who passed the test ?

• Remedy

1. Use an alternative estimator is

Y = N

∑Ni=1 Iiyi/πi∑Ni=1 Ii/πi

.

It is often called Hajek estimator. Its variance is

V ar(Y)

.=

N∑i=1

(1

πi− 1

)(yi − y)2 .

2. Use rejective sampling:

“conditional distribution of the Bernoulli sampling distribution

given n = n0 ⇐⇒ simple random sampling without replacement

with size n0”

3.3. PPS SAMPLING 29

3.3 PPS sampling

• Basic Setup

1. Let x1, · · · , xN be known characteristics of the population ele-

ments such that xk > 0 for all i. This xi is called the measure

of size (MOS). Examples of MOS include the size of farm, the

number of employees in a company, and the acreage of counties.

2. Wish to select a sample with the selection probability propor-

tional to xi.

3. If the sample size is equal to one, then it is an easy job to select

a sample with probability proportional to xi.

4. Probability proportional to size (PPS) sampling idea: Use m in-

dependent selection of a sample of size one with probability pro-

portional to xi. Thus, it is a with-replacement sampling in the

sense that, once drawn, an element is replaced into the population

so that all N elements participate in each draw.

• With-replacement sampling:

– Pro: Easy to implement and to investigate its properties. Maybe

a good approximation of without-replacement sampling if n/N is

negligible.

– Con: Sample elements can be duplicated. Inefficient.


Example : Simple random sampling with replacement

• Make m independent selections.

• Each element has 1/N probability of selection.

• On the i-th draw (i = 1, · · · ,m), unit ki is selected and replaced.

• Probability that unit k is drawn r times

=

(m

r

)(1

N

)r (1− 1

N

)m−r

• First order inclusion probability

πk = 1−(1− 1

N

)m

• Second order inclusion probability

πkl = 1− 2

(1− 1

N

)m

+

(1− 2

N

)m

, k = l

• HT estimator is not useful because the actual sample size (excluding

the duplication) is random.


How to select a pps sample with n = 1?

1. Cumulative total method

[Step 1] Set T0 = 0 and compute Tk = Tk−1 + xk, k = 1, 2, · · · , N .

[Step 2] Draw ϵ ∼ Unif (0, 1). If ϵ ∈ (Tk−1/TN , Tk/TN ), element k is

selected.

Very popular. It needs a list of all xk in the population.

2. Lahiri’s method

[Step 0] Choose M > x1, x2, · · · , xN. Set r = 1.

[Step 1] Draw kr by SRS from 1, 2, · · · , N.

[Step 2] Draw ϵr ∼ Unif (0, 1).

[Step 3] If ϵr ≤ xkr/M , then select element kr and stop. Otherwise,

reject kr and goto Step 1 with r = r + 1.

The basic idea is based on the rejection algorithm due to Von Ney-

mann.


In with-replacement sampling, order of the sample selection is important.

• Ordered sample

OS = (k1, k2, · · · , km)

where ki is the index of the element in the i-th with-replacement sam-

pling.

• Sample: S = k; k = ki for some i, i = 1, 2, · · · ,m

• Unequal probability with replacement: Consider p1, p2, · · · , pN > 0

such that∑N

i=1 pi = 1. We can construct pk from xk (MOS) by pk =

xk/∑N

i=1 xi. On the i-th draw, label k is selected with probability pk.

That is Pr (ki = k) = pk. Note that

πk = Pr (k ∈ S)

= 1− Pr (k /∈ S)

= 1− (1− pk)m

1. For m = 1, πk = pk.

2. For m > 1 and pk’s are small, πk.= mpk.


• Estimator of t =∑N

i=1 yi:

1. First, define

Zi =ykipki

=

N∑k=1

ykpk

I (ki = k) .

Note that Z1, · · · , Zm are independent random variables since the

m draws are independent.

2. Z1, · · · , Zm are identically distributed since the same probabil-

ities are used at each draw, where E (Zi) = t and V (Zi) =∑Nk=1

(ykpk− t)2

pk ≡ V1.

3. Thus, Z1, · · · , Zm are IID with mean t and variance V1. Use

z =∑m

k=1 Zk/m to estimate t.

4. Hansen-Hurwitz estimator:

tpwr ≡m∑k=1

Zk/m

where Zi = yk/pk if ki = k.

5. Properties

(a) Unbiased estimator of t

(b) V(tpwr

)= V1/m by standard result.

(c) Unbiased estimator of V(tpwr

)is

V(tpwr

)=

1

m

1

m− 1

m∑i=1

(zi − z)2

(d) For large m, tpwr is AN (t, V1/m).


3.4 πps sampling

• Ideally,

1. The actual selection of the sample is relatively simple.

2. The first-order inclusion probabilities πk are strictly proportional

to xk.

3. The second-order inclusion probabilities satisfy πkl > 0 for all

k = l. (measurable sampling design)

4. The πkl can be computed without very heavy calculations.

5. ∆kl = πkl − πkπl < 0 for all k = l to guarantee that the SYG

variance estimator is always nonnegative.

• Motivation:

– PPS sampling satisfies the above conditions but it can have du-

plicated sample elements → inefficient.

– Want to find a fixed-size sampling design with πk ∝ xk where

xk > 0 and known. It is called πps design.

• Remark: For fixed-size design, πk ∝ xk and∑N

i=1 πi = n leads to

πk =nxk∑Ni=1 xi

,

which can be contradictory to the fact that πk → 1 for n→ N . Also,

πk can be grater than 1 if xk is extremely large.

3.4. πPS SAMPLING 35

πps sample with n = 2:

• Notation

– θi: the probability of selecting i in the first sample selection.

– θj|i: the probability of selecting j in the second sample selection

given that i is selected in the first sample selection.


– Second order inclusion probability (for i = j)

πij = θiθj|i + θjθi|j

– First order inclusion probability: Since∑

j =i πij = πi,

πi = θi +∑j =i

θjθi|j

– Restrictions on θi and θj|i:

θi +∑j =i

θjθi|j = 2pi

where pi = xi/∑N

k=1 xk.


πps sample with n = 2:

• Brewer (1963) method: Use

θi ∝pi (1− pi)

1− 2pi

and

θj|i ∝ pj

• Durbin (1967) method: Use

θi ∝ pi

and

θj|i ∝ pj

(1

1− 2pi+

1

1− 2pj

)• The two methods produce the same inclusion probabilities

πij =2pipj1 +K

(1

1− 2pi+

1

1− 2pj

)where K =

∑Ni=1 (1− 2pi)

−1 pi. Thus,

πi =∑j =i

πij = 2pi.

3.5 Systematic sampling

• Setup:

1. Have N elements in a list.

2. Choose a positive integer, a, called sampling interval. Let n =

[N/a]. That is, N = na+ c, where c is an integer 0 ≤ c < a.

3. Select a random start, r, from 1, 2, · · · , a with equal probability.

4. The final sample is

S = r, r + a, r + 2a, · · · , r + (n− 1)a , if c < r ≤ a

= r, r + a, r + 2a, · · · , r + na , if 1 ≤ r ≤ c.

3.5. SYSTEMATIC SAMPLING 37

• Sample size can be random

ns =

n if c < r ≤ a

n+ 1 if r ≤ c


πk =

πkl =


Remark

• This is very easy to do.

• This is a probability sampling design.

• This is not measurable sampling design: No design-unbiased estimator

of variance (because only one random draw)

• Pick one set of elements (which always go together) & measure each

one: Later, we will call this cluster sampling.

• Divide population into non-overlapping groups & choose an element

in each group: closely related to stratification.


Estimation

• Partition the population into a groups

U = S1 ∪ S2 ∪ · · · ∪ Sa

where Si: disjoint

• Population total

t =∑i∈U

yi =

a∑r=1

∑k∈Sr

yk =

a∑r=1

tSr

• Think of finite population with a elements with measurements tS1 , · · · , tSa .

• HT estimator:

tHT =tSr

1/a

• Variance: Note that we are doing SRS from the population of a ele-

ments tS1 , · · · , tSa.

V ar(tHT

)=

a2

1

(1− 1

a

)S2t

where

S2t =

1

a− 1

a∑r=1

(tSr − t)2

and t =∑a

r=1 tSr/a = t/a.

• When the variance is small ?


Estimation - Continued

• Now, assuming N = na

V (tHT ) = a (a− 1)S2t

= n2a

a∑r=1

(ySr − yu)2

where ySr = tSr/n and yu = t/n.

• ANOVA

SST =∑k∈U

(yk − yu)2

=

a∑r=1

∑k∈Sr

(yk − ySr)2 + n

a∑r=1

(ySr − yu)2

= SSW + SSB

Then,

V (tHT ) = na · SSB = N · SSB = N (SST − SSW ) .

• If SSB is small, then ySr are more alike and V (tHT ) is small.

• If SSW is small, then V (tHT ) is large.

• Intraclass correlation coefficient ρ measures homogeniety of clusters.

ρ = 1− n

n− 1

SSW

SST

More details about ρ will be covered in the cluster sampling.


Comparison between systematic sampling (SY) and SRS

• How does SY compare to SRS when the population is sorted by the

following way ?

1. Random ordering: Intuitively should be the same

2. Linear ordering: SY should be better than SRS

3. Periodic ordering: if period = a, SY can be terrible.

4. Autocorrelated order: Successive yk’s tend to lie on the same side

of yu. Thus, SY should be better than SRS.

• How to quantify ? :

VSRS

(tHT

)=

N2

n

(1− n

N

) 1

N − 1

N∑k=1

(yk − YN

)2VSY

(tHT

)= n2a

a∑r=1

(ySr − yu)2

Cochran (1946) introduced superpopulation model to deal with this

problem. (treat yk as a random variable)

• Example: Superpopulation model for a population in random order.

Denote the model by ζ: yk iid(µ, σ2

)Eζ

VSRS

(tHT

)=

N2

n

(1− n

N

)σ2

Eζ

VSY

(tHT

)=

N2

n

(1− n

N

)σ2

Thus, the model expectations of the design variances are the same

under the IID model.


3.6 Stratified sampling

• Stratified sampling:

1. The finite population is stratified into H subpopulations.

U = U1 ∪ · · · ∪ UH

2. Within each population (or stratum), samples are drawn inde-

pendently across the strata.

Pr (i ∈ Sh, j ∈ Sg) = Pr (i ∈ Sh)Pr (j ∈ Sg) , for h = g

where Sh is the index set of the sample in stratum h, h =

1, 2, · · · ,H.

• Why stratification ?

1. Control for domains of study

2. Flexibility in design and estimation

3. Convenience

4. Efficiency

3.6. STRATIFIED SAMPLING 43

• HT estimation for t =∑H

h=1 th, where th =∑

i∈Uhyi.

1. HT estimator:

tHT =H∑

h=1

th,HT

where th,HT is unbiased for th.

2. Variance

V ar(tHT

)=

H∑h=1

V ar(th,HT

)by independence

3. Variance estimation

V(tHT

)=

H∑h=1

Vh

(th,HT

)where Vh

(th,HT

)is unbiased for V ar

(th,HT

).

• Example: Stratified SRS

1. HT estimator:

tHT =

H∑h=1

Nhyh

where yh = n−1h

∑i∈Sh

yi.

2. Variance

V ar(tHT

)=

H∑h=1

N2h

nh

(1− nh

Nh

)S2h

where S2h = (Nh − 1)−1∑

i∈Uh

(yi − Yh

)2.


V(tHT

)=

H∑h=1

N2h

nh

(1− nh

Nh

)s2h

where s2h = (nh − 1)−1∑i∈Sh

(yi − yh)2.


• Sample allocation: Given n =∑H

h=1 nh, how to choose nh ?

1. Proportional allocation: choose nh ∝ Nh.

2. Optimal allocation: choose nh such that

minimize V ar(tHT

)subject to

H∑h=1

chnh = C,

where ch is the cost of observing an element in stratum h and C

is a given total cost. The solution (Neyman, 1934) is

nh ∝ NhSh/√ch

3. Properties

– Under proportional allocation, the weights are all equal.

– In general,

Vopt

(tHT

)≤ Vprop

(tHT

)≤ VSRS

(tHT

)where Vopt

(tHT

)is the variance of the stratified sampling

estimator under optimal allocation, Vprop

(tHT

)is the vari-

ance of the stratified sampling estimator under proportional

allocation, and VSRS

(tHT

)is the variance of SRS estimator.

Method of collapsed strata

• nh ≡ 1: No unbiased estimator of V ar(tHT

)under stratified sampling.

• Form pairs of strata:

t1, · · · , tH →(tj1, tj2

), j = 1, 2, · · · , H/2

where H: even

• Variance estimator

Vcoll =

H/2∑j=1

(tj1 − tj2

)2

3.6. STRATIFIED SAMPLING 45

• Property

E(Vcoll

)= E

H/2∑j=1

(tj1 − tj1

)−(tj2 − tj2

)− (tj2 − tj1)

2=

H/2∑j=1

V ar

(tj1)+ V ar

(tj2)+ (tj2 − tj1)

2

=

H∑h=1

V ar(th)+

H/2∑j=1

(tj1 − tj2)2

≥ V ar(tHT

)Thus, it is a conservative variance estimator.


3.7 Systematic πps

• Let I =∑N

i=1 xi/n. Assume xk < I for all k ∈ U .

• Systematic πps sampling

1. Choose R ∼ Unif (0, I]

2. Unit k is selected iff

k−1∑j=1

xj < R+ l · I ≤k∑

j=1

xj

for some l = 0, 1, · · · , n− 1.

• Inclusion probability: if∑k−1

j=1 xj ≥ l · I,

Pr (k ∈ S) = Pr

k−1∑j=1

xj < R+ l · I ≤k∑

j=1

xj

=

nxk∑k∈U xk

.

• Advantage:

– Easy to implement

– If xk < I, you get πps.

– Automatic stratification, like systematic sampling

• Disadvantage

– No design-unbiased variance estimator

– If xk ≥ I for some k, not strict πps.

=> Pick these with probability 1.

Chapter 4

Cluster sampling

4.1 Introduction

• Setup:

1. Frame list clusters (disjoint groups of population elements)

2. Select clusters by a probability sampling.

• Why cluster sampling?

1. Frame inadequacy: No way to get a list of elements (very expen-

sive) but relatively easy or cheap to list clusters

2. Convenience: By grouping elements into “close” subgroups, you

can save time or money

47

48 CHAPTER 4. CLUSTER SAMPLING

• Single stage cluster sampling: Sample clusters and observe all elements

in each selected cluster.

• Two-stage sampling:

[Stage 1] Population is divided into clusters, called Primary Sampling

Units (PSU’s). A probability sample of PSU’s is drawn.

[Stage 2] Each selected PSU is divided into clusters or elements,

called Secondary Sampling Units (SSU’s). A probability sample

of SSU’s is drawn in each selected PSU.

If SSU=cluster, it is called two-stage cluster sampling.

If SSU=element, it is called two-stage element sampling

• Multi-stage sampling: PSU, SSU, ..., USU (Ultimate Sampling Unit)

If USU=cluster, it is called multi-stage cluster sampling.

If USU=element, it is called multi-stage element sampling.

Probably, don’t know N . So, estimation of yU is more difficult.

4.2. SINGLE-STAGE CLUSTER SAMPLING 49

4.2 Single-Stage Cluster Sampling

• Notation (about population)

– UI = 1, · · · , NI: index set of clusters in the population

– Ui: the set of elements in the i-th cluster of sizeMi, (i = 1, 2, · · · , NI)

– yij : measurement of item y at the j-th element (j = 1, 2, · · · ,Mi)

in cluster i, i = 1, 2, · · · , NI .

– Population total: t =∑NI

i=1

∑Mij=1 yij =

∑NIi=1 ti =

∑NIi=1MiYi

where ti =∑Mi

j=1 yij = MiYi

– Population size: N =∑NI

i=1Mi

• Notation (about sample)

– SI : index set of clusters in the sample

– nI = |SI |: the number of sampled clusters

– S = ∪i∈SIUi: index set of elements in the sample

– nS = |S| =∑

i∈SIMi: the number of sampled elements

Usually, nS is not fixed even if nI is fixed.

• Single-stage cluster sampling

1. Draw a probability sample SI from UI via pI (·).

2. Observe every elements in each selected clusters.


• Cluster inclusion probability

πIi = Pr (i ∈ SI) =∑

SI ;i∈SI

pI (SI)

πIij = Pr (i, j ∈ SI) =∑

SI ;i,j∈SI

pI (SI)

• Element inclusion probability

πk = Pr (k ∈ S) = Pr (i ∈ SI) = πIi where k ∈ Ui

πkl = Pr (k, l ∈ S) =

Pr (i ∈ SI) = πIi where k, l ∈ Ui

Pr (i, j ∈ SI) = πIij where k ∈ Ui, l ∈ Uj

• HT estimation

1. Point estimator:

tHT =∑i∈SI

tiπIi

=∑i∈UI

tiIIiπIi

2. Variance

V ar(tHT

)=∑i∈UI

∑j∈UI

tiπIi

tjπIj

(πIij − πIiπIj)


V(tHT

)=∑i∈SI

∑j∈SI

tiπIi

tjπIj

∆Iij

πIij

provided πIij > 0, where ∆Iij = πIij − πIiπIj .


Remark

For fixed size design (nSI= nI),

V ar(tHT

)= −1

2

∑i∈UI

∑j∈UI

∆Iij

(tiπIi− tj

πIj

)2

1. If πIi ∝ ti, then tHT = t

2. If πIi ∝ Ni and if yUi is constant, then tHT = t.

3. Equal probability sampling design is generally inefficient (unless Mi ∝Y −1i )

Example: Simple random cluster sampling (SIC)

• pI (·): SRS of nI clusters from NI

• HT estimation

tHT =∑i∈SI

tiπIi

=NI

nI

∑i∈SI

ti = NI tSI

• Variance

V ar(tHT

)=

N2I

nI

(1− nI

NI

)S2t

where

S2t =

1

NI − 1

∑i∈UI

(ti − tUI)2


Alternative estimation under unequal Mi

• From each sampled cluster i, we observe(Mi, Yi

).

• If we know N =∑

i∈UIMi, we can use this information to improve

tHT =∑

i∈SIti/πIi: ratio estimation idea

tR = N

∑i∈SI

ti/πIi∑i∈SI

Mi/πIi

• Under SIC, for example, the variances are

V(tHT

)=

N2I

nI

(1− nI

NI

)1

NI − 1

NI∑i=1

(ti − tU )2

and

V(tR) .=

N2I

nI

(1− nI

NI

)1

NI − 1

NI∑i=1

(ti −MiYU

)2


Special Case of Mi = M under SIC

• Estimation of mean:

ˆYU =tHT

NIM=

1

nI

1

M

∑i∈SI

M∑j=1

yij =1

nI

∑i∈SI

Yi

• Variance

V ar(ˆYU

)=

1

nI

(1− nI

NI

)1

NI − 1

NI∑i=1

(Yi − YU

)2where Yi =

∑Mj=1 yij/M and YU = t/(NIM) =

∑NIi=1 Yi/NI .

• ANOVA

Source D.F. Sum of Squares Mean S.S.

Between cluster NI − 1 SSB S2b

Within cluster NI (M − 1) SSW S2w

Total NIM − 1 SST S2

where

SSB =

NI∑i=1

M(Yi − YU

)2SSW =

NI∑i=1

M∑j=1

(yij − Yi

)2SST =

NI∑i=1

M∑j=1

(yij − YU

)2and MSS=SS/(d.f.). Note that

S2 =(NI − 1)S2

b +NI (M − 1)S2w

NIM − 1∼=

S2b + (M − 1)S2

w

M.

The variance is

V ar(ˆYU

)=

1

nIM

(1− nI

NI

)S2b


Design effect

• Want to compare the current sampling design p (·) with SRS of equal

sample size

• Kish (1965) introduced design effect

deff(p, tHT

)=

Vp

(tHT

)VSRS

(tHT

)• Two uses

1. Compare designs

– If deff > 1, then p (·) is less efficient than SRS.

– If deff < 1, then p (·) is more efficient than SRS.

2. Determine sample size:

(a) Have some desired variance V ∗

(b) Under SRS, you can easily find required sample size n∗

(c) Choose n∗p = deff · n∗.

Then,

Vp

(tHT | n∗

p

)= V ∗.

• n∗ is often called effective sample size. It is the sample size required

for the given V ∗ if the sample design is SRS.


Intracluster correlation coefficient

• A measure of within cluster homogeneity

• Assume Mi = M

• Intracluster correlation coefficient

ρ =Cov[yij , yik|j = k]√

V (yij)√

V (yik)

=

∑Ni=1

∑j =k

∑(yij − Y )(yik − Y )/

∑Ni=1M(M − 1)∑N

i=1

∑Mj=1(yij − Y )2/

∑Ni=1M

• Properties

1. ρ = 1− M

M − 1

SSW

SST

2. Since 0 ≤ SSW ≤ SST ,

− 1

M − 1≤ ρ ≤ 1.

For SSW = 0, ρ = 1: perfect homogeneity in cluster.

For SSW = SST , ρ = −1/(M − 1): perfect heterogeneity in

cluster. Each cluster is like the whole population in terms of

variability.

• Variance

VSIC

(ˆY)= VSRS(

ˆY )[1 + (M − 1)ρ]

Thus,

deff = 1 + (M − 1) ρ


Homogeneity coefficient

• A measure of within cluster homogeneity

δ = 1−∑

i∈UI

∑j∈Ui

(yij − Yi

)2/ (N −NI)∑

i∈UI

∑j∈Ui

(yij − YU

)2/ (N − 1)

= 1− SSW/(N −NI)

SST/(N − 1)

• For SSW = 0, δ = 1: perfect homogeneity in cluster. For SSW =

SST , δ = −(NI − 1)/(N −NI): perfect heterogeneity in cluster.

• In practice, δ tend to be positive

• Using δ, compute deff for SIC

– Let M =∑NI

i=1Mi/NI = N/NI and let

Cov =1

NI − 1

NI∑i=1

(Mi − M

)MiY

2i

be the finite population covariance between Mi and MiY2i . Note

that, if Mi = M then Cov = 0. Also, if Y 2i is constant, then

Cov ∼= Cov (Ni, Nj).

– Some algebra shows that

VSIC

(tHT

)=

N2I

nI

(1− nI

NI

)S2t

=

(1 +

N −NI

NI − 1δ

)MS2

yKI + Cov ·KI

where KI =N2

InI

(1− nI

NI

).

• To make a fair comparison with SRS, we need expected number of

elements (not clusters) under SIC

ESIC (nS) = ESIC

∑i∈SI

Mi

= nIM

4.3. TWO-STAGE SAMPLING 57

• Using n = nIM ,

VSRS

(tHT

)=

N2

nIM

(1− nIM

N

)S2y

=N

NIS2y

N2I

nI

(1− nI

NI

)= MS2

yKI

• Thus,

deff =VSIC

(tHT

)VSRS

(tHT

) =

(1 +

N −NI

NI − 1δ

)+

Cov

MS2y

• Two sources for leading to deff > 1:

1. δ > 0 (homogeneity) reduces efficiency. N may be much bigger

than NI , so even small positive δ can have a big impact.

2. Cov > 0 reduces efficiency. (Variability of cluster size reduces

efficiency.)

4.3 Two-stage Sampling

• Setup:

1. Stage 1: Draw SI ⊂ UI via pI (·)

2. Stage 2: For every i ∈ SI , draw Si ⊂ Ui via pi (· | SI)

Sample of elements: S = ∪i∈SISi

• Some simplifying assumptions

1. Invariance of the second-stage design pi (· | SI) = pi (·) for everyi ∈ UI and for every SI such that i ∈ SI

2. Independence of the second-stage design

P (S = ∪i∈SIsi | SI) =

∏i∈SI

Pr (Si = si | SI)

• Remark: A non-invariant design is two-phase sampling design.


1. Phase 1: Select a sample and observe xi

2. Phase 2: Based on the observed value of xi, the second-phase

sampling design is determined. The second-phase sample is se-

lected by the second-phase sampling design.

• Notation: Sample size

– nSI: Number of PSU’s in the sample. If the first-stage sampling

is a fixed-size sampling design, then nSI= nI .

– mSi : Number of sampled elements in Si. If the second-stage

sampling is a fixed-size sampling design, then mSi = mi.

–∑

i∈SImSi = |S|: The number of sampled elements.

• Notation: Inclusion probability

– Cluster inclusion probability: πIi and πIij (same as in the single-

stage cluster sampling)

– Conditional inclusion probability:

πk|i = Pr [k ∈ Si | i ∈ SI ]

πkl|i = Pr [k, l ∈ Si | i ∈ SI ]

∆kl|i = πkl|i − πk|iπl|i.

In general, πk|i is a random variable (in the sense that it is a

function of SI). Under invariance, it is fixed.

– Element inclusion probability:

∗ First order inclusion probability

πk = Pr [k ∈ S] = Pr (k ∈ Si | i ∈ SI)Pr (i ∈ SI) = πk|iπIi

if k ∈ Ui.


∗ Second order inclusion probability

πkl =

πIiπk|i if k = l ∈ Ui

πIiπkl|i if k&l ∈ Ui, k = l

πIijπk|iπl|j if k ∈ Ui, l ∈ Uj (i = j)

• HT estimation for t =∑

i∈UIti =

∑i∈UI

∑k∈Ui

yk:

tHT =∑i∈SI

tiπIi

=∑i∈SI

∑k∈Si

ykπk|iπIi

• Properties of tHT

1. Unbiased

2. Variance

V(tHT

)= VPSU + VSSU

where

VPSU =∑i∈UI

∑j∈UI

∆IijtiπIi

tjπIj

VSSU =∑i∈UI

Vi

πIi

with

Vi = V(ti | Si

)=∑k∈Ui

∑l∈Ui

∆kl|iykπk|i

ylπl|i

.

• Remark

1. If SI = UI , then the design is a stratified sampling. Note that

πIi = 1, πIij = 1, and ∆Iij = 0 for all i, j. Thus, V(tHT

)=∑

i∈UIVi/1.

2. If Si = Ui for every i ∈ SI , then the design is single-stage cluster

sampling and V(tHT

)= VPSU .


• Variance estimation

V(tHT

)= VPSU + VSSU

=∑i∈SI

∑j∈SI

∆Iij

πIij

tiπIi

tjπIj

+∑i∈SI

Vi

πIi,

where

VPSU =∑i∈SI

∑j∈SI

∆Iij

πIij

tiπIi

tjπIj−∑i∈SI

1

πIi

(1

πIi− 1

)Vi

VSSU =∑i∈SI

Vi

π2Ii

and Vi satisfies E(Vi | SI

)= V

(ti | SI

). Here, we used the fact

E(titj | SI

)=

titj if i = j

Vi + t2i if i = j.

by the independence of the second-stage sampling across the clusters.

• Often,∑

i∈SI

ViπIi

is ignored. (if nI/NI.= 0).


Example 1: Two-stage SRS cluster sampling with equal size Mi = M

• Sampling design

1. Stage 1: Select SRS of clusters of size nI from a population of NI

clusters.

2. Stage 2: Select SRS of elements of size m from a cluster of Mi =

M elements at each cluster.

• HT estimator of Y =∑NI

i=1

∑Mj=1 yij/(NIM):

ˆY =1

nIm

∑i∈SI

∑j∈Si

yij

• Variance

V(ˆY)

=

(1− nI

NI

)S2b

nIM+(1− m

M

) S2w

nIm

Or,

V(ˆY)=

(1− nI

NI

)S21

nI+(1− m

M

) S22

nIm

where

S21 =

1

NI − 1

NI∑i=1

(Yi − Y

)2=

S2b

M

and

S22 = S2

w =1

NI (M − 1)

NI∑i=1

M∑j=1

(yij − Yi

)2.

• Variance components:

S2b = S2 1 + (M − 1) ρ

S2w = S2 (1− ρ)

• Ignoring f1 = nI/NI ,

V(ˆY)

.=

S2

nIm1 + (m− 1) ρ

Thus, Design effect = 1 + (m− 1) ρ


• Sample Size determination: Minimize

V(ˆY)=

1

nI

(S21 −

1

MS22

)+

S22

m

+Constant

subject to C = c1nI + c2nIm:

m∗opt =

√c1c2· S2√

S21 − S2

2/M


V(ˆY)=

(1− nI

NI

)s21nI

+nI

NI

(1− m

M

) s22nIm

where

s21 = (nI − 1)−1∑i∈SI

(yi − ˆY

)2s22 = n−1

I (m− 1)−1∑i∈SI

∑j∈Si

(yij − yi)2

and yi =∑

j∈Siyij/m.

• If nI/NI.= 0, then V ( ˆY ) = s21/nI .


Example 2: Two-stage PPS sampling

• Sampling design

1. Stage One: PPS sampling of nI clusters with MOS = Mi

2. Stage Two: SRS sampling of m elements in each selected clusters.

• Estimation of mean

ˆYPPS =1

nIm

∑i∈SI

∑j∈Si

yij

Self-weighting design: equal weights


V(ˆYPPS

)=

1

nIs2z

where

s2z =1

nI − 1

nI∑k=1

(zk − zn)2

and zk = ti/Mi if cluster i is selected in the k-th PPS sampling.

Chapter 5

Estimation

5.1 Introduction

• So far, we have discussed various sampling designs and its unbiased

estimator.

• HT estimator is used for each sampling design (except for PPS sam-

pling). No claim for optimality.

Definition

For parameter θ (y), y = (y1, y2, · · · , yN )′, an estimator θ∗ (S) is UMVUE

(Uniformly unbiased minimum variance estimator) if

1. Unbiased: Ey

θ∗ (S)

= θ (y), for all y

2. Minimum variance: Vy

θ∗ (S)

≤ Vy

θ (S)

for all unbiased estima-

tor θ (S) and for all y.

Remark

Uniformity is important: Suppose that my estimator is θ ≡ 12. If θ = 12,

then θ is unbiased and V(θ)= 0. That is, MVUE at θ = 12. But, it is not

UMVUE.

65

66 CHAPTER 5. ESTIMATION

Proposition

Consider a noncensus design with πk > 0, (k = 1, 2, · · · , N), then there

is no UMVUE of t =∑N

i=1 yi exists.

Proof

Suppose that there exists Q which is UMVUE of t. Fix any y∗ =

(y∗1, · · · , y∗N )′ ∈ RN . Now, consider

Q∗ (S) =∑k∈S

yk − y∗kπk

+N∑i=1

y∗k.

The new estimator Q∗ (S) satisfies

1. Unbiased

2. The variance of Q∗ (S) is zero at y = y∗.

Because Q is UMVUE, Vy(Q) ≤ Vy (Q∗). Since Vy (Q

∗) = 0 for y = y∗, we

have Vy(Q) = 0 for y = y∗. Since y∗ can be arbitrary, we have Vy(Q) = 0

for all y, which means that Q = t for all y, which is impossible for any

noncensus design. Therefore, UMVUE cannot exist.

5.1. INTRODUCTION 67

Remark

1. In the proof of the proposition, Q∗ (S) is called the difference estima-

tor. The variance of the difference estimator is

V Q∗ (S) =∑k∈U

∑l∈U

∆klyk − y∗k

πk

yl − y∗lπl

.

The variance is small if yk ∼= y∗k.

2. The class of (design) unbiased estimator is too big. We cannot find

the best one in this class.

3. If we define the class of the linear estimators as

t =∑k∈S

wiyi,

where wi are constants that are fixed in advance, the HT estimator is

the only estimator among the class of linear unbiased estimators.

4. We have the following alternative definition of the linear estimator:

t =∑k∈S

wi (S) yi =∑k∈S

wisyi

where wi (S) = wis are constants that depends on the realized sample.

That is, wi (S) = wis are random variables.

5. One advantage of linear estimator is that it is internally consistent.

An estimator is internally consistent if

t (y1 + y2) = t (y1) + t (y2) ,

where t (y) is an estimator of the total of item y.


5.2 Large sample theory

Basic Setup

1. Define a sequence of finite populations :

Uk = 1, 2, · · · , Nk , k = 1, 2, · · ·

where N1 < N2 < · · · and yki is the y-value of the i-th unit in the k-th

population.

2. From each finite population Uk, select a sample Sk ⊂ Uk of size nk.

Assume nk →∞ and fk = N−1k nk → f as k →∞.

Definition

1. θn : (design) consistent for the finite population parameter θN if, for

every ϵ > 0,

limk→∞

Pr∣∣∣θnk

− θNk

∣∣∣ > ϵ= 0

Or, simple we write

limn→∞

Pr∣∣∣θn − θN

∣∣∣ > ϵ= 0

where the distribution is the sampling distribution generated by the

repeated sampling of size n from the finite population.

2. Xn is bounded in probability by gn (write Xn = Op (gn) ) if, for every

ϵ >, there exists a positive real number Mϵ such that

Pr |Xn| > gnMϵ < ϵ

for all n.

3. Xn is of smaller order in probability than gn (write Xn = op (gn) ) if,

for every ϵ > 0,

limn→∞

Prg−1n |Xn| > ϵ

= 0

5.3. RATIO ESTIMATION 69

Taylor series linearization

• Taylor’s Theorem :

Let Xn be a sequence of random variables such that

Xn = a+Op (rn)

where rn → 0 as n → ∞. If g (x) us a function with s-th continuous

derivatives at x = a, then

g (Xn) = g (a) +s−1∑k=1

1

k!g(k) (a) (Xn − a)k +Op (r

sn)

where g(k) (a) is the k-th derivative of g (x) evaluated at x = a.

• For p-dimensional y, if y = Y +Op

(n−1/2

), then

g (y) = g(Y)+

p∑j=1

∂g(Y)

∂yj

(yjn − Yj

)+Op

(n−1

).

5.3 Ratio estimation

• Basic Setup :

– Observe x (auxiliary variable) and y (study variable) in the sam-

ple

– We know X =∑N

i=1 xi or X = N−1∑N

i=1 xi in advance.

– XHT =∑

i∈S π−1i xi can be different from X.

• Ratio estimator :

Yr = XYHT

XHT

= XR

Yr = XYHT

XHT

= XR

• Algebraic properties


– Linear in y (thus it is internally consistent.)

– If XHT < X, then YHT < Yr

– If XHT > X, then YHT > Yr

– If yi = xi, then the ratio estimator equals to X. ,∑i∈S

wixi = X

for Yr =∑

i∈S wiyi.

5.3. RATIO ESTIMATION 71

• Statistical properties - Bias

– It is biased because E(R)= R.

– Bias of R = YHT /XHT is called the ratio bias. That is, B(R)=

E(R)−R is called the ratio bias.

– Definition: Bias of θ is negligible

⇐⇒ R.B.(θ) =Bias(θ)√V ar(θ)

→ 0 as n→∞.

Note: If the bias of θ is negligible, then

θ − θ√V ar(θ)

=θ − E(θ)√V ar(θ)

+Bias(θ)√V ar(θ)

→ N (0, 1) ,

by CLT, and

MSE(θ) = V (θ) +Bias(θ)

2

= V (θ)

1 +

[R.B.(θ)

]2.= V (θ).

– Ratio bias is negligible.

Cov(R, XHT

)/X = E(RXHT )/X − E(R)E(XHT )/X

= Y/X − E(R)= −Bias

(R).

Thus,

R.B.(R)

2≤

V(XHT

)X2

=CV

(XHT

)2→ 0

• Statistical properties - Variance


– Taylor expansion

Yr = Y +(YHT − Y

)−R

(XHT − X

)− X−1

[(XHT − X

) (YHT − Y

)−R

(XHT − X

)2]+op

(n−1

)where R = X−1Y .

– Variance

V(Yr

).= V

(∑i∈S

1

πiEi

)where Ei = yi −Rxi.

• Variance estimation : Use Ei = yi− Rxi in the HT (or SYG) variance

estimator.

• Example: SRS

V(Yr

)=

N2

n

(1− n

N

) 1

N − 1

N∑i=1

(yi −Rxi)2

For variance estimation, use

V(Yr

)=

N2

n

(1− n

N

) 1

n− 1

∑i∈S

(yi − Rxi

)2.

5.4. REGRESSION ESTIMATION 73

Application of the ratio estimator

• Hajek estimator: Ratio estimator of the mean using xi = 1

• Domain estimation : The parameter of interest can take the form of

the ratio

Yd =

∑Ni=1 δiyi∑Ni=1 δi

where δi = 1 if i ∈ D and Di = 0 if i /∈ D. Thus,

YHTd =

∑i∈S π−1

i δiyi∑i∈S π−1

i δi

is an (approximately) unbiased estimator of Yd.

5.4 Regression estimation

• Basic Setup :

– Observe xk = (x1k, · · · , xJk)′ (auxiliary variables) and yi (study

variable) in the sample

– We know X =∑N

i=1 xi or X = N−1∑N

i=1 xi in advance.

– Interested in estimating ty =∑N

i=1 yi

• Motivation: Use auxiliary information at estimation stage

• Use a regression approach:

1. Suppose we have

yok =

J∑j=1

bjxjk = b′xk, k = 1, 2, · · · , N,

for some known J-dimensional vector b. The yok is a proxy for

yk.


2. Difference estimator :

ty,diff =

N∑i=1

yoi +∑i∈S

yi − yoiπi

– Unbiased (regardless of choice of yok)

– The variance is small if yok∼= yk.

3. How to choose yok = b′xk? - Let’s estimate b from the sample.

4. Regression estimator:

ty,reg =

N∑i=1

yi +∑i∈S

yi − yiπi

,

where yi = b′xk and b is estimated from the sample using a

(linear) regression model.

Eζ (yi) = x′ib

Vζ (yi) = σ2.

Note that b and σ2 are superpopulation parameters.

5. How to estimate b ?

(a) Note that, under census, b can be estimated by solving

U (b) ≡N∑i=1

(yi − b′xi

)x′i = 0′.

(b) Consider an unbiased estimator of U (b):

U (b) =∑i∈S

1

πi

(yi − b′xi

)x′i

(c) Obtain a solution b by solving U (b) = 0 for b.

The solution is

b =

(∑i∈S

1

πixix

′i

)−1∑i∈S

1

πixiyi .


• Regression estimator :

Yreg = YHT +(X− XHT

)′b

Yreg = YHT +(X− XHT

)′b

where

b =

(∑i∈S

π−1i xix

′i

)−1∑i∈S

π−1i xiyi,

and (X′

HT , YHT

)=

1

N

(X′

HT , YHT

)=

1

N

∑i∈S

1

πi

(x′i, yi).

• Note that, if 1 is in the column space of xi, we can write

Yreg =N∑i=1

yi

and

Yreg =1

N

N∑i=1

yi

where yi = x′ib.

Algebraic properties

• Linear in y:

Yreg =∑i∈S

1

πigisyi

where

gis = 1 +(X− XHT

)′(∑i∈A

π−1i xix

′i

)−1

xi.

Also,

Yreg =1

N

∑i∈S

1

πigisyi.


• Calibration property ∑i∈S

1

πigisxi = X. (5.1)

The property (5.1) is also called benchmarking property.

• If xi = (1,x′i1)

′, then

Yreg = Yπ +(X1 − Xπ1

)′b1

and

Yreg = N ·Yπ +

(X1 − Xπ1

)′b1

,

where Yπ and Xπ1 are the Hajek estimators of the form

(X′

π1, Yπ)=

(∑i∈S

π−1i

)−1∑i∈S

π−1i

(x′i1, yi

),

b1 =

[∑i∈S

π−1i

(xi1 − Xπ1

) (xi1 − Xπ1

)′]−1∑i∈S

π−1i

(xi1 − Xπ1

)yi,

and X1 = N−1∑N

i=1 xi1.

• Weights in Yreg =∑

i∈S wiyi can be derived by minimizing

Q (w) =∑i∈S

πi

(wi −

1

πi

)2

subject to (5.1).

Statistical Property

• Taylor expansion: Define

C =∑k∈S

π−1k xkx

′k

d =∑k∈S

π−1k xkyk


and

C =N∑k=1

xkx′k

d =

N∑k=1

xkyk.

Using

b = C−1d.= b+C−1

(d− Cb

), (5.2)

we have

Yreg.= YHT +

(X− XHT

)′b

+(X− XHT

)′( N∑i=1

xix′i

)−1∑i∈S

π−1i xi

(yi − x′

ib)

.= YHT +

(X− XHT

)′b

where b =(∑N

i=1 xix′i

)−1∑Ni=1 xiyi.

• Alternative expression

Yreg.= X′b+

∑i∈S

π−1i

(yi − x′

ib)

• Bias : Negligible

• Variance :

V ar(Yreg

).= V ar

∑i∈S

π−1i

(yi − x′

ib)


V(Yreg

)=∑i∈S

∑j∈S

∆ij

πij

Ei

πi

Ej

πj

where Ei = yi − x′ib.


Remark

• The regression estimator is derived using a regression model.

• The validity (i.e. asymptotic unbiasedness) of the regression estimator

does not depend on whether the regression model holds or not.

• However, the variance of the regression estimator is small if the regres-

sion model is good.

• That is, it is model-assisted, not model-dependent.

5.5. GREG ESTIMATION 79

5.5 GREG estimation

• Recall: Difference estimator

– Suppose that yo = (yo1, yo2, · · · , yoN ) is a guess about y = (y1, y2, · · · , yN )′.

– Difference estimator of Y =∑N

i=1 yi:

Ydiff =

N∑i=1

yoi +∑i∈S

1

πi(yi − yoi )

• Properties of the difference estimator

– Unbiased (regardless of yo)

– Efficient if yo is a good guess about y

• How to choose y0?

• Superpopulation model:

– Prior belief about the relationship between y and x.

– Regard y1, . . . , yN as a random sample from an infinite pop’n ζ

– Assumption about the distribution of y given x.

• Generalized regression (GREG) model

Eζ (yi) = x′iβ

Covζ (yi, yj) =

ciσ

2 if i = j

0 if i = j

where ci = c (xi)and c (x) is a known function of x.

Examples: GREG model


1. Ratio model

Eζ (Yi) = xiβ

Vζ (Yi) = xiσ2

2. Regression model

Eζ (Yi) = β0 + xiβ1

Vζ (Yi) = σ2

3. Group mean model (or ANOVA model)

Eζ (Yi) = µg

Vζ (Yi) = σ2g

for i ∈ Ug and U = U1 ∪ U2 ∪ · · · ∪ UG

• GREG estimator

YGREG =N∑i=1

yi +∑i∈S

1

πi(yi − yi)

where yi = x′iβ with

β =

(∑i∈S

1

πicixix

′i

)−1∑i∈S

1

πicixiyi.

• Alternative representation

YGREG = YHT +(X − XHT

)′β

Examples: GREG estimators

1. Ratio model

YGREG = Yratio =

(N∑i=1

xi

) ∑i∈S

1πiyi∑

i∈S1πixi


2. Regression model

YGREG = Yreg =∑i∈S

1

πiyi +

(N∑i=1

xi −∑i∈S

1

πixi

)β

where

β =

∑i∈S

1πi

(xi − xπ) (yi − yπ)∑i∈S

1πi

(xi − xπ)2 .

3. Group mean model (or ANOVA model)

YGREG =

G∑g=1

NgYg

Ng

where Ng =∑N

i=1 xig, Ng =∑

i∈S xig/πi, Yg =∑

i∈S xigyi/πi, and

xig = 1 if i ∈ Ug and xig = 0 otherwise.


• Algebraic Properties

– Linear in y

YGREG =∑i∈A

1

πigi (S) yi

where

gi (S) = 1 +(X− XHT

)′(∑k∈S

1

πkckxkx

′k

)−1xi

ci.

Thus, “Final weight = Design weight * g-factor”.

– Calibration property

∑i∈S

1

πigi (S)xi =

N∑i=1

xi.

In fact, the final weights are chosen to minimize∑

i∈A ciπi

(wi − 1

πi

)2subject to

∑i∈Awixi =

∑Ni=1 xi.

– Read Deville and Sarndal (1992, JASA)

– (Result 6.5.1 in p. 231) If ci = λ′xi, then∑

i∈A1πiyi =

∑i∈A

1πiyi.

Thus,

YGREG =N∑i=1

x′iβ


• Statistical Properties

– Design consistent

– Asymptotic Variance

V ar(YGREG

).= V ar

∑i∈A

π−1i

(yi − x′

iB)

where

B =

(N∑i=1

1

cixix

′i

)−1 N∑i=1

1

cixiyi.

– Variance estimation : Use Ei = yi−x′iβ instead of Ei = yi−x′

iB

in the HT variance estimator.

Conclusion: Three approaches

• Design-based approach: Use HT estimator

• Model-based approach: Use BLUP estimator

• Model-assisted approach: Use GREG estimator

– Design consistent regardless of whether the model holds or not.

– Variance is small if the model is true.


Example : Group ratio model

Eζ (Yi) = βgxi

Vζ (Yi) = σ2gxi

for i ∈ Ug and U = U1 ∪U2 ∪ · · · ∪UG. The xi are observed throughout the

population. Note that if xi = 1 then it reduces to the group mean model.

• Let x′i = (x1i, x2i, · · · , xGi) where

xgi =

xi if i ∈ Ug

0 otherwise

and β = (β1, β2, · · · , βG)′. Then, Eζ (Yi) = x′iβ and

B =

(∑i∈U

xix′i

σ2i

)−1∑i∈U

xiyiσ2i

= diag

∑i∈U1

yi∑i∈U1

xi, · · · ,

∑i∈UG

yi∑i∈UG

xi

and

B =

(∑i∈S

xix′i

πiσ2i

)−1∑i∈S

xiyiπiσ2

i

= diag

∑i∈S1

yi/πi∑i∈S1

xi/πi, · · · ,

∑i∈SG

yi/πi∑i∈SG

xi/πi

• Since Vζ (Yi) = λ′xi for some λ, we have

YGREG =∑i∈U

x′iB =

G∑g=1

∑i∈Ug

xi

∑i∈Sg

yi/πi∑i∈Sg

xi/πi.

This is called separate ratio estimator. If there are differences among

but homogeneous within groups in terms of ratio, the separate ratio

estimator might be good.

• If xi = 1 , two possibilities

1. Groups = strata: stratification

2. Groups = strata: poststratification


Example : Poststratification

• Poststratification estimator

Ypost =G∑

g=1

NgYg

Ng

• Under SRS, it is

Ypost =G∑

g=1

Ng

(∑i∈Sg

yi

ng

)where ng is the number of sampled elements in group g.

• Asymptotic variance (under SRS)

V(Ypost

)=

∑i∈U

∑j∈U

∆ijEi

πi

Ej

πj

.=

N

n

(1− n

N

) G∑g=1

∑i∈Ug

(yi − Yg

)2.

Thus, it is essentially equal to the variance under stratified sampling

with proportional allocation.


Example : Two-way ANOVA (additive, no interaction)

• Model

Eζ (Yk) = αi + βj

Vζ (Yk) = σ2

• Setup: Have I × J groups or cells. Cell counts Nij are not known.

Marginal counts Ni· =∑J

j=1Nij and N·j =∑I

i=1Nij are known.

• Example: i gender, j age group. (I=2, J=3)

• Auxiliary variable: Let

δijk =

1 if k ∈ Uij

0 otherwise .

Unfortunately, we do not observe δijk in the population. Instead, we

observe

xk = (δ1·,k, δ2·k, · · · , δI·k, δ·1k, δ·2k, · · · , δ·Jk)

throughout the population. Thus, we know

N∑k=1

xk = (N1·, N2·, · · · , NI·, N·1, N·2, · · · , N·J)

• GREG estimator

YGREG =∑i∈S

1

πigi (S) yi

where

gi (S) = 1 +

(N∑k=1

xk −∑k∈S

1

πkxk

)′( N∑k=1

xkx′k

)−1xi

σ2i

.

Unfortunately, we cannot compute the inverse of∑N

k=1 xkx′k.

• Alternative method: (Raking ratio estimation)


– We want to find gks = gk (S) such that

∑k∈S

gksπk

δi·,k =

N∑k=1

δi·k, i = 1, 2, · · · , I (5.3)

∑k∈S

gksπk

δ·jk =

N∑k=1

δ·jk, j = 1, 2, · · · , J. (5.4)

– Do this iteratively (Iterative proportional fitting)

1. Start with g(0)ks = 1.

2. For δi·k = 1,

g(t+1)ks = g

(t)ks

∑Nk=1 δi·k∑

k∈S g(t)ks δi·k/πk

.

It satisfies (9.1), but not necessarily satisfy (9.2).

3. For δ·jk = 1,

g(t+2)ks = g

(t+1)ks

∑Nk=1 δ·jk∑

k∈S g(t+1)ks δ·jk/πk

.

It satisfies (5.4), but not necessarily satisfy (5.3).

4. Set t← t+ 2 and go to Step 2. Continue until convergence.


5.6 Optimal Estimation

• Optimal design & estimation : Find a pair of design & estimatorp (·) , θ

that minimizes the variance, or MSE, for a given cost (cost

.= sample size) among a suitable class of estimators.

• Class of Linear and design-unbiased estimator: Unique solution (HT

estimator)

• Non-existence of the UMVUE

Let any noncensus design with πk > 0 (k = 1, 2, · · · , N) be given.

Then no uniformly minimum variance estimator exists in the class of

all unbiased estimators of Y =∑N

i=1 yi.

• Remedy: Change the optimality criterion

– We don’t know the variance before sampling

– Use assumption about Y : superpopulation model

– Anticipated variance: AV(θ)= EζEp

(θ − θN

)2– Thus, find a pair of design & estimator

p (·) , θ

that minimizes

the AV(θ)for a given cost.

5.6. OPTIMAL ESTIMATION 89

Result 1

• If θ is design-unbiased for θN , then

AV(θ)= EpVζ

(θ)+ VpEζ

(θ)− Vζ (θN )

• (Godambe and Joshi, 1965) Consider a model ζ with yi’s independent

with Vζ (yi) = σ2i . If p (·) is a probability sampling design (πi > 0, i ∈

U) and Y is any design unbiased estimator for the total of y, then

AV(Y)≥∑i∈U

(1

πi− 1

)σ2i .

The right side is called the Godambe-Joshi Lower Bound (GJLB).

Result 2

• For the fixed-size probability sampling design, the GJLB is further

minimized if and only if

πi ∝ Vζ (yi)1/2 .

• (Isaki and Fuller, 1982) Suppose that p (·) is a fixed size probability

sampling design and ζ is a superpopulation model with yi’s indepen-

dent and Eζ (yi) = x′iβ and Vζ (yi) = ciσ

2. Then, the GREG estimator

asymptotically attains the GJLB if ci = λ′xi for some λ.


Proof of Result 1

Write Y = YHT+R. Since we assume that Y is design unbiased, E (R) =

0. Thus, for any fixed j ∈ U ,

0 = E (R)

=∑S∈S

p (S)R (S)

=∑

S∈S;j∈Sp (S)R (S) +

∑S∈S;j /∈S

p (S)R (S) .

Now,

Vζ

(Y)= Vζ

(YHT

)+ Vζ (R) + 2Covζ

(YHT , R

).

Thus,

Ep

Covζ

(YHT , R

)= Ep

[Eζ

(YHT − Eζ

(YHT

))R]

= Ep

∑j∈U

Eζ

(yj − Eζ (yj)) Ij

πjR

=

∑j∈U

Eζ

(yj − Eζ (yj))

πjE Ij (S)R (S)

=∑j∈U

Eζ

(yj − Eζ (yj))

πj

∑S∈S;j∈S

R (S) p (S)

= −

∑j∈U

Eζ

(yj − Eζ (yj))

πj

∑S∈S;j /∈S

R (S) p (S)

= 0,

5.6. OPTIMAL ESTIMATION 91

where the last equality holds because∑

S∈S;j /∈S R (S) p (S) is independent

of yj . Therefore,

Ep

Vζ

(Y)

= Ep

Vζ

(YHT

)+ Ep Vζ (R)

≥ Ep

Vζ

(YHT

)= Ep

Vζ

(N∑i=1

yiIiπi

)

= Ep

N∑i=1

σ2i Iiπ2i

=

N∑i=1

σ2i

πi

and

AV(Y)

= EpVζ

(Y)+ VpEζ

(Y)− Vζ (Y )

≥ EpVζ

(Y)− Vζ (Y )

≥N∑i=1

σ2i

πi−∑i∈U

σ2i .

Chapter 6

Variance estimation

6.1 Introduction

• Use of variance estimate in sampling

– Inferential purpose : construct CI, hypothesis testing

– Descriptive purpose : evaluation of survey estimates, future sur-

vey planning

• What is a good variance estimator ?

– Unbiased, or nearly unbiased (with positive bias - conservative)

– Stable : Variance of the variance estimator is low.

– Nonnegative

– Simple to calculate

• HT variance estimator (or SYG variance estimator): Some problems

1. Can take negative values

2. Need to know the joint inclusion probability πij , which can be

cumbersome for large sample size.

Variance of variance estimator

93

94 CHAPTER 6. VARIANCE ESTIMATION

• Parameter of interest: V (θ)

• Let V be an (unbiased) estimator of V (θ).

• May assume that

dV

V (θ)∼ χ2 (d)

for some d. (d.f. of V ).

• By the property of χ2 distribution,

E(V)= V (θ)

and

V(V)=

2V (θ)

2

d

Thus,

CV(V)=

√V(V)

E(V) =

√2

d.

• How to compute d?

1. Method of moments: requires an estimate for V(V).

2. Rule of thumb: use d = nPSU −H, where nPSU is the number of

sampled PSU and H is the number of strata.

6.1. INTRODUCTION 95

Alternative to HT variance estimation

• Simplified variance estimator: Motivation

1. Consider the variance estimator for PPS sampling:

V0 =1

n (n− 1)

∑i∈S

yipi− 1

n

∑j∈S

yjpj

2

,

which is always nonnegative and simple to compute.

2. What if we use V0 as an estimator for the variance of YHT =∑i∈S

yiπi

by treating YHT∼= YPPS = 1

n

∑i∈S

yipi?

3. Simplified variance estimator: Use the PPS sampling variance

estimator (V0) to estimate the variance of YHT

Theorem

E(V0

)− V ar

(YHT

)=

n

n− 1

V ar

(YPPS

)− V ar

(YHT

)where V ar

(YPPS

)is the variance of YPPS using pk = πk/n as the selection

probability for unit k for each of PPS draw, and

V ar(YPPS

)=

1

n

N∑i=1

pi

(yipi− Y

)2

.


Remark

1. In most cases, the bias is positive. (thus, conservative estimation.)

2. Under SRS, the relative bias of the simplified variance estimator is

V0 − V ar(YHT

)V ar

(YHT

) =n

n− 1

n

N − n

and it is negligible if n/N is negligible.

3. Application to multi-stage sampling: Express

YHT =∑i∈SI

YiπIi

.

The resulting simplified variance estimator can be written

V0 =1

n (n− 1)

∑i∈AI

(Yipi− YHT

)2

=n

(n− 1)

∑i∈AI

(YiπIi− 1

nYHT

)2

where pi = πIi/n and n is the size of the sampled PSU’s. The bias is

negligible if the primary sampling rate is negligible. If the sampling

design is also a stratified (multi-stage) sampling such that

YHT =

H∑h=1

∑i∈SIh

whiYi,

the simplified variance estimator can be written

V0 =

H∑h=1

nh

(nh − 1)

nh∑i=1

whiYhi −1

nh

nh∑j=1

whj Yhj

2

.

6.2. TAYLOR SERIES LINEARIZATION 97

6.2 Taylor series linearization

• Estimate variance of nonlinear estimator by approximating estimator

by a linear function

• First-order Taylor linearization: For p-dimensional y, if yn = YN +

Op

(n−1/2

), then

g (yn) = g(Y)+

p∑j=1

∂g(Y)

∂yj

(yjn − Yj

)+Op

(n−1

)• Linearized variance

V g (yn).=

p∑i=1

p∑j=1

∂g(Y)

∂yi

∂g(Y)

∂yjCov yin, yjn

• Two methods of obtaining linearized variance estimation

1. Direct method: use

V g (yn).=

p∑i=1

p∑j=1

∂g (yn)

∂yi

∂g (yn)

∂yjC yin, yjn

2. Residual technique:

[Step 1] Obtain a first-order Taylor expansion to get

g (yn).= g

(Y)+

1

N

∑i∈S

1

πiei

for some ei.

[Step 2] The variance of g (yn) is then approximated by the

variance of N−1∑

i∈S1πiei. If we observed ei, then we would

estimate the variance of N−1∑

i∈S1πiei. Obtain a variance

estimator of N−1∑

i∈S1πiei and replace ei by ei.


Example: Ratio

R =y

x, R =

Y

X

• Taylor expansion:

R = R+ X−1 (y −Rx) +Op

(n−1

)• Method 1

V(R)

.= x−2V (y) + x−2R2V (x)− 2x−2RC (x, y)

• Method 2

V(R)

.=

1

N2

∑i∈S

∑j∈S

πij − πiπjπij

eiπi

ejπj

where ei = x−1(yi − Rxi

).

• Ratio estimator ˆYr = XR

V1.=

1

N2

∑i∈S

∑j∈S

πij − πiπjπij

yi − Rxiπi

yj − Rxjπj

V2.=

1

N2

(X

x

)2∑i∈S

∑j∈S

πij − πiπjπij

yi − Rxiπi

yj − Rxjπj

• Which one to use ?

6.2. TAYLOR SERIES LINEARIZATION 99

Variance estimation for GREG estimator

• For simplicity, assume that ci = λ′x so that

YGREG =∑i∈S

1

πigiyi

where

gi = X′

(∑i∈S

1

πicixix

′i

)−11

cixi

• Two types of variance estimators

V1 =∑i∈S

∑j∈S

πij − πiπjπij

eiπi

ejπj

V2 =∑i∈S

∑j∈S

πij − πiπjπij

gieiπi

gjejπj

They are asymptotically equivalent because gi.= 1.

• V 2 has good conditional property.


Variance estimation for Poststratified estimator

• Poststratified estimator

Ypost =

G∑g=1

Ng

Ng

Yg

• Unconditional variance estimator

V1 =∑i∈S

∑j∈S

πij − πiπjπij

eiπi

ejπj

where ei = yi − yg for xig = 1.

• Conditional variance estimator

V2 =∑i∈S

∑j∈S

πij − πiπjπij

gieiπi

gjejπj

where ei = yi − yg and gi = Ng/Ng for xig = 1.

• Under SRS:

V1 =N2

n

(1− n

N

) G∑g=1

ng − 1

n− 1s2g

V2 =(1− n

N

) n

n− 1

G∑g=1

N2g

ng

ng − 1

ngs2g

where s2g =∑

i∈Sg(yi − yg)

2 / (ng − 1).

6.3 Replication method

• Replication method - Idea

1. Interested in estimating the variance of θ.

2. From the original sampleA, generateG resamplesA(1), A(2), · · · , A(G).

3. Based on the observations from the resampleA(g), (g = 1, 2, · · · , G),

compute the replicate θ(g) for θ.

6.3. REPLICATION METHOD 101

4. The replicate variance estimator for θ is computed as

V = KG

G∑g=1

(θ(g) − θ(·)

)2for some suitable KG, where θ(·) = G−1

∑Gg=1 θ

(g).

• Reference: Wolter (2007): Introduction to variance estimation.

Replication method for variance estimation

• Random group method

– Independent random group method:

Mahalanobis (1939, 1946), Deming (1946)

– Non-independent random group method

• Balanced repreated replication:

Plackett and Burman (1946), McCarthy (1966)

• Jackknife: Quenoulle (1949), Tukey (1958)

• Bootstrap: Efron (1979)


6.3.1 Independent Random Group Method

• Procedure

[Step 1] A sample, s1, is drawn from the finite population according

to the design p. Compute θ1 from the observations in s1.

[Step 2] Sample s1 is replaced into the population and a second sam-

ple, s2 is drawn according to the same sampling design p. Com-

pute θ2 from the observations in s2.

[Step 3] This process is repeated until G ≥ 2 times.

[Step 4] Use

θRG =1

G

G∑k=1

θ(k) (6.1)

as an estimator for θ. Use

V(θRG

)=

1

G

1

G− 1

G∑k=1

(θ(k) − θRG

)2(6.2)

as a variance estimator for θRG.

• Property: Let θ1, · · · , θG be uncorrelated random variables with com-

mon expectation E(θ1

)= θ. Then,

1. θRG in (6.1) is unbiased for θ.

2. V(θRG

)in (6.2) is unbiased for V

(θRG

).


• Example : Suppose that a sample of households is to be drawn using

a multistage sampling design. Two random groups are desired. An

areal frame exists, and the target population is divided into two strata

(defined, say, on the basis of geography). Stratum 1 contains N1 PSUs

and stratum 2 consists of one PSU that is to be selected with certainty.

G = 2 independent random groups are to be used. Each sample is

selected independently according to the following plan:

– Stratum 1: Two PSUs are selected using some πps sampling de-

sign. From each selected PSU, an equal probability systematic

sample of m1 households is selected.

– Stratum 2: The certainty PSU is divided into city blocks, with

the block size varying between 10 and 15 households. An unequal

probability systematic sample of m2 blocks is selected with prob-

ability proportional to the block sizes. All households in selected

blocks are enumerated.

For point estimation, use

ˆθ =(θ(1) + θ(2)

)/2

For variance estimation, use

V(ˆθ)

=1

2(1)

2∑g=1

(θ(g) − ˆθ

)2=(θ(1) − θ(2)

)2/4.

• Easy Concept, Unbiased Variance Estimator; Unstable, Not often used

in practice

6.3.2 Non-independent Random Groups

• Idea

1. Given sample S, use a random mechanism to divide S into S =

∪Gg=1S(g), where S(1), · · · , S(G) are disjoint.


2. Calculate θ(1), · · · , θ(G) and treat them as independent

3. Use

V =1

G

1

G− 1

G∑k=1

(θ(k) − θRG

)2as a variance estimator for θ.

• Requirement : Each S(g) should have same design as S

• Impractical in some cases, Unstable

• Property: Let θ1, · · · , θG be random variables with common expecta-

tion E(θi

)= θ. Then,

EV(θRG

)− V

(θRG

)= − 1

G (G− 1)

∑∑i=j

Cov(θ(i), θ(j)

)– If θ(1), · · · , θ(G) are independent, then RHS=0.

– If θ(1), · · · , θ(G) are identically distributed, then RHS=−Cov(θ(1), θ(2)

).


Example : Use of non-independent random group method un-

der simple random sampling

• Interested in the variance estimation for θ = y under simple random

sampling.

• Partition the sample into G groups of dependent samples S = ∪Gg=1S(g)

where S(g) is a simple random sample of size b = n/G.

• Compute θ(g) = y(g) from S(g). Note that

θ =1

G

G∑g=1

y(g).

• How large is the bias of V(θRG

)?

Bias(V)

= −Cov(y(1), y(2)

)=

1

NS2


6.3.3 Jackknife method for variance estimation

• Motivation

– Basic Setup : Let (xi, yi) be IID from a bivariate distribution

with mean (µx, µy). Let θ = µy/µx. Standard ratio estimator of

θ, θ = x−1y, has bias of order O(n−1

).

– Quenouille (1949) idea : propose a bias-reduced estimator of θ :

θ(.) =1

n

n∑k=1

θ(k)

where θ(k) = nθ − (n− 1) θ(−k) and

θ(−k) =

∑i =k

xi

−1∑i=k

yi

– Tukey (1958) : treat θ(1), · · · , θ(G) as an independent random

group of size n to get

VJK

(θ)

.=

1

n

1

n− 1

n∑k=1

(θ(k) − θ(.)

)2=

n− 1

n

n∑k=1

(θ(−k) − θ(.)

)2


Taylor theorem 2 :

Let Xn,Wn be a sequence of random variables such that

Xn = Wn +Op (rn)

where rn → 0 as n→∞. If g (x) is a function with s-th continuous deriva-

tives in the line segment joining Xn and Wn and the s-th order partial

derivatives are bounded, then

g (Xn) = g (Wn) +s−1∑k=1

1

k!g(k) (Wn) (Xn −Wn)

k

+Op (rsn)

where g(k) (a) is the k-th derivative of g (x) evaluated at x = a.

Properties

• If θn = y, then

VJK =1

n

1

n− 1

n∑i=1

(yi − y)2 =1

ns2y

• Under regularity conditions, y(−k) − y = Op

(n−1

).

• For θ = f (x, y), we have

θ(−k) − θ =∂f

∂x(x, y)

(x(−k) − x

)+∂f

∂y(x, y)

(y(−k) − y

)+ op

(n−1

)• The jackknife variance estimator defined by

VJK =n− 1

n

n∑k=1

(θ(−k) − θ

)2is asymptotically equivalent to the linearized variance estimator.


Example: Ratio

R =y

x,

• Jackknife replicates for R:

R(−k) =y(−k)

x(−k)=

ny − yknx− xk

, k = 1, 2, · · · , n

• Taylor expansion 2:

R(−k) = R+ (x)−1(y(−k) − Rx(−k)

)+Op

(n−1

)• Jackknife variance estimator

VJK =n− 1

n

n∑k=1

(R(−k) − R

)2.=

n− 1

n

n∑k=1

(x)−1

(y(−k) − Rx(−k)

)2

=1

n(n− 1)

1

x2

n∑k=1

(yk − Rxk

)2

• Jackknife variance estimator for ratio estimator ˆYr = XR is asymp-

totically equivalent to

VJK =

(X

x

)21

n(n− 1)

n∑k=1

(yk − Rxk

)2,

which is often called the conditional variance estimator.


Example : Post-stratification (Under SRS)

• Point estimator:

Ypost =G∑

g=1

Ngyg =G∑

g=1

Ng

ng

∑i∈Ag

yi

• k-th jackknife replicate of Ypost: For k ∈ Sh,

Y(−k)post − Ypost = Nh

(y(−k)h − yh

)= Nh (nh − 1)−1 (yh − yk)

• Jackknife variance estimator

VJK

(Ypost

)=

n− 1

n

n∑k=1

(Y

(−k)post − Ypost

)2.=

n− 1

n

G∑g=1

N2g (ng − 1)−1 s2g

• Asymptotically equivalent to the conditional variance estimator.


Extension to complex sampling

• Stratified multi-stage cluster sampling design

YHT =H∑

h=1

nh∑i=1

whiYhi

• Jackknife replicates : Delete j-th cluster in g-th stratum

Y(−gj)HT =

H∑h=1

nh∑i=1

w(gj)hi Yhi

where

w(−gj)hi =

0 if h = g and i = j

(nh − 1)−1 nhwhi if h = g and i = j

whi otherwise

• Jackknife variance estimator

VJK

(YHT

)=

H∑h=1

nh − 1

nh

nh∑i=1

Y(−hi)HT − 1

nh

nh∑j=1

Y(−hj)HT

2

• Property

VJK

(YHT

)=

H∑h=1

nh

nh − 1

nh∑i=1

whiYhi −1

nh

nh∑j=1

whj Yhj

2

≡ V0


6.3.4 Balanced repeated replication

• Basic Set-Up : Stratified sampling with nh = 2, θ =∑H

h=1Whyh

• Half Sample Replication :

θ(v) =

H∑h=1

Wh[δ(v)h yh1 + (1− δ

(v)h )yh2], v = 1, · · · , 2H

where δ(v)h =

1 if yh1 ∈ sv

0 if yh2 ∈ sv

• Balanced Condition : There always exist s1, · · · , sG with H < G <

H + 4 such that

G∑v=1

(2δ(v)h − 1)(2δ

(v)h′ − 1) = 0 if h = h′

• Variance Estimator :

VBRR =1

G

G∑v=1

(θ(v) − θ)2

• Property: If δ(v)h satisfies the balanced condition, then

VBRR =∑h

W 2h (yh1 − yh2)

2 /4

which is equal to the variance estimator of a stratified sampling esti-

mator with nh = 2.


Example : BRR for H = 3

• MG : Hadamard matrix of order G:

– G×G matrix of ± 1.

– M ′GMG = GIG

– If MG satisfies M ′GMG = GIG, then M2G =

(M M

M −M

)– For G = 4,

M4 =

1 1 1 1

1 −1 1 −11 −1 −1 1

1 1 −1 −1

• Each column of MG are mutually orthogonal : satisfies the balanced

condition

• G = 4 BRR replicates can be constructed as follows:

θ(1) = W1y11 +W2y21 +W3y31

θ(2) = W1y12 +W2y21 +W3y32

θ(3) = W1y12 +W2y22 +W3y31

θ(4) = W1y11 +W2y22 +W3y32

• Can check that

1

G

G∑v=1

(θ(v) − θ)2 =∑h

W 2h (yh1 − yh2)

2 /4

Chapter 7

Two-phase sampling

7.1 Introduction

Motivation

• Use of auxiliary variable x

1. Design stage : stratification, PPS (or πps) sampling

2. Estimation stage : ratio estimation, regression estimation

• Need to observe xi in the population (design stage), or need to know

the total of xi (estimation stage)

• What if x is not available ? - Use a sampling to observe x first.

• Basic structure

[Phase 1] Select a (simple) random sample S1. Observe xi ∈ S1.

[Phase 2] Treat S1 as if the population and make a well chosen sam-

pling design to select a sample S2 from S1 using the information

xi that are observed in S1. Observe yi ∈ S2.

113

114 CHAPTER 7. TWO-PHASE SAMPLING

• Notation

– π(1)i = Pr (i ∈ S1): (first-order) inclusion probability under the

first-phase sampling

– π(2)i|S1

= Pr (i ∈ S2 | S1): (first-order) conditional inclusion proba-

bility under the second-phase sampling given the first-phase sam-

ple

– πi = Pr (i ∈ S2) : (first-order) inclusion probability under the

two-phase sampling

πi =∑

S1; i∈S1

π(2)i|S1

P1 (S1) = E1

π(2)i|S1

I (i ∈ S1)

• Features

– If π(2)i|S1

= π(2)i (invariance), then πi = π

(1)i π

(2)i .

– If the invariance does not hold, then we cannot compute the first-

order inclusion probability πi

– Cannot use the HT estimator

• Example :

1. Phase one: SRS of size n.

2. Phase two: πps sampling of size r with πi ∝ xi.

Note that

π(2)i|S1

=rxi∑k∈S1

xk

and πi cannot be computed from one realization of S1.

Remedy

• Use π∗-estimator :

Y ∗ =∑i∈S2

yi

π(1)i π

(2)i|S1

≡∑i∈S2

yiπ∗i

7.2. TWO-PHASE SAMPLING FOR STRATIFICATION 115

• Properties

– Unbiased for Y =∑N

i=1 yi

– Variance

V(Y ∗)= V

∑i∈S1

yi

π(1)i

+E

∑i∈S1

∑j∈S1

(π(2)ij|S1− π

(2)i|S1

π(2)j|S1

) yiπ∗i

yjπ∗j

7.2 Two-phase sampling for stratification

• Basic setup : Wish to perform a stratified sampling, but the stra-

tum indicator variables xi = (xi1, · · · , xiH) are not available in the

population frame.

• Two-phase sampling for stratification

1. Perform a SRS of size n from the finite population and obtain∑i∈S1

xi = (n1, n2, · · · , nH) where n =∑H

h=1 nh.

2. Among the nh elements, select rh elements by SRS independently

across the strata.

• Point estimation

ˆYtp =H∑

h=1

whyh2

where wh = nh/n and yh2 = r−1h

∑i∈S2

xihyi.

• Variance

V(ˆYtp

)=

(1

n− 1

N

)S2 + E

H∑

h=1

(nh

n

)2( 1

rh− 1

nh

)s2h1

.= E

n−1

H∑h=1

wh (yh1 − y1)2 +

H∑h=1

r−1h w2

hs2h1

where

s2h1 =1

nh − 1

∑i∈S1

xih (yi − yh1)2



V(ˆYtp

)= n−1

H∑h=1

wh

(yh2 − ˆYtp

)2+

H∑h=1

r−1h w2

hs2h2

• Variance comparison

V(ˆYSRS

)− V

(ˆYtp

)= E

(1

r− 1

n

) H∑h=1

wh (yh1 − y1)2 +

H∑h=1

(1

r− wh

rh

)whs

2h1

• Two sources for the gain of efficiency:

1. Use n elements for the between-stratum variances.

2. An optimal choice of rh can improve the efficiency.

• Optimal allocation: Minimize

V =1

n

(S2 −

H∑h=1

WhS2h

)+

H∑h=1

WhS2h

1

νh

subject to C = n(c1 +

∑Hh=1 c2hWhνh

)• Solution

r∗hn∗ = Wh

(c1c2h

)1/2(

S2h

S2 −∑H

h=1WhS2h

)1/2

.

If c2h = c2, Sh = Sw, and ϕ = S2/S2w then the optimal solution is

r∗

n∗ =

(c1c2

)1/2( 1

ϕ− 1

)1/2

7.3 Two-phase regression estimator

• Basic setup :

7.3. TWO-PHASE REGRESSION ESTIMATOR 117

1. Phase 1 : Simple random sampling of size n to get S1, observe

xi, i ∈ A1

2. Phase 2 : From S1, simple random sampling of size r to get S2,

observe (xi, yi) , i ∈ S2

• Regression estimator

yreg,tp = y2 + (x1 − x2)′ B

• Property

1. Taylor linearization

yreg,tp = y2 + (x1 − x2)′B +Op

(r−1)

2. Approximately unbiased

3. Variance

V (yreg,tp).=

(1

n− 1

N

)B′SxxB +

(1

r− 1

N

)See


7.4 Repeated survey

• Motivation: sampling for the same population over time.

• Several parameters

1. Difference of the means between time: Y2 − Y1

2. Overall mean:(Y1 + Y2

)/2

3. Most recent mean: Y2

• Best sampling design

1. For θ1 = Y2 − Y1 : full overlap sampling

2. For θ2 =(Y1 + Y2

)/2 : no overlap sampling

3. For θ3 = Y2 : partial replacement sampling

• Partial overlap sampling: Let Y2 be the parameter of interest.

1. At time t = 1: Select a SRS of size n. Let S1 be the realized

sample at t = 1.

2. At time t = 2: partition the population U into two strata: S1

and U ∼ S1. From S1, select a SRS of size nm to get S2m. From

U ∼ S1, select a SRS of size nu = n − nm to get S2u. The final

sample at t = 2 is S2 = S2m ∪ S2u.

Stratum Pop’n size Sample size Estimator for Y2

Matched n nmˆYm

Unmatched N − n nuˆYu

N n α ˆYu + (1− α) ˆYm

7.4. REPEATED SURVEY 119

• Composite estimator

ˆYα = α ˆYu + (1− α) ˆYm

for some constant α.

– If ˆYu and ˆYm are unbiased, then ˆYα is unbiased.

– Variance of ˆYα is minimized at

α∗ =V(ˆYm

)− Cov

(ˆYu,

ˆYm

)V(ˆYu

)+ V

(ˆYm

)− 2Cov

(ˆYu,

ˆYm

)– Minimum variance is

V(ˆY ∗α

)=

V(ˆYm

)V(ˆYu

)−Cov

(ˆYu,

ˆYm

)2

V(ˆYu

)+ V

(ˆYm

)− 2Cov

(ˆYu,

ˆYm

)• How to choose ˆYu and ˆYu ?


• Two-phase sampling approach

– x: observation at t = 1, y: observation at t = 2

– Estimators

ˆYu =1

nu

∑i∈S2u

yi ≡ yu

ˆYm = ym + (x1 − xm) b

– Variances and covariance

V(ˆYu

)= n−1

u S2

V(ˆYm

)= n−1

m

(1− ρ2

)S2 + n−1ρ2S2

Cov(ˆYu,

ˆYm

)= 0

– Optimal composite estimation

α∗ =nnu − n2

uρ2

n2 − n2uρ

2

– Variance of the optimal estimator

V(ˆY ∗α

)=

n− nuρ2

n2 − n2uρ

2S2

– Optimal allocation

nu

n=

1

1 +√

1− ρ2,

nm

n=

√1− ρ2

1 +√

1− ρ2

– Variance under optimal allocation

V(ˆY ∗α

)=

S2

2n

(1 +

√1− ρ2

)

Chapter 8

Nonresponse

8.1 Introduction

• Types of nonresponse

– Unit Nonresponse: No information is collected from a sample

unit. Maybe caused by a refusal, not-at-home, inability of the

unit to cooperate.

– Item Nonresponse: The unit cooperate in the survey but fails

to provide answers to some of the questions. Maybe caused by

“Don’t knows”, refused to answer because it is too sensitive, an-

swered but inconsistent with other answers.

• Approaches for nonresponse

– Unit Nonresponse: Call-back, Nonresponse weighting adjustment

– Item Nonresponse: Imputation

• How can reduce the nonresponse ?

– Use more incentive: works for refusal

– Call-backs: works for not-at-home

121

122 CHAPTER 8. NONRESPONSE

– What if we use some power to get answers (e.g. penalty by law)

?

• Basic Setup

Stratum Pop. Size Pop. Mean Sample Size Sample Mean

Respondents NR YR nR yR

Nonrespondents NM YM nM yM

Entire population N Y n y

SRS from the entire population, but observe only on the respondents.

Use yR (=sample mean of the respondents) to estimate the population

mean Y .

Bias (yR).=

NM

N

(YR − YM

)V ar (yR)

.=

1

nRS2R

• Two problems :

– Biased : YR = YM

– Large variance due to nR < n

8.2. CALL-BACKS 123

8.2 Call-backs

• Basic setup:

Among the nM nonrespondents, contact νnM (0 < ν < 1) to get the

responses.

Pop. Original Final Sample Sample

Stratum Size Sample Size Size Mean

Resp. NR nR r1 = nR y1

Nonresp. NM nM r2 = νnM y2

N n r

– Point estimation :ˆY =

nR

ny1 +

nM

ny2

– Variance :

V ar(ˆY)=

1

n

(1− n

N

)S2 +

W2S22

n

(1

ν− 1

)where W2 = N−1NM .

– Cost function:

C = c0n+ c1W1n+ c2W2νn

– Optimal sample size (n) and call-back rate (ν) : Minimize the

variance for the given cost

ν∗ =

√c0 + c1W1

c2× S2

2

S2 −W2S22

8.3 Nonresponse weighting adjustment

• Under no missing data:

YHT =∑i∈S

1

πiyi


• Response indicator function:

Ri =

1 if unit i responds,

0 if unit i does not respond.

• Idea : Use two-phase sampling approach

Population (U)Phase1→ Sample (S)

Phase2→ Respondents (SR)

• Estimation: Let πi|S = Pr (Ri = 1 | S). If πi|S were known, then

Yϕ =∑i∈S

1

πi

1

πi|SRiyi

would be conditionally unbiased.

• In practice, we use an estimator πi|S of πi|S . The NWA estimator is

YNWA =∑i∈S

1

πi

1

πi|SRiyi

8.3.1 Weighting class NWA method

• A special case of NWA method. Commonly used.

• Partition the sample into G groups : S = S1 ∪ S2 ∪ · · · ∪ SG

• Assumption: Response homogeneity group (RHG) model

Pr (Ri = 1 | S) = πi|S = θgs for all i ∈ Sg

Pr (Ri = 1, Rj = 1 | S) = Pr (Ri = 1 | S)Pr (Rj = 1 | S) for all i = j ∈ S

• Under the RHS model, use

πi|S =

∑i∈Sg

π−1i Ri∑

i∈Sgπ−1i

for i ∈ Sg.

8.3. NONRESPONSE WEIGHTING ADJUSTMENT 125

• Weighting class NWA estimator

YNWA =G∑

g=1

∑i∈Sg

π−1i π−1

i|SRiyi =G∑

g=1

NgyRg

where Ng =∑

i∈Sgπ−1i and

yRg =

∑i∈Sg

π−1i Riyi∑

i∈Sgπ−1i Ri

• Define YRg =∑

i∈Sgπ−1i π−1

i|SRiyi and NRg =∑

i∈Sgπ−1i π−1

i|SRi. The

weighting class NWA estimator can be written

YNWA =G∑

g=1

NgYRg

NRg


• Linearization :

YNWA =

G∑g=1

NgYRg

NRg

.=

G∑g=1

Ng

Yg

Ng

+YRg − ygNRg

Ng

where yg = Yg/Ng and Yg =∑

i∈Sgπ−1i yi. Thus,

YNWA.= YHT +

G∑g=1

∑i∈Sg

π−1i π−1

i|SRi (yi − yg)

• Approximately unbiased.

• Asymptotic variance

V(YNWA

).= V

(YHT

)+ V

G∑

g=1

∑i∈Sg

π−1i π−1

i|SRi (yi − yg)

= V

(YHT

)+ E

G∑

g=1

∑i∈Sg

π−2i

(π−1i|S − 1

)(yi − yg)

2

= V1 + V2

• Variance estimation:

V = V1 + V2

where(V1, V2

)is an unbiased estimator of (V1, V2). Since

V1 = E

∑i∈S

(1− πi)y2iπ2i

+∑i,j∈S

∑i=j

πij − πiπjπij

yiπi

yjπj

,

we can use

V1 =∑i∈SR

(1− πi)

πi|S

y2iπ2i

+∑

i,j∈SR

∑i=j

πij − πiπjπij

yiπiπi|S

yjπj πj|S

and

V2 =

G∑g=1

∑i∈SRg

π−2i π−1

i|S

(π−1i|S − 1

)(yi − yRg)

2 .

8.3. NONRESPONSE WEIGHTING ADJUSTMENT 127

8.3.2 Estimators that use weighting as well as auxiliary vari-

ables

• Motivation

– The NWA estimator uses a model for Ri.

– In addition to the model for Ri, we want to use a model for yi

given xi, where xi is always observed.

• NWA regression estimator

Yreg =∑i∈S

1

πiyi +

∑i∈SR

1

πi

1

πi|S(yi − yi)

where yi = x′iBR and BR =

(∑i∈SR

π−1i π−1

i|Sxix′i

)−1∑i∈SR

π−1i π−1

i|Sxiyi

• Properties

Yreg = YNWA +(XHT − XNWA

)′BR

.= YNWA +

(XHT − XNWA

)′B

where B is the probability limit of BR and

XNWA =∑i∈SR

1

πi

1

πi|Sxi.

Furthermore, under the weighting class NWA method, we have

YNWA.= YHT +

G∑g=1

∑i∈Sg

π−1i π−1

i|SRi (yi − yg)

and

XNWA.= XHT +

G∑g=1

∑i∈Sg

π−1i π−1

i|SRi (xi − xg) .

Thus,

Yreg.= YHT +

G∑g=1

∑i∈Sg

π−1i π−1

i|SRi (ei − eg)

where ei = yi − x′iB and eg =

(∑i∈Sg

π−1i

)−1∑i∈Sg

π−1i ei.


• Approximately unbiased.


V(Yreg

)= V

(YHT

)+ E

G∑

g=1

∑i∈Sg

π−2i

(π−1i|S − 1

)(ei − eg)

2

.

• Variance estimation: Same V1 for NWA estimator. For V2, use

V2 =

G∑g=1

∑i∈SRg

π−2i π−1

i|S

(π−1i|S − 1

)(ei − eRg)

2 .

8.4 Imputation

8.4.1 Introduction

• Meaning: Fill in missing values by a plausible value (or by a set of

plausible values)

• Why imputation ?

– It provides a complete data file: we can apply the standard com-

plete data methods

– By filling in missing values, the analyses from different users can

be consistent.

– By a proper choice of imputation model, we may reduce the non-

response bias.

– Do not want to delete the records of partial information: Makes

full use of information. (i.e. reduce the variance)

• Basic setup

– yi: study variable. subject to missing.

– xi: auxiliary variable. always observed.

8.4. IMPUTATION 129

– Ii: sampling indicator function for unit i

– Ri: response indicator function for yi.

• Lemma 1: Let Yn =∑

i∈S π−1i yi be an unbiased estimator of Y =∑N

i=1 yi under complete response. If yi is not observed at Ri = 0 and

if we can find y∗i that satisfies

E (y∗i | Ii = 1, Ri = 0) = E (yi | Ii = 1, Ri = 0) (8.1)

then the imputed estimator of the form

YI =∑i∈S

1

πiRiyi + (1−Ri) y

∗i (8.2)

is unbiased for Y in the sense that E(YI − Y

)= 0.

Proof:

Note that

EYn | I,R

= E

∑i∈S

1

πiRiyi + (1−Ri) yi | I,R

=∑i∈S

1

πiRiE (yi | Ii = 1, Ri = 1) + (1−Ri)E (yi | Ii = 1, Ri = 0) .

Similarly,

EYI | I,R

=

∑i∈S

1

πiRiE (yi | Ii = 1, Ri = 1) + (1−Ri)E (y∗i | Ii = 1, Ri = 0) .

and so, by (9.1),

EYn − YI | I,R

= 0,

which implies E(YI − Y

)= 0.

• How to get y∗i satisfying (9.1)?

– Deterministic imputation: Use an estimator ofE (yi | Ii = 1, Ri = 0).

– Stochastic imputation: Generate y∗i from f (yi | Ii = 1, Ri = 0).


• Approaches of computing the conditional distribution f (yi | Ii = 1, Ri = 0):

– Assuming Missing Completely At Random (MCAR):

f (yi | Ii = 1, Ri = 0) = f (yi | Ii = 1, Ri = 1) . (8.3)

Under MCAR, we can estimate the parameter using the set of

respondents. However, the MCAR may not be realistic.

– Assume that there exists an auxiliary vector xi such that

f (yi | xi, Ii = 1, Ri = 0) = f (yi | xi, Ii = 1, Ri = 1) . (8.4)

Condition (9.3) is called Missing At Random (MAR). Under MAR,

we have

E (yi | Ii = 1, Ri = 0) = E E (yi | xi, Ii = 1, Ri = 0) | Ii = 1, Ri = 0

= E E (yi | xi, Ii = 1, Ri = 1) | Ii = 1, Ri = 0 .

Thus, we have only to generate y∗i from f (yi | xi, Ii = 1, Ri = 1).

8.4. IMPUTATION 131

• Lemma 2: Let y∗i be the imputed value of yi. If

E (y∗i | xi, Ii = 1, Ri = 1) = E (yi | xi, Ii = 1, Ri = 1) (8.5)

and MAR condition holds, then the imputed estimator YI in (8.2) is

unbiased.

Proof:

Note that

EYn − YI | X, I,R

=

∑i∈S

1

πi(1−Ri) E (yi | xi, Ii = 1, Ri = 0)− E (y∗i | xi, Ii = 1, Ri = 0)

=∑i∈S

1

πi(1−Ri) E (yi | xi, Ii = 1, Ri = 1)− E (y∗i | xi, Ii = 1, Ri = 1)

where the second equality follows from MAR condition. Thus, by

(8.5),

EYn − YI | X, I,R

= 0,

which implies E(YI − Y

)= 0.

• When the MAR condition holds? : If the response mechanism satisfies

Pr (Ri = 1 | yi,xi, Ii = 1) = Pr (Ri = 1 | xi, Ii = 1)

then (9.3) holds.

• Commonly used imputation methods

1. Business surveys: Ratio, regression, nearest neighbor imputation

2. Socio-economic surveys: Random donor (within classes), stochas-

tic ratio or regression, Fractional Imputation, Multiple imputa-

tion.


8.4.2 Deterministic imputation

• Assumptions

– E (yi | xi, Ii = 1) = x′iβ with an unknown β.

– V (yi | xi, Ii = 1) = ciσ2 with known ci = c (xi) and unknown σ2.

– Missing at random

• Imputed estimator of Y :

YI =∑i∈S

1

πi

Riyi + (1−Ri)x

′iβ

where β =(∑

i∈S c−1i π−1

i Rixix′i

)−1∑i∈S c−1

i π−1i Rixiyi.

• Example:

1. Cell mean imputation: xi is a vector of indicator functions of the

cells. If xi = 1, then it leads to mean imputation.

2. Ratio imputation

3. Regression imputation

8.4. IMPUTATION 133

• Property

1. E(β | x, I, R

)= β

2. Let y∗i = x′iβ be the imputed value of yi. Then, condition (8.5)

holds and the imputed estimator is unbiased by Lemma 2.

3. To discuss the variance and its estimation, we need to use a Taylor

linearization method.

• Taylor linearization: Write YI as a function of β and use

YI(β).= YI (β) +

∂YI (β)

∂β

(β − β

),

we have

YI(β).=∑i∈S

1

πi

xiβ +Ri (1 + ki)

(yi − x′

iβ)

where

ki =

∑i∈S

π−1i (1−Ri)x

′i

∑i∈S

ciπ−1i Rixix

′i

−1

cixi.


V(YI

).= V

∑i∈S

1

πixiβ

+ E

∑i∈S

1

π2i

Ri (1 + ki)2 ciσ

2

.

Note that

V(Yn

).= V

∑i∈S

1

πixiβ

+ E

∑i∈S

1

π2i

ciσ2

.


V(YI

)=∑i∈S

∑j∈S

πij − πiπjπij

ηiπi

ηjπj

where ηi = xiβ +Ri (1 + ki)(yi − x′

iβ).


8.4.3 Stochastic imputation

• Motivation: Deterministic imputation may provide an efficient esti-

mator for the mean, but may provide biased estimates for other pa-

rameters.

• Example

1. Mean imputation provides biased estimate for proportion Pr (a < Y < b).

2. Regression imputation underestimates the population variance.

• Remedy: Add more variability in the imputed values.

– Use y∗i = yi+e∗i , where e∗i is randomly selected from ei = yi − yi; i ∈ SR.

– Generate y∗i from f (y | xi, Ii = 1) so that EI (yi) = yi and EI

denotes the expectation over the imputation mechanism.

• Hot deck imputation

– Partition the sample into G groups: S = S1 ∪ S2 ∪ · · · ∪ SG.

– In group g, we have ng elements, rg respondents, andmg = ng−rgnonrespondents.

– For each group Sg, select mg imputed values from rg respondents

with replacement (or without replacement).

– This hot deck imputation method is often justified under the IID

model within groups and is commonly used in household surveys.

Example 1: Hot deck imputation under SRS

• Sg = SRg∪SMg with SRg = i ∈ Sg;Ri = 1 and SMg = i ∈ Sg;Ri = 0.

• Imputation mechanism: y∗ji.i.d.∼ Uniform yi; i ∈ SRg. That is, y∗j =

yi with probability 1/rg for i ∈ SRg and j ∈ SMg.

8.4. IMPUTATION 135

• Imputed estimator of Y :

ˆYI =1

n

∑i∈SRiyi + (1−Ri) y

∗i

• Variance

V(ˆYI

)= V

EI

(ˆYI

)+ E

VI

(ˆYI

)= V

n−1G∑

g=1

ngyRg

+ E

n−2G∑

g=1

mg

(1− r−1

g

)S2Rg

where yRg = r−1

g

∑i∈SRg

yi and S2Rg = (rg − 1)−1∑

i∈SRg(yi − yRg)

2.

Under the model

yi | (i ∈ Sg)i.i.d.∼

(µg, σ

2g

),

the variance can be written

V(ˆYI

)= V

n−1G∑

g=1

ngµg

+ E

n−2G∑

g=1

(ng + 2mg +

mg (mg − 1)

rg

)σ2g

.

Note that

V(ˆYn

)= V

n−1G∑

g=1

ngµg

+E

n−2G∑

g=1

ngσ2g

.

Thus, the variance is increased.


8.4.4 Variance estimation after imputation

• Variance is increased after imputation. Two sources:

1. Reduced sample size.

2. Randomness due to imputation mechanism in stochastic imputa-

tion.

• Naive approach: Treating the imputed values as if observed and ap-

plying the standard variance estimation formula to the imputed data.

• Naive approach underestimates the true variance !

• Example 1 (Continued): The naive variance estimator VI = n−1S2I has

expectation

EVI

= V

n−1G∑

g=1

ngµg

+1

n (n− 1)E

G∑

g=1

(ng −

ng

n− 2

mg

n− mg(mg − 1)

nrg

)σ2g

.= V

(ˆYn

)• Approach : Write

V(ˆYI

)= V

(ˆYn

)+ E

G∑

g=1

cgσ2g

for some cg. The (approximate) bias-corrected estimator is

V = VI +G∑

g=1

cgS2Rg

8.4. IMPUTATION 137

• Other approaches:

– Multiple imputation: Rubin (1987)

– Adjusted jackknife: Rao and Shao (1992)

– Fractional imputation: Kim and Fuller (2004)

– Linearization: Shao and Steel (1999), Kim and Rao (2009)

• Multiple imputation method

– Idea: Instead of creating a single imputed data set, create M(>

1) imputed data set independently to get M point estimators

θI(1), · · · , θI(M) andM naive variance estimators VI(1), · · · , VI(M).

The final point estimator is

θMI =1

M

M∑k=1

θI(k)

and its variance estimator is

VMI =1

M

M∑k=1

VI(k) +

(1 +

1

M

)1

M − 1

M∑k=1

(θI(k) − θMI

)2.

– Multiple imputation is justified under the following condition:

V(θI

)= V

(θn

)+ V

(θI − θn

)– Approximate Bayesian bootstrap imputation:

[Step 1] First select y∗ji.i.d.∼ Uniform yi; i ∈ SRg, j ∈ SRg.

[Step 2] The final imputed values are selected from y∗∗ji.i.d.∼

Uniform y∗i ; i ∈ SRg.

• Adjusted jackknife

– Idea: For the imputed value y∗i = yi + e∗i , we create the k-th

replicate of y∗i by y∗(k)i = y

(k)i + e∗i where y

(k)i = x′

iβ(k)

and β(k)

is computed from the formula β by deleting (xk, yk).


– The variance estimation method is justified because

V(θI

)= V

(θId

)+ V

(θI − θId

)where θId is the imputed estimator using deterministic imputa-

tion. The first term is computed using either a jackknife method

(Rao and Shao, 1992) or a linearization method (Kim and Rao,

2009).

• Fractional imputation

– Idea: Split the record with missing item into M imputed values

y∗i(k) = yi + e∗i(k) with fractional weights w∗i(k) such that

M∑k=1

w∗i(k)

(1, e∗i(k)

)= (1, 0)

holds. The fractional imputation estimator is algebraically equiv-

alent to the deterministic imputation estimator but also provide

consistent estimates for other parameters.

– Variance estimation can be easily computed by a replication method.

Chapter 9

Small Area Estimation

9.1 Introduction

• Original sample A is decomposed into G domains such that A = A1 ∪· · · ∪AG and n = n1 + · · ·+ nG

• n is large but ng can be very small.

• Direct estimator of Yg =∑

i∈Ugyi

Yd,g =∑i∈Ag

1

πiyi

– Unbiased

– May have high variance. (CV > 30%: unacceptable for official

statistics)

• Synthetic estimator of Yd

Ys,g = Xgβ

where Xd =∑

i∈Udxi is the known total of xi in Ud and β is the

estimated regression coefficient for the regression of Yg on Xg.

– Low variance.

139

140 CHAPTER 9. SMALL AREA ESTIMATION

– Could be biased (unless∑

i∈Ud(yi − x′

iB) = 0)

• Composite estimation: consider

Yc,g = αgYd,g + (1− αg) Ys,g

for some αg ∈ (0, 1). We are interested in finding α∗g that minimizes

the MSE of Yc. The optimal choice is

α∗g∼=

MSE(Ys,g

)MSE

(Yd,g

)+MSE

(Ys,g

)– For the direct estimation part, MSE

(Yd,g

)= V

(Yd,g

)can be

estimated.

– For the synthetic estimation part, under the model,

Yg = X′gβ + ug

where ug ∼ N(0, σ2

u

), MSE

(Ys,g

)∼= N2

gV (ug) = N2g σ

2u

Thus, use

Yc,g = α∗gYd,g +

(1− α∗

g

)Ys,g

where

α∗g∼=

N2g σ

2u

Vg +N2g σ

2u

and Vg is an unbiased estimator of V(Yd,g

)and σ2

u is an unbiased

estimator of σ2u.

9.2 Area level estimation

• Parameter of interest: Yg = N−1g

∑i∈Ug

yi

• ModelˆYd,g ∼ N

(Yg, Vg

)

9.2. AREA LEVEL ESTIMATION 141

with Vg = V ( ˆYd,g),

Yg = X′gβ + ug

and ug ∼ N(0, σ2

u

).

• Best unbiased predictor of Yg assuming that β, Vg and σ2u are known:

ˆY ∗g = E

Yg | ˆYd,g

= X′

gβ +Eug | ˆYd,g

= X′

gβ +σ2u

σ2u + Vg

(ˆYd,g − X′

gβ)

Thus,ˆY ∗g = α∗

gˆYd,g +

(1− α∗

g

)X′

gβ (9.1)

where α∗g = σ2

u/(Vg + σ2u).

• Alternative derivation: BLUP

ˆYd,g = Yg + eg

X′gβ = Yg − ug

where eg and ug are independent error terms with mean zeros and

variance Vg and σ2u, respectively. The BLUP of Yg is given by (9.1).

• Remark: The model should be a model for the domain mean of Y . If

it is about the domain total, the model can be written as

Yg = X′gβ +Ngug

and the composite estimator will be changed.

• MSE: If β, Vg, and σ2u are known, then

MSE(ˆY ∗g

)= V

(ˆY ∗g − Yg

)= V

α∗g

(ˆYd,g − Yg

)+(1− α∗

g

) (X′

gβ − Yg)

=(α∗g

)2Vg +

(1− α∗

g

)2σ2u

= α∗gVg =

(1− α∗

g

)σ2u.


Note that, since 0 < α∗g < 1,

MSE(ˆY ∗g

)< Vg

and

MSE(ˆY ∗g

)< σ2

u.

• When β is unknown (and Vg and σ2u are known):

β =

G∑g=1

wgXgX′g

−1G∑

g=1

wgXgˆYd,g

where wg =(σ2u + Vg

)−1. The EBLUP is

ˆY ∗g (β) = α∗

gˆYd,g +

(1− α∗

g

)X′

gβ (9.2)

The MSE is

MSEˆY ∗g (β)

= V

ˆY ∗g (β)− Yg

= V

α∗g

(ˆYd,g − Yg

)+(1− α∗

g

) (X′

gβ − Yg

)=

(α∗g

)2Vg +

(1− α∗

g

)2 σ2u + X′

gV (β)Xg

= α∗

gVg +(1− α∗

g

)2X′

gV (β)Xg.

where

V(β)=

G∑

g=1

wgXgX′g

−1

• If β and σ2u are unknown:

1. Find a consistent estimator of β and σ2u.

2. UseˆY ∗g (α

∗g, β) = α∗

gˆYd,g +

(1− α∗

g

)X′

gβ. (9.3)

where α∗g = σ2

u/(Vg + σ2u)

9.3. EXTENSIONS 143

• MSE

MSEˆY ∗g (α

∗g, β)

= V

ˆY ∗g (α

∗g, β)− Yg

= V

α∗g

(ˆYd,g − Yg

)+(1− α∗

g

) (X′

gβ − Yg

)=

(α∗g

)2Vg +

(1− α∗

g

)2 σ2u + X′

gV (β)Xg

+ V (αg)

Vg + σ2

u

= α∗

gVg +(1− α∗

g

)2X′

gV (β)Xg + V (αg)Vg + σ2

u

• MSE estimation (Prasad and Rao, 1990):

ˆMSEˆY ∗g (α

∗g, β)

= α∗

gVg+(1− α∗

g

)2X′

gV (β)Xg+2V (αg)Vg + σ2

u

.

• Estimation of σ2u: Method of moment

σ2u =

∑g

kg

G

G− p

(ˆYd,g − X′

gβ)2− Vd,g

,

where kg ∝σ2u + Vg

−1and

∑Gg=1 kg = 1. If σ2

u is negative, then we

set it to zero.

9.3 Extensions

• Unit level estimation: Battese, Harter, and Fuller (1988).

Use a unit level modeling

ygi = x′giβ + ug + egi

and

Y ∗g =

∑i∈Ug

x′giβ + ug

.

It can be shown that

ˆY ∗g = α∗

gYreg,g +(1− α∗

g

)Ys,g

where

Yreg,g = ˆYd,g +(Xg − ˆXd,g

)′β


and

Ys,g = X′gβ.

9.3. EXTENSIONS 145

• Benchmarked small area estimation: Wang, Fuller, and Qu (2009).

– sum of the small area estimates is not necessarily equal to Y =∑i∈A

1πiyi

– It is desired to make the benchmarking condition holds:

G∑g=1

NgˆY ∗g = Y

– Idea: SinceˆY ∗g = X′

gβ + α∗g

(ˆYd,g − X′

gβ),

we can adjust σ2u so that

G∑g=1

Ngα∗g

(ˆYd,g − X′

gβ)= 0.

• For other applications, read “Small Area Estimation” by Rao (2003).

stat 521: survey sampling - iowa state...

Documents