variance estimation in complex surveys drew hardin kinfemichael gedif
TRANSCRIPT
Variance Estimation Variance Estimation in Complex Surveysin Complex Surveys
Drew HardinDrew Hardin
Kinfemichael GedifKinfemichael Gedif
So far..So far..
Variance for estimated mean and Variance for estimated mean and total undertotal under
SRS, Stratified, Cluster (single, multi-stage), SRS, Stratified, Cluster (single, multi-stage), etc.etc.
Variance for estimating a ratio of two Variance for estimating a ratio of two means undermeans under
SRS (we used linearization method)SRS (we used linearization method)
What about other cases?What about other cases?
Variance for estimators that are not Variance for estimators that are not linear combinations of means and linear combinations of means and totalstotals– RatiosRatios
Variance for estimating other statistic Variance for estimating other statistic from complex surveysfrom complex surveys– Median, quantiles, functions of EMF, etc.Median, quantiles, functions of EMF, etc.
Other approaches are necessaryOther approaches are necessary
OutlineOutline
Variance Estimation Methods Variance Estimation Methods – LinearizationLinearization– Random Group MethodsRandom Group Methods– Balanced Repeated Replication (BRR)Balanced Repeated Replication (BRR)– Resampling techniquesResampling techniques
Jackknife, BootstrapJackknife, Bootstrap Adapting to complex surveysAdapting to complex surveys ‘‘Hot’ research areasHot’ research areas ReferenceReference
Linearization (Taylor Series Linearization (Taylor Series Methods)Methods)
We have seen this before (ratio We have seen this before (ratio estimator and other courses).estimator and other courses).
Suppose our statistic is non-linear. It Suppose our statistic is non-linear. It can often be approximated using can often be approximated using Taylor’s Theorem.Taylor’s Theorem.
We know how to calculate variances We know how to calculate variances of linear functions of means and of linear functions of means and totals.totals.
Linearization (Taylor Series Linearization (Taylor Series Methods)Methods)
LinearizeLinearize
Calculate VarianceCalculate Variance
)ˆ,ˆ(ˆˆ
)ˆ(ˆ
)ˆ(ˆ
)ˆ,...,ˆ(2
),...(1
2
),...(1
1 11
jijji i
kttk
ttk
ttCovt
h
t
h
tVt
htV
t
htthV
kk
k
j
jjtttj
kk ttc
cccchttthtttth k
k
1
,..,21321 )ˆ(),....,,,(
),...,,()ˆ,...,ˆ,ˆ,ˆ( 21
321
Linearization (Taylor Series) Linearization (Taylor Series) MethodsMethods
– Pro: Pro: Can be applied in general sampling designsCan be applied in general sampling designs Theory is well developedTheory is well developed Software is available Software is available
– Con:Con: Finding partial derivatives may be difficultFinding partial derivatives may be difficult Different method is needed for each statisticDifferent method is needed for each statistic The function of interest may not be expressed The function of interest may not be expressed
a smooth function of population totals or a smooth function of population totals or meansmeans
Accuracy of the linearization approximationAccuracy of the linearization approximation
Random Group MethodsRandom Group Methods Based on the concept of replicating the Based on the concept of replicating the
survey designsurvey design Not usually possible to merely go and Not usually possible to merely go and
replicate the surveyreplicate the survey However, often the survey can be divided However, often the survey can be divided
into R groups so that each group forms a into R groups so that each group forms a miniature versions of the surveyminiature versions of the survey
Random Group MethodsRandom Group Methods
1 2 3 4 5 6 7 8Stratum 1
1 2 3 4 5 6 7 8Stratum 2
1 2 3 4 5 6 7 8Stratum 3
1 2 3 4 5 6 7 8Stratum 4
1 2 3 4 5 6 7 8Stratum 5
Treat as miniature sample
Unbiased Estimator (Average of Unbiased Estimator (Average of Samples)Samples)
Slightly Biased Estimator (All Data)Slightly Biased Estimator (All Data)
1
)~ˆ(
1)
~(ˆ 1
2
1
RRV
R
rr
1
)ˆˆ(1ˆ 1
2
2
RRV
R
rr
Random Group MethodsRandom Group Methods Pro: Pro:
– Easy to calculateEasy to calculate– General method (can also be used for non General method (can also be used for non
smooth functions) smooth functions) Con:Con:
– Assumption of independent groups (problem Assumption of independent groups (problem when N is small)when N is small)
– Small number of groups (particularly if one Small number of groups (particularly if one strata is sampled only a few times)strata is sampled only a few times)
– Survey design must be replicated in each Survey design must be replicated in each random group (presence of strata and clusters random group (presence of strata and clusters remain the same)remain the same)
Resampling and Replication Resampling and Replication MethodsMethods
Balanced Repeated Replication (BRR)Balanced Repeated Replication (BRR)– Special case when Special case when nnhh=2=2
Jackknife (Quenouille (1949) Tukey (1958))Jackknife (Quenouille (1949) Tukey (1958)) Bootstrap (Efron (1979) Shao and Tu Bootstrap (Efron (1979) Shao and Tu
(1995))(1995)) These methodsThese methods
Extend the idea of random group methodExtend the idea of random group method Allows replicate groups to overlapAllows replicate groups to overlap Are all purpose methodsAre all purpose methods Asymptotic properties ??Asymptotic properties ??
Balanced Repeated Balanced Repeated ReplicationReplication
Suppose we had sampled 2 per Suppose we had sampled 2 per stratumstratum
There are 2There are 2H H ways to pick 1 from ways to pick 1 from each stratum.each stratum.
Each combination could treated as a Each combination could treated as a sample.sample.
Pick R samples.Pick R samples.
Balanced Repeated Balanced Repeated ReplicationReplication
Which samples should we include?Which samples should we include?– Assign each value either 1 or –1 within the Assign each value either 1 or –1 within the
stratumstratum– Select samples that are orthogonal to one Select samples that are orthogonal to one
another to create balanceanother to create balance– You can use the design matrix for a fraction You can use the design matrix for a fraction
factorialfactorial
– Specify a vector Specify a vector r r of 1,-1 values for each of 1,-1 values for each stratumstratum
EstimatorEstimator 2
1
ˆ)(ˆ1
)ˆ(ˆ
R
rrBRR R
V
Balanced Repeated Balanced Repeated ReplicationReplication
ProPro– Relatively few computationsRelatively few computations– Asymptotically equivalent to linearization Asymptotically equivalent to linearization
methods for smooth functions of population methods for smooth functions of population totals and quantilestotals and quantiles
– Can be extended to use weightsCan be extended to use weights
ConCon– 2 psu per sample2 psu per sample
Can be extended with more complex Can be extended with more complex schemesschemes
The JackknifeThe JackknifeSRS-with replacementSRS-with replacement
Quenoule (1949); Tukey (1958); Shao and Tu (1995)Quenoule (1949); Tukey (1958); Shao and Tu (1995) LetLet be the estimator of be the estimator of after omitting the after omitting the iithth
observationobservation Jackknife estimateJackknife estimate
Jackknife estimator of theJackknife estimator of the
For Stratified SRS without replacement Jones For Stratified SRS without replacement Jones (1974)(1974)
l iin
i
iJ nnn ˆ)1(ˆ~
where/~~
1
n
iJ
i
n
i
in
i
iJ
nn
nn
nV
1
2
11
2
)~~
()1(
1
/ˆˆwhere)ˆˆ(1
)ˆ(
i
)ˆ(V
The JackknifeThe Jackknifestratified multistage designstratified multistage design
In stratum h, delete one PSU at a timeIn stratum h, delete one PSU at a time Let be the estimator of the same form as Let be the estimator of the same form as
when PSU when PSU ii of stratum of stratum hh is omitted is omitted Jackknife estimate:Jackknife estimate:
Or using pseudovaluesOr using pseudovalues
)(ˆ)1/()(' ''
hihihihhhhh hh
hi ygwherenhyynWyWy
)(ˆ hi
L
h
n
i
L
h
n
i
hi
h
IIJ
hiIJ
hihh
hi
h h
nLn
nn
1 1 1 1
)()()()(
)()(
~11~;/
~~
ˆ)1(ˆ~
The JackknifeThe Jackknifestratified multistage designstratified multistage design
Different formulae for Different formulae for
WhereWhere
Using the pseudovalues Using the pseudovalues
)ˆ(V
hn
i
methodhiL
h h
hL n
nV
1
2)(
1
)ˆˆ()1
)ˆ(
LnL
h
hL
h
hihmethod /ˆor,/ˆ,ˆ,ˆbecanˆ1
)(
1
)()(
IIIjn
nV
hn
i
jJ
hiL
h h
hL ,)
~~(
)1)ˆ(
1
2)()(
1
The JackknifeThe JackknifeAsymptoticsAsymptotics
Krewski and Rao (1981)Krewski and Rao (1981) Based on the concept of a sequence of finite populations Based on the concept of a sequence of finite populations
with L strata in with L strata in
Under conditions C1-C6 given in the paperUnder conditions C1-C6 given in the paper
Where method is the estimator used (Linearization, BRR, Jackknife)Where method is the estimator used (Linearization, BRR, Jackknife)
1LL
L
)1,0()ˆ(
ˆ)
)ˆ()
),0()ˆ()2
22/1
NV
Tiii
nVii
Nni
d
method
method
method
d
The BootstrapThe BootstrapNaïve bootstrapNaïve bootstrap
Efron (1979); Rao and Wu (1988); Shao and Tu Efron (1979); Rao and Wu (1988); Shao and Tu (1995)(1995)
Resample with replacement in stratum Resample with replacement in stratum hh
Estimate:Estimate:
Variance:Variance:
– Or approximate byOr approximate by
The estimator is not a consistent estimator of the The estimator is not a consistent estimator of the variance of a general nonlinear statisticsvariance of a general nonlinear statistics
hn
ihiy 1*
Bb
ygandyyyny b
h
bh
b
i
bhih
bh
,...,2,1
)(ˆ,, *)*()*()*()*(1)*(
2**
**
* ))ˆ(ˆ()ˆ(ˆ EEVNBS
B
b
b
BV
NBS1
.*)*(** )ˆˆ(1
1)ˆ(ˆ
The BootstrapThe BootstrapNaïve bootstrapNaïve bootstrap
ForFor
Comparing with Comparing with
The ratio does not converge to 1for a The ratio does not converge to 1for a bounded bounded nnhh
***ˆ yyW hh
22
* 1)( h
h
h
h
sn
n
n
WyVar h
22
)( hh
sn
WyVar h
)(
)( *
yVar
yVar
The BootstrapThe BootstrapModified bootstrapModified bootstrap
Resample with replacement in Resample with replacement in stratum stratum hh
Calculate:Calculate:
Variance: Variance: Can be approximated with Monte CarloCan be approximated with Monte Carlo For the linear case, it reduces to the For the linear case, it reduces to the
customary unbiased variance estimatorcustomary unbiased variance estimator mmhh < n < nhh
1,1* h
m
ihi my h
)~(~
,~~,/~~
)()1(
~
1
*2/1
2/1
ygyWymyy
yyn
myy
h
m
i
L
hhhhih
hih
hhhi
h
2**
**
** ))~
(~
()~
(ˆ EEVMBS
More on bootstrapMore on bootstrap
The method can be extended to stratified srs The method can be extended to stratified srs without replacement by simply changing without replacement by simply changing
For For mmhh=n=nhh-1-1, this method reduces to the naïve , this method reduces to the naïve BSBS
For For nnhh=2, m=2, mhh=1=1, the method reduces to the , the method reduces to the random half-sample replication methodrandom half-sample replication method
For nFor nhh>3, choice of m>3, choice of mh h …see Rao and Wu (1988)…see Rao and Wu (1988)
))(1()1(
~to~ *2/1
2/1
hhihh
hhhihi yyf
n
myyy
SimulationSimulationRao and Wu (1988)Rao and Wu (1988)
Jackknife and Linearization intervals gave Jackknife and Linearization intervals gave substantial bias for nonlinear statistics in one substantial bias for nonlinear statistics in one sided intervalssided intervals
The bootstrap performs best for one-sided The bootstrap performs best for one-sided intervals (especially when intervals (especially when mmhh=n=nhh-1-1))
For two-sided intervals, the three methods For two-sided intervals, the three methods have similar performances in coverage have similar performances in coverage probabilitiesprobabilities
The Jackknife and linearization methods are The Jackknife and linearization methods are more stable than the bootstrap more stable than the bootstrap
B=200 is sufficientB=200 is sufficient
‘‘Hot’ topicsHot’ topics
Jackknife with non-smooth functions Jackknife with non-smooth functions (Rao and Sitter 1996)(Rao and Sitter 1996)
Two-phase variance estimation Two-phase variance estimation (Graubard and Korn 2002; Rubin-(Graubard and Korn 2002; Rubin-Bleuer and Schiopu-Kratina 2005)Bleuer and Schiopu-Kratina 2005)
Estimating Function (EF) bootstrap Estimating Function (EF) bootstrap method (Rao and Tausi 2004)method (Rao and Tausi 2004)
SoftwareSoftware
OSIRIS – BRR, JackknifeOSIRIS – BRR, Jackknife SAS – LinearizationSAS – Linearization Stata – LinearizationStata – Linearization SUDAAN – Linearization, Bootstrap, SUDAAN – Linearization, Bootstrap,
JackknifeJackknife WesVar – BRR, JackKnife, BootstrapWesVar – BRR, JackKnife, Bootstrap
References:References: Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of
statistics 7, 1-26.statistics 7, 1-26. Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters
using sample surveys. Statistical Science, 17, 73-96.using sample surveys. Statistical Science, 17, 73-96. Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples:
Properties of linearization, jackknife, and balanced replication methods. The Properties of linearization, jackknife, and balanced replication methods. The annals of statistics. 9, 1010-1019.annals of statistics. 9, 1010-1019.
Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Statistics 20, 355-375.Statistics 20, 355-375.
Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey data. JASA, 83, 231-241.survey data. JASA, 83, 231-241.
Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under stratified multistage sampling. Communications in statistics. 33:, 2087-under stratified multistage sampling. Communications in statistics. 33:, 2087-2095. 2095.
Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shao’s paper.Statistics, 27, pp. 246–247.
Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for joint model and design based framework. Annals of Statistics (to appear)
Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag.Verlag.
Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics. 29:614.Mathematical Statistics. 29:614.
Not referred in the presentationNot referred in the presentation Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-
Verlag.Verlag. Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper,
Statistics, 27, pp. 203–237, with discussion, 237–254.