variance estimation
DESCRIPTION
Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007. Variance Estimation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/1.jpg)
1
Optimal Number of Replicates for Variance Estimation
Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy
Third International Conference on Establishment Surveys(ICES-III)
June 21, 2007
![Page 2: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/2.jpg)
2
Variance Estimation
Two general approaches for variance estimation With weighted data obtained under complex designs:
Linearization
Replication
![Page 3: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/3.jpg)
3
Linearization
Approximate complex statistics in terms of L linear statistics
Estimate variance of from:
2
1
ˆˆˆ
ˆ
L
lll
l
EE
![Page 4: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/4.jpg)
4
Replication
Partition the full sample into R subsamples (replicates)
Obtain separate estimates for from each replicate:
Estimate variance of by:
Rr ....., 1, r ,ˆ
R
R
rr
2
1
)ˆˆ(
![Page 5: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/5.jpg)
5
How Many Replicates?
Recommendations regarding the optimal number of replicates for variance estimation are at variance:
Computational resources required can be intensive
For certain statistics a larger number of replicates might be needed to produce stable estimates of variance
What is the point of diminishing returns?
![Page 6: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/6.jpg)
6
Research Methodology
Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation:
National Study of Postsecondary Faculty (NSOPF:04)
National Postsecondary Student Aid Study (NPSAS:04)
![Page 7: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/7.jpg)
7
General Design SpecificationsNational Study of Postsecondary Faculty (NSOPF:04)
Survey of about 35,000 faculty and instructional staff
Across a sample of 1,080 institutions
In the 50 States and the District of Columbia
![Page 8: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/8.jpg)
8
Sampling Methodology
Institutions selected with probability proportional to a measure of size to over-represent:
Hispanic
Non-Hispanic Black
Asian and Pacific Islander
Full-time other female
Used RTI’s cost/variance optimization procedure for sample allocation
![Page 9: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/9.jpg)
9
Institution Sampling Frame
Degree Granting Carnegie Code Public Private Total
Doctor’s 15, 16, 52 190 110 300
Master’s 21, 22 270 320 590
Bachelor’s 31, 32, 33 90 480 570
Associate’s 40, 60 1,030 150 1,180
Other/Unknown51, 53 – 59, unclassified
110 620 730
Total 1,700 1,680 3,380
![Page 10: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/10.jpg)
10
Institution Sample
Degree Granting Public Private Total
Doctor’s 190 110 300
Master’s 120 80 200
Bachelor’s 30 130 160
Associate’s 340 10 350
Other 10 60 70
Total 680 400 1,080
![Page 11: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/11.jpg)
11
Expected Faculty CountsFrom Sampled Institutions by Strata
NSOPF stratum Black Hispanic Asian OFTF OFTM OPT Total
Public, doctor’s 10,720 8,660 32,630 58,870 115,830 51,110 277,820
Public, master's 4,670 3,150 4,950 14,120 20,440 22,130 69,460
Public, bachelor’s 810 340 520 1,430 2,110 3,880 9,090
Public, associate’s 12,250 9,240 6,100 21,100 21,700 82,570 152,960
Public, other 150 80 170 290 630 830 2,150
Private not-for-profit, doctor’s 6,060 3,760 13,110 21,490 47,370 33,280 125,080
Private not-for-profit, master's 1,110 950 1,020 4,930 7,020 12,530 27,550
Private not-for-profit, bachelor’s 1,360 390 670 3,920 6,270 5,440 18,050
Private not-for-profit, Associate’s 20 20 40 180 450 480 1,180
Private not-for-profit, other 330 120 250 790 1,680 2,700 5,880
Total 37,480 26,710 59,460 127,120 223,500 214,940 689,210
![Page 12: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/12.jpg)
12
Target Number of Respondentsby Institution and Faculty Strata
Institution stratum Respondents Faculty stratum Respondents
Public doctor’s 6,200 Non-Hispanic Black 1,600
Public master’s 2,700 Hispanic 1,300
Public bachelor’s 600 Asian 900
Public associate’s 7,500 Other full-time female 4,600
Public other 500 Other full-time male 8,300
Private not-for-profit doctor’s 2,600 Other part-time 7,800
Private not-for-profit master’s 1,900
Private not-for-profit bachelor’s 1,700
Private not-for-profit associate’s 100
Private not-for-profit other 700
Total 24,500 24,500
![Page 13: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/13.jpg)
13
Distribution of Respondents(by institution and faculty strata)
Institution stratum Respondents Faculty stratum Respondents
Public doctor’s 7,460 Non-Hispanic Black 2,060
Public master’s 2,680 Hispanic 1,700
Public bachelor’s 450 Asian 1,610
Public associate’s 6,410 Other full-time female 5,850
Public other 110 Other full-time male 8,500
Private not-for-profit doctor’s 3,160 Other part-time 6,380
Private not-for-profit master’s 2,270
Private not-for-profit bachelor’s 2,520
Private not-for-profit associate’s 190
Private not-for-profit other 850
Total 26,110 Total 26,110
![Page 14: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/14.jpg)
14
Variance Estimation Methodology(NSOPF:04)
Used methodology developed by Kaufman (2004) to create bootstrap replicate weights:
Reflected finite population correction adjustment for the first stage (institution) selection.
Second stage (faculty selection) finite population correction factors were close to one and not reflected.
Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES.
Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.
![Page 15: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/15.jpg)
15
Comparisons of Variance EstimatesSE of Percent Teaching as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer Other title
Sta
nd
ard
Err
or
![Page 16: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/16.jpg)
16
Comparisons of Variance EstimatesSE of Percent Research as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer O ther title
Stan
dard
Err
or
![Page 17: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/17.jpg)
17
Comparisons of Variance EstimatesSE of Percent Administration as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer O ther title
Stan
dard
Err
or
![Page 18: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/18.jpg)
18
Comparisons of Variance EstimatesSE of Percent Full-time by Institution Type
(Bootstrap vs. Linearization)
0.0
5.0
10.0
15.0
20.0
PublicPh.D.
PublicMS
PublicBA
PublicAsso.
PublicOther
PrivatePh.D.
PrivateMS
PrivateBA
PrivateAsso.
PrivateOther
Sta
nd
ard
Err
or
![Page 19: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/19.jpg)
19
Revised Variance Estimation Methodology(NSOPF:04)
Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights.
Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics.
Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics.
Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.
![Page 20: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/20.jpg)
20
RSE of Percent Asians by Number of Replicates
0.60%
0.65%
0.70%
0.75%
0.80%
0.85%
0.90%
0.95%
1.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 21: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/21.jpg)
21
RSE of Percent Asians by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.40%
0.90%
1.40%
1.90%
2.40%
2.90%
3.40%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 22: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/22.jpg)
22
RSE of Percent Age < 35 by Number of Replicates
2.00%
2.10%
2.20%
2.30%
2.40%
2.50%
2.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 23: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/23.jpg)
23
RSE of Percent Age < 35 by Number of Replicates(Taylor Linearization and Permutations of Replicates)
1.2%
1.8%
2.4%
3.0%
3.6%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 24: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/24.jpg)
24
RSE of Percent Citizen by Number of Replicates
0.15%
0.18%
0.20%
0.23%
0.25%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 25: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/25.jpg)
25
RSE of Percent Citizen by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.10%
0.13%
0.15%
0.18%
0.20%
0.23%
0.25%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 26: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/26.jpg)
26
RSE of Percent Full-time by Number of Replicates
0.0000001%
0.0000002%
0.0000003%
0.0000004%
0.0000005%
0.0000006%
0.0000007%
0.0000008%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 27: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/27.jpg)
27
RSE of Percent Full-time by Number of Replicates
(Taylor Linearization and Permutations of Replicates)
0.00%
0.30%
0.60%
0.90%
1.20%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 28: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/28.jpg)
28
RSE of Percent Master’s by Number of Replicates
0.80%
1.00%
1.20%
1.40%
1.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 29: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/29.jpg)
29
RSE of Percent Master’s by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.80%
1.00%
1.20%
1.40%
1.60%
1.80%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 30: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/30.jpg)
30
RSE of Percent Teaching as Principal Activity by Number of Replicates
0.35%
0.40%
0.45%
0.50%
0.55%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 31: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/31.jpg)
31
RSE of Percent Teaching as Principal Activity by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.25%
0.30%
0.35%
0.40%
0.45%
0.50%
0.55%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 32: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/32.jpg)
32
RSE of Mean Income by Number of Replicates
0.30%
0.35%
0.40%
0.45%
0.50%
0.55%
0.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 33: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/33.jpg)
33
RSE of Mean Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.20%
0.40%
0.60%
0.80%
1.00%
1.20%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 34: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/34.jpg)
34
RSE of Median Income by Number of Replicates
1.00%
1.10%
1.20%
1.30%
1.40%
1.50%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 35: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/35.jpg)
35
RSE of Median Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.10%
0.58%
1.05%
1.53%
2.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 36: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/36.jpg)
36
RSE of Regression InterceptIncome = Hours + Race + Hours Race
0.60%
0.70%
0.80%
0.90%
1.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 37: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/37.jpg)
37
RSE of Regression InterceptIncome = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
0.60%
0.70%
0.80%
0.90%
1.00%
1.10%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 38: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/38.jpg)
38
RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race
6.00%
7.00%
8.00%
9.00%
10.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 39: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/39.jpg)
39
RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
11.0%
12.0%
13.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 40: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/40.jpg)
40
RSE of Regression Slope (Race)Income = Hours + Race + Hours Race
18.0%
20.5%
23.0%
25.5%
28.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 41: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/41.jpg)
41
RSE of Regression Slope (Race)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
15.0%
20.0%
25.0%
30.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 42: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/42.jpg)
42
RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race
30.0%
35.0%
40.0%
45.0%
50.0%
55.0%
60.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 43: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/43.jpg)
43
RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
110.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
![Page 44: Variance Estimation](https://reader031.vdocuments.mx/reader031/viewer/2022012400/568136f1550346895d9e85db/html5/thumbnails/44.jpg)
44
Conclusions(Rough & Interim)
Complex statistics do require more replicates for stable variance estimation
It seems that:
64 replicates might be inadequate
200 replicates seem to be overkill
Somewhere between 100 to 200 replicates might be sufficient