evaulation of the nscrg school sample donsig jang and xiaojing lin third international conference on...
TRANSCRIPT
EVAULATION OF THE NSCRG SCHOOL SAMPLE
EVAULATION OF THE NSCRG SCHOOL SAMPLE
Donsig Jang and Xiaojing Lin
Third International Conference on Establishment Surveys
Montreal, Canada, June 21, 2007
Donsig Jang and Xiaojing Lin
Third International Conference on Establishment Surveys
Montreal, Canada, June 21, 2007
OutlineOutline
Sampling options on repeated establishment surveys
Reasons to keep the same sample in establishment surveys
Issues in keeping the same sample Example: NSRCG school sample Summary Recommendation for 2008 NSRCG School
Sample
Sampling options on repeated establishment surveys
Reasons to keep the same sample in establishment surveys
Issues in keeping the same sample Example: NSRCG school sample Summary Recommendation for 2008 NSRCG School
Sample
Sampling options on repeated establishment surveys
Sampling options on repeated establishment surveys
Keep the same sample over time with supplemental samples for births– Efficient change estimates BUT– Response burden– Inefficient “cross-sectional” estimates
An independent sample in each survey round Sample coordination to maximize overlaps
between samples– Rotation samples (Sigman and Monsour 1995)– Permanent random number technique (Ohlsson 1995, 2001)– Keyfitz procedure (Keyfitz 1951)
Keep the same sample over time with supplemental samples for births– Efficient change estimates BUT– Response burden– Inefficient “cross-sectional” estimates
An independent sample in each survey round Sample coordination to maximize overlaps
between samples– Rotation samples (Sigman and Monsour 1995)– Permanent random number technique (Ohlsson 1995, 2001)– Keyfitz procedure (Keyfitz 1951)
Reasons to keep the same sample in establishment surveys
Reasons to keep the same sample in establishment surveys
Difficulty in identifying point of contact Costly efforts in gaining participation Often requires nontrivial process to gather
information – previous survey participation would help
Difficulty in identifying point of contact Costly efforts in gaining participation Often requires nontrivial process to gather
information – previous survey participation would help
Issues in keeping the same sampleIssues in keeping the same sample
Can they be a representative sample of the current cross-sectional population?– Depending on how dynamic the population is over
timecoverage issues: births vs. deathssample efficiency: distributional changes
Alternatives– Independent sample from the most up-to-date sample
frame– Coordination of samples
E.g., Keyfitz procedure to maximize the sample overlap between the current and the previous ones
Can they be a representative sample of the current cross-sectional population?– Depending on how dynamic the population is over
timecoverage issues: births vs. deathssample efficiency: distributional changes
Alternatives– Independent sample from the most up-to-date sample
frame– Coordination of samples
E.g., Keyfitz procedure to maximize the sample overlap between the current and the previous ones
National Survey of Recent College Graduates (NSRCG)
National Survey of Recent College Graduates (NSRCG)
Repeated every two or three years Collects education, demographic, and
employment information from recent college graduates (bachelor’s and master’s) majoring in science,engineering, and health fields
Two stage sample design– 1st stage: select schools and obtain the list of
graduates from selected schools– 2nd stage: select graduates from the list provided by
schools NSF-sponsored survey
Repeated every two or three years Collects education, demographic, and
employment information from recent college graduates (bachelor’s and master’s) majoring in science,engineering, and health fields
Two stage sample design– 1st stage: select schools and obtain the list of
graduates from selected schools– 2nd stage: select graduates from the list provided by
schools NSF-sponsored survey
NSRCGList collection from schools
NSRCGList collection from schools
Identify point of contact (usually institutional coordinator) Gather the list of graduates with key sampling and
locating information including:– degree award dates – degree level – field of major – race/ethnicity – gender – date of birth – SSN – student ID – mailing addresses including parent’s addresses– phone numbers (land line, cell) – emails, etc.
Identify point of contact (usually institutional coordinator) Gather the list of graduates with key sampling and
locating information including:– degree award dates – degree level – field of major – race/ethnicity – gender – date of birth – SSN – student ID – mailing addresses including parent’s addresses– phone numbers (land line, cell) – emails, etc.
NSRCGList collection from schools (continued)
NSRCGList collection from schools (continued)
Need a good understanding on the information requested and file format
Time consuming and costly efforts– different schools have different issues
A crucial part for the quality of the survey– strive to get almost perfect cooperation rate
(99%)– Out of 300 schools,
only four final refusals in 2003 only five refusals in 2006
Need a good understanding on the information requested and file format
Time consuming and costly efforts– different schools have different issues
A crucial part for the quality of the survey– strive to get almost perfect cooperation rate
(99%)– Out of 300 schools,
only four final refusals in 2003 only five refusals in 2006
NSRCGSchool sample selection
NSRCGSchool sample selection
For 1995, 1997, 1999, 2001 surveys– 275 schools initially selected in 1995 and kept with 5
supplemental samples added over three survey rounds (to account for frame coverage)
A new sample of 300 schools selected in 2003:– To reflect rapid changes of S&E populations in 1990’s– Health field added to the survey as eligible field of
study
For 1995, 1997, 1999, 2001 surveys– 275 schools initially selected in 1995 and kept with 5
supplemental samples added over three survey rounds (to account for frame coverage)
A new sample of 300 schools selected in 2003:– To reflect rapid changes of S&E populations in 1990’s– Health field added to the survey as eligible field of
study
NSRCGSchool sample selection (continued)
NSRCGSchool sample selection (continued)
Probability proportional size (PPS) with composite size measure
Composite size measures calculated to achieve equal weights within each of NSRCG analytic domains constructed by a combination of:– degree year, degree level, field of majors, race/ethnicity, and
gender Population dynamics
– new schools (birth), closed (death), no S&E graduates (temporarily ineligible), etc
Coverage issue– distributions of schools changed (in terms of composite size
measures) potential factor affecting the sample efficiency
Probability proportional size (PPS) with composite size measure
Composite size measures calculated to achieve equal weights within each of NSRCG analytic domains constructed by a combination of:– degree year, degree level, field of majors, race/ethnicity, and
gender Population dynamics
– new schools (birth), closed (death), no S&E graduates (temporarily ineligible), etc
Coverage issue– distributions of schools changed (in terms of composite size
measures) potential factor affecting the sample efficiency
2003 NSRCG school sample2003 NSRCG school sample
In both 2001 and 2003 NSRCG
170 (57%)
Only in 2003 NSRCG
130 (43%)
Total 300
Excessive efforts (time and resources) to Excessive efforts (time and resources) to achieve 99% of RR (4 schools refused)achieve 99% of RR (4 schools refused)
Distribution of list submission dates in 2003 NSRCG
Distribution of list submission dates in 2003 NSRCG
0 30 60 90 120 150 180 210 240
0.000
0.005
0.010
0.015
Both in 01 and 03Only in 2003
Days
School sample after 2003 NSRCG – 2006 NSRCG
School sample after 2003 NSRCG – 2006 NSRCG
Frame evaluationFrame evaluation
AY2001 AY2002 AY2003 AY2004 AY2005
In 2003 frame but not in 2006 frame 48 0 3,077 1,092 0 0 23
In both 2003 and 2006 frames 1,762 300 624,297 639,411 671,868 702,021 722,727
Not in 2003 frame but in 2006 frame 190 0 0 570 4,369 7,396 6,819
Frame School Count
Sample School Count
Graduate count
2003 Frame based on AY2001 IPEDS counts
2006 Frame based on AY2003 and AY2004 IPEDS counts
Graduate counts dropped from and added to the population
Graduate counts dropped from and added to the population
Count Percentage Count PercentageBachelor 3476 0.35 5252 0.49Master 5380 1.97 1775 0.60Non-Hispanic White 5461 0.66 4109 0.48Asian,Pacific Islander,Nonresident 2643 1.04 1610 0.54Hispanic,Black,American Indian 752 0.40 1308 0.63Male 3949 0.71 3967 0.65Female 4907 0.69 3060 0.41
Eligible in 2003 but Ineligible in 2006
Newly Eligible for 2006 NSRCGDomain
Degree Level
Race/Ethnicity
Gender
Graduate counts dropped from and added to the population
Graduate counts dropped from and added to the population
Count Percentage Count PercentageChemistry 15 0.06 109 0.46Physics/Astronomy 0 0.00 0 0.00Other Physical Sciences 30 0.23 44 0.34Mathematics/Statistics 36 0.10 120 0.30Computer Sciences 687 0.56 2,329 1.53Environmental, Geologicaland Agricultural Sciences 585 1.81 69 0.27Aerospace Engineering 0 0.00 33 0.55Chemical Engineering 0 0.00 0 0.00Civil Engineering 1 0.00 86 0.34Electrical Engineering 242 0.43 723 1.06Industrial Engineering 0 0.00 0 0.00Mechanical Engineering 0 0.00 112 0.31Other Engineering 266 0.87 171 0.51Biological Sciences 1,165 0.80 273 0.18Psychology 182 0.09 1,079 0.52Economics 39 0.08 129 0.21Sociology/Anthropology 57 0.07 76 0.08Other Social Sciences 321 0.53 398 0.59Political Science 145 0.17 251 0.24Health-Related - Nursing 163 0.16 741 0.71Health-Related – all else 4,922 3.62 284 0.23
Eligible in 2003 but Ineligible in 2006
Newly Eligible for 2006 NSRCGField of Major
2006 NSRCG School Sample2006 NSRCG School Sample
No significant change of the population
– Kept the same school sample without any supplemental sample
No significant change of the population
– Kept the same school sample without any supplemental sample
Distribution of list submission dates in 2006 NSRCG
Distribution of list submission dates in 2006 NSRCG
20 50 80 110 140 170 200 230
0.000
0.005
0.010
0.015
0.020
Both in 2001 and 2006Only in 2006
Days
2008 NSRCG ?2008 NSRCG ?
Evaluate the current sampling strategy (keeping the same sample) by doing– frame evaluation– comparisons with other sampling schemes
Independent PPSKeyfitz procedure
Evaluate the current sampling strategy (keeping the same sample) by doing– frame evaluation– comparisons with other sampling schemes
Independent PPSKeyfitz procedure
2008 NSRCG2008 NSRCG
Frame evaluationFrame evaluation
AY2001 AY2002 AY2006
In 2003 frame but not in 2008 frame 78 2 4,643 2,584 0
In both 2003 and 2008 frames 1,732 298 622,731 637,919 744,070
Not in 2003 frame but in 2008 frame 294 0 0 494 11,755
Frame School Count
Sample School Count
Graduate count
2003 Frame based on AY2001 IPEDS counts
2008 Frame based on AY2006 IPEDS counts
Graduate counts dropped from and added to the population
Graduate counts dropped from and added to the population
Sample EvaluationSample Evaluation
Three sample selection methods considered– Keep the 2003 school sample with a
supplemental sample of size 4– Independent PPS with composite size
measures based on updated frame information
– Keyfitz procedure
Three sample selection methods considered– Keep the 2003 school sample with a
supplemental sample of size 4– Independent PPS with composite size
measures based on updated frame information
– Keyfitz procedure
PPS sample selection procedure PPS sample selection procedure
di idd
d
mS M
MDefine Size Measure:
where md is a sample size of domain d,
Md is the population size of domain d
Mid is the population size of domain d in school i
domain d is constructed from a combination of: graduate year, degree level, field of major, race/ethnicity, and gender
PPS sample selection procedurePPS sample selection procedure
School i selected with probability (pi) proportional to size Si Achieve equal weight within each domain d Distributional changes of the NSRCG graduate populations
would cause unequal weight variations within domains Independent PPS with up-to-date frame data is desirable if
weight variation is severe
School i selected with probability (pi) proportional to size Si Achieve equal weight within each domain d Distributional changes of the NSRCG graduate populations
would cause unequal weight variations within domains Independent PPS with up-to-date frame data is desirable if
weight variation is severe
Keyfitz procedureKeyfitz procedure
Maximize the overlap between two samples The first sample (2003 NSRCG) was selected
with PPS The second sample inclusion probability is
dependent upon: – updated size measures – the first sample inclusion probability– the actual sample realization in the first
sample
Maximize the overlap between two samples The first sample (2003 NSRCG) was selected
with PPS The second sample inclusion probability is
dependent upon: – updated size measures – the first sample inclusion probability– the actual sample realization in the first
sample
Simulation of sampling procedures Simulation of sampling procedures
Generate 1000 school “independent” samples for each of the following options– Keep the same school sample with a
supplemental sample of size 4 from the newly eligible schools (“births”)
– Independent PPS sampling using MOS calculated from 2008 NSRCG frame
– Keyfitz procedure
Generate 1000 school “independent” samples for each of the following options– Keep the same school sample with a
supplemental sample of size 4 from the newly eligible schools (“births”)
– Independent PPS sampling using MOS calculated from 2008 NSRCG frame
– Keyfitz procedure
SummarySummary
Keeping the same sample is a cost effective option Concern about statistical inefficiency due to the nature
of dynamic population Frame coverage corrected by supplemental sample Evaluate the NSRCG school sample
– Empirical frame evaluation– Samples simulated based on two methods
Distribution changes (in terms of composite size measure) would make the final sample inefficient:– Weight variation within planned domains– Over or under estimation of graduates in some domains
Keeping the same sample is a cost effective option Concern about statistical inefficiency due to the nature
of dynamic population Frame coverage corrected by supplemental sample Evaluate the NSRCG school sample
– Empirical frame evaluation– Samples simulated based on two methods
Distribution changes (in terms of composite size measure) would make the final sample inefficient:– Weight variation within planned domains– Over or under estimation of graduates in some domains
RecommendationRecommendation
Keep the same school sample with supplemental sample of size 4 for 2008 NSRCG
Keep the same school sample with supplemental sample of size 4 for 2008 NSRCG