![Page 1: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/1.jpg)
Sample Design on Historical Census Projects at the University of Minnesota
Ron Goeken
![Page 2: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/2.jpg)
Completed historical samples
Census Year Target Population Sample Density Number of Person Records
1850 U.S. Free Population 1-in-100 198,000
1860 U.S. Free Population 1-in-100 274,000
1870 U. S. Population 1-in-100 383,000
1880 U. S. Population 1-in-100 503,000
1900 U. S. Population 1-in-100 755,000
1910 U. S. Population 1-in-100 917,000
1920 U. S. Population 1-in-100 1,050,000
![Page 3: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/3.jpg)
![Page 4: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/4.jpg)
Sample design issues
• Goal is to sample entire households (or dwellings)
• Every individual has an equal probability of being sampled
• Practicality
![Page 5: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/5.jpg)
Basic sample design
• Every household is defined as a cluster
• Samples are also stratified
• Only include in sample if the first person in a household is on a sample line.
• Probability of selection = np(1/n) = p
3 person household: 3*.01(1/3) = .01
8 person household: 8*.01(1/8) = .01
![Page 6: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/6.jpg)
Group Quarters
• Not practical to sample large institutions in their entirety
• A better approach is to apply individual level sampling when household size exceeds a predetermined threshold.
![Page 7: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/7.jpg)
Sampling rules – dwellings/households
• 1. If the dwelling contains 30 or fewer residents: – a) accept the entire dwelling if the sample point falls on the
first listed individual in the dwelling. – b) reject the entire dwelling if the sample point falls on any
other dwelling resident.
• 2. If the dwelling contains 31 or more residents and the household contains 30 or fewer persons: – a) accept the entire household if the sample point falls on
the household head. – b) reject the entire household if the sample point falls on any
other household member.
![Page 8: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/8.jpg)
Sampling rules – group quarters
• 3. If the household contains 31 or more persons : – accept individuals on sample lines.
![Page 9: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/9.jpg)
Target and Actual Sample Densities for Completed Historical Samples
Census Year Number of Person Records
Target Sample Density
Actual Sample Density
1850 198,000 1 % 0.989 %
1860 274,000 1 % 0.997 %
1870 383,000 1 % 0.998 %
1880 503,000 1 % 1.003 %
1900 755,000 1 % 0.993 %
1910 917,000 1 % 0.994 %
1920 1,050,000 1 % 0.992 %
![Page 10: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/10.jpg)
Sample Confidence Interval
• Estimating the number of sample clusters in the total population
• # of sampled person records = # of sample clusters
Total Population Total # of clusters
![Page 11: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/11.jpg)
Source of under-sampling
• Some enumerator manuscripts were never microfilmed
• Data entry error
• Processing procedures can lead to deleting records, but rarely adding records
• Ambiguity on enumerator manuscripts
![Page 12: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/12.jpg)
![Page 13: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/13.jpg)
Percent of Target Records by Size of Place and Census Year
Population category 1880 1900 1920
100K+ 99.0 98.9 98.7
50K – 99999 97.3 96.3 100.4
25K – 49999 100.5 97.8 99.0
10K – 24999 101.4 98.6 97.5
5K – 9999 93.7 100.1 99.6
2.5K – 4999 97.6 99.3 99.3
Under 2.5K 100.0 98.7 99.6
![Page 14: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/14.jpg)
Assigning Weights
• Each sampled individual represents X number of individuals in the total population.
• We have typically assigned weights at the national level.
![Page 15: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/15.jpg)
County Level or SEA Level Weights
• 1. Weight at the county level if county population exceeds 10,000 and:– A. all other counties in the SEA have populations exceeding
10,000, or– B. the combined populations of the counties with populations
under 10,000 is 10,000 or more.
2. If conditions above are not true, then weight at the SEA level.
![Page 16: Sample Design on Historical Census Projects at the University of Minnesota Ron Goeken](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d7e5503460f94a61cda/html5/thumbnails/16.jpg)
Conclusion
• Sample designs are fairly straightforward in theory, but source materials and procedures result in under-sampling bias
• Detailed weights based on county populations or SEA populations theoretically improve precision