of an experiment in administrative records use a register ...3 what was the purpose of arex 2000?...

30
1 Merging Administrative Records in the Absence of a Register: Data Quality Concerns and Outcomes of an Experiment in Administrative Records Use Dean H. Judson Planning, Research and Evaluation Division

Upload: others

Post on 25-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

1

Merging Administrative Records in the Absence of a Register: Data Quality Concerns and Outcomes of an Experiment in Administrative Records Use

Dean H. Judson

Planning, Research and Evaluation Division

Page 2: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

2

Basic Conclusions and Recommendations

• An administrative records census is not currently feasible, but might be for 2020

• Numerous applications of administrative records research for 2010:• Nonresponse Follow-up (NRFU) substitution• Imputation methods improvement• Master Address File (MAF) targeting• Census unduplication confirmation• Large scale data processing and record linkage improvements• Social Security Number (SSN) verification and search• Population estimation, survey improvement

• Four major improvements needed for an Administrative Records Experiment in 2010 (AREX 2010):

• Race and Hispanic origin improvements (underway)• Timeliness of AR data (harder, but doable)• Greater geographic representativeness • Experimental variation on key design dimensions

Page 3: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

3

What Was the Purpose of AREX 2000?

• Test the feasibility of an administrative records census• two counties in Maryland

• 1.4M persons in 558,000 households• three counties in Colorado

• 1.2M persons in 459,000 households

• Test two methods for conducting an administrative records census • top-down method• bottom-up method (match to address list,

additional operations)

Page 4: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

4

The Foundation of AREX: The Statistical Administrative Records System (StARS)

• Prototype based on 1998 vintage files• Census-like structure and content • Final database comparable Census

2000(Flowchart page A)

Page 5: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

5

The Statistical Administrative Records System-1999

TY98 IRS 1040119,946,193

TY98 IRS 1099598,075,971

Medicare56,837,022

Selective Service13,176,234

HUD TRACS3,342,234

Indian Health Service

3,106,821

EditedIRS 1040

243,260,776Edited

IRS 1099Edited

MedicareEdited

Selective Service

EditedHUD TRACS

EditedIndian Health

Service

NUMIDENT676,589,439

CensusNUMIDENT396,185,872

Address Processing795,742,702

Person Characteristics

File (PCF)396,185,872

Hygiene & Unduplication136,154,293

Geocoding102,965,122 (75.6% Coded)33,189,171 (24.4% Uncoded)

Person Processing875,750,973

SSN Validation (PVS)844,945,296 Valid

(96.5%)

Unduplication279,601,038

Remove Deceased/Create

Composite Record257,764,909

Extraction of AREX Test Site Records1,459,760 in Baltimore Site1,229,274 in Colorado Site

InvalidSSNs

30,805,677(3.5%)

RaceModel

GenderModel

MortalityModel

TIGER

Code 1

ABI

? Research

Page A

Page 6: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

6

Administrative Source Files (StARS)

• Internal Revenue Service tax files• Medicare enrollment database• Public housing assistance file• Selective Service registration file• Indian Health Service file• Social Security Number master file

(NUMIDENT--lookup file)(Refer to Tables, pages B,C)

Page 7: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

7

Creating Final StARS Database

• Select best address and demographics based on• geocodability• currency• quality

• Impute missing demographics• Flag records for deceased people• Final database is like the census

Page 8: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

8

Address Processing Results (StARS)

• Almost 800 million addresses at start• About 6 percent identified as potential

businesses• 136 million address records after

unduplication• About 75 percent geocoded

• 85 percent geocoding rate for city-style addresses

Page 9: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

9

Person Processing Results (StARS)

• 875 million records at start• 845 million have valid SSN record (96.5%)• 280 million after unduplication by SSN• 261 million after removal of known deceased• 257 million after removal of known deceased

and persons residing in outlying territories

(vs. Census 2000 population of 281 million)

Page 10: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

10

Additional Operations of AREX 2000

• Clerical geocoding• Request for physical address (for P.O.

Boxes, Etc.)• Match to Decennial Master Address File• Field address verification

(refer to AREX flowchart, page D)

Page 11: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

11

County Results: Total Population

97-15,70491-45,831519,326Jefferson County

99-7,28091-44,642501,533El Paso County

97-5,66085-27,030175,300Douglas County

10211,32891-54,753625,401Baltimore City

99-8,44795-40,469736,652Baltimore County

99-25,76392-212,7252,558,212Total

%A-C Diff%A-C Diff

Bottom-UpTop-DownCensus 2000

Note: A=AREX count; C=Census count

Page 12: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

12

County Results: Race and Hispanic Algebraic Percent Error (A-C)/C

Size of minority population impacted ALPE results.

-35%

-25%

-15%

-5%

5%

15%

25%

35%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Bottom-up race and Hispanic results by county

ALP

E

WhiteBlackAIAPIHispanic

Page 13: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

13

County Results: AREX Race Imputation

Arex race imputation greater in CO counties, especially blacks.

0%5%

10%15%20%25%

30%35%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Persons with imputed race by county

Perc

en

t o

f p

ers

on

s

Any raceBlacks

Cause: Very high imputation rates

Page 14: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

14

Additional County Results

• Bottom-up performed better than top-down• Children undercounted• Aged overcounted• Evidence of imputation problems

Page 15: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

15

Tracts and Blocks

• MD site: 404 tracts, 17,041 blocks• CO site: 283 tracts, 22,945 blocks

…• Emphasis on ALPE { (A-C)/C } distributions• 5% criterion: AREX is +/-5% of Census• 25% criterion: AREX is +/-25% of Census

Page 16: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

16

Tract results: Total Population

>75% of tracts met 5% criterion, except Baltimore City, 95% met 25% criterion.

0%10%20%30%40%50%60%70%80%90%100%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Tract ALPE distributions by county

Per

cen

t o

f tr

acts

5% criterion25% criterion

ALPEs

Results are more stable in Colorado;Highly variable in Baltimore County and City

~50% of AREX tract counts within 5% of Census in Baltimore City

Page 17: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

17

Block Results: Total Population

18-39% of blocks met 5% criterion, 85% met 25% criterion.

0%10%20%30%40%50%60%70%80%90%100%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Block ALPE distributions by county

Perc

en

t o

f b

lock

s

5% criterion25% criterion

ALPEs

Results notably less stable at the block level

~25% of AREX block counts within 5% of Census in Baltimore City

Page 18: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

18

Household Evaluation Questions

• Can we use administrative records data to contribute to census operations at the household level?

• Analytic questions:• Do AREX addresses (computer) link to different kinds of Census

addresses at varying rates?• Do AREX addresses that (computer) link to Census addresses

contain the same number of people?• Do AREX addresses that link to Census addresses contain similar

demographic distributions (demographically match)?• Can we predict, using non-Census information, when an address

will demographically match?

Page 19: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

19

Link Rates

• How often did AREX addresses link to Census addresses?• 81.4% of census addresses linked

• ~ 5% more “imperfectly” linked• higher (84.0%) in occupied• lower (46.4%) in vacant

Page 20: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

20

Link Rates and Coverage:By NRFU Status

Coverage by AREX of Census housing units, by NRFU status*

Type of Census housing unit Total

Linked with AREX

housing unitsNRFU 360,914 70.9%non-NRFU 716,450 88.4%Occupied NRFU 289,224 76.7%Occupied non-NRFU 715,115 88.5%Vacant NRFU 71,690 47.6%Vacant non-NRFU 1,335 58.7%

Page 21: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

21

Link Rates and Coverage:By Imputation Status

Coverage by AREX of Census housing units, by imputation status

Type of Census housing unit

Total

Linked with

AREX housing units

Imputed 24,584 62.3% Non-imputed 1,067,876 81.9% Imputed occupied 23,811 63.2% Non-imputed, occupied 993,462 84.5% Imputed vacant 773 34.7% Non-imputed, vacant 74,414 46.5%

Page 22: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

22

Number of People in Linked Addresses

• Do AREX addresses that link to Census addresses contain the same number of people?• Equal number: 51.1% of all linked addresses• Plus/minus one: 79.4% of all linked addresses

Page 23: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

23

Matching Demographic Distributions in Linked Households

• How do the demographic properties of linked households compare?

• Demographic categories:• Sex• Race (4 groups, with Census multirace allocated)• Hispanic origin• Age (5 year categories and 0-17, 18-64, 65+)• ARSH—age, race, sex and Hispanic origin

Page 24: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

24

Demographic Distributions:Overall

Comparisons between AREX and Census for demographic groups, for linked households with the same number of people only.

HH Size

Total linked,

of equal size

Equal for all sex groups

Equal for all race groups

Equal for all Hisp. groups

Equal for all

5-year age groups

Equal for age groups

0-17, 18-64, 65+

Equal for all demographic

groups

All sizes 445,426 91.2% 93.4% 94.8% 81.3% 93.1% 80.5%

1 85.4%

2 84.3%

3 72.2%

4 74.0%

5 69.5%

6 59.2%

7+ 28.7%

Page 25: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

25

Prediction

• Predicting Where An Arex Household Will Be Similar To A Census Household• Goal: Use only information available prior to

Census MO/MB and NRFU operations to predict which addresses will match accurately

• If we can predict well, then we can use that predictive model to tell us which addresses are good candidates for NRFU substitution or imputation

• Exploratory logistic regression model

Page 26: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

26

Prediction:Simple Relationships

Difference

of

proportions

Address contains only persons 65 and older versus demographicmatch/nonmatch status.

All AREX persons age 65 or older? No Yes Total Nonmatch 513,926 33,418 547,344 66.56 28.43 Match 258,150 84,144 342,294 33.44 71.57 Total 772,076 117,562 889,638 86.79 13.21 100

Page 27: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

27

Logistic RegressionModel Results

• Estimated odds ratios• Single unit: 2.6• One or two persons in HH: 3.5 • No AREX imputed race: 2.1• AREX one or more white: 2.1 • All AREX 65 and older: 1.7

• Interaction effects:• Total effect of 65+,nonmulti,nonimputed: 5.2• Total effect of 65+,1+white, 1-2 persons: 19.2

Page 28: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

28

Conclusions, continued

• Top-down results are not sufficient for enumeration at the block level

• Bottom-up results were better• Tract, block results differential and predictably less accurate• AREX covered the universe of Census HUs well• Comparisons of household size and demographic

composition were relatively promising• AREX and Census household level characteristics were less

similar for NRFU HUs• Open questions: Role of vacant addresses, quality of NRFU data.

Page 29: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

29

Recommendations• Continue development now

• Build/train team; build acquisition relationships• Improve race and Hispanic origin data

• Underway—NUMIDENT race enhancement• Continue to update NUMIDENT with ACS

(others?)• Improve timeliness of AR data

• Obtain IRS/other agencies’ data on a flow basis• SSA W-2’s• Birth/death data• Additional data sources

Page 30: of an Experiment in Administrative Records Use a Register ...3 What Was the Purpose of AREX 2000? • Test the feasibility of an administrative records census • two counties in Maryland

30

Recommendations• Improve geographical representativeness

of AREX 2010• Implement experimental variation on key

processing dimensions