of an experiment in administrative records use a register ...3 what was the purpose of arex 2000?...

Post on 25-Sep-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Merging Administrative Records in the Absence of a Register: Data Quality Concerns and Outcomes of an Experiment in Administrative Records Use

Dean H. Judson

Planning, Research and Evaluation Division

2

Basic Conclusions and Recommendations

• An administrative records census is not currently feasible, but might be for 2020

• Numerous applications of administrative records research for 2010:• Nonresponse Follow-up (NRFU) substitution• Imputation methods improvement• Master Address File (MAF) targeting• Census unduplication confirmation• Large scale data processing and record linkage improvements• Social Security Number (SSN) verification and search• Population estimation, survey improvement

• Four major improvements needed for an Administrative Records Experiment in 2010 (AREX 2010):

• Race and Hispanic origin improvements (underway)• Timeliness of AR data (harder, but doable)• Greater geographic representativeness • Experimental variation on key design dimensions

3

What Was the Purpose of AREX 2000?

• Test the feasibility of an administrative records census• two counties in Maryland

• 1.4M persons in 558,000 households• three counties in Colorado

• 1.2M persons in 459,000 households

• Test two methods for conducting an administrative records census • top-down method• bottom-up method (match to address list,

additional operations)

4

The Foundation of AREX: The Statistical Administrative Records System (StARS)

• Prototype based on 1998 vintage files• Census-like structure and content • Final database comparable Census

2000(Flowchart page A)

5

The Statistical Administrative Records System-1999

TY98 IRS 1040119,946,193

TY98 IRS 1099598,075,971

Medicare56,837,022

Selective Service13,176,234

HUD TRACS3,342,234

Indian Health Service

3,106,821

EditedIRS 1040

243,260,776Edited

IRS 1099Edited

MedicareEdited

Selective Service

EditedHUD TRACS

EditedIndian Health

Service

NUMIDENT676,589,439

CensusNUMIDENT396,185,872

Address Processing795,742,702

Person Characteristics

File (PCF)396,185,872

Hygiene & Unduplication136,154,293

Geocoding102,965,122 (75.6% Coded)33,189,171 (24.4% Uncoded)

Person Processing875,750,973

SSN Validation (PVS)844,945,296 Valid

(96.5%)

Unduplication279,601,038

Remove Deceased/Create

Composite Record257,764,909

Extraction of AREX Test Site Records1,459,760 in Baltimore Site1,229,274 in Colorado Site

InvalidSSNs

30,805,677(3.5%)

RaceModel

GenderModel

MortalityModel

TIGER

Code 1

ABI

? Research

Page A

6

Administrative Source Files (StARS)

• Internal Revenue Service tax files• Medicare enrollment database• Public housing assistance file• Selective Service registration file• Indian Health Service file• Social Security Number master file

(NUMIDENT--lookup file)(Refer to Tables, pages B,C)

7

Creating Final StARS Database

• Select best address and demographics based on• geocodability• currency• quality

• Impute missing demographics• Flag records for deceased people• Final database is like the census

8

Address Processing Results (StARS)

• Almost 800 million addresses at start• About 6 percent identified as potential

businesses• 136 million address records after

unduplication• About 75 percent geocoded

• 85 percent geocoding rate for city-style addresses

9

Person Processing Results (StARS)

• 875 million records at start• 845 million have valid SSN record (96.5%)• 280 million after unduplication by SSN• 261 million after removal of known deceased• 257 million after removal of known deceased

and persons residing in outlying territories

(vs. Census 2000 population of 281 million)

10

Additional Operations of AREX 2000

• Clerical geocoding• Request for physical address (for P.O.

Boxes, Etc.)• Match to Decennial Master Address File• Field address verification

(refer to AREX flowchart, page D)

11

County Results: Total Population

97-15,70491-45,831519,326Jefferson County

99-7,28091-44,642501,533El Paso County

97-5,66085-27,030175,300Douglas County

10211,32891-54,753625,401Baltimore City

99-8,44795-40,469736,652Baltimore County

99-25,76392-212,7252,558,212Total

%A-C Diff%A-C Diff

Bottom-UpTop-DownCensus 2000

Note: A=AREX count; C=Census count

12

County Results: Race and Hispanic Algebraic Percent Error (A-C)/C

Size of minority population impacted ALPE results.

-35%

-25%

-15%

-5%

5%

15%

25%

35%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Bottom-up race and Hispanic results by county

ALP

E

WhiteBlackAIAPIHispanic

13

County Results: AREX Race Imputation

Arex race imputation greater in CO counties, especially blacks.

0%5%

10%15%20%25%

30%35%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Persons with imputed race by county

Perc

en

t o

f p

ers

on

s

Any raceBlacks

Cause: Very high imputation rates

14

Additional County Results

• Bottom-up performed better than top-down• Children undercounted• Aged overcounted• Evidence of imputation problems

15

Tracts and Blocks

• MD site: 404 tracts, 17,041 blocks• CO site: 283 tracts, 22,945 blocks

…• Emphasis on ALPE { (A-C)/C } distributions• 5% criterion: AREX is +/-5% of Census• 25% criterion: AREX is +/-25% of Census

16

Tract results: Total Population

>75% of tracts met 5% criterion, except Baltimore City, 95% met 25% criterion.

0%10%20%30%40%50%60%70%80%90%100%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Tract ALPE distributions by county

Per

cen

t o

f tr

acts

5% criterion25% criterion

ALPEs

Results are more stable in Colorado;Highly variable in Baltimore County and City

~50% of AREX tract counts within 5% of Census in Baltimore City

17

Block Results: Total Population

18-39% of blocks met 5% criterion, 85% met 25% criterion.

0%10%20%30%40%50%60%70%80%90%100%

BaltimoreCounty

BaltimoreCity

DouglasCounty

El PasoCounty

JeffersonCounty

Block ALPE distributions by county

Perc

en

t o

f b

lock

s

5% criterion25% criterion

ALPEs

Results notably less stable at the block level

~25% of AREX block counts within 5% of Census in Baltimore City

18

Household Evaluation Questions

• Can we use administrative records data to contribute to census operations at the household level?

• Analytic questions:• Do AREX addresses (computer) link to different kinds of Census

addresses at varying rates?• Do AREX addresses that (computer) link to Census addresses

contain the same number of people?• Do AREX addresses that link to Census addresses contain similar

demographic distributions (demographically match)?• Can we predict, using non-Census information, when an address

will demographically match?

19

Link Rates

• How often did AREX addresses link to Census addresses?• 81.4% of census addresses linked

• ~ 5% more “imperfectly” linked• higher (84.0%) in occupied• lower (46.4%) in vacant

20

Link Rates and Coverage:By NRFU Status

Coverage by AREX of Census housing units, by NRFU status*

Type of Census housing unit Total

Linked with AREX

housing unitsNRFU 360,914 70.9%non-NRFU 716,450 88.4%Occupied NRFU 289,224 76.7%Occupied non-NRFU 715,115 88.5%Vacant NRFU 71,690 47.6%Vacant non-NRFU 1,335 58.7%

21

Link Rates and Coverage:By Imputation Status

Coverage by AREX of Census housing units, by imputation status

Type of Census housing unit

Total

Linked with

AREX housing units

Imputed 24,584 62.3% Non-imputed 1,067,876 81.9% Imputed occupied 23,811 63.2% Non-imputed, occupied 993,462 84.5% Imputed vacant 773 34.7% Non-imputed, vacant 74,414 46.5%

22

Number of People in Linked Addresses

• Do AREX addresses that link to Census addresses contain the same number of people?• Equal number: 51.1% of all linked addresses• Plus/minus one: 79.4% of all linked addresses

23

Matching Demographic Distributions in Linked Households

• How do the demographic properties of linked households compare?

• Demographic categories:• Sex• Race (4 groups, with Census multirace allocated)• Hispanic origin• Age (5 year categories and 0-17, 18-64, 65+)• ARSH—age, race, sex and Hispanic origin

24

Demographic Distributions:Overall

Comparisons between AREX and Census for demographic groups, for linked households with the same number of people only.

HH Size

Total linked,

of equal size

Equal for all sex groups

Equal for all race groups

Equal for all Hisp. groups

Equal for all

5-year age groups

Equal for age groups

0-17, 18-64, 65+

Equal for all demographic

groups

All sizes 445,426 91.2% 93.4% 94.8% 81.3% 93.1% 80.5%

1 85.4%

2 84.3%

3 72.2%

4 74.0%

5 69.5%

6 59.2%

7+ 28.7%

25

Prediction

• Predicting Where An Arex Household Will Be Similar To A Census Household• Goal: Use only information available prior to

Census MO/MB and NRFU operations to predict which addresses will match accurately

• If we can predict well, then we can use that predictive model to tell us which addresses are good candidates for NRFU substitution or imputation

• Exploratory logistic regression model

26

Prediction:Simple Relationships

Difference

of

proportions

Address contains only persons 65 and older versus demographicmatch/nonmatch status.

All AREX persons age 65 or older? No Yes Total Nonmatch 513,926 33,418 547,344 66.56 28.43 Match 258,150 84,144 342,294 33.44 71.57 Total 772,076 117,562 889,638 86.79 13.21 100

27

Logistic RegressionModel Results

• Estimated odds ratios• Single unit: 2.6• One or two persons in HH: 3.5 • No AREX imputed race: 2.1• AREX one or more white: 2.1 • All AREX 65 and older: 1.7

• Interaction effects:• Total effect of 65+,nonmulti,nonimputed: 5.2• Total effect of 65+,1+white, 1-2 persons: 19.2

28

Conclusions, continued

• Top-down results are not sufficient for enumeration at the block level

• Bottom-up results were better• Tract, block results differential and predictably less accurate• AREX covered the universe of Census HUs well• Comparisons of household size and demographic

composition were relatively promising• AREX and Census household level characteristics were less

similar for NRFU HUs• Open questions: Role of vacant addresses, quality of NRFU data.

29

Recommendations• Continue development now

• Build/train team; build acquisition relationships• Improve race and Hispanic origin data

• Underway—NUMIDENT race enhancement• Continue to update NUMIDENT with ACS

(others?)• Improve timeliness of AR data

• Obtain IRS/other agencies’ data on a flow basis• SSA W-2’s• Birth/death data• Additional data sources

30

Recommendations• Improve geographical representativeness

of AREX 2010• Implement experimental variation on key

processing dimensions

top related