combining estimates from related surveys via bivariate modelsfebruary 26, 2016 bell & franco...

28
Combining Estimates from Related Surveys via Bivariate Models (Application: using ACS estimates to improve estimates from smaller U.S. surveys) William R. Bell and Carolina Franco, U.S. Census Bureau 2016 Ross-Royall Symposium February 26, 2016 Bell & Franco () Combining estimates from related surveys February 26, 2016 1 / 17

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Combining Estimates from Related Surveysvia Bivariate Models

(Application: using ACS estimates to improve estimatesfrom smaller U.S. surveys)

William R. Bell and Carolina Franco, U.S. Census Bureau

2016 Ross-Royall Symposium

February 26, 2016

Bell & Franco () Combining estimates from related surveys February 26, 2016 1 / 17

Page 2: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Disclaimer:

This report is released to inform interested parties of ongoing research andto encourage discussion. The views expressed on statistical,methodological, technical, or operational issues are those of the author(s)and not necessarily those of the U.S. Census Bureau.

Bell & Franco () Combining estimates from related surveys February 26, 2016 2 / 17

Page 3: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Introduction

Investigate the potential of using bivariate models to borrow strengthfrom estimates from a large survey to improve related estimates fromsmaller surveys.

Motivation: �Large survey� is the Census Bureau�s AmericanCommunity Survey (ACS), the largest U.S. household survey.

Approach is simple and requires no covariates from auxiliaryinformation.

Real examples show that large reductions in standard errors ofestimates are possible.

Bell & Franco () Combining estimates from related surveys February 26, 2016 3 / 17

Page 4: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

ACS: The Largest U.S. Household Survey

American Community Survey (ACS)

Conducted annually (data collected throughout the year) and hasreplaced the decennial census long form sample.

Samples approximately 3.5 million addresses each year.

Encompasses a broad range of topics: demographic, income, healthinsurance, employment, disabilities, occupations, housing, education,veteran status, etc.

Produces estimates annually based on 1 or 5 years of data.

Bell & Franco () Combining estimates from related surveys February 26, 2016 4 / 17

Page 5: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Three Smaller U.S. Surveys

Survey of Income and Program Participation (SIPP) DisabilityModule

Approx. 37,000 households and 70,000 persons in 2008 panel.Detailed questions about many di¤erent aspects of disability.

National Health Interview Survey (NHIS)

About 110,000 persons in Family Core component, 2013.Questions about a broad range of health topics asked in personalhousehold interviews.Estimates used to track health status, health care access, and progresstoward achieving national health objectives

Current Population Survey (CPS) Annual Social and EconomicSupplement.

Samples about 100,000 addresses.Provides o¢ cial national estimates of income and poverty.

Bell & Franco () Combining estimates from related surveys February 26, 2016 5 / 17

Page 6: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Four Applications

1 SIPP estimates of U.S. state disability rates.ACS variable: Estimate of state disability rates (types of disabilitiesand the time frames di¤er from SIPP).

2 NHIS estimates of U.S. state uninsured rates.ACS variable: Estimate of U.S. state uninsured rates (questions askedand the mode of survey delivery and design di¤er from NHIS).

3 CPS estimates of per capita expenditure on health insurancepremiums by stateACS variable: Estimated per capita income by state.

4 ACS 1-yr estimates (of anything! Take county rates of childrenin poverty to illustrate)2nd variable: Corresponding previous ACS 5-yr estimates (largersample size, but less current).

Bell & Franco () Combining estimates from related surveys February 26, 2016 6 / 17

Page 7: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Univariate Gaussian Shrinkage Model for Survey Estimates

For m small areas:

yi = Yi + ei i = 1, . . . ,mYi = µ+ ui

yi is the direct survey estimate of Yi , the population characteristic ofinterest for area i .

ei is the sampling error in yi , generally assumed to be N(0, vi ),independent with vi known.

ui is the area i random e¤ect, usually assumed to be i.i.d. N(0, σ2u)and independent of the ei .

Bell & Franco () Combining estimates from related surveys February 26, 2016 7 / 17

Page 8: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Shrinkage Estimation (Stein 1956, Carter and Rolph 1974)

Best linear predictor of Yi (µ and σ2 known):

Yi = (1� γi )yi + γiµ

whereγi =

vivi + σ2u

Weighted average Yi �shrinks� the direct estimate yi towards theoverall mean µ.

The smaller is the sampling variance vi the more weight is placed onthe direct survey estimate yi .

Parameters unknown: estimate by ML or REML, or take Bayesianapproach.

Fay and Herriot (1979) extended the approach to shrink yi towards aregression mean µi = x

0i β, and applied this approach to small area

estimation.Bell & Franco () Combining estimates from related surveys February 26, 2016 8 / 17

Page 9: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Bivariate Gaussian Model

y1i = Y1i + e1i = (µ1 + u1i ) + e1i , i = 1, . . . ,m.y2i = Y2i + e2i = (µ2 + u2i ) + e2i�

u1iu2i

�i .i .d� N(0,Σ), Σ =

�σ11 σ12σ12 σ22

��e1ie2i

�i .i .d� N(0,Vi ), Vi =

�v11 00 σ22

�y1i is the direct estimate of the quantity of interest Y1i , and y2i is thedirect estimate from another survey of a related quantity Y2i .

Note that Vi assumes the sampling errors e1i and e2i areuncorrelated. This can be generalized.

The alternative of simply including y2i as a regression covariate in themodel would ignore their sampling errors!

Bell & Franco () Combining estimates from related surveys February 26, 2016 9 / 17

Page 10: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Estimation/Inference for Model Parameters

Unknown parameters: µ1, µ2, σ11, σ22, and σ12 or ρ = σ12/p

σ11σ22.

Sampling variances v1i and v2i are treated as known (really estimatedusing survey microdata).

Can estimate unknown parameters by ML or REML.

We shall use a Bayesian approach with �at priors onµ1, µ2, σ11 > 0, σ22 > 0 and ρ 2 (�1, 1).

Approach was implemented in JAGS.

Bell & Franco () Combining estimates from related surveys February 26, 2016 10 / 17

Page 11: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Prediction When Model Parameters are Known

In matrix notation yi = Yi + ei = (µ+ ui ) + ei

YBPi = E (Yi jyi ) = µ+ Σ(Σ+Vi )�1(yi � µ)

MSE (YBPi ) = Var(Yi jyi ) = Σ� Σ(Σ+Vi )�1Σ

We are interested in predicting Y1i only, not Y2i

Y BP1i is a linear combination of µ1, (y1i � µ1), and (y2i � µ2).

Bell & Franco () Combining estimates from related surveys February 26, 2016 11 / 17

Page 12: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

MSE % Reductions from Shrinkage Estimation

direct estimation to univariate shrinkage:

100��1� Var(Y1i jy1i )

v1i

�(more reduction as v1i increases)

univariate to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

Var(Y1i jy1i )

�(more reduction as v2i decreases and as ρ increases)

direct estimation to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

v1i

Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

Page 13: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

MSE % Reductions from Shrinkage Estimation

direct estimation to univariate shrinkage:

100��1� Var(Y1i jy1i )

v1i

�(more reduction as v1i increases)

univariate to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

Var(Y1i jy1i )

�(more reduction as v2i decreases and as ρ increases)

direct estimation to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

v1i

Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

Page 14: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

MSE % Reductions from Shrinkage Estimation

direct estimation to univariate shrinkage:

100��1� Var(Y1i jy1i )

v1i

�(more reduction as v1i increases)

univariate to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

Var(Y1i jy1i )

�(more reduction as v2i decreases and as ρ increases)

direct estimation to bivariate shrinkage:

100��1� Var(Y1i jy1i , y2i )

v1i

Bell & Franco () Combining estimates from related surveys February 26, 2016 12 / 17

Page 15: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application I: 2010 Disability Rates for U.S. States: SIPPborrowing from ACS

y1i = SIPP disability estimate, y2i = ACS disability estimate

Smoothing of SIPP direct sampling variance estimates is applied.

ρ= .82

Univariate shrinkage yields an MSE decrease of 2%� 67% fromdirect, with a median of 19%

The MSE decrease from bivariate vs. univariate model is 6%� 59%with a median of 29%

The MSE decrease from bivariate vs. direct is 8� 86%, with amedian decrease of 43%

Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

Page 16: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application I: 2010 Disability Rates for U.S. States: SIPPborrowing from ACS

y1i = SIPP disability estimate, y2i = ACS disability estimate

Smoothing of SIPP direct sampling variance estimates is applied.

ρ= .82

Univariate shrinkage yields an MSE decrease of 2%� 67% fromdirect, with a median of 19%

The MSE decrease from bivariate vs. univariate model is 6%� 59%with a median of 29%

The MSE decrease from bivariate vs. direct is 8� 86%, with amedian decrease of 43%

Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

Page 17: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application I: 2010 Disability Rates for U.S. States: SIPPborrowing from ACS

y1i = SIPP disability estimate, y2i = ACS disability estimate

Smoothing of SIPP direct sampling variance estimates is applied.

ρ= .82

Univariate shrinkage yields an MSE decrease of 2%� 67% fromdirect, with a median of 19%

The MSE decrease from bivariate vs. univariate model is 6%� 59%with a median of 29%

The MSE decrease from bivariate vs. direct is 8� 86%, with amedian decrease of 43%

Bell & Franco () Combining estimates from related surveys February 26, 2016 13 / 17

Page 18: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

●●

●●

●●

●●

● ●

●● ●

0.10 0.15 0.20 0.25 0.30

0.10

0.15

0.20

0.25

0.30

Rate Estimates

Direct estimate

Biv

aria

te m

odel

pre

dict

ion

●●

●●

●●

●●●

●●

0.0000 0.0010 0.0020 0.0030

020

4060

8010

0

MSE % Improvement from Bivariate

Variance of direct estimate

Per

cent

Disability Rates for U.S. States, 2014Bivariate model for SIPP and ACS estimates

Page 19: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application II: 2013 Health Insurance Coverage Rates forU.S. States: NHIS Borrowing from ACS

y1i = NHIS estimate of health insurance coverage (from National Centerfor Health Statistics)y2i = ACS estimate of health insurance coverage

Estimates published for only 43 states �due to considerations of samplesize and precision.�

ρ= .96

MSE decrease UNI vs. Direct: 1%� 16%, median = 10%

MSE decrease BIV vs. UNI: 16%� 67%, median = 54%

MSE decrease BIV vs. Direct: 19� 72%, median = 60%!

Using bivariate model might allow publication of estimates for states thatwould otherwise be excluded (?)

Bell & Franco () Combining estimates from related surveys February 26, 2016 14 / 17

Page 20: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application II: 2013 Health Insurance Coverage Rates forU.S. States: NHIS Borrowing from ACS

y1i = NHIS estimate of health insurance coverage (from National Centerfor Health Statistics)y2i = ACS estimate of health insurance coverage

Estimates published for only 43 states �due to considerations of samplesize and precision.�

ρ= .96

MSE decrease UNI vs. Direct: 1%� 16%, median = 10%

MSE decrease BIV vs. UNI: 16%� 67%, median = 54%

MSE decrease BIV vs. Direct: 19� 72%, median = 60%!

Using bivariate model might allow publication of estimates for states thatwould otherwise be excluded (?)

Bell & Franco () Combining estimates from related surveys February 26, 2016 14 / 17

Page 21: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application II: 2013 Health Insurance Coverage Rates forU.S. States: NHIS Borrowing from ACS

y1i = NHIS estimate of health insurance coverage (from National Centerfor Health Statistics)y2i = ACS estimate of health insurance coverage

Estimates published for only 43 states �due to considerations of samplesize and precision.�

ρ= .96

MSE decrease UNI vs. Direct: 1%� 16%, median = 10%

MSE decrease BIV vs. UNI: 16%� 67%, median = 54%

MSE decrease BIV vs. Direct: 19� 72%, median = 60%!

Using bivariate model might allow publication of estimates for states thatwould otherwise be excluded (?)

Bell & Franco () Combining estimates from related surveys February 26, 2016 14 / 17

Page 22: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application III: 2012 Per Capita Expenditures for HealthInsurance for U.S. States: CPS Borrowing from ACS

y1i = CPS estimated per capita expenditure on health insurancepremimumsy2i = ACS per capita income estimate

ρ= .65

MSE decrease UNI vs. Direct: 1%� 55%, median = 8%

MSE decrease BIV vs. UNI: �1.5%� 28%, median = 6%

MSE decrease BIV vs. Direct: 2%� 68%, median = 14%

More modest decreases overall, presumably because ρ and v1i/σ11 arelower than in the previous examples.

Bell & Franco () Combining estimates from related surveys February 26, 2016 15 / 17

Page 23: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application III: 2012 Per Capita Expenditures for HealthInsurance for U.S. States: CPS Borrowing from ACS

y1i = CPS estimated per capita expenditure on health insurancepremimumsy2i = ACS per capita income estimate

ρ= .65

MSE decrease UNI vs. Direct: 1%� 55%, median = 8%

MSE decrease BIV vs. UNI: �1.5%� 28%, median = 6%

MSE decrease BIV vs. Direct: 2%� 68%, median = 14%

More modest decreases overall, presumably because ρ and v1i/σ11 arelower than in the previous examples.

Bell & Franco () Combining estimates from related surveys February 26, 2016 15 / 17

Page 24: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application III: 2012 Per Capita Expenditures for HealthInsurance for U.S. States: CPS Borrowing from ACS

y1i = CPS estimated per capita expenditure on health insurancepremimumsy2i = ACS per capita income estimate

ρ= .65

MSE decrease UNI vs. Direct: 1%� 55%, median = 8%

MSE decrease BIV vs. UNI: �1.5%� 28%, median = 6%

MSE decrease BIV vs. Direct: 2%� 68%, median = 14%

More modest decreases overall, presumably because ρ and v1i/σ11 arelower than in the previous examples.

Bell & Franco () Combining estimates from related surveys February 26, 2016 15 / 17

Page 25: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application IV: ACS 1-yr County Poverty EstimatesBorrow from Previous ACS 5-yr County Poverty Estimates

y1i = 2012 ACS estimated county rates of children in poverty

y2i = 2007-2011 ACS estimated county child poverty rates

Note: Good covariates are available for modeling, but are not used here.

ρ = .94

MSE decrease UNI vs. Direct: 0.4%� 87%, median = 32%

MSE decrease BIV vs. UNI: 4%� 65%, median = 49%

MSE decrease BIV vs. Direct: 4� 91%, median = 67%!!

Bell & Franco () Combining estimates from related surveys February 26, 2016 16 / 17

Page 26: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application IV: ACS 1-yr County Poverty EstimatesBorrow from Previous ACS 5-yr County Poverty Estimates

y1i = 2012 ACS estimated county rates of children in poverty

y2i = 2007-2011 ACS estimated county child poverty rates

Note: Good covariates are available for modeling, but are not used here.

ρ = .94

MSE decrease UNI vs. Direct: 0.4%� 87%, median = 32%

MSE decrease BIV vs. UNI: 4%� 65%, median = 49%

MSE decrease BIV vs. Direct: 4� 91%, median = 67%!!

Bell & Franco () Combining estimates from related surveys February 26, 2016 16 / 17

Page 27: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Application IV: ACS 1-yr County Poverty EstimatesBorrow from Previous ACS 5-yr County Poverty Estimates

y1i = 2012 ACS estimated county rates of children in poverty

y2i = 2007-2011 ACS estimated county child poverty rates

Note: Good covariates are available for modeling, but are not used here.

ρ = .94

MSE decrease UNI vs. Direct: 0.4%� 87%, median = 32%

MSE decrease BIV vs. UNI: 4%� 65%, median = 49%

MSE decrease BIV vs. Direct: 4� 91%, median = 67%!!

Bell & Franco () Combining estimates from related surveys February 26, 2016 16 / 17

Page 28: Combining Estimates from Related Surveys via Bivariate ModelsFebruary 26, 2016 Bell & Franco Combining estimates from related surveys February 26, 2016 1 / 17. Disclaimer: This report

Concluding Remarks

Bivariate model can achieve large MSE decreases by borrowingstrength from ACS estimates to improve estimates from smallersurveys, provided ρ is high!

Model is simple; key is the quality of the additional data source(ACS estimates) used for this purpose.

In most of the examples (I, II, IV), the biggest part of the MSEdecreases came from the univariate to bivariate shrinkage, not fromthe univariate shrinkage.

Theoretical and empirical results show not much improvement when alarger survey borrows strength from a smaller one.

Bell & Franco () Combining estimates from related surveys February 26, 2016 17 / 17