fitting a bayesian fay-herriot model2018/10/24  · fitting a bayesian fay-herriot model nathan b....

20
Fitting a Bayesian Fay-Herriot Model Nathan B. Cruze United States Department of Agriculture National Agricultural Statistics Service (NASS) Research and Development Division Washington, DC October 25, 2018 ... providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Upload: others

Post on 30-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Fitting a Bayesian Fay-Herriot Model

    Nathan B. Cruze

    United States Department of AgricultureNational Agricultural Statistics Service (NASS)

    Research and Development Division

    Washington, DCOctober 25, 2018

    “. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

  • Disclaimer

    The Findings and Conclusions in This Preliminary Presen-tation Have Not Been Formally Disseminated by the U.S.Department of Agriculture and Should Not Be Construed toRepresent Any Agency Determination or Policy.

  • Overview

    I NASS interest in small area estimation (SAE)

    I The Fay and Herriot (1979) modelI Case study: county estimates of planted corn, Illinois 2014

    I Computation in R and JAGS

    GASP 2018–Fitting Bayesian Fay-Herriot Model 3

  • Small Area Estimation (SAE) Literature“A domain is regarded as ‘small’ if the domain-specific sample isnot large enough to support [survey] estimates of adequateprecision.”–Rao and Molina (2015)

    Regression and mixed-modeling approaches in SAE literature

    I Shrinkage–improve estimates with other information

    I Utility of auxiliary data as covariate

    I Variance-bias trade off

    Two common models

    1. Unit-level models, e.g., Battese et al. (1988)I USDA NASS (formerly SRS) as source of data/funding

    2. Area-level models, e.g., Fay and Herriot (1979)

    GASP 2018–Fitting Bayesian Fay-Herriot Model 4

  • NASS Interest In SAE

    Iwig (1996): USDA’s involvement in county estimates in 1917

    Published estimates used by:

    I Agricultural sector

    I Financial institutions

    I Research institutions

    I Government and USDA

    Published estimates used for:

    I County loan rates

    I Crop insurance

    I County-level revenueguarantee

    National Academies of Sciences, Engineering, and Medicine (2017)

    I Consensus estimates: Board review of survey and other data

    I Currently published without measures of uncertainty

    I Recommends transition to system of model-based estimates

    GASP 2018–Fitting Bayesian Fay-Herriot Model 5

  • Fay-Herriot (Area-Level) ModelFay and Herriot (1979)–improved upon per capita incomeestimates with following model

    θ̂j = θj + ej , j = 1, . . . ,m counties (1)

    θj = x′j β + uj (2)

    Adding Eqs. 1 and 2

    θ̂j = x′j β + uj + ej

    I θ̂j , direct estimate

    I E (ej |θj) = 0I V (ej |θj) = σ̂2j , estimated

    variance

    I xj , known covariates

    I uj , area random effect

    I ujiid∼ (0, σ2u)

    GASP 2018–Fitting Bayesian Fay-Herriot Model 6

  • Fay-Herriot Formulated As Bayesian Hierarchical Model‘Recipe’ for hierarchical Bayesian model as in Cressie and Wikle(2011)

    Data model:θ̂j |θj ,β

    ind∼ N(θj , σ̂2j ) (3)

    Process model:θj |β, σ2u

    iid∼ N(x ′j β, σ2u) (4)

    Prior distributions on β and σ2uI Browne and Draper (2006), Gelman (2006): σ2u ∼?I We will specify σ2u ∼ Unif (0, 108), β

    iid∼ MVN(0, 106I )

    Goal: Obtain posterior summaries about county totals, θj

    GASP 2018–Fitting Bayesian Fay-Herriot Model 7

  • County Agricultural Production Survey (CAPS)

    Case study in Cruze et al. (2016)

    Illinois planted corn

    I 9 Ag. Statistics Districts

    I 102 counties

    I a major producer of cornI End-of-season survey

    – Direct estimates of totals– Estimated sampling variances

    Min Median Max

    n reports 2 47 93CV (%) 9.1 19.2 92.3

    Warren 187

    Hende son 071

    McDonough 109

    Stephenson 1n

    Carroll 015

    Ogle 141

    Lee

    Winnebago Boon 201 007

    McHenry

    111

    DeKalb Kane 037 089

    Lake 097

    Cook

    10 103

    ?----��---r----1 Kendall Will 197

    Knox 095

    Fulton 057

    Bureau 011

    LaSalle 099 20

    093

    Grundy 063

    Peoria 143

    Macoupin 117

    Kankakee 091

    Livingston ,,L---- ---.. 105 Iroquois

    075 Woodford 203

    Montgom 135

    McLean 113 50 Ford

    053

    40 __ __ _ _ _, Vermilion L_�-----,- Champaign 183 DeWitt 039

    Macon 115

    019

    Coles 029

    Edgar 045

    Clark 023 70 Cumberland .. ---- -� -- ---I 035

    L_�------'--�- -� Fayette

    051 Effingham

    049 Jasper

    079

    .... --.&....,i,Marion Clay 025

    .... ----- 121

    Jefferson 081

    Franklin 055

    Williamson 199

    https://www.nass.usda.gov/Charts_and_Maps/Crops_

    County/indexpdf.php

    https://www.nass.usda.gov/Charts_and_Maps/Crops_County/indexpdf.phphttps://www.nass.usda.gov/Charts_and_Maps/Crops_County/indexpdf.php

  • Covariate x1: USDA Farm Service Agency (FSA) Acreage

    I FSA administers farmsupport programs

    I Enrollment popular,not compulsory

    I Data self-reported atFSA office

    I Administrative vs.physical county

    https://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/

    frequently-requested-information/crop-acreage-data/index

    https://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/frequently-requested-information/crop-acreage-data/indexhttps://www.fsa.usda.gov/news-room/efoia/electronic-reading-room/frequently-requested-information/crop-acreage-data/index

  • Covariate x2: NOAA Climate Division March Precipitation

    Weather as auxiliary variable

    I March: Planting ‘intentions’

    I April: Illinois planting

    I Could rainfall in Marchaffect planting?

    I One-to-one mapping: ASDand climate division

    I Repeat value for all countieswithin ASD

    ASD Precip (in)

    10 1.0820 1.3530 1.2740 1.6650 1.5060 1.3670 1.4680 1.6990 2.00

    Source: ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdivDetails in Vose et al. (2014)

    GASP 2018–Fitting Bayesian Fay-Herriot Model 10

    ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv

  • NASS Official StatisticsFrom prior publication: Illinois 2014, 11.9 million acres of cornplanted

    I Require: State-ASD-county benchmarking of estimates

    State/district: https://quickstats.nass.usda.gov/results/3A17F375-B762-37BD-8C03-D581DC8F7A85County: https://quickstats.nass.usda.gov/results/478D1A7B-E680-3E5E-95E4-9A59F938A256

    https://quickstats.nass.usda.gov/results/3A17F375-B762-37BD-8C03-D581DC8F7A85https://quickstats.nass.usda.gov/results/478D1A7B-E680-3E5E-95E4-9A59F938A256

  • JAGS Model

    I Note data, process, prior structure from earlier slide

    I Note distributions parameterized in terms of precision

    I Read into R script as stored R source code or as text stringOnline resources http://www.sumsar.net/blog/2013/06/three-ways-to-run-bayesian-models-in-r/

    http://www.sumsar.net/blog/2013/06/three-ways-to-run-bayesian-models-in-r/

  • A Pseudo-Code R Script

    GASP 2018–Fitting Bayesian Fay-Herriot Model 13

  • Analysis of JAGS Model OutputPosterior summaries of parameters–based on 3,000 saved iterates

    I Posterior means, standard deviations, quantiles, potentialscale reduction factors, effective sample sizes, pD, DIC

    I Transform back to acreage scale

    I Ratio benchmarking–inject benchmarking factor back intochains as in Erciulescu et al. (2018)

    GASP 2018–Fitting Bayesian Fay-Herriot Model 14

  • Results: Models With and Without BenchmarkingI Modeled estimates (ME) may not satisfy benchmarkingI Ratio-benchmarked estimates (MERB) are consistent with

    state targets and improve agreement with external sources

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    County Comparisons of Model and FSA Acreage

    FSA Planted Area (Acres of Corn)

    Mod

    eled

    Est

    imat

    es o

    f Pla

    nted

    Are

    a (A

    cres

    ) ● MERB estimateME estimate

    ●●

    ●●

    ASD Comparisons of Model and FSA Acreage

    FSA Planted Area (Acres of Corn)M

    odel

    ed E

    stim

    ates

    of P

    lant

    ed A

    rea

    (Acr

    es) ● MERB estimate

    ME estimate

    GASP 2018–Fitting Bayesian Fay-Herriot Model 15

  • Results: Posterior Distributions of ASD-Level AcreagesUsed county-level inputs to produce county-level estimates

    I Idea: derive ASD-level estimates from Monte Carlo iteratesI Sum corresponding draws from county posterior distributions

    – Compute means and variances from aggregated chains

    500 1000 1500 2000

    Planted Area (1,000 Acres)

    ASD 80 ASD 90

    ASD 20 ASD 30

    ASD 70

    ASD 40

    ASD 50

    ASD 60

    ASD 10

    MERBNASS OFFICIAL

    GASP 2018–Fitting Bayesian Fay-Herriot Model 16

  • Results: Relative Variability of Survey Versus ModelObtain estimates and measures of uncertainty for counties anddistricts

    I Recall the goal of SAE–increased precision!

    CV (%) of CAPS Survey Estimates

    Min Q1 Median Mean Q3 MaxCounty 9.1 16.6 19.2 22.2 23.5 92.3District 4.4 5.6 6.8 6.6 7.2 8.7

    CV (%) of MERB Estimates

    Min Q1 Median Mean Q3 MaxCounty 3.6 5.6 7.2 9.0 10.5 31.2District 1.7 2.0 2.1 2.5 2.3 4.4

    GASP 2018–Fitting Bayesian Fay-Herriot Model 17

  • Results: Comparison to Other SourcesFor counties and districts, compute ‘standard score’

    I (model estimate-other source)/model standard errorI Direct Estimates, Cropland Data Layer, Battese-Fuller, FSA

    ● ●●●●●

    ● ●● ●●

    ● ●●

    ● ●

    −6 −4 −2 0 2 4 6

    County−Level Comparisons

    Number of Posterior Standard Deviations

    FS

    AB

    atte

    se_F

    ulle

    rC

    DL

    DE

    *

    ●●

    FS

    AB

    atte

    se_F

    ulle

    rC

    DL

    DE

    −2 0 2 4 6

    ASD−Level Comparisons

    Number of Posterior Standard Deviations

    GASP 2018–Fitting Bayesian Fay-Herriot Model 18

  • Conclusions

    Discussed Bayesian formulation of Fay-Herriot model motivated byNASS applications

    Other R packages facilitate Bayesian small area estimation

    I ‘BayesSAE’ by Chengchun Shi

    I ‘hbsae’ by Harm Jan Boonstra

    I May be bound by limited choice of prior distributions

    I Transformations of data may be needed

    Proc MCMC in SAS added ‘Random’ statement as of version 9.3

    Thanks to Andreea Erciulescu (NISS) and Balgobin Nandram(WPI) for three years of adventures in small area estimation!

    GASP 2018–Fitting Bayesian Fay-Herriot Model 19

  • ReferencesBattese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-components model for prediction of county crop

    areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36.

    Browne, W. J. and Draper, D. (2006). A comparison of bayesian and likelihood-based methods for fitting multilevelmodels. Bayesian Analysis, 1(3):473–514.

    Cressie, N. and Wikle, C. (2011). Statistics for Spatio-Temporal Data. Wiley, Hoboken, NJ.

    Cruze, N., Erciulescu, A., Nandram, B., Barboza, W., and Young, L. (2016). Developments in Model-BasedEstimation of County-Level Agricultural Estimates. In Proceedings of the Fifth International Congress onEstablishment Surveys. American Statistical Association, Geneva.

    Erciulescu, A. L., Cruze, N. B., and Nandram, B. (2018). Model-based county level crop estimates incorporatingauxiliary sources of information. Journal of the Royal Statistical Society: Series A (Statistics in Society).doi:10.1111/rssa.12390.

    Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of james-steinprocedures to census data. Journal of the American Statistical Association, 74(366):269–277.

    Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article bybrowne and draper). Bayesian Analysis, 1(3):515–534.

    Iwig, W. (1996). The National Agricultural Statistics Service County Estimates Program. In Schaible, W., editor,Indirect Estimators in U.S. Federal Programs, chapter 7, pages 129–144. Springer, New York.

    National Academies of Sciences, Engineering, and Medicine (2017). Improving Crop Estimates by IntegratingMultiple Data Sources. The National Academies Press, Washington, DC.

    Rao, J. and Molina, I. (2015). Small Area Estimation. In Wiley Online Library: Books. Wiley, 2nd edition.

    Vose, R. S., Applequist, S., Squires, M., Durre, I., Menne, M. J., Williams, C. N., Fenimore, C., Gleason, K., andArndt, D. (2014). Improved Historical Temperature and Precipitation Time Series for U.S. Climate Divisions.Journal of Applied Meteorology and Climatology, 53(5):1232–1251.

    GASP 2018–Fitting Bayesian Fay-Herriot Model 20