meps workshop household component survey estimation issues household component survey estimation...
TRANSCRIPT
MEPS WORKSHOPMEPS WORKSHOP
Household Component Survey Estimation Household Component Survey Estimation IssuesIssues
Steve Machlin, Agency for Healthcare Research and QualitySteve Machlin, Agency for Healthcare Research and QualityPaul Gorrell, Social and Scientific SystemsPaul Gorrell, Social and Scientific Systems
Overview
Annual person-level estimates– Overlapping panels
Estimation variables– Weights– Variance
Pooling multiple years of annual data Longitudinal analysis of MEPS panels
– Two-year period
Family-level estimation Other miscellaneous issues
Annual Person-Level Files
MEPS Annual FilesMEPS Annual Files
YearYear
PanelPanel
19971997 19981998 19991999 20002000 20012001
1 1 (96-97)(96-97) Yr. 2Yr. 2
2 2 (97-98)(97-98) Yr. 1Yr. 1 Yr. 2Yr. 2
3 3 (98-99)(98-99) Yr. 1Yr. 1 Yr. 2Yr. 2
4 4 (99-00)(99-00) Yr. 1Yr. 1 Yr. 2Yr. 2
5 5 (00-01)(00-01) Yr. 1Yr. 1 Yr. 2Yr. 2
6 6 (01-02)(01-02) Yr. 1Yr. 1
MEPS Annual FilesMEPS Annual Files
YearYear
PanelPanel
20022002 20032003 20042004
6 (01-02)6 (01-02) Yr. 2Yr. 2
7 (02-03)7 (02-03) Yr. 1Yr. 1 Yr. 2Yr. 2
8 (03-04)8 (03-04) Yr. 1Yr. 1 Yr. 2Yr. 2
9 (04-05)9 (04-05) Yr. 1Yr. 1
MEPS Annual Person Level MEPS Annual Person Level Estimation Estimation
19971997 19981998 19991999 20002000 20012001
File NumberFile Number HC-HC-020020
HC-HC-028028
HC-HC-038038
HC-HC-050050
HC-HC-060060
Persons with Persons with weight > 0 weight > 0
32,63632,636 22,95322,953 23,56523,565 23,83923,839 32,12232,122
WeightedWeighted
Persons: AllPersons: All271.3 271.3 millionmillion
273.5 273.5 millionmillion
276.4 276.4 millionmillion
278.4 278.4 millionmillion
284.2 284.2 millionmillion
INSC1231=1INSC1231=1
(in target pop. (in target pop. at end of year)at end of year)
267.7 267.7 millionmillion
270.1 270.1 millionmillion
273.0 273.0 millionmillion
275.2 275.2 millionmillion
280.8 280.8 millionmillion
MEPS Annual Person Level MEPS Annual Person Level Estimation (continued) Estimation (continued)
20022002 20032003 20042004
File NumberFile Number HC-070HC-070 HC-079HC-079 HC-089HC-089
Persons with Persons with weight > 0 weight > 0
37,41837,418 32,68132,681 32,73732,737
WeightedWeighted
Persons: AllPersons: All
288.2 288.2 millionmillion
290.6 290.6 millionmillion
293.5293.5
millionmillionINSC1231=1INSC1231=1
(in target pop. at (in target pop. at end of year)end of year)
284.6 284.6 millionmillion
286.8 286.8 millionmillion
289.7289.7
millionmillion
Weights and Variance Estimation Variables
MEPS Sample Design
Each panel is sub-sample of household respondents for the previous year’s National Health Interview Survey (NHIS) – NHIS sponsor is National Center for Health Statistics
NHIS sample based on complex stratified multi-stage probability design
Civilian non-institutionalized population
NHIS Sample DesignNHIS Sample Design (1995-2004) (1995-2004)
U.S. partitioned into 1,995 Primary Sampling Units (Counties or groups of adjacent counties)
PSU’s grouped into 237 design strata– 358 PSU’s sampled across strata
Second Stage Units (SSU’s)– Clusters of housing units
– Oversample of SSU’s with large Black/Hispanic populations
MEPS based on subsample of about 200 PSU’s from NHIS
Oversampling in MEPSOversampling in MEPS
Every year: Blacks and Hispanics– Carryover from NHIS
1997: Selected subpopulations– Functionally impaired adults– Children with activity limitations– Adults 18-64 predicted to have high medical expenditures– Low income– Adults with other impairments
2002 and beyond:– Asians – Low income– Additional oversampling of blacks in 2004
Estimation from Complex Surveys
Estimates need to be weighted to reflect sample design and survey nonresponse– Unweighted estimates are biased
Use appropriate method to compute standard errors to account for complex design– Assuming simple random sampling usually
underestimates sampling error
Development of Person Development of Person WeightsWeights
Base Weight (NHIS)Base Weight (NHIS)– Compensates for oversampling and nonresponseCompensates for oversampling and nonresponse
Adjustments forAdjustments for– Household nonresponse (MEPS Round 1)Household nonresponse (MEPS Round 1)
– Attrition of persons (Subsequent Rounds)Attrition of persons (Subsequent Rounds)
– Poststratification (Census Population Estimates)Poststratification (Census Population Estimates)
– Trimming of extreme weightsTrimming of extreme weights
Final Person WeightFinal Person Weight– Weight > 0: person selected and in-scope for surveyWeight > 0: person selected and in-scope for survey
– Weight = 0 (about 4% in 2002): person not in-scope for Weight = 0 (about 4% in 2002): person not in-scope for survey but living in household with in-scope person(s)survey but living in household with in-scope person(s)
Distribution of MEPS Sample Distribution of MEPS Sample Person Final WeightsPerson Final Weights
19971997 19981998 19991999 20002000 20012001
AverageAverage 8,3128,312 11,91711,917 11,73011,730 11,67911,679 8,8498,849
MinimumMinimum 299299 321321 307307 454454 336336
MaximumMaximum 68,51868,518 84,58784,587 80,06280,062 78,15778,157 67,53767,537
Variable NameVariable Name WTDPER97WTDPER97 WTDPER98WTDPER98 PERWT99FPERWT99F PERWT00FPERWT00F PERWT01FPERWT01F
Distribution of Sample Person Distribution of Sample Person Final Weights (continued)Final Weights (continued)
20022002 20032003 20042004
AverageAverage 7,7027,702 8,8928,892 8,9668,966
MinimumMinimum 367367 401401 425425
MaximumMaximum 46,76646,766 60,27360,273 63,72863,728
Variable NameVariable Name PERWT02FPERWT02F PERWT03FPERWT03F PERWT04FPERWT04F
Types of Basic Point Types of Basic Point EstimatesEstimates
MeansMeans ProportionsProportions TotalsTotals Differences between subgroupsDifferences between subgroups
Variance EstimationVariance Estimation
Basic software procedures assume simple random sampling (SRS)Basic software procedures assume simple random sampling (SRS)– MEPS not SRSMEPS not SRS
– Point estimates correct (if weighted)Point estimates correct (if weighted)
– Standard errors usually too smallStandard errors usually too small
Software to account for complex design using Taylor Series approachSoftware to account for complex design using Taylor Series approach– SUDAAN (stand-alone or callable within SAS)SUDAAN (stand-alone or callable within SAS)
– STATA (svy commands)STATA (svy commands)
– SAS 8.2 (survey procedures)SAS 8.2 (survey procedures)
– SPSS (new complex survey features in 13.0)SPSS (new complex survey features in 13.0)
Estimation Example: Estimation Example: Average Total Expenditures, 2001Average Total Expenditures, 2001
Weighted mean = $2,555 per capitaWeighted mean = $2,555 per capita– Unweighted mean of $2,400 is biasedUnweighted mean of $2,400 is biased
SE based on Taylor Series = 55SE based on Taylor Series = 55– SAS V8.2: PROC SURVEYMEANS SAS V8.2: PROC SURVEYMEANS
– SUDAAN: PROC DESCRIPTSUDAAN: PROC DESCRIPT
– Stata: svymeanStata: svymean
SE assuming SRS = 41 (too low)SE assuming SRS = 41 (too low)– SAS V8.2: PROC UNIVARIATE or MEANSSAS V8.2: PROC UNIVARIATE or MEANS
Computing Standard Errors for Computing Standard Errors for MEPS EstimatesMEPS Estimates
Document on MEPS websiteDocument on MEPS website http://www.meps.ahrq.gov/mepsweb/surhttp://www.meps.ahrq.gov/mepsweb/sur
vey_comp/standard_errors.jspvey_comp/standard_errors.jsp
Example (Point estimates and SEs): Example (Point estimates and SEs): SAS V8.2SAS V8.2
proc surveymeans data=work.h60 proc surveymeans data=work.h60 mean; stratum varstr01;mean; stratum varstr01;cluster varpsu01;cluster varpsu01;weight perwt01f;weight perwt01f;var totexp01;var totexp01;
Example (Point estimates and SEs): Example (Point estimates and SEs): SUDAAN (SAS-callable)SUDAAN (SAS-callable)
First need to sort file by varstr01 & varpsu01First need to sort file by varstr01 & varpsu01 proc descript data=work.h60 filetype=SAS proc descript data=work.h60 filetype=SAS
design=wr;design=wr;nest varstr01 varpsu01;nest varstr01 varpsu01;weight perwt01f;weight perwt01f;var totexp01;var totexp01;
Example (Point estimates and SEs): Example (Point estimates and SEs): StataStata
svyset [pweight=perwt01f], strata(varstr01) psu svyset [pweight=perwt01f], strata(varstr01) psu (varpsu01)(varpsu01)
svymean(totexp01)svymean(totexp01)
Analysis of SubpopulationsAnalysis of Subpopulations
Analyzing files that contain only a subset of MEPS Analyzing files that contain only a subset of MEPS sample may produce error messages or incorrect sample may produce error messages or incorrect standard errors standard errors
Each software package has capability to produce Each software package has capability to produce subpopulation estimates from entire person-level subpopulation estimates from entire person-level filefile
See “Computing Standard Errors for MEPS Estimates”See “Computing Standard Errors for MEPS Estimates”
– http://http://www.meps.ahrq.gov/mepsweb/survey_comwww.meps.ahrq.gov/mepsweb/survey_comp/standard_errors.jspp/standard_errors.jsp
Assessing Precision/ReliabilityAssessing Precision/Reliability of Estimates of Estimates
Sample SizesSample Sizes Standard Errors/Confidence IntervalsStandard Errors/Confidence Intervals Relative Standard ErrorsRelative Standard Errors
– standard error of estimate standard error of estimate estimate estimate
Example: Average total Example: Average total expenses per capita, 2001expenses per capita, 2001
Sample Size = 32,122Sample Size = 32,122 Estimate = $2,555Estimate = $2,555 Standard Error = 55Standard Error = 55 95% Confidence Interval: (2447, 2663) 95% Confidence Interval: (2447, 2663) Relative Standard Error (RSE) or Coefficient Relative Standard Error (RSE) or Coefficient
of Variation (CV) = 55 of Variation (CV) = 55 ÷ ÷ 2555 = .021 = 2.1%2555 = .021 = 2.1%
Types of Basic Point Types of Basic Point Estimates: ExamplesEstimates: Examples
MeansMeans– Annual per capita expenses in 2001 = $2,555 Annual per capita expenses in 2001 = $2,555
ProportionsProportions– Percent with some health expenses in 2001 = 85.4%Percent with some health expenses in 2001 = 85.4%– Two methods to generate estimates: Two methods to generate estimates:
percents obtained from frequency tablespercents obtained from frequency tables means of dichotomous variablemeans of dichotomous variable
TotalsTotals– Total expenses in 2001 = $726.4 billion Total expenses in 2001 = $726.4 billion – Total number of persons (sum of weights) Total number of persons (sum of weights)
Differences between subgroupsDifferences between subgroups
Pooling Multiple Years of MEPS Data
Reasons for PoolingReasons for Pooling
Reduce standard error of estimate(s)Reduce standard error of estimate(s) Stabilize trend analyzesStabilize trend analyzes Enhance ability to analyze small Enhance ability to analyze small
subgroupssubgroups
Minimum Sample SizesMinimum Sample Sizes
CFACT StandardsCFACT Standards– Minimum unweighted sample of 100Minimum unweighted sample of 100– Flag estimates with RSE > 30%Flag estimates with RSE > 30%
Confidence intervals become problematic with small samples Confidence intervals become problematic with small samples and/or highly skewed dataand/or highly skewed data– Consider larger minimum sample sizes for highly skewed variablesConsider larger minimum sample sizes for highly skewed variables– Analysts may be comfortable with smaller minimums for less Analysts may be comfortable with smaller minimums for less
skewed variablesskewed variables– ASA Paper: Yu and Machlin (Skewness)ASA Paper: Yu and Machlin (Skewness)
http://www.meps.ahrq.gov/mepsweb/data_files/publications/workinhttp://www.meps.ahrq.gov/mepsweb/data_files/publications/workingpapers/wp_04002.pdfgpapers/wp_04002.pdf
Example: Annual Sample Example: Annual Sample Sizes (Unpooled)Sizes (Unpooled)
YearYear Total Total PopulationPopulation
Children Children 0-50-5
Asian/PI Asian/PI Children* 0-5Children* 0-5
19961996 21,57121,571 2,0182,018 585819971997 32,63632,636 3,0823,082 787819981998 22,95322,953 2,1142,114 828219991999 23,56523,565 2,1562,156 9393
* Sample sizes do not meet AHRQ minimum * Sample sizes do not meet AHRQ minimum requirement (n=100) to publish estimates.requirement (n=100) to publish estimates.
Pooled Sample SizesPooled Sample Sizes
YearsYears
Total Total
SampleSample
Children Children
0-50-5
Asian/PI Asian/PI Children 0-5Children 0-5
1996-19971996-1997 54,20754,207 5,1005,100 136136
1998-19991998-1999 46,51846,518 4,2704,270 175175
1996-19991996-1999 100,725100,725 9,3709,370 311311
Relative Standard Errors for Relative Standard Errors for Estimated Mean Expenditures: Estimated Mean Expenditures:
Asian/PI Children 0-5Asian/PI Children 0-5
0%
10%
20%
30%
40%
50%
96 97 98 99 96-97 98-99 96-99
Rel
ativ
e S
tan
dar
d E
rro
r
Annual 2 year 4 year
Creating a Pooled FileCreating a Pooled File for Analysis (1996-2002) for Analysis (1996-2002)
Need to work with Pooled Estimation File (HC-036) when 1+ Need to work with Pooled Estimation File (HC-036) when 1+ years being pooled include any year from 1996 through 2001years being pooled include any year from 1996 through 2001– Stratum and PSU variables obtained from HC-036 for Stratum and PSU variables obtained from HC-036 for
1996-20041996-2004– Documentation for HC-036 provides instructions on how to Documentation for HC-036 provides instructions on how to
properly create pooled analysis fileproperly create pooled analysis file Stratum and PSU variables properly standardized for pooling Stratum and PSU variables properly standardized for pooling
years from 2002 onward (i.e., do not need HC-036)years from 2002 onward (i.e., do not need HC-036)
Creating Pooled Files: Creating Pooled Files: Summary of Important StepsSummary of Important Steps
Rename analytic and weight variables from different years to common names. Rename analytic and weight variables from different years to common names. – Expenditures: TOTEXP99 & TOTEXP00 = TOTEXPExpenditures: TOTEXP99 & TOTEXP00 = TOTEXP
– Weights: PERWT99F & PERWT00F = POOLWTWeights: PERWT99F & PERWT00F = POOLWT
Divide weight variable by number of years pooled to produce estimates for “an Divide weight variable by number of years pooled to produce estimates for “an average year” during the period.average year” during the period.– Keep original weight value if estimating total for periodKeep original weight value if estimating total for period
Concatenate annual files Concatenate annual files Merge variance estimation variables from HC-036 onto file (only if 1+ years prior Merge variance estimation variables from HC-036 onto file (only if 1+ years prior
to 2002)to 2002)– Strata variable: STRA9603Strata variable: STRA9603
– PSU variable: PSU9603PSU variable: PSU9603
Estimates from Pooled FilesEstimates from Pooled Files
Produce estimates in analogous fashion as for Produce estimates in analogous fashion as for individual yearsindividual years
Estimates interpreted as “average annual” for Estimates interpreted as “average annual” for pooled periodpooled period
Example: Pooled 1996-99 dataExample: Pooled 1996-99 data– The average annual total health care expenditures for The average annual total health care expenditures for
Asian/Pacific Islander children under 6 years of age Asian/Pacific Islander children under 6 years of age during the period from 1996-1999 was $525 (SE=97). during the period from 1996-1999 was $525 (SE=97).
Pooling Annual Data: Pooling Annual Data: Lack of Independence Across YearsLack of Independence Across Years
Legitimate to pool data for persons in consecutive Legitimate to pool data for persons in consecutive yearsyears– Each yr. constitutes nationally representative Each yr. constitutes nationally representative
sample sample – Pooling produces average annual estimatesPooling produces average annual estimates– Stratum & PSU variables sufficient to account Stratum & PSU variables sufficient to account
for lack of independence between years for lack of independence between years Lack of independence actually begins with first Lack of independence actually begins with first
stage of sample selectionstage of sample selection– Same PSUs are used to select each MEPS Same PSUs are used to select each MEPS
panelpanel See HC-036 documentationSee HC-036 documentation
Longitudinal Analysis of MEPS Panels
MEPS Longitudinal Analysis:MEPS Longitudinal Analysis:Panel 4: 1999-2000Panel 4: 1999-2000
Panel 4: 1999-2000
Round 2
Round 3 Round 4 Round 5
19991/1/1999
2000
Round 1
12/31/2000
MEPS Longitudinal AnalysisMEPS Longitudinal Analysis
National estimates of person-level changes National estimates of person-level changes over two-year periodover two-year period
– two-year period is relatively shorttwo-year period is relatively short Examine characteristics associated with Examine characteristics associated with
changeschanges
– mainly round 1 datamainly round 1 data
Variables that may change Variables that may change between years or roundsbetween years or rounds
Insurance coverageInsurance coverage– Monthly indicators (24 measures)Monthly indicators (24 measures)
– Annual summary (2 measures per person)Annual summary (2 measures per person)
Health statusHealth status– Each round (5 measures)Each round (5 measures)
Having a usual source of careHaving a usual source of care– Rounds 2 & 4 (2 measures)Rounds 2 & 4 (2 measures)
Use and expendituresUse and expenditures– Annual (2 measures per person)Annual (2 measures per person)
MEPS Longitudinal Weight Files MEPS Longitudinal Weight Files Currently Available (Oct., 2006)Currently Available (Oct., 2006)
MEPS MEPS PanelPanel
Years Years CoveredCovered
PUF PUF NumberNumber
11 1996-971996-97 HC-023HC-023
22 1997-981997-98 HC-035HC-035
33 1998-991998-99 HC-048HC-048
44 1999-001999-00 HC-058HC-058
55 2000-012000-01 HC-065HC-065
66 2001-022001-02 HC-071HC-071
77 2002-032002-03 HC-080HC-080
Creating Longitudinal Files (Panel 4) : Creating Longitudinal Files (Panel 4) : Summary of Important StepsSummary of Important Steps
Select Panel 4 records from annual filesSelect Panel 4 records from annual files– 1999 (PUF HC-038)1999 (PUF HC-038)
– 2000 (PUF HC-050)2000 (PUF HC-050)
Obtain MEPS Longitudinal File (HC-058)Obtain MEPS Longitudinal File (HC-058)– Contains weight and variance estimation Contains weight and variance estimation
variablesvariables
– Contains variable indicating whether complete Contains variable indicating whether complete data are available for 1 or both years of paneldata are available for 1 or both years of panel
Link using DUPERSIDLink using DUPERSID
Longitudinal WeightLongitudinal Weight
Variable Name: LONGWTP#Variable Name: LONGWTP# Produces estimates for persons in civilian Produces estimates for persons in civilian
noninstitutionalized population in two noninstitutionalized population in two consecutive years when applied to consecutive years when applied to persons participating in both years of a persons participating in both years of a given panel (YRINDP# = 1)given panel (YRINDP# = 1)
Examples: Longitudinal EstimatesExamples: Longitudinal Estimates
Of those without insurance at any time in 1999, Of those without insurance at any time in 1999, estimated 76.9% (SE=1.6) also uninsured estimated 76.9% (SE=1.6) also uninsured throughout 2000throughout 2000
Estimated 8.2% (SE=0.4) of the population had Estimated 8.2% (SE=0.4) of the population had no insurance throughout 1999-2000no insurance throughout 1999-2000
Of those with no expenses in 1999, estimated Of those with no expenses in 1999, estimated 47.6% (SE=1.3) had some expenses in 200047.6% (SE=1.3) had some expenses in 2000
Of top 5% of spenders in 1996, 30% retain this Of top 5% of spenders in 1996, 30% retain this position in 1997. position in 1997.
Family-Level Estimation
Family-Level EstimationFamily-Level Estimation
Need to roll up persons to familiesNeed to roll up persons to families– MEPS vs. CPS definitionsMEPS vs. CPS definitions– Any time during year or December 31Any time during year or December 31– Instructions in person file documentationInstructions in person file documentation
Avg. number of persons per family = 2.4Avg. number of persons per family = 2.4 Use appropriate family weight variableUse appropriate family weight variable
– Family weight = 0 if full-year data not obtained for all in-scope Family weight = 0 if full-year data not obtained for all in-scope family members (about 2% of cases in 2002) family members (about 2% of cases in 2002)
MEPS Annual Files: MEPS Annual Files: Annualized Family Sample SizesAnnualized Family Sample Sizes
19971997 19981998 19991999 20002000 20012001
File File NumberNumber
HC-HC-020020
HC-HC-028028
HC-HC-038038
HC-HC-050050
HC-HC-060060
FamiliesFamilies
(unwtd)(unwtd)
13,08713,087 9,0239,023 9,3459,345 9,5159,515 12,85212,852
WeightedWeighted 112.2 112.2 millionmillion
113.4 113.4 millionmillion
114.6 114.6 millionmillion
116.3 116.3 millionmillion
118.8 118.8 millionmillion
Family Weight Family Weight Variable NameVariable Name
WTFAMF97WTFAMF97 WTFAMF98WTFAMF98 FAMWT99FFAMWT99F FAMWT00FFAMWT00F FAMWT01FFAMWT01F
MEPS Annual Files: MEPS Annual Files: Annualized Family Sample SizesAnnualized Family Sample Sizes (con.) (con.)
20022002 20032003 20042004
File File NumberNumber
HC-070HC-070 HC-079HC-079 HC-089HC-089
FamiliesFamilies
(unwtd)(unwtd)
14,82814,828 12,86012,860 13,01813,018
WeightedWeighted 121.0 121.0 millionmillion
121.8 121.8 millionmillion
123.0 123.0 millionmillion
Family Weight Family Weight Variable NameVariable Name
FAMWT02FFAMWT02F FAMWT03FFAMWT03F FAMWT04FFAMWT04F
Family-Level ExampleFamily-Level Example
2001 average total 2001 average total expenses per familyexpenses per family
Estimates based on Estimates based on families in scope at families in scope at any time during yearany time during year
Family Family sizesize EstimateEstimate SESE
AllAll $6,029$6,029 13113111 $4,191$4,191 21521522 $7,405$7,405 27727733 $6,616$6,616 26826844 $6,075$6,075 278278
5+5+ $7,518$7,518 389389
Other Miscellaneous Estimation Issues
Medical Event as Unit of Medical Event as Unit of AnalysisAnalysis
Can use event files to estimate average Can use event files to estimate average expense per event expense per event
Examples: In 2001,Examples: In 2001,– mean facility expense per inpatient stay was mean facility expense per inpatient stay was
$6,629 (SE=263).$6,629 (SE=263).– mean expense per office visit to a medical mean expense per office visit to a medical
provider was $114 (SE=2)provider was $114 (SE=2)
Special SupplementsSpecial Supplements
Self Administered Questionnaire (SAQ)Self Administered Questionnaire (SAQ)
– Use SAQ weightUse SAQ weight Parent Administered Questionnaire (PAQ) Parent Administered Questionnaire (PAQ)
– 2000 only2000 only
– Use PAQ weightUse PAQ weight Diabetes Care Survey (DCS)Diabetes Care Survey (DCS)
– Use DCS weightUse DCS weight Variables on person-level filesVariables on person-level files
– Consult documentation for appropriate weightConsult documentation for appropriate weight