methodological improvements to the brfss:...
TRANSCRIPT
1
Methodological Improvements To The BRFSS: Weighting, Cell
Phones, And Analytical Advances
Martin R. Frankel, Baruch College, CUNY and Abt Associates Inc.
Michael P. Battaglia, Abt Associates Inc.
2
Revised Weighting Procedures
• Revised weighting procedures – Recommended by BRFSS Scientific Advisory
Groups (2002 on)– Take advantage of new weighting technology
(raking) to include more post-stratification factors related to health access and behavior
– Proper compensation for the differential impact of lower response rates
3
Revised Weighting Procedures
• Current Weighting Age by Gender by 2 category Race/Ethnicity (possibly by geographic region)
• Revised Weighting:– Age by Gender– Comprehensive Race/Ethnicity– Marital Status– Education
4
Revised Weighting Procedures
– Non-telephone adjustment using interruption in telephone service
– Gender by race/ethnicity– Age group by race/ethnicity– If geographic region weighting used:
• Region• Region by age group• Region by gender• Region by race/ethnicity
5
Revised Weighting Procedures
• Weight trimming is used to avoid extreme weights – Weight trimming refers to increasing the value of
extremely low weights and decreasing the value of extremely high weight values to reduce their impact on the variance of the estimates
• The IGCV (Individual and Global Cap Value) method is based on the specification of global low and high weight cap factors, and individual low and high weight cap values.
6
Revised Weighting Procedures• The MCV (Margin Cap Value) method takes
each margin (control variable) and independently ratio adjusts the input weights so that the weighted sample totals are in exact agreement with the control totals. This process takes place before the raking iterations start. For each survey respondent the program then looks across all the raking margins and determines the minimum value of the ratio- adjusted input weight and the maximum value of the ratio-adjusted input weight.
7
Impact of New Weighting• Measures of Health Risk and Chronic
Diseases generally increase • Measure of Health Access generally
decrease
8
Impact of New Weighting
• Groups with Higher Risk and Lower Access levels are generally “under- represented” in Telephone Samples
• New Weighting is able to apply more “compensation” for this differential under- representation
9
New vs. Current Weight (US)
10
11
12
13
14
15
16
17
18
Revised Weighting Procedures
• Raking Algorithm Allows for More Control to make samples more “representative” of population
• Changes in estimates are supported by extensive analyses based on lower MSE
• New Procedures can be easily adapted to work with the inclusion of “cell phone only” interviews
19
Methodological Improvements To The BRFSS: Weighting, Cell Phones, And Analytical
Advances
20
Decline in Landline Telephones
• Most RDD telephone samples have only included landline telephone numbers
• The exclusion of cell phone telephone exchanges from RDD health surveys may be producing invalid conclusions about level of risk behaviors and health care needs
• Cell-only adults: 4.4% in early 2004 has grown to 16.1% in early 2008
21
Problem and a Solution• Today, RDD sample exclude a substantial and
differential proportion of the adult population– younger, renters, lower income
• Some journal papers and newspaper stories have questioned the viability of RDD surveys
• New Approach: Dual frame sample design that includes a sufficiently large sample of cell phone-only adults– List-assisted landline RDD sample plus a sample of telephone
numbers drawn from dedicated cellular 1,000 banks• Sorted by area code, exchange, and 1,00-bank within a state.
Equal probability sample of numbers.
22
BRFSS Has Been Leading the Way
• 2007 pilot study in 3 states• 2008 pilot study in 18 states (also included cell numbers
identified by Genesys-CSS)• 2009 landline RDD and cell phone dual frame sample in
all states– Minimum of 250 cell-only interviews.– Maximum of up to 10% of total landline completed interviews in a
state.• 2010 BRFSS estimates based on dual frame design in
all states
23
What We Learned from the 2007 and 2008 Pilot Studies
• It is feasible to screen for cell-only adults• Design weights can be developed for the
sample of cell-only adults• At the national level the NHIS can be used to
size the three key usage groups for use in weighting:– Cell-only, landline-only, and dual service
• For state level surveys, attempts to use internal sample information to size the groups have not proven successful
24
New Approach for the BRFSS• Used multinomial logistic regression to predict 3 telephone usage
groups in the NHIS using 13 socio-demographic variables that are also present in the ACS– Region, – type of living quarters, – number of persons in the HH, – number of children in the HH, – number of elderly adults in the HH, – highest education level among the adults in the HH, – home tenure status, – number of adult males in the HH, – number of adult females in the HH, – number of adult Hispanics in the household, – number of adult non-Hispanic blacks in the HH, – number of married adults in the HH, – and number of never married adults in the HH.
25
New Approach for the BRFSS• Apply model coefficients to households in ACS
to obtain predicted probability that they belong to each of the three telephone usage groups
• Assign three predicted probabilities to all adults in the household
• Do some further weight adjustments to bring the ACS weighted distributions into agreement with Claritas age x gender x race/ethnicity state population control totals and NHIS estimates of telephone usage by Census Region
26
Percent Change in Estimates from Including Cell-Only Adults
-10.0
-5.0
0.0
5.0
10.0
15.0
20.0
25.0
Smoking No Insurance Asthma Binge Drinking Delayed Medical Due To Cost Diabetes
6 risk factors and health conditions
Perc
ent D
iffer
ence
27
Percent Change in Estimates from Including Cell-Only Adults Aged 18-24 Years
-50.0
-40.0
-30.0
-20.0
-10.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
Smoking No Insurance Asthma Binge Drinking Delayed Medical Due To Cost Diabetes
6 risk factors and health conditions
Perc
ent D
iffer
ence
28
Current Work
• Fit multinomial logistic regression model using 2007 NHIS and applied it to the 2006 – 2007 ACS
• Compared model-based estimates of telephone usage with independent proprietary direct state estimates
• Weighting methodology for combining the 2008 RDD state samples with the cell phone samples in the 18 states
29
Cel l - Onl y St at e E st i mat es Based on 2007 NHI S and 2006- 2007 ACS
1
2 4
5
68
9
10
11
12 13
15 16
17 18 1920
21 22
23
24
25
26 2728
29
30
31
32
33 34
35
36
3738
39
40
41
42
44
4546 47
48
4950
51
53
54 55
56
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
22.0
24.0
St a t e
30
Current Work
• For 2009, a combined weighted data set will also be produced
• 2010 sample size considerations – number of cell phone-only interviews relative to total number of landline interviews in state
31
Methodological Improvements To The BRFSS: Weighting, Cell Phones, And Analytical
Advances
32
Optional Modules
• Sets of questions on specific topics that states elect to use in their questionnaires
• In 2007 19 optional modules were supported by CDC
• Not feasible for a state to use all modules• Unlike national surveys such as NHIS, not
possible to release national estimates for key prevalence estimates from an optional module
33
Mental Illness And Stigma Module
• Kessler 6• 6 questions, 0-4 scale• Total K6 score between 0 and 24• Score of 13 or higher indicates serious
psychological distress (SPD)• SPD coded as 1 if score GE 13 or 0 if LE
12 – dichotomous outcome variable
34
Module Use By State
• 25 states administered module to all respondents
• 7 split sample states (QSTVER =1)• 4 split sample states (QSTVER = 2)• SPD item nonresponse in above 36 states• 15 states did not administer the module• No SPD prevalence estimates for the 15
states and therefore no national SPD prevalence estimate
35
423,783 Adults (9 Categories)
1. 137,614 module not used2. 12,453 module used for all, SPD missing (8.2%)3. 139,783 module used for all, SPD not missing4. 3,482 QSTVER = 1, SPD missing5. 43,614 QSTVER = 1, SPD not missing6. 40,966 QSTVER = 2, module not used7. 1,259 QSTVER = 2, SPD missing8. 15,923 QSTVER = 2, SPD not missing9. 28,689 QSTVER = 1, module not used
36
Overall Statistical Approach
• Logistic regression to model Y = SPD using adults with non-missing SPD
• For missing SPD adults, obtain predicted probability that Y = 1 (e.g., 0.358)
• Use rounding method to convert predicted probability to 1 or 0 value
• Validate accuracy of imputations• Form weighted state and national
estimates of SPD prevalence
37
Overall Statistical Approach
• First, impute SPD for adults in categories 2, 4 and 7 using state-specific models based on categories 3, 5 and 8, respectively
• Second, impute SPD for adults in categories 6 and 9 using state-specific models based on categories 5 and 8, respectively
• Third, impute SPD for adults in category 1 using national model based on categories 3, 5 and 8– Also impute SPD for adults in category 1 using
national model based on categories 2, 3, 4, 5, 7 and 8
38
Predictor Variables
• Core BRFSS socio-demographic, health, and risk factor variables used in logistic regression models used to impute missing SPD values in states that used module
• For national model used above variables plus state-level variables from ACS, NCHS, and other sources
• Step-wise logistic regression
39
Predictor: Physical Health
0 days 30 days Missing
SPD = 0 98% 80% 92%
SPD = 1 2% 20% 8%
40
Predictor: Mental Health
0 days 30 days Missing
SPD = 0 99% 66% 90%
SPD = 1 1% 34% 10%
41
Predictor: Poor Health
0 days 30 days Missing
SPD = 0 97% 69% 99%
SPD = 1 3% 31% 1%
42
36 States: SPD Estimates Before Versus After Imputation
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
SPD without imputedSPD with imputed
43
Validation Of State-Level Imputation Models
• For imputation of missing SPD values in categories 2, 4 and 7 in the 36 states compare imputed value with adjusted total K6 score for those who answered at least one of the six questions
440.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
K6 score for SPD imputed to 1K6 score for SPD imputed to 0
45
Validation of National Imputation Model
• Divide adults in categories 3, 5 and 8 in each of the 36 states into two random halves – Training sample and validation sample
• Using training sample to fit national model• Apply model to validation sample to obtain
imputed SPD values• Use validation sample to compare imputed SPD
prevalence estimates with actual SPD prevalence estimates
46
Correlation = 0.88
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8
Actual SPD estimates
Impu
ted
SPD
est
imat
es
47
SPD Prevalence Estimates For the 15 States and the National
Prevalence Estimate
48
Alabama 6.0Arizona 3.2Delaware 3.3Florida 4.1Idaho 2.7Maryland 3.7New Jersey 3.6New York 5.2North Carolina 5.4North Dakota 3.9Pennsylvania 3.7South Dakota 3.9Tennessee 5.2Utah 2.5West Virginia 5.8U.S. Total 4.3
49
50
51
Next Steps And Applicability To Other Modules
• Calculate variance of national SPD estimate using multiple imputation
• Approach generalizes best to optional modules used in at least half of the states
• Approach generalizes best to optional module outcome variables that are highly correlated with a question used in the core BRFSS