nicholas jewell medicres world congress 2011
TRANSCRIPT
Good Biostatistical Report Practices
Being Honest in Data Analysis
Nicholas P. Jewell Departments of Statistics &
School of Public Health (Biostatistics) University of California, Berkeley
March 26, 2011
1
2
3
References
• Bailar, J.C. and Mosteller, F. “Guidelines for statistical reporting in articles for medical journals: Amplifications and explanations,” Annals of Internal Medicine, 108, 1988, 266-273.
• Lang, T.A. and Secic M. “How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers,” 2nd Edition, Amrican College of Physicians, 2006.
• Altman, D. “Statistical reviewing for medical journals, Statistics in Medicine, 17, 1998, 2661-2674.
© Nicholas P. Jewell, 2011
4
Overview • Opportunistic Data Torturing
– Multiple comparisons – Multiple analyses – Subgroup analyses
• Procustean Data Torturing – Subgroup analyses (or lack thereof) – Meta-analyses
• Sensitivity Analyses/Proxies/Assumptions/Honesty • Blinding the Statistician (who funds the statistician?) • Pre-specified Data Driven Analyses • Dissemination and Publication
5
© Nicholas P. Jewell, 2011
What is the Recipe for a Good Biostatistical Review?
The Four Hs • What basic questions should be asked?
– GBRS Questions provide an excellent checklist (guidelines etc)
– How was the Data Collected (e.g. design/sample size issues/loss to follow up)
– How were the variables measured? – How was the data analyzed? (e.g. methods/
assumptions, software, pre-specified methods) – How were the results reported? (e.g. sensitivity,
interpretation, uncertainty, conflicts of interest) 6
© Nicholas P. Jewell, 2011
Deeper Questions The Five Ws
• What is the population of interest? • What is the data-generating mechanism (e.g.
assumptions, missingness or data filtering) • What are the parameters of interest that describe the
data-generating mechanism? (e.g. causal inference) • Which of these parameters are identifiable and how can
they be estimated effectively? • Where can I find the data and the software?
7
© Nicholas P. Jewell, 2011
The MIRA Trial • Gates Foundation study to determine the
effectiveness of a latex diaphragm in the reduction of heterosexual acquisition of HIV among women
• Two arm, randomized, controlled trial • Primary intervention: diaphragm and gel provision
to diaphragm arm (nothing to control arm). • Secondary Intervention: Intensive condom provision
and counseling given to both arms, plus treatment of STIs
• Trial is not blinded • 5000 women seen for 18 months in three sites in
Zimbabwe and South Africa 8
© Nicholas P. Jewell, 2009
MIRA Trial: Basic Intention to Treat Results
• Basic Intent-to-Treat Analysis: – 158 new HIV infections in Diaphragm Arm – 151 new HIV infections in Control Arm
• ITT estimate of Relative Risk is 1.05 with a 95% CI of (0.84, 1.30)
• End of story . . . . .?
9
© Nicholas P. Jewell, 2009
MIRA Trial: Basic Intention to Treat Results
• However condom use differed between the two arms: – 53.5% in Diaphragm Arm (by visit) – 85.1% in Control Arm (by visit)
• Could this mean that the diaphragm was more effective than it appeared from the basic analysis?
• To make sense of this—we’d like to understand the role of condom use in mediating the effect of treatment assignment on HIV infection. 10
© Nicholas P. Jewell, 2009
Most Important Public Health Questions
1. What is the effectiveness of providing study product in environment of country-level standard condom counseling?
(in environment of no condom counseling?) 2. How does providing study product alone compare to
consistent condom use alone in reducing HIV transmission?
3. How does providing the study product alone compare to unprotected sex, in terms of risk of HIV infection?
None of these questions are answered by basic ITT analysis 11
© Nicholas P. Jewell, 2009
Reasons effectiveness may be different in environment of intensive condom counseling vs. environment of
country-level standard condom counseling
A. In trials without blinding, effect of intensive condom counseling may be different for treatment arm and control arm.
B. Effects of intensive condom counseling may be different for those who adhere to assigned treatment and those who don’t.
C. Intensive condom counseling may itself affect adherence to assigned treatment.
12
© Nicholas P. Jewell, 2009
Reason A: In unblinded trials, effect of intensive condom counseling may be different for treatment arm and control arm.
Treatment Arm
Control Arm
Non-adherers (25%) Adherers (75%)
50 infections 40
50
Treatment Arm Control Arm
Non-adherers (25%) Adherers (75%)
50 40
50 10
RR:170/200=0.85
RR:106/80=1.33
Study Product (or placebo) Users Condom Users No Condom Counseling
Intensive Condom Counseling
Assume: Condoms 80% protective, Study Product 20% efficacy, and Intensive Condom Counseling less effect in treatment arm
8
50 infections
40 40
50 50
8
10 10
MIRA Trial—Randomization of Diaphragm
HIV Infection
Tx Assignment (Diaphragm use)
Condom Use
14
© Nicholas P. Jewell, 2009
Estimating Direct Effects: Adjusting for a Mediator (condom use)
HIV Infection
Treatment Group (Diaphragm use)
Condom Use
DIRECT EFFECT
• We want to estimate the direct effect of diaphragm provision, at a set level of condom use. (Petersen et al. 2006, Robins and Greenland 1992, Pearl 2000, Rosenblum et al. 2009) • Still ITT interpretation • Requires Stronger Assumptions than basic ITT
15
© Nicholas P. Jewell, 2009
Estimating Direct Effects: Stratify on condom use?
(danger: condom use is not randomized)
HIV Infection
Treatment Group (Diaphragm use)
Condom Use
Confounders
after stratification on condom use
HIV Infection
Treatment Group (Diaphragm use)
Confounders randomization hasn’t ruled out confounding of direct effect!
Now have to adjust for confounders (but we are still ITT)
Why Stratification/Regression Gives Biased Estimates When There Are
Confounders as Causal Intermediates
Treatment Group (R)
HIV Status (H)
Condom Use (C)
Diaphragm Use (D) (confounder)
Using Regression, if we control for D, we don’t get the direct effect that we want. © Nicholas P. Jewell, 2009 © Nicholas P. Jewell, 2009
Results of Direct Effects Analysis
• Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Never”: 0.59 (95% CI: 0.26, 4.56)
• Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Always”: 0.96 (95% CI: 0.59, 1.45)
Conclusion: No definitive evidence from direct effects analysis that diaphragms prevent (or don’t prevent) HIV.
© Nicholas P. Jewell, 2009
18
© Nicholas P. Jewell, 2009
Pain Trials with Self-Reporting • Pain is the most disturbing symptom of peripheral
neuropathy among diabetic patients
• As many as 45% of patients with diabetes develop peripheral neuropathies
• Gabapentin was suggested as a treatment option
• To evaluate the effect of Gabapentin, a randomized, double-blind, placebo-controlled trial was conducted
• 165 patients with a 1- to 5-year history of pain attributed to diabetic neuropathy enrolled at 20 different sites
© Nicholas P. Jewell, 2011
19
Backonja et al. Trial in JAMA • The main outcome was daily pain severity as measured on an
11-point Likert scale (0 no pain- 10 worst possible pain)
• Eighty-four patients received gabapentin, 81 received placebo
• By intention-to-treat analysis, gabapentin-treated patients' mean daily pain score (baseline 6.4, end point 3.9) was significantly lower (P<.001) than the placebo-treated patients' score (baseline 6.5, end point 5.1)
• Concluded that gabapentin appears to be efficacious for the treatment of pain associated with diabetic peripheral neuropathy
© Nicholas P. Jewell, 2011
20
Handling Treatment-Related Side Effects
• Treat side effects singly by removing those individuals from the data analysis and seeing if that changed the results
© Nicholas P. Jewell, 2011
21
Perception Effect • Patients have a perception about the treatment they
receive • In general we may think of the patients assigning a
degree of certainty (probability) to receiving the active treatment, measured by a variable P
• In most cases we do not observe the patient’s perception on a continuous scale
© Nicholas P. Jewell, 2011
€
P =
10−1
⎧
⎨ ⎪
⎩ ⎪
convinced on treatmentnot sure if on treatment or placebo
convinced on placebo
22
Application to Unmasking of Blinded Treatment Assignment Due to Side Effects
Outcome (pain score)
Treatment Group
Side effects Confounders
• side effects are treatment-related and therefore potentially unmasking
• outcome is measured subjectively
• actual use of the drug (adherence) is a potential confounder 23
© Nicholas P. Jewell, 2011
Data • Outcome:
mean pain score for the last 7 diary entries
• Baseline covariates: age, sex, race, height, weight, baseline pain, baseline sleep
• Treatment: gabapentin, placebo
• Perception: changes from 0 to 1 when a treatment-related side effect occurs
© Nicholas P. Jewell, 2011
24
Parameters of Interest • What is the treatment effect if all the patients
remained unknowledgeable about their treatment? (Perception fixed at 0)
• What is the treatment effect if all the patients thought they were receiving the active treatment? (Perception fixed at 1)
• (The difference between these two parameters can be thought of as a perception bias)
© Nicholas P. Jewell, 2011
preferred ITT parameter
25
Estimates
© Nicholas P. Jewell, 2011
26
Subgroups—Interaction?
27
© Nicholas P. Jewell, 2011
28
© Nicholas P. Jewell, 2011
29
© Nicholas P. Jewell, 2011
30
© Nicholas P. Jewell, 2011
Reviewer Comments
• Figure 4- The steroid finding is worthy of comment, even if the interaction is not statistically significant (the power for interactions is low). It may well be that the new agent has little benefit for those not treated with steroids,”
• Figure 4 has been deleted as suggested by the editor. While there was not a significant reduction in relative risk of confirmed PUBS in the subgroup not taking steroids at baseline, VIGOR was only designed to detect a significant reduction overall, and not within any specific subgroup. There were, however, reductions in relative risk in this subgroup albeit not significant. Whenever many subgroups are examined, it becomes likely that there will be no apparent effect in some of them purely by chance. This is why the test for interaction, although underpowered as you rightly pointed out, is the appropriate way to assess subgroup effects. When there is no treatment-by-subgroup interaction, the more reliable estimate of treatment effect within a subgroup is the effect in the entire cohort [1].”
31
© Nicholas P. Jewell, 2011
Meta-Analyses
• Random or Fixed Effects – heterogeneity – Measure of association (RR or ER)
• Zero-event trials/use of all evidence – Drug safety studies (Vioxx, Celebrex, Avandia, etc) – Continuity corrections – Study duration
32
© Nicholas P. Jewell, 2009
Meta-Analyses and No Event Studies
• One study: 30 events under Tx, 3 under placebo (5000 individuals in both arms) – Risk difference of 0.0054 with exact 95%
confidence interval of (0.0032, 0.0076), p < 0.0001 • Now add three (short duration) studies,all with
5000 in both arms, no events in either arms in all cases – exact 95% confidence interval of (-0.0044, 0.0001),
p = 0.46
33
© Nicholas P. Jewell, 2011
Multiplicity
34
© Nicholas P. Jewell, 2011
References • Ioannidis, J.P.A. “Why most published research findings are false,” PLoS Med, 2(8),
2005, e124. • Siegfried, T. “Odds are, it’s wrong,” Science News, March 27, 2010, 26-29. • Bailar, J.C. III., ”Science. Statistics, and deception," Annals of Internal Medicine, 104,
1986, 259-260. • Mills. J.S., “Data torturing,” New England Journal of Medicine, 329, 1993, 1196-1199. • Young, J., Hubbard, A.E., Eskenazi, B. and Jewell, N. P., “A machine-learning
algorithm for estimating and ranking the impact of environmental risk factors in exploratory epidemiological studies”, to appear, 2011.
• A.E. Hubbard, Ahern, J., Fleischer, N.L., Lippman, S.A., Jewell, N.P., Bruckner, T., Satariano, W.A., “To GEE or not to GEE: Comparing estimating function and likelihood-based methods for estimating the associations between neighboring risk factors and health,” Epidemiology, 21, 2010, 467-474.
• Van der Laan, M., Hubbard, A., Jewell, N.P., ”Semi-parametric models versus faith-based inference," to appear, Epidemiology, 21, 2010, 479-481.
35
© Nicholas P. Jewell, 2011