nicholas jewell medicres world congress 2011

Good Biostatistical Report Practices

Being Honest in Data Analysis

Nicholas P. Jewell Departments of Statistics &

School of Public Health (Biostatistics) University of California, Berkeley

March 26, 2011

1

References

•  Bailar, J.C. and Mosteller, F. “Guidelines for statistical reporting in articles for medical journals: Amplifications and explanations,” Annals of Internal Medicine, 108, 1988, 266-273.

•  Lang, T.A. and Secic M. “How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers,” 2nd Edition, Amrican College of Physicians, 2006.

•  Altman, D. “Statistical reviewing for medical journals, Statistics in Medicine, 17, 1998, 2661-2674.

© Nicholas P. Jewell, 2011

4

Overview •  Opportunistic Data Torturing

–  Multiple comparisons –  Multiple analyses –  Subgroup analyses

•  Procustean Data Torturing –  Subgroup analyses (or lack thereof) –  Meta-analyses

•  Sensitivity Analyses/Proxies/Assumptions/Honesty •  Blinding the Statistician (who funds the statistician?) •  Pre-specified Data Driven Analyses •  Dissemination and Publication

5


What is the Recipe for a Good Biostatistical Review?

The Four Hs •  What basic questions should be asked?

–  GBRS Questions provide an excellent checklist (guidelines etc)

–  How was the Data Collected (e.g. design/sample size issues/loss to follow up)

–  How were the variables measured? –  How was the data analyzed? (e.g. methods/

assumptions, software, pre-specified methods) –  How were the results reported? (e.g. sensitivity,

interpretation, uncertainty, conflicts of interest) 6


Deeper Questions The Five Ws

•  What is the population of interest? •  What is the data-generating mechanism (e.g.

assumptions, missingness or data filtering) •  What are the parameters of interest that describe the

data-generating mechanism? (e.g. causal inference) •  Which of these parameters are identifiable and how can

they be estimated effectively? •  Where can I find the data and the software?

7


The MIRA Trial •  Gates Foundation study to determine the

effectiveness of a latex diaphragm in the reduction of heterosexual acquisition of HIV among women

•  Two arm, randomized, controlled trial •  Primary intervention: diaphragm and gel provision

to diaphragm arm (nothing to control arm). •  Secondary Intervention: Intensive condom provision

and counseling given to both arms, plus treatment of STIs

•  Trial is not blinded •  5000 women seen for 18 months in three sites in

Zimbabwe and South Africa 8


MIRA Trial: Basic Intention to Treat Results

•  Basic Intent-to-Treat Analysis: – 158 new HIV infections in Diaphragm Arm – 151 new HIV infections in Control Arm

•  ITT estimate of Relative Risk is 1.05 with a 95% CI of (0.84, 1.30)

•  End of story . . . . .?

9


MIRA Trial: Basic Intention to Treat Results

•  However condom use differed between the two arms: – 53.5% in Diaphragm Arm (by visit) – 85.1% in Control Arm (by visit)

•  Could this mean that the diaphragm was more effective than it appeared from the basic analysis?

•  To make sense of this—we’d like to understand the role of condom use in mediating the effect of treatment assignment on HIV infection. 10


Most Important Public Health Questions

1. What is the effectiveness of providing study product in environment of country-level standard condom counseling?

(in environment of no condom counseling?) 2. How does providing study product alone compare to

consistent condom use alone in reducing HIV transmission?

3. How does providing the study product alone compare to unprotected sex, in terms of risk of HIV infection?

None of these questions are answered by basic ITT analysis 11


Reasons effectiveness may be different in environment of intensive condom counseling vs. environment of

country-level standard condom counseling

A. In trials without blinding, effect of intensive condom counseling may be different for treatment arm and control arm.

B. Effects of intensive condom counseling may be different for those who adhere to assigned treatment and those who don’t.

C. Intensive condom counseling may itself affect adherence to assigned treatment.

12


Reason A: In unblinded trials, effect of intensive condom counseling may be different for treatment arm and control arm.

Treatment Arm

Control Arm

Non-adherers (25%) Adherers (75%)

50 infections 40

50

Treatment Arm Control Arm

Non-adherers (25%) Adherers (75%)

50 40

50 10

RR:170/200=0.85

RR:106/80=1.33

Study Product (or placebo) Users Condom Users No Condom Counseling

Intensive Condom Counseling

Assume: Condoms 80% protective, Study Product 20% efficacy, and Intensive Condom Counseling less effect in treatment arm

8

50 infections

40 40

50 50

8

10 10

MIRA Trial—Randomization of Diaphragm

HIV Infection

Tx Assignment (Diaphragm use)

Condom Use

14


Estimating Direct Effects: Adjusting for a Mediator (condom use)

HIV Infection

Treatment Group (Diaphragm use)

Condom Use

DIRECT EFFECT

•  We want to estimate the direct effect of diaphragm provision, at a set level of condom use. (Petersen et al. 2006, Robins and Greenland 1992, Pearl 2000, Rosenblum et al. 2009) •  Still ITT interpretation •  Requires Stronger Assumptions than basic ITT

15


Estimating Direct Effects: Stratify on condom use?

(danger: condom use is not randomized)

HIV Infection


Condom Use

Confounders

after stratification on condom use

HIV Infection


Confounders randomization hasn’t ruled out confounding of direct effect!

Now have to adjust for confounders (but we are still ITT)

Why Stratification/Regression Gives Biased Estimates When There Are

Confounders as Causal Intermediates

Treatment Group (R)

HIV Status (H)

Condom Use (C)

Diaphragm Use (D) (confounder)

Using Regression, if we control for D, we don’t get the direct effect that we want. © Nicholas P. Jewell, 2009 © Nicholas P. Jewell, 2009

Results of Direct Effects Analysis

•  Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Never”: 0.59 (95% CI: 0.26, 4.56)

•  Relative Risk of HIV infection between Diaphragm arm and Control arm by end of Trial, with Condom Use Fixed at “Always”: 0.96 (95% CI: 0.59, 1.45)

Conclusion: No definitive evidence from direct effects analysis that diaphragms prevent (or don’t prevent) HIV.


18


Pain Trials with Self-Reporting •  Pain is the most disturbing symptom of peripheral

neuropathy among diabetic patients

•  As many as 45% of patients with diabetes develop peripheral neuropathies

•  Gabapentin was suggested as a treatment option

•  To evaluate the effect of Gabapentin, a randomized, double-blind, placebo-controlled trial was conducted

•  165 patients with a 1- to 5-year history of pain attributed to diabetic neuropathy enrolled at 20 different sites


19

Backonja et al. Trial in JAMA •  The main outcome was daily pain severity as measured on an

11-point Likert scale (0 no pain- 10 worst possible pain)

•  Eighty-four patients received gabapentin, 81 received placebo

•  By intention-to-treat analysis, gabapentin-treated patients' mean daily pain score (baseline 6.4, end point 3.9) was significantly lower (P<.001) than the placebo-treated patients' score (baseline 6.5, end point 5.1)

•  Concluded that gabapentin appears to be efficacious for the treatment of pain associated with diabetic peripheral neuropathy


20

Handling Treatment-Related Side Effects

•  Treat side effects singly by removing those individuals from the data analysis and seeing if that changed the results


21

Perception Effect •  Patients have a perception about the treatment they

receive •  In general we may think of the patients assigning a

degree of certainty (probability) to receiving the active treatment, measured by a variable P

•  In most cases we do not observe the patient’s perception on a continuous scale


€

P =

10−1

⎧

⎨ ⎪

⎩ ⎪

convinced on treatmentnot sure if on treatment or placebo

convinced on placebo

22

Application to Unmasking of Blinded Treatment Assignment Due to Side Effects

Outcome (pain score)

Treatment Group

Side effects Confounders

•  side effects are treatment-related and therefore potentially unmasking

•  outcome is measured subjectively

•  actual use of the drug (adherence) is a potential confounder 23


Data •  Outcome:

mean pain score for the last 7 diary entries

•  Baseline covariates: age, sex, race, height, weight, baseline pain, baseline sleep

•  Treatment: gabapentin, placebo

•  Perception: changes from 0 to 1 when a treatment-related side effect occurs


24

Parameters of Interest •  What is the treatment effect if all the patients

remained unknowledgeable about their treatment? (Perception fixed at 0)

•  What is the treatment effect if all the patients thought they were receiving the active treatment? (Perception fixed at 1)

•  (The difference between these two parameters can be thought of as a perception bias)


preferred ITT parameter

25

Estimates


26

Subgroups—Interaction?

27


28


29


30


Reviewer Comments

•  Figure 4- The steroid finding is worthy of comment, even if the interaction is not statistically significant (the power for interactions is low). It may well be that the new agent has little benefit for those not treated with steroids,”

•  Figure 4 has been deleted as suggested by the editor. While there was not a significant reduction in relative risk of confirmed PUBS in the subgroup not taking steroids at baseline, VIGOR was only designed to detect a significant reduction overall, and not within any specific subgroup. There were, however, reductions in relative risk in this subgroup albeit not significant. Whenever many subgroups are examined, it becomes likely that there will be no apparent effect in some of them purely by chance. This is why the test for interaction, although underpowered as you rightly pointed out, is the appropriate way to assess subgroup effects. When there is no treatment-by-subgroup interaction, the more reliable estimate of treatment effect within a subgroup is the effect in the entire cohort [1].”

31


Meta-Analyses

•  Random or Fixed Effects –  heterogeneity –  Measure of association (RR or ER)

•  Zero-event trials/use of all evidence –  Drug safety studies (Vioxx, Celebrex, Avandia, etc) –  Continuity corrections –  Study duration

32


Meta-Analyses and No Event Studies

•  One study: 30 events under Tx, 3 under placebo (5000 individuals in both arms) – Risk difference of 0.0054 with exact 95%

confidence interval of (0.0032, 0.0076), p < 0.0001 •  Now add three (short duration) studies,all with

5000 in both arms, no events in either arms in all cases – exact 95% confidence interval of (-0.0044, 0.0001),

p = 0.46

33


Multiplicity

34


References •  Ioannidis, J.P.A. “Why most published research findings are false,” PLoS Med, 2(8),

2005, e124. •  Siegfried, T. “Odds are, it’s wrong,” Science News, March 27, 2010, 26-29. •  Bailar, J.C. III., ”Science. Statistics, and deception," Annals of Internal Medicine, 104,

1986, 259-260. •  Mills. J.S., “Data torturing,” New England Journal of Medicine, 329, 1993, 1196-1199. •  Young, J., Hubbard, A.E., Eskenazi, B. and Jewell, N. P., “A machine-learning

algorithm for estimating and ranking the impact of environmental risk factors in exploratory epidemiological studies”, to appear, 2011.

•  A.E. Hubbard, Ahern, J., Fleischer, N.L., Lippman, S.A., Jewell, N.P., Bruckner, T., Satariano, W.A., “To GEE or not to GEE: Comparing estimating function and likelihood-based methods for estimating the associations between neighboring risk factors and health,” Epidemiology, 21, 2010, 467-474.

•  Van der Laan, M., Hubbard, A., Jewell, N.P., ”Semi-parametric models versus faith-based inference," to appear, Epidemiology, 21, 2010, 479-481.

35


nicholas jewell medicres world congress 2011

Health & Medicine

data analysis nicholas

diaphragm arm

data filtering

basic analysis

basic questions

basic intention

latex diaphragm

results basic intent