practical statistical reasoning in clinical trials for non-statisticians

"This training has been funded in whole or in part with Federal funds from the National Institute on Drug Abuse, National Institutes of Health, Department of Health and Human Services, under Contract No.HHSN271201000024C."

Produced by: NIDA CTN CCC Training Office

2012Web Seminar Series

PRACTICAL STATISTICAL REASONING INCLINICAL TRIALS FOR NON-STATISTICIANS

Presented on November 14, 2012 by:

Paul Wakim, PhDAbigail G. Matthews, PhDClinical Trials Network

National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

2

Presenters

• Abigail G. Matthews, PhDBiostatisticianNIDA CTN Data and Statistics Center EMMES Corporation

• Paul Wakim, PhDSenior Mathematical StatisticianNIDA CCTN

3

Outline:

• Introduction• Trial Design• Q&A• Analysis Plan• Trial Monitoring and Interim Analyses• Q&A• Primary Analysis• Subgroup Analyses• Q&A

4

Goals

• Improve communication between researchers and biostatisticians– Importance of collaboration– Role of the biostatistician in clinical trials research– Basic statistical concepts

• Discussion with participants from all backgrounds

NO technical information, and NO formulas

5

Lack of Communication

6

Lack of Communication

7

Why is Communication So Important?

• Biostatisticians cannot:– Propose research questions– Be subject-matter experts– Design a study without clinical input– Design statistical analyses without clinical input– Interpret results and place in clinical context

• Investigators cannot:– Be knowledgeable about all statistical issues involved in sample size

estimation and development of analysis plans– Implement the often complex statistical analyses involved in clinical trials– Interpret statistical analyses

» Without communication, neither can do their jobs

8

Role of a Biostatistician

• Work with investigators on trial design– Insure design will yield results that answer research

question of interest– Aid in defining primary outcome– Conduct sample size calculations– Write appropriate sections of protocol

• Develop analysis plan– Identify interim analyses and procedures for trial

monitoring– Design primary analysis– Specify methods for subset analyses, sensitivity analyses

and other exploratory analyses

9

Role of a Biostatistician (cont’d)

• Implement trial monitoring and interim analyses– Develop monitoring reports for investigators, site

staff, and sponsor, for example• Recruitment rates• Demographics• Availability of primary outcome

– Prepare and present DSMB reports for open and closed sessions

– Conduct interim analyses such as efficacy, futility and sample size re-estimation

– Aid in preparation of IND Annual Reports

10

Role of a Biostatistician (cont’d)

• Implement analysis plan– Aid in creation of the final/clinical study report• Tables• Figures• Interpretation

– Perform any additional analyses for manuscripts• Contribute to IND reports as necessary• Develop novel statistical methodologies to

analyze clinical trial data more appropriately (if necessary)

TRIAL DESIGN

11

Trial Design

Clinical Trials NetworkNational Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services


Basic designs

Primary outcome measure (a.k.a. primary endpoint)

Sample size and power analysis

Basic Design: Superiority

Clinical hypothesis:Experimental treatment is more effective thanthe control treatment

Statistical hypotheses:Null hypothesis H0: Experimental – Control = 0Alternative hypothesis H1: Experimental – Control ≠ 0

We expect (hope) to reject H0 in favor of H1


95% confidence intervals around the difference: Experimental – ControlHigh numbers (on the right) represent good outcome

Basic Design: Superiority

Superior

Inconclusive

Inconclusive

Inferior

Diff.= 0

Based on Piaggio 2006

Basic Design: Non-Inferiority

Clinical hypothesis:Experimental treatment is not less effective than the control treatment

Statistical hypotheses:Null hypothesis H0: Experimental – Control < – MAlternative hypothesis H1: Experimental – Control ≥ – M

We expect (hope) to reject H0 in favor of H1


Basic Design: Non-InferioritySuperior

Diff.= 0Diff.= -M

Inconclusive

Non-inferior(?)

Inconclusive(?)

Non-inferior

Inferior

95% confidence intervals around the difference: Experimental – ControlHigh numbers (on the right) represent good outcome

Non-inferior

Based on Piaggio 2006

Basic Design: EquivalenceClinical hypothesis:Experimental treatment is as effective as the control treatment

Statistical hypotheses:Null hypothesis H0:

Experimental – Control < – M or Experimental – Control > + M

Alternative hypothesis H1: – M ≤ Experimental – Control ≤ + M

We expect (hope) to reject H0 in favor of H1Clinical Trials Network

National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

Basic Design: EquivalenceSuperior

Inconclusive(?)

Equivalent(?)

Equivalent

Equivalent(?)

Inconclusive(?)

Inferior

Inconclusive

Diff.=+MDiff.=-M Diff.=0

95% confidence intervals around the difference: Experimental – ControlHigh numbers (on the right) represent good outcome Based on Piaggio 2006

Primary Outcome Measure(aka primary endpoint)

• Clinically meaningful

• Simple vs. composite



1. Treating ordinal data as categorical

2. Creating dichotomies from continuous data

3. Using change from baseline



Three “Deadly Sins” in MeasuringClinical Trial Outcomes

From Stephen Senn, 2011

Sample Size & Power Analysis



What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate

Expected drop-out rate Sample size

Expected drop-out rate Sample size

Increase the sample size to account for the expected amount of missing data in the primary analysis



Expected Drop-Out Rate(amount of missing primary data)




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect

Smallest Meaningful Clinical Differenceto Detect



Difference to detect Sample size

Difference to detect Sample size




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect• Alpha, aka chance of Type I error, e.g. 5%

Alphaaka probability of making a Type I error



Non-technical definition (superiority trial):Chance of concluding that the experimental treatment is (more) effective when in fact it is not

Technical definition:Probability of rejecting H0 when H0 is true

Different perspectives: FDA, Pharmaceutical company

Bottom line:Most commonly used value for α: 0.05 (two-sided)



Alpha Sample size

Alpha Sample size

Alphaaka probability of making a Type I error




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect• Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90%

Power to Detect an Effect



Non-technical definition (superiority trial):Chance of concluding that the experimental treatment is (more) effective when in fact it is

Technical definition:Probability of rejecting H0 when H0 is false (i.e. when H1 is true)

Different perspectives: FDA, Pharmaceutical company

Bottom line:Most commonly used value for power: between 0.80 & 0.90



Power Sample size

Power Sample size

Power to Detect an Effect




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect• Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90%• Variability of primary outcome measure

Variability of Primary Outcome Measure



Variability Sample size

Variability Sample size




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect• Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90%• Variability of primary outcome measure• Correlation between measurements within the

same cluster (aka Intra-Class Correlation or ICC)



From Wikipedia

Correlation Between Measurements within the Same Cluster

(e.g. repeated measures)



Intra-class correlation Sample size

Intra-class correlation Sample size




What the biostatistician needs and why:• Number of treatment groups• Superiority or non-inferiority or equivalence• One-sided or two-sided• Expected drop-out rate• Smallest meaningful clinical difference to detect• Alpha, aka chance of Type I error, e.g. 5% • Power to detect an effect, e.g. 80% or 90%• Variability of primary outcome measure• Correlation between measurements within the

same cluster (aka Intra-Class Correlation or ICC)

One Final Note About Sample Size



Cost, which has nothing to do with biostatistics, is most often a key factor in the final decision on sample size.

QUESTIONS?

ANALYSIS PLAN

40

41

Purpose

• Identify primary outcome measure a priori• Spell out analytic methods a priori• Remove criticism of data driven analyses

In CTN:• Analysis plan must be finalized before data lock• Developed by DSC, but approved by Lead Node

Key Components of an Analysis Plan

1) Population to analyze: Intent-to-Treat (ITT) vs. per-protocol

(PP) analysis

2) Statistical test or model for primary outcome

3) Adjustment for multiple comparisons

4) Handling of missing data

5) Handling of outliers

6) Interim analyses

7) Sensitivity analyses

8) Secondary and subgroup analyses

43

1. Population Analyzed

Intent-to-Treat (ITT)• ALL randomized participants are analyzed• “Once randomized, analyzed”• Participants with completely missing data are includedPer-Protocol (PP)• Analyze a select subset of randomized participants as

stated in protocol• For example,

– Only participants who had at least 80% of study medication– Only participants who attended at least 50% of the expected

TAU sessions

44

2. Statistical Test or Model

Test• What statistical test should be used?• What time points are of interest?• Measure of treatment effectModeling• Must have parameter(s) to test primary outcome and hypothesis• Longitudinal model/repeated measures, single time point or

composite score• Consider inclusion of stratification factors, time by treatment

interactions, additional covariates (e.g. level of baseline substance use)

• Potential site effects

45

3. Adjustment for Multiple Comparisons

Why?• Need to control the study-wise false positive rate (type I

error)• If perform 100 tests, 5% will be significant by chance if α =

0.05When?• More than one primary outcome• Multiple treatment comparisons (e.g. multiple doses vs.

placebo)• Multiple time points of interest, but not longitudinal model

46

3. Adjustment for Multiple Comparisons (cont’d)

How?• Bonferroni– Very conservative, but simple– Split type I error rate equally between all statistical

tests• Stepwise procedures

47

4. Handling of Missing Data

Based on the first 24 multi-site CTN trials on substance abuse conducted between 2001 and 2010, the percent of missing data for the primary outcome measure ranged from 2% to 60% (Wakim 2011).

There are many methods of handling missing data with varying levels of complexity, e.g.,– Simple: imputing missing abstinence data as positive– Complex: pattern mixture models

48

Types of Missing Data1. Missing Completely at Random (MCAR)– Whether an observation is missing or not is completely

random– Participant does not attend visit due to snow storm

2. Missing at Random (MAR)– Unobserved data can be explained by observed data– Most common statistical methods will yield valid results

under MAR3. Missing Not at Random (MNAR)– Unobserved data cannot be explained by observed data– Participant does not attend study visit because they were

using– Standard statistical methods cannot be used

49

5. Handling of Outliers

An outlier is a value that is so far from the others that it appears to have come from a different population.

The presence of outliers can invalidate many statistical analyses.

Motulsky 2010

50

6. Interim Analyses

• Specify type of interim analyses to be performed– Sample size re-estimation– Futility– Efficacy

• Specify when analyses will be performed– e.g., sample size re-estimation when 50% of

participants have completed active treatment• Specify frequency of these analyses– e.g., DSMB meetings every six months

51

7. Sensitivity Analyses

Essence: Determine how sensitive the study results are to various aspects of the analysis

• Common to assess different methods of handling missing data

• Compare alternative statistical methods

52

8. Secondary and Subgroup Analyses

• Specify secondary analyses of primary outcome(s)

• Describe secondary outcomes• Identify exploratory analyses• Subgroup analyses:– Gender– Race– Ethnicity

TRIAL MONITORING ANDINTERIM ANALYSES

53

54

Trial Monitoring

1) Adverse events (AEs) and Serious Adverse Events (SAEs)

2) Regulatory compliance

3) Recruitment

4) Availability of primary outcome

5) Treatment exposure

6) Retention (follow-up visits)

7) Data quality

55

Interim Analyses

• Analysis of outcome variable(s) during conduct of the trial » may need to adjust for these multiple “looks”

• Evaluate whether study should be concluded early, possible reasons:– Current sample yields sufficient power– Not to expose participants to an unsafe treatment– Prevent treatment of participants with a clearly

inferior therapy– Insurmountable logistical issues, such as extremely

poor data quality or recruitment

56

Types of Interim Analyses

1. Sample size re-estimation

2. Efficacy

3. Futility

4. Harm

57

Sample Size Re-estimation

Why?• Uncertainty in parameters estimates and

assumptions used in original calculations

How? - example• Only analyze one treatment arm (placebo) and

compute sample size needed to detect clinically meaningful effect

• Not estimating treatment effect » no impact on study-wide type I error rate

58

Efficacy

Question: Is one treatment arm clearly inferior or superior?

• Analyze data as specified for final data analysis• Specify stopping rules a priori• Advantages:

– Can be used to drop a treatment arm if clearly inferior to others

– Prevents exposure of participants to an ineffective treatment• Disadvantages:

– Requires unblinding– Must adjust for multiple “looks” at the data

59

Futility

Question: Based on the data observed thus far, is there clear evidence of no difference between the two treatment conditions?

• Compute the conditional power (probability of detecting a true treatment effect given observed data)

• A priori, specify an unacceptable value of conditional power

60

Harm

Question: Is one treatment arm unsafe, or less safe than another arm?

• Compare occurrence of AEs and/or SAEs with acceptable limits

• Test whether frequency and/or type of AE/SAE differs across treatment arms

• Advantages:– No impact on study-wide type I error rate

• Disadvantages:– May require unblinding

QUESTIONS?

PRIMARY ANALYSIS

Primary Analysis



General key points

Interaction: what does it mean?

N=11 pairs of measures (x,y) produce the following statistical results:

Property Value

Average of x 9.0

Variance of x 10.0

Average of y 7.5

Variance of y 3.75

Correlation between x and y 0.816

Regression line y = 3 + 0.5x

Anscombe’s Quartet


From Wikipedia

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

x1

y1


0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

x2

y2


0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

x3

y3


0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

x4

y4


Always start with a simple graph of the primary outcome, over time if applicable, and by treatment group



General Key Point # 1

If the primary research question is important, the answer (result) is important, regardless of whether it is positive, negative or null, as long as it is valid.

A well designed and conducted clinical trial that produces a null result is not a “failed study”.

A null result advances scientific knowledge by eliminating an ineffective treatment from the list of possibly effective treatments, thus shortening that list.




Sensitivity Analysis

As part of the analysis for the primary manuscript, present the results with at least one variation of the primary analysis, e.g., a slightly modified outcome, a different statistical model, or a different assumption.




Woody et al. JAMA 2008

Convert the statistical results to the original scale, with point estimates and corresponding confidence intervals for:

• The primary outcome for each treatment group

• The treatment effect (or effect size, i.e., the difference of the primary outcome between control and experimental treatment groups)




Understand in simple English, not in statistical jargon, what the primary results mean, e.g.,

• Reject H0 vs. Do not reject H0

• p-value• Interaction






Interaction

What does it mean?

1) Treatment effect

2) Site effect

3) Treatment-by-site interaction

4) Quantitative vs. qualitative interaction



Interaction - What does it mean?



1) Treatment Effect



2) Site Effect



3) Treatment-by-Site Interaction(same as site-by-treatment interaction)

4) Treatment-by-Site InteractionQuantitative vs. Qualitative


So what’s the bottom line?

• There is no major downside to including a site effect in the primary analysis. In fact, it may increase power.

• Testing for a treatment-by-site interaction is important.

• A significant treatment-by-site interaction affects the interpretation of the overall treatment effect and the generalizability of the conclusions; but if explained, it may shed light on important factors that modify treatment response.


SUBGROUP ANALYSES

90

What are Subgroup Analyses?Special type of secondary analyses that focus on differences in treatment effect among subgroups of trial participants• Protocol or analysis plan usually specifies some subgroup

analyses• Can also be ad hoc (i.e. exploratory), but this not preferable• Examples:– Gender, race, ethnicity (required by NIH)– Age group– Socioeconomic status– Severity of disease/disorder

91

Key Points

• Subgroups defined on pre-randomization characteristics

• Number of subgroup analyses should be kept to a minimum

• Two approaches:1. Perform analysis within each subgroup2. Use interaction terms

• Caution: statistical significance in subgroup analysis does not imply overall treatment effect

QUESTIONS?

THANK YOU

Bassler D, Briel M, Montori VM, Lane M et al., Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis, JAMA, 2010, 303(12):1180-1187.

Briel M, Lane M, Montori VM et al., Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2), Trials, 2009, 10:49-58.

Briggs M, Why do statisticians answer silly questions that no one ever asks?, Significance, February 2012, Volume 9, Issue 1, pp. 30-31.

Committee for Proprietary Medicinal Products (CPMP), Points to Consider on Adjustment for Baseline Covariates, Statistics in Medicine, 2004, 23:701-709.

Dmitrienko A, Molenberghs G, Chuang-Stein C & Offen W, Analysis of Clinical Trials Using SAS: A Practical Guide, 2005, SAS Institute Inc.

Dmitrienko A et al. (editors), Multiple Testing Problems in Pharmaceutical Statistics, Chapman and Hall/CRC Biostatistics Series, 2009.

Dmitrienko A, Key Multiplicity Problems in Clinical Trials, presentation at the 2011 FDA/Industry Statistics Workshop, Washington, DC.

References (1 of 4)

European Agency for the Evaluation of Medicinal Products (EMEA), Committee for Proprietary Medicinal Products (CPMP), Points to Consider on Multiplicity Issues in Clinical Trials, 19 September 2002.

FDA/ICH, Guidance for Industry: E09 Statistical Principles for Clinical Trials, September 1998.

Friedman LM, Furberg CD & DeMets DL, Fundamentals of Clinical Trials, 4th Edition, Springer, 2010

Graham JW, Missing Data Analysis: Making It Work in the Real World, Annual Review of Psychology, 2009, 60: 549-576.

Jennison C & Turnbull BW, Group Sequential Methods with Applications to Clinical Trials, Chapman & Hall/CRC, 2000.

Lachin JM, A review of methods for futility stopping based on conditional power, Statistics in Medicine, 2005, 24:2747-2764.

Lan KKG & Wittes J, The B-Value: A Tool for Monitoring Data, Biometrics, 1988, 44:579-585.

Motulsky H, Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking, Second Edition, Oxford, 2010

References (2 of 4)

Moyé LA, Statistical Monitoring of Clinical Trials: Fundamentals for Investigators, Springer, 2006.

Moyé LA, Rudiments of Subgroup Analyses, Progress in Cardiovascular Diseases, 2012, 54:338-342.

Petticrew M et al., Damned if you do, damned if you don’t: subgroup analysis and equity, Journal of Epidemiology and Community Health 2012, 66:95-98.

Piaggio G et al., Reporting of Noninferiority and Equivalence Randomized Trials: An Extension of the CONSORT Statement, JAMA (2006), 295:1152-1160.

Proschan MA, Lan KKG & Wittes JT, Statistical Monitoring of Clinical Trials: A Unified Approach, Springer, 2006.

Proschan MA, Sample size re-estimation in clinical trials, Biometrical Journal, 2009, 51(2):348-357.

Senn S, Statistical Issues in Drug Development, presentation at the 2011 FDA/Industry Statistics Workshop, Washington, DC.

References (3 of 4)

Sun X et al., Credibility of claims of subgroup effects in randomised controlled trials: systematic review, British Medical Journal, 2012, 344:e1553 (Published 15 March 2012).

Underwood D, The Profitable Pause, International Clinical Trials, August 2011, Issue 21, 56-60.

Wainer H et al., Finding what is not there through the unfortunate binning of results: The Mendel effect, Chance, 2006, 19:49-52.

Wakim P et al., Relation of study design to recruitment and retention in CTN trials, American Journal of Drug and Alcohol Abuse, 37:426–433, 2011.

Wikipedia, Intraclass correlation, accessed on 10/5/2011.

Wikipedia, Anscombe’s Quartet, accessed on 11/30/2011.

Woody GE et al., Extended vs Short-term Buprenorphine-Naloxone for Treatment of Opioid-Addicted Youth: A Randomized Trial, Journal of the American Medical Association, 2008, 300(17):2003-2011.

Zhu L, Ni L & Yao B, Group Sequential Methods and Software Applications, The American Statistician, 2011, Vol. 65, No. 2, 127-135.

References (4 of 4)

A copy of this presentation will be available electronically after the meeting

http://ctndisseminationlibrary.org

98

DATE WEBINARS

DEC 19 Helping Patients with Substance Use Disorders and Pain

Upcoming Webinars

The CCC encourages all to complete the survey issued to participants directly following the webinar session, as this is the primary collective tool for rating your experience with this and other webinars, and communicating the interests and needs of CTN members and associates.

Survey Reminder

practical statistical reasoning in clinical trials for non-statisticians

Documents

complex statistical

subset analyses

additional analyses

sensitivity analyses

exploratory analyses

clinical trial data

statistical issues

clinical contextinvestigators