paths to precision medicine: subgroup identification in clinical trials xin huang, yan sun, v....

Paths to precision medicine: Subgroup Identification in Clinical Trials

Xin Huang, Yan Sun, V. Devanarayan

Exploratory Statistics

AbbVie Inc.

April. 15, 2015

Subgroup Identification in Clinical Trials| April. 2015| Copyright © 2015 AbbVie 2

Why Personalized Medicine?

Edward Abrahams and Mike Silver. The Case for Personalized Medicine. (2009) Journal of Diabetes Science and Technology V3 Issue 4


• For ease of implementation in clinical practice, need cut-points on biomarkers for predicting responders/non-responders.

• i.e., threshold-based biomarker signatures

• E.g., Patients with Gene X1 > …, Gene X2 < …, are likely responders.

• This should be “Multivariate”.

• Derive this from typically a large panel of candidate markers, and often from full genome data (e.g., > 30,000 genes).

• Need to account for linear and nonlinear trends.

• After a promising threshold-based signature is identified, need to predict it’s performance in a future dataset.

• i.e., predict treatment effect in the “responder” subgroup, or predict the signature effect among patients receiving treatment.

Some statistical challenges


Biomarker signatures for subgroup identification

• Prognostic Signature: • identifies a subgroup of patients that are more likely to experience an

outcome of interest (efficacy, toxicity, disease progression, etc.), independent of treatment.

• Predictive Signature: • identifies a subgroup of patients that respond better to a specific

treatment.


Prognostic Signatures (predict the disease outcome irrespective of the treatment ): CART (Breiman et al, 1984) MARS (Friedman, 1991) RuleFit (Friedman and Popescu 2008)

Predictive Signatures (predict the response to a specific treatment compared to other treatments): Interaction Trees (Su et al. 2008, 2009) Virtual Twins ( Foster et al. 2011) SIDES method (Lipkovich et al. 2011, 2014) Bayesian approaches (Berger et al. 2014)

Some existing methods


Consider a supervised learning problem with data where is a p-vector of predictor and is an outcome variable

Consider three major applications: • Linear regression for continuous response

• Logistic regression for binary response, where

• Cox regression for survival response: , where is a right censored survival time and is the censoring indicator

Denote the log likelihood or log partial likelihood by , where is the usual linear combination of predictors. • continuous response in simple linear regression

• log odds in logistic regression

• log hazard ratio in proportional hazards regression.

Objective functions


Consider the following model for prognostic signatures (predict the outcome, irrespective of the treatment),

, (1)

where is the signature rule returning grouping indicators for each subject.

Consider following model for predictive signatures (predict the response to a specific treatment compared to the other treatment),

, (2)

where r is the treatment indicator.

Our algorithms derive signature rules, , with the objective of searching for a best grouping to optimize the significance of in (1) and (2)

Objective functions, contd.


Original Data

Tree 1

>= C1< C1

Tree 2

>= C2< C2 … … ...Tree B

>= CB< CB

Aggregate Thresholds (C1, C2, …., CB)

BATTing Threshold (Median)

Bootstrapping (sampling with replacement)

Data 1 Data 2 Data B… … ...

Threshold is robust to small perturbations in data, outliers, etc.

Bootstrapping & Aggregating of Thresholds from Trees (BATTing)

(Devanarayan, 1999)


BATTing, contd.


Sequential BATTing

Model Growing within the potential Sig+ group• Get the BATTing threshold for each unused marker • The best marker is selected to split the current sig+ group• This procedure continues in the new Sig+ group

Stopping Rule:• The new added predictor goes through the likelihood ratio test for

significance.

Whole Population (Sig+)

Sig-

(Sig+) (Sig+) Sig+

Sig- Sig- Sig-

Marker 7 Marker 3 Marker 9


Adaptive Index ModelAIM (Tian & Tibshirani, 2010) can be used for selecting markers & thresholds.• Output: AIM Score

• An index predictor: # of satisfied rules

• Model to get the AIM score Prognostic:

Predictive: . • An information matrix based fast algorithm is used to do score test to

select threshold for each marker• Markers are selected one at a time (forward selection)• Optimal # of markers is determined via cross validation


AIM-BATTing

1. Obtain the AIM Score

2. Use BATTing to derive an optimal AIM Score threshold based on Model (1) & (2). The threshold is then used to stratify the population.

Patient 1

Patient 2

Patient n

AIMI(X1≥c1)

+I(X2≤c2)

…..+

I(Xk≥ck)

Score 1

Step1

Score 2

Score n

Step2

BATTingI( Score ≥ j )

Sig+ Grp.

Sig- Grp.


Some Refinements to the AIM-BATTing algorithm

• MC-AIM-BATTing: – Monte Carlo procedure to get a more stable estimate of the “optimal # of

markers”.

– i.e., use the median of estimated “optimal # of markers” across multiple cross validation runs with different random seeds

• MC-AIM-RULE-BATTing: – Use BATTing directly on the rules (Xi > c), instead of scores, and get a

cutoff on the rule list.

– Patients meeting all the rules within the cutoff are assigned to the sig+ group


Using an entire dataset to build a model

Select “important” variables by associating markers with outcomes (e.g., stepwise regression)

Test and rely on lack of fit assessment of the resulting model

Assuming the resulting model is correct, making inferences using the same dataset

over-fitting

Performance evaluation: Common mistakes in practice


Predictive significance via cross-validation

Aggregated cross-validated p values from M iterations (p1, p2, …., pM)

predictive significance (median of p value)

Repeat Multiple

Times

Note: other performance statistics, e.g., sensitivity, specificity, PPV, NPV, hazard ratio, odds ratio can be calculated similarly

Train

Test

Sig.

Train

Test

Sig.

Train

Test

Sig.

Group Label

Group Label

Group Label

Group Label

Group Label

CV p-valuepi


• Similar simulation model as Lipkovich et al., 2011, 2014, with each predictor as continuous instead of dichotomized valued

• Small trials to large trials (n=100, 300, 500)

• Number of candidate predictors is k=10 and 18 with different correlation structures

• Effect size is 0.2 (low), 0.5 (medium), 0.8 (high)

Simulation Design

Effect size = E(Y|Trt, sig+) - E(Y|ctrl, sig+) = 0.5

0.5


Simulation Results

• For small effect size, none of the methods has many testing p values less than 0.05 for sample size from 100 to 500

• Our proposed methods outperform SIDES in terms of the selection accuracy: the accuracy of SIDES is around 50% while that of our proposed algorithms is from 60% to 70% for large sample size.

• For effect size greater than medium (0.5) and sample size larger than 300, our proposed methods have most of the testing p values less than 0.05 and accuracy around 90%. SIDES method under performs in all scenarios.


• Data simulated based on a Phase III clinical trial

• Efficacy of a novel treatment is compared to the standard of care (Control) in patients with severe sepsis

• Treatment arm (n = 317) vs. Control arm (n = 153)

• Outcome: Binary (survival)

• Available markers: • demographic and clinical covariates, i.e., age, time from first sepsis-organ fail

to start drug, sum of baseline SOFA socres (cardiovascular, hematology, hepaticrenal, and respiration scores), number of baseline organ failures, pre-infusion apache-ii score, baseline GLASGOW coma scale score, baseline activity of daily living score

• laboratory markers, i.e., baseline local platelets, creatinine, serum IL-6 concentration, local bilirunbin

• Overall outcome was insignificant (1-tailed p value = 0.08), with survival rates of 40.7% and 34% in the treatment and control arms, respectively

• Original data was randomly split into two parts (training + testing)

Clinical Trial Case Study


Signature rules for positive subgroup:

• Sequential-BATTing and AIM-RULE: pre-infusion apache-ii score <= 27

• AIM: meet at least two out of the three thresholds: (1) pre-infusion apache-ii score < 27; (2) Age < 54; (3) local bilirunbin > 0.8

• SIDES: Creatinine <= 1.1 & baseline GLASGOW coma scale score > 11

Clinical Trial Case Study, contd.

Table: 1-tailed p-values for sepsis trial exampleDataset Method

Sig+ Sig- Trt. Ctrl.AIM 0.019 0.001 0.000 0.462 0.000

AIM.Rule 0.126 0.000 0.000 0.692 0.000Seq.BT 0.126 0.000 0.000 0.692 0.000SIDES 0.018 0.041 0.002 0.304 0.003AIM 0.267 0.007 0.000 0.560 0.006

AIM.Rule 0.495 0.011 0.000 0.677 0.018Seq.BT 0.105 0.005 0.000 0.816 0.002SIDES 0.313 0.209 0.003 0.059 0.641AIM 0.952 0.050 0.000 0.150 0.164

AIM.Rule 0.173 0.001 0.000 0.957 0.000Seq.BT 0.173 0.001 0.000 0.957 0.000SIDES 0.885 0.299 0.871 0.599 0.742

Corss-validation (median of

100 iteration)

Validation dataset

Treatment Difference Group Difference Interaction

training dataset

Seq-BATTing has the most promising CV performance, and its signature is validated in the test dataset


The proposed subgroup identification algorithms perform well in simulations and case-study illustration.

These algorithms provide threshold-based multivariate biomarker signatures. • Variable selection is automatically built-in to these algorithms.

Personalized medicine is a paradigm shift in drug development, which requires• Advanced subgroup identification and subgroup analysis methods• Enrichment design and simulations• Smart diagnostic test development and clinical development

strategy to overcome operational challenges• Collaboration between functional areas

Summary


1. Hastie T, Tibshirani R, Friedman J (2011) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2nd ed. 2009. Corr. 7th printing 2013 edition. Springer

2. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees, 1 edition. Chapman and Hall/CRC3. Friedman JH (1991) Multivariate Adaptive Regression Splines. Ann Stat 19:1–67. doi: 10.1214/aos/11763479634. Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954. doi: 10.1214/07-AOAS1485. Liu X, Minin V, Huang Y, et al. (2004) Statistical methods for analyzing tissue microarray data. J Biopharm Stat 14:671–685. doi:

10.1081/BIP-2000256576. Chen G, Zhong H, Belousov A, Devanarayan V (2015) A PRIM approach to predictive-signature development for patient

stratification. Stat Med 34:317–342. doi: 10.1002/sim.63437. Su X, Zhou T, Yan X, et al. (2008) Interaction Trees with Censored Survival Data. Int J Biostat. doi: 10.2202/1557-4679.10718. Su X, Tsai C-L, Wang H, et al. (2009) Subgroup Analysis via Recursive Partitioning. J Mach Learn Res 10:141–158.9. Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search--a recursive

partitioning method for establishing response to treatment in patient subpopulations. Stat Med 30:2601–2621. doi: 10.1002/sim.4289

10. Lipkovich I, Dmitrienko A (2014) Strategies for identifying predictive biomarkers and subgroups with enhanced treatment effect in clinical trials using SIDES. J Biopharm Stat 24:130–153. doi: 10.1080/10543406.2013.856024

11. Berger JO, Wang X, Shen L (2014) A Bayesian approach to subgroup identification. J Biopharm Stat 24:110–129. doi: 10.1080/10543406.2013.856026

12. Devanarayan V, Cummins D, Tanzer L (1999) Application of GAM and tree models for assessing the role of drug resistance proteins in leukemia chemotherapy.

13. Tian L, Tibshirani R (2011) Adaptive index models for marker-based risk stratification. Biostatistics 12:68–86. doi: 10.1093/biostatistics/kxq047

14. Tian L, Alizadeh A, Gentles A, Tibshirani R (2012) A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates. arXiv

15. Tibshirani R, Efron B (2002) Pre-validation and inference in microarrays. Stat Appl Genet Mol Biol. doi: 10.2202/1544-6115.100016. Foster JC, Taylor JM, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med. 30(24) 2867-80

References

paths to precision medicine: subgroup identification in clinical trials xin huang, yan sun, v....

Documents

subgroup of patients

responder subgroup

clinical practice

c b c b aggregate thresholds

clinical trials xin

data b

tree b

treatment effect