stabilized dynamic treatment regimes · current diabetes guidelines: tight control of glycosylated...

26
Motivation Methodology Simulation Studies and Data Analysis Stabilized Dynamic Treatment Regimes Yingqi Zhao Fred Hutchinson Cancer Research Center IMA Innovative Statistics and Machine Learning for Precision Medicine, Minneapolis, Sep 14, 2017 1/ 26

Upload: others

Post on 26-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

Yingqi Zhao

Fred Hutchinson Cancer Research Center

IMA Innovative Statistics and Machine Learning for PrecisionMedicine, Minneapolis,

Sep 14, 2017

1/ 26

Page 2: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Outline

1 Motivation

2 Methodology: stabilized dynamic treatment regimes

3 Simulation

4 Discussion

2/ 26

Page 3: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Active Surveillance in Prostate cancer

Active surveillance represents a strategy to address theovertreatment of prostate cancer.

Delay intervention in patients whose tumors initially havefeatures consistent with a low risk cancer and treat only whena more clinically significant malignancy is identified.

Patients treated with active surveillance undergo serialmonitoring with serum prostate specific antigen (PSA)measurements, clinical examinations and repeat biopsies.

When to recommend intervention (surgery)?

3/ 26

Page 4: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Canary Prostate cancer Active Surveillance Study (PASS)

A multi-institutional active surveillance cohort established in2008.

Broad eligibility criteria were used to sample the full spectrumof men using active surveillance; 905 participants enrolled.

Participants were followed with serum PSA measurementsevery 3 months, clinical and digital rectal examination every 6months, and repeat prostate biopsy 6 to 12, 24, 48 and 72months after diagnosis.

Outcome: disease reclassification

Can we identify a *fixed* rule over time to recommendintervention as needed?

4/ 26

Page 5: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Diabetes in Complex Patients

Current diabetes guidelines: tight control of glycosylatedhemoglobin (A1c) (< 7 %)

Healthy patients.

Based on trials of younger patients without severe diabetescomplications or other comorbidities.

Relatively low risk of tight control; significant benefits inreducing incidence of vascular events

Tight control of BP (<130/80 mm Hg) and LDL cholesterol(<100 mg/dl) for patients with diabetes

5/ 26

Page 6: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Diabetes in Complex Patients

Inappropriate for complex diabetes patients, i.e., older patients(age > 65 years) and/or those with comorbid conditions.

Evidence for these guidelines was mainly obtained from theresults of randomized clinical trials (RCTs)

Complex patients usually meet the exclusion criteria of clinicaltrials

Increased risk of drug-related morbidity, e.g., hypoglycemia,hypotension

How can we strengthen the current guidelines for complexpatients?

6/ 26

Page 7: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

A1c Control Observational Study

Linked claims and EHR data for Medicare beneficiaries in theUniversity of Wisconsin Medical Foundation (UWMF) system.

8,304 diabetes patients active during 2003-2011, recordedeach 90-day quarter in which they were alive at the start ofthe quarter

Outcome: adverse outcomes occurring during these quarters(e.g., emergency department use or hospitalizations, death),documented from claims information.

Covariates: sociodemographics, and indicators forcomorbidities; time varying patient complexity

Can we identify a *fixed* rule to target tight control of A1c?

7/ 26

Page 8: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Dynamic Treatment Regime

At any decision point

Input: available historical information on the patient to thatpoint.Output: next treatment.

Dynamic treatment regimes (DTRs) are sequential decisionrules for individual patients that can adapt over time to anevolving illness.

One decision rule for each time point.Each rule: recommends the treatment at that point as afunction of accrued historical information.An algorithm for treating any patient.Aim to optimize some cumulative clinical outcome.

8/ 26

Page 9: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

DTR Goals

Learn adaptive treatment strategies: tailor (sequences of)treatments based on patient characteristics.

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

Maximize the benefit of dynamic treatment regimes:

Well chosen tailoring variables.

Well devised decision rules.

9/ 26

Page 10: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Dynamic Treatment Regimes (DTR)

Observe data on n individuals, T stages for each individual,

X1,A1,R1,X2,A2, . . . ,XT ,AT ,RT ,XT+1

Xj : Patient covariates available at stage j .Aj : Treatment at stage j , Aj ∈ {−1, 1}.Rj : Outcome following stage j .Hj : History available at stage j , Hj = {X1,A1,R1, . . . ,Aj−1,Rj−1,Xj}.

A DTR is a sequence of decision rules:

d = (d1(H1), . . . , dT (HT )), dj(Hj) ∈ {−1, 1}.

Goal

Maximize the expected sum of outcomes if the DTR isimplemented in the future.

10/ 26

Page 11: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Value Function and Optimal DTR for Multiple Stages

The value function: V(d) = Ed(R1 + . . .+ RT ).

Optimal DTR: d∗ = argmaxd V(d).

Two main challenges in developing optimal DTRs:

Taking individual information into account in decision making.

Incorporating long-term benefits and risks of treatment due todelayed effects.

11/ 26

Page 12: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Q-learning

Backwards and recursively estimates the following Q-function:

Qj(hj , aj) = E (Rj+ maxaj+1∈{−1,1}

Qj+1(Hj+1, aj+1)|Hj = hj ,Aj = aj),

where QT+1 = 0, and hj ∈ Oj , aj ∈ Aj , j = 1, . . . ,T .

The estimated optimal sequence of decision rules

d̂j(hj) = argmaxaj∈{−1,1}

Q̂j(hj , aj).

Q learning with regression: estimate the Q-functions fromdata using regression and then find the optimal DTR.

Decision not shared.

12/ 26

Page 13: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Current methods

Simultaneous g-estimation: the empirical performance islargely unknown

Q learning with shared parameters: simultaneous estimationprocedure for the shared parameters based on Q-learning

Rely on the assumption that models are correct

13/ 26

Page 14: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

Learn the optimal regimes at all stages simultaneously.

Any dj(hj) can be written as dj(hj) = sign{fj(hj)};Aj = dj(hj) is equivalent to Aj fj(hj) > 0.

Decompose Hj into H(s)j and H

(u)j , which represent variables

that are shared or not shared.

Let βj = (β(u)j

ᵀ, β(s)

ᵀ)ᵀ, where β

(u)j are the unshared

parameters, and β(s) are the shared parameters across stages.

14/ 26

Page 15: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

An inverse probability of treatment estimator (IPWE) of V(d)

V̂ IPWE (f ) = Pn

(∑j Rj I{A1f1(H1) > 0, . . . ,AT fT (HT ) > 0}∏T

j=1 pj(Aj |Hj)

)

= Pn

(∑j Rj I [minj=1,...,T{Aj fj(Hj)} > 0]∏T

j=1 pj(Aj |Hj)

),

where fj(hj) = h(s)j β(s) + h

(u)j β

(u)j , and Pn denotes empirical

averages.

15/ 26

Page 16: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

Replace a concave surrogate for the indicator to alleviate thecomputation difficulties.

maxβ(u)j ,β(s)

Pn

Rψ[minj=1,...,T{Aj(H(u)j

ᵀβ(u)j + H

(s)j

ᵀβ(s))}]∏T

j=1 pj(Aj |Hj)

,

denoted by Φn(β(u)j , β(s)) and ψ is a concave function.

Use ψ(t) = − log(1 + e−t)

Use soft minimum to approximate the min function.

pj(Aj |Hj) can be estimated using e.g., logistic regression

Apply the LASSO penalty for sparsity, i.e., maximize

max Φn(β(u)j , β(s))− λn

(∑j

|β(u)j |+∑|β(s)|

),

where ψ(t) = − log(1 + e−t) and L1 penalties are imposed on

β(u)j and β(s).

16/ 26

Page 17: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

The IPWE V̂ IPWE (f ) has potentially high variance.

Consider a doubly robust method – construct augmentedIPWE for V(d).

V̂ AIPWE (f ) = Pn

[ ∑a1,...,aT

I [ minj=1,...,T

{aj fj(Hj)} ≥ 0]Wa(Y ,HT ;λ, L)

]

Incorporate contributions from the subjects who did notreceive the specified treatment assignments across all stagesvia the weighting function Wa(Y ,HT ;λ, L).

Weighting functions involves propensity scores, λ, andregression functions, L, at each stage.

Robust against misspecification.

17/ 26

Page 18: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Properties and Computation

Fisher consistency

Dervatives can be derived and we implement Orthant-WiseLimited-memory Quasi- Newton algorithm to find the solution.

Posit parametric models to approximate propensity scores andregression functions.

18/ 26

Page 19: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies: Generative Model

Baseline covariates X1,1, . . . ,X1,p ∼ N(0, 1)

4 stages.

Xj ,1 = Xj−1,1 + N(0, 1), 2 ≤ j ≤ 4.

Aj ∈ {−1, 1} w.p. 0.5 in Scenarios 1, 2

Aj ∈ {−1, 1} and logit{P(Aj = 1)} = Xj ,1 + Xj ,2 − 0.5 inScenarios 3, 4

19/ 26

Page 20: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies: Generative Model

Scenario 1/3: covariates shared throughout

Y ∼ 10 + X1,1X1,2 + X1,3 −∑4

j=1 |Xj ,1 − 1|{I (Aj >

0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).

Scenario 2/4: 25 covariates in stage 1 unshared, covariatesshared from stages 2 to 4

Y ∼ 10 + X1,1X1,2 + X1,3 − |X1,1 + X1,2 − 2|{I (A1 >0)− I (X1,1 + X1,2 − 2 > 0)}2 −

∑4j=2 |Xj ,1 − 1|{I (Aj >

0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).

20/ 26

Page 21: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies

Optimal decisions:

Scenario 1/3: d∗j (hj) = sign(Xj − 1)Scenario 2/4:d∗1 (h1) = sign(X1 + X2 − 2), d∗j (hj) = sign(Xj − 1), j = 2, 3, 4

Training data sample size n = 300.

Testing data sample size 10000.

p = 5, 10.

500 replications; tuning parameters selected via crossvalidation.

Evaluate using the values of the estimated DTRs.

21/ 26

Page 22: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies

Table 1: Results

Scenario 1 2 3 4

p=5SDTR-IPW 9.92 9.40 9.38 8.86SDTR-AIPW 9.89 9.48 9.68 9.15Q learning 8.12 8.15 8.65 8.82Shared Q learning 9.74 9.25 9.40 9.19

p=10SDTR-IPW 9.87 9.38 9.21 8.71SDTR-AIPW 9.85 9.56 9.63 9.04Q learning 8.14 8.12 8.53 8.65Shared Q learning 9.58 9.07 9.30 9.02

22/ 26

Page 23: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies

Mimic PASS study

Baseline variables: (PSA0,Coreratio0) ∼ a multivariate normaldistribution with mean (5.2,13) and the covariance matrixdiag(12.5,49); 5 other variables ∼ N(0, 1)

If PSAj−1 < 8, do not intervene at stage j ; otherwise,intervene, i.e., Aj = 1 with probability

exp(4− 0.2PSAj − 0.01Coreratioj−11 + exp(4− 0.2PSAj − 0.01Coreratioj−1))

The probability of reclassification at stage j is generated from

exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)

1 + exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)

23/ 26

Page 24: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Simulation Studies

Table 2: Results

SDTR-IPW SDTR-AIPW Q learning Shared Q learning

35% 35% 42% 40%

Decision rule by SDTR-IPW:d̂(xj) = 26.7− 3.7PSAj − 0.12Coreratioj

24/ 26

Page 25: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Discussion

Survival outcomes.

Multiple treaments.

Incorporating Cost.

Landmark analysis.

25/ 26

Page 26: Stabilized Dynamic Treatment Regimes · Current diabetes guidelines: tight control of glycosylated hemoglobin (A1c) (

MotivationMethodology

Simulation Studies and Data Analysis

Acknowledgement

IMA workshop organizersRuoqing ZhuYingye ZhengGuanhua Chen

26/ 26