stabilized dynamic treatment regimes · current diabetes guidelines: tight control of glycosylated...

MotivationMethodology

Simulation Studies and Data Analysis

Stabilized Dynamic Treatment Regimes

Yingqi Zhao

Fred Hutchinson Cancer Research Center

IMA Innovative Statistics and Machine Learning for PrecisionMedicine, Minneapolis,

Sep 14, 2017

1/ 26



Outline

1 Motivation

2 Methodology: stabilized dynamic treatment regimes

3 Simulation

4 Discussion

2/ 26



Active Surveillance in Prostate cancer

Active surveillance represents a strategy to address theovertreatment of prostate cancer.

Delay intervention in patients whose tumors initially havefeatures consistent with a low risk cancer and treat only whena more clinically significant malignancy is identified.

Patients treated with active surveillance undergo serialmonitoring with serum prostate specific antigen (PSA)measurements, clinical examinations and repeat biopsies.

When to recommend intervention (surgery)?

3/ 26



Canary Prostate cancer Active Surveillance Study (PASS)

A multi-institutional active surveillance cohort established in2008.

Broad eligibility criteria were used to sample the full spectrumof men using active surveillance; 905 participants enrolled.

Participants were followed with serum PSA measurementsevery 3 months, clinical and digital rectal examination every 6months, and repeat prostate biopsy 6 to 12, 24, 48 and 72months after diagnosis.

Outcome: disease reclassification

Can we identify a *fixed* rule over time to recommendintervention as needed?

4/ 26



Diabetes in Complex Patients

Current diabetes guidelines: tight control of glycosylatedhemoglobin (A1c) (< 7 %)

Healthy patients.

Based on trials of younger patients without severe diabetescomplications or other comorbidities.

Relatively low risk of tight control; significant benefits inreducing incidence of vascular events

Tight control of BP (<130/80 mm Hg) and LDL cholesterol(<100 mg/dl) for patients with diabetes

5/ 26



Diabetes in Complex Patients

Inappropriate for complex diabetes patients, i.e., older patients(age > 65 years) and/or those with comorbid conditions.

Evidence for these guidelines was mainly obtained from theresults of randomized clinical trials (RCTs)

Complex patients usually meet the exclusion criteria of clinicaltrials

Increased risk of drug-related morbidity, e.g., hypoglycemia,hypotension

How can we strengthen the current guidelines for complexpatients?

6/ 26



A1c Control Observational Study

Linked claims and EHR data for Medicare beneficiaries in theUniversity of Wisconsin Medical Foundation (UWMF) system.

8,304 diabetes patients active during 2003-2011, recordedeach 90-day quarter in which they were alive at the start ofthe quarter

Outcome: adverse outcomes occurring during these quarters(e.g., emergency department use or hospitalizations, death),documented from claims information.

Covariates: sociodemographics, and indicators forcomorbidities; time varying patient complexity

Can we identify a *fixed* rule to target tight control of A1c?

7/ 26



Dynamic Treatment Regime

At any decision point

Input: available historical information on the patient to thatpoint.Output: next treatment.

Dynamic treatment regimes (DTRs) are sequential decisionrules for individual patients that can adapt over time to anevolving illness.

One decision rule for each time point.Each rule: recommends the treatment at that point as afunction of accrued historical information.An algorithm for treating any patient.Aim to optimize some cumulative clinical outcome.

8/ 26



DTR Goals

Learn adaptive treatment strategies: tailor (sequences of)treatments based on patient characteristics.

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

Maximize the benefit of dynamic treatment regimes:

Well chosen tailoring variables.

Well devised decision rules.

9/ 26



Dynamic Treatment Regimes (DTR)

Observe data on n individuals, T stages for each individual,

X1,A1,R1,X2,A2, . . . ,XT ,AT ,RT ,XT+1

Xj : Patient covariates available at stage j .Aj : Treatment at stage j , Aj ∈ {−1, 1}.Rj : Outcome following stage j .Hj : History available at stage j , Hj = {X1,A1,R1, . . . ,Aj−1,Rj−1,Xj}.

A DTR is a sequence of decision rules:

d = (d1(H1), . . . , dT (HT )), dj(Hj) ∈ {−1, 1}.

Goal

Maximize the expected sum of outcomes if the DTR isimplemented in the future.

10/ 26



Value Function and Optimal DTR for Multiple Stages

The value function: V(d) = Ed(R1 + . . .+ RT ).

Optimal DTR: d∗ = argmaxd V(d).

Two main challenges in developing optimal DTRs:

Taking individual information into account in decision making.

Incorporating long-term benefits and risks of treatment due todelayed effects.

11/ 26



Q-learning

Backwards and recursively estimates the following Q-function:

Qj(hj , aj) = E (Rj+ maxaj+1∈{−1,1}

Qj+1(Hj+1, aj+1)|Hj = hj ,Aj = aj),

where QT+1 = 0, and hj ∈ Oj , aj ∈ Aj , j = 1, . . . ,T .

The estimated optimal sequence of decision rules

d̂j(hj) = argmaxaj∈{−1,1}

Q̂j(hj , aj).

Q learning with regression: estimate the Q-functions fromdata using regression and then find the optimal DTR.

Decision not shared.

12/ 26



Current methods

Simultaneous g-estimation: the empirical performance islargely unknown

Q learning with shared parameters: simultaneous estimationprocedure for the shared parameters based on Q-learning

Rely on the assumption that models are correct

13/ 26




Learn the optimal regimes at all stages simultaneously.

Any dj(hj) can be written as dj(hj) = sign{fj(hj)};Aj = dj(hj) is equivalent to Aj fj(hj) > 0.

Decompose Hj into H(s)j and H

(u)j , which represent variables

that are shared or not shared.

Let βj = (β(u)j

ᵀ, β(s)

ᵀ)ᵀ, where β

(u)j are the unshared

parameters, and β(s) are the shared parameters across stages.

14/ 26




An inverse probability of treatment estimator (IPWE) of V(d)

V̂ IPWE (f ) = Pn

(∑j Rj I{A1f1(H1) > 0, . . . ,AT fT (HT ) > 0}∏T

j=1 pj(Aj |Hj)

)

= Pn

(∑j Rj I [minj=1,...,T{Aj fj(Hj)} > 0]∏T

j=1 pj(Aj |Hj)

),

where fj(hj) = h(s)j β(s) + h

(u)j β

(u)j , and Pn denotes empirical

averages.

15/ 26




Replace a concave surrogate for the indicator to alleviate thecomputation difficulties.

maxβ(u)j ,β(s)

Pn

Rψ[minj=1,...,T{Aj(H(u)j

ᵀβ(u)j + H

(s)j

ᵀβ(s))}]∏T

j=1 pj(Aj |Hj)

,

denoted by Φn(β(u)j , β(s)) and ψ is a concave function.

Use ψ(t) = − log(1 + e−t)

Use soft minimum to approximate the min function.

pj(Aj |Hj) can be estimated using e.g., logistic regression

Apply the LASSO penalty for sparsity, i.e., maximize

max Φn(β(u)j , β(s))− λn

(∑j

|β(u)j |+∑|β(s)|

),

where ψ(t) = − log(1 + e−t) and L1 penalties are imposed on

β(u)j and β(s).

16/ 26




The IPWE V̂ IPWE (f ) has potentially high variance.

Consider a doubly robust method – construct augmentedIPWE for V(d).

V̂ AIPWE (f ) = Pn

[ ∑a1,...,aT

I [ minj=1,...,T

{aj fj(Hj)} ≥ 0]Wa(Y ,HT ;λ, L)

]

Incorporate contributions from the subjects who did notreceive the specified treatment assignments across all stagesvia the weighting function Wa(Y ,HT ;λ, L).

Weighting functions involves propensity scores, λ, andregression functions, L, at each stage.

Robust against misspecification.

17/ 26



Properties and Computation

Fisher consistency

Dervatives can be derived and we implement Orthant-WiseLimited-memory Quasi- Newton algorithm to find the solution.

Posit parametric models to approximate propensity scores andregression functions.

18/ 26



Simulation Studies: Generative Model

Baseline covariates X1,1, . . . ,X1,p ∼ N(0, 1)

4 stages.

Xj ,1 = Xj−1,1 + N(0, 1), 2 ≤ j ≤ 4.

Aj ∈ {−1, 1} w.p. 0.5 in Scenarios 1, 2

Aj ∈ {−1, 1} and logit{P(Aj = 1)} = Xj ,1 + Xj ,2 − 0.5 inScenarios 3, 4

19/ 26



Simulation Studies: Generative Model

Scenario 1/3: covariates shared throughout

Y ∼ 10 + X1,1X1,2 + X1,3 −∑4

j=1 |Xj ,1 − 1|{I (Aj >

0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).

Scenario 2/4: 25 covariates in stage 1 unshared, covariatesshared from stages 2 to 4

Y ∼ 10 + X1,1X1,2 + X1,3 − |X1,1 + X1,2 − 2|{I (A1 >0)− I (X1,1 + X1,2 − 2 > 0)}2 −

∑4j=2 |Xj ,1 − 1|{I (Aj >

0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).

20/ 26



Simulation Studies

Optimal decisions:

Scenario 1/3: d∗j (hj) = sign(Xj − 1)Scenario 2/4:d∗1 (h1) = sign(X1 + X2 − 2), d∗j (hj) = sign(Xj − 1), j = 2, 3, 4

Training data sample size n = 300.

Testing data sample size 10000.

p = 5, 10.

500 replications; tuning parameters selected via crossvalidation.

Evaluate using the values of the estimated DTRs.

21/ 26



Simulation Studies

Table 1: Results

Scenario 1 2 3 4

p=5SDTR-IPW 9.92 9.40 9.38 8.86SDTR-AIPW 9.89 9.48 9.68 9.15Q learning 8.12 8.15 8.65 8.82Shared Q learning 9.74 9.25 9.40 9.19

p=10SDTR-IPW 9.87 9.38 9.21 8.71SDTR-AIPW 9.85 9.56 9.63 9.04Q learning 8.14 8.12 8.53 8.65Shared Q learning 9.58 9.07 9.30 9.02

22/ 26



Simulation Studies

Mimic PASS study

Baseline variables: (PSA0,Coreratio0) ∼ a multivariate normaldistribution with mean (5.2,13) and the covariance matrixdiag(12.5,49); 5 other variables ∼ N(0, 1)

If PSAj−1 < 8, do not intervene at stage j ; otherwise,intervene, i.e., Aj = 1 with probability

exp(4− 0.2PSAj − 0.01Coreratioj−11 + exp(4− 0.2PSAj − 0.01Coreratioj−1))

The probability of reclassification at stage j is generated from

exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)

1 + exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)

23/ 26



Simulation Studies

Table 2: Results

SDTR-IPW SDTR-AIPW Q learning Shared Q learning

35% 35% 42% 40%

Decision rule by SDTR-IPW:d̂(xj) = 26.7− 3.7PSAj − 0.12Coreratioj

24/ 26



Discussion

Survival outcomes.

Multiple treaments.

Incorporating Cost.

Landmark analysis.

25/ 26



Acknowledgement

IMA workshop organizersRuoqing ZhuYingye ZhengGuanhua Chen

26/ 26

stabilized dynamic treatment regimes · current diabetes guidelines: tight control of glycosylated...

Documents