stabilized dynamic treatment regimes · current diabetes guidelines: tight control of glycosylated...
TRANSCRIPT
MotivationMethodology
Simulation Studies and Data Analysis
Stabilized Dynamic Treatment Regimes
Yingqi Zhao
Fred Hutchinson Cancer Research Center
IMA Innovative Statistics and Machine Learning for PrecisionMedicine, Minneapolis,
Sep 14, 2017
1/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Outline
1 Motivation
2 Methodology: stabilized dynamic treatment regimes
3 Simulation
4 Discussion
2/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Active Surveillance in Prostate cancer
Active surveillance represents a strategy to address theovertreatment of prostate cancer.
Delay intervention in patients whose tumors initially havefeatures consistent with a low risk cancer and treat only whena more clinically significant malignancy is identified.
Patients treated with active surveillance undergo serialmonitoring with serum prostate specific antigen (PSA)measurements, clinical examinations and repeat biopsies.
When to recommend intervention (surgery)?
3/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Canary Prostate cancer Active Surveillance Study (PASS)
A multi-institutional active surveillance cohort established in2008.
Broad eligibility criteria were used to sample the full spectrumof men using active surveillance; 905 participants enrolled.
Participants were followed with serum PSA measurementsevery 3 months, clinical and digital rectal examination every 6months, and repeat prostate biopsy 6 to 12, 24, 48 and 72months after diagnosis.
Outcome: disease reclassification
Can we identify a *fixed* rule over time to recommendintervention as needed?
4/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Diabetes in Complex Patients
Current diabetes guidelines: tight control of glycosylatedhemoglobin (A1c) (< 7 %)
Healthy patients.
Based on trials of younger patients without severe diabetescomplications or other comorbidities.
Relatively low risk of tight control; significant benefits inreducing incidence of vascular events
Tight control of BP (<130/80 mm Hg) and LDL cholesterol(<100 mg/dl) for patients with diabetes
5/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Diabetes in Complex Patients
Inappropriate for complex diabetes patients, i.e., older patients(age > 65 years) and/or those with comorbid conditions.
Evidence for these guidelines was mainly obtained from theresults of randomized clinical trials (RCTs)
Complex patients usually meet the exclusion criteria of clinicaltrials
Increased risk of drug-related morbidity, e.g., hypoglycemia,hypotension
How can we strengthen the current guidelines for complexpatients?
6/ 26
MotivationMethodology
Simulation Studies and Data Analysis
A1c Control Observational Study
Linked claims and EHR data for Medicare beneficiaries in theUniversity of Wisconsin Medical Foundation (UWMF) system.
8,304 diabetes patients active during 2003-2011, recordedeach 90-day quarter in which they were alive at the start ofthe quarter
Outcome: adverse outcomes occurring during these quarters(e.g., emergency department use or hospitalizations, death),documented from claims information.
Covariates: sociodemographics, and indicators forcomorbidities; time varying patient complexity
Can we identify a *fixed* rule to target tight control of A1c?
7/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Dynamic Treatment Regime
At any decision point
Input: available historical information on the patient to thatpoint.Output: next treatment.
Dynamic treatment regimes (DTRs) are sequential decisionrules for individual patients that can adapt over time to anevolving illness.
One decision rule for each time point.Each rule: recommends the treatment at that point as afunction of accrued historical information.An algorithm for treating any patient.Aim to optimize some cumulative clinical outcome.
8/ 26
MotivationMethodology
Simulation Studies and Data Analysis
DTR Goals
Learn adaptive treatment strategies: tailor (sequences of)treatments based on patient characteristics.
MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs
Tailored Therapies
Concepts & Tools
SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics
4
MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs
Tailored Therapies
Concepts & Tools
SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics
4
Maximize the benefit of dynamic treatment regimes:
Well chosen tailoring variables.
Well devised decision rules.
9/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Dynamic Treatment Regimes (DTR)
Observe data on n individuals, T stages for each individual,
X1,A1,R1,X2,A2, . . . ,XT ,AT ,RT ,XT+1
Xj : Patient covariates available at stage j .Aj : Treatment at stage j , Aj ∈ {−1, 1}.Rj : Outcome following stage j .Hj : History available at stage j , Hj = {X1,A1,R1, . . . ,Aj−1,Rj−1,Xj}.
A DTR is a sequence of decision rules:
d = (d1(H1), . . . , dT (HT )), dj(Hj) ∈ {−1, 1}.
Goal
Maximize the expected sum of outcomes if the DTR isimplemented in the future.
10/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Value Function and Optimal DTR for Multiple Stages
The value function: V(d) = Ed(R1 + . . .+ RT ).
Optimal DTR: d∗ = argmaxd V(d).
Two main challenges in developing optimal DTRs:
Taking individual information into account in decision making.
Incorporating long-term benefits and risks of treatment due todelayed effects.
11/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Q-learning
Backwards and recursively estimates the following Q-function:
Qj(hj , aj) = E (Rj+ maxaj+1∈{−1,1}
Qj+1(Hj+1, aj+1)|Hj = hj ,Aj = aj),
where QT+1 = 0, and hj ∈ Oj , aj ∈ Aj , j = 1, . . . ,T .
The estimated optimal sequence of decision rules
d̂j(hj) = argmaxaj∈{−1,1}
Q̂j(hj , aj).
Q learning with regression: estimate the Q-functions fromdata using regression and then find the optimal DTR.
Decision not shared.
12/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Current methods
Simultaneous g-estimation: the empirical performance islargely unknown
Q learning with shared parameters: simultaneous estimationprocedure for the shared parameters based on Q-learning
Rely on the assumption that models are correct
13/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Stabilized Dynamic Treatment Regimes
Learn the optimal regimes at all stages simultaneously.
Any dj(hj) can be written as dj(hj) = sign{fj(hj)};Aj = dj(hj) is equivalent to Aj fj(hj) > 0.
Decompose Hj into H(s)j and H
(u)j , which represent variables
that are shared or not shared.
Let βj = (β(u)j
ᵀ, β(s)
ᵀ)ᵀ, where β
(u)j are the unshared
parameters, and β(s) are the shared parameters across stages.
14/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Stabilized Dynamic Treatment Regimes
An inverse probability of treatment estimator (IPWE) of V(d)
V̂ IPWE (f ) = Pn
(∑j Rj I{A1f1(H1) > 0, . . . ,AT fT (HT ) > 0}∏T
j=1 pj(Aj |Hj)
)
= Pn
(∑j Rj I [minj=1,...,T{Aj fj(Hj)} > 0]∏T
j=1 pj(Aj |Hj)
),
where fj(hj) = h(s)j β(s) + h
(u)j β
(u)j , and Pn denotes empirical
averages.
15/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Stabilized Dynamic Treatment Regimes
Replace a concave surrogate for the indicator to alleviate thecomputation difficulties.
maxβ(u)j ,β(s)
Pn
Rψ[minj=1,...,T{Aj(H(u)j
ᵀβ(u)j + H
(s)j
ᵀβ(s))}]∏T
j=1 pj(Aj |Hj)
,
denoted by Φn(β(u)j , β(s)) and ψ is a concave function.
Use ψ(t) = − log(1 + e−t)
Use soft minimum to approximate the min function.
pj(Aj |Hj) can be estimated using e.g., logistic regression
Apply the LASSO penalty for sparsity, i.e., maximize
max Φn(β(u)j , β(s))− λn
(∑j
|β(u)j |+∑|β(s)|
),
where ψ(t) = − log(1 + e−t) and L1 penalties are imposed on
β(u)j and β(s).
16/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Stabilized Dynamic Treatment Regimes
The IPWE V̂ IPWE (f ) has potentially high variance.
Consider a doubly robust method – construct augmentedIPWE for V(d).
V̂ AIPWE (f ) = Pn
[ ∑a1,...,aT
I [ minj=1,...,T
{aj fj(Hj)} ≥ 0]Wa(Y ,HT ;λ, L)
]
Incorporate contributions from the subjects who did notreceive the specified treatment assignments across all stagesvia the weighting function Wa(Y ,HT ;λ, L).
Weighting functions involves propensity scores, λ, andregression functions, L, at each stage.
Robust against misspecification.
17/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Properties and Computation
Fisher consistency
Dervatives can be derived and we implement Orthant-WiseLimited-memory Quasi- Newton algorithm to find the solution.
Posit parametric models to approximate propensity scores andregression functions.
18/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies: Generative Model
Baseline covariates X1,1, . . . ,X1,p ∼ N(0, 1)
4 stages.
Xj ,1 = Xj−1,1 + N(0, 1), 2 ≤ j ≤ 4.
Aj ∈ {−1, 1} w.p. 0.5 in Scenarios 1, 2
Aj ∈ {−1, 1} and logit{P(Aj = 1)} = Xj ,1 + Xj ,2 − 0.5 inScenarios 3, 4
19/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies: Generative Model
Scenario 1/3: covariates shared throughout
Y ∼ 10 + X1,1X1,2 + X1,3 −∑4
j=1 |Xj ,1 − 1|{I (Aj >
0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).
Scenario 2/4: 25 covariates in stage 1 unshared, covariatesshared from stages 2 to 4
Y ∼ 10 + X1,1X1,2 + X1,3 − |X1,1 + X1,2 − 2|{I (A1 >0)− I (X1,1 + X1,2 − 2 > 0)}2 −
∑4j=2 |Xj ,1 − 1|{I (Aj >
0)− I (Xj ,1 − 1 > 0)}2 + N(0, 1).
20/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies
Optimal decisions:
Scenario 1/3: d∗j (hj) = sign(Xj − 1)Scenario 2/4:d∗1 (h1) = sign(X1 + X2 − 2), d∗j (hj) = sign(Xj − 1), j = 2, 3, 4
Training data sample size n = 300.
Testing data sample size 10000.
p = 5, 10.
500 replications; tuning parameters selected via crossvalidation.
Evaluate using the values of the estimated DTRs.
21/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies
Table 1: Results
Scenario 1 2 3 4
p=5SDTR-IPW 9.92 9.40 9.38 8.86SDTR-AIPW 9.89 9.48 9.68 9.15Q learning 8.12 8.15 8.65 8.82Shared Q learning 9.74 9.25 9.40 9.19
p=10SDTR-IPW 9.87 9.38 9.21 8.71SDTR-AIPW 9.85 9.56 9.63 9.04Q learning 8.14 8.12 8.53 8.65Shared Q learning 9.58 9.07 9.30 9.02
22/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies
Mimic PASS study
Baseline variables: (PSA0,Coreratio0) ∼ a multivariate normaldistribution with mean (5.2,13) and the covariance matrixdiag(12.5,49); 5 other variables ∼ N(0, 1)
If PSAj−1 < 8, do not intervene at stage j ; otherwise,intervene, i.e., Aj = 1 with probability
exp(4− 0.2PSAj − 0.01Coreratioj−11 + exp(4− 0.2PSAj − 0.01Coreratioj−1))
The probability of reclassification at stage j is generated from
exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)
1 + exp(−2 + 0.08(PSAj ≥ 10)PSAj − 0.5Aj)
23/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Simulation Studies
Table 2: Results
SDTR-IPW SDTR-AIPW Q learning Shared Q learning
35% 35% 42% 40%
Decision rule by SDTR-IPW:d̂(xj) = 26.7− 3.7PSAj − 0.12Coreratioj
24/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Discussion
Survival outcomes.
Multiple treaments.
Incorporating Cost.
Landmark analysis.
25/ 26
MotivationMethodology
Simulation Studies and Data Analysis
Acknowledgement
IMA workshop organizersRuoqing ZhuYingye ZhengGuanhua Chen
26/ 26