machine learning for healthcare - mit opencourseware · 2020. 12. 30. · sh and protein-if . her2...

48
Machine Learning for Healthcare HST.956, 6.S897 Lecture 14: Causal Inference Part 1 David Sontag 1

Upload: others

Post on 28-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Machine Learning for Healthcare HST.956, 6.S897

Lecture 14: Causal Inference Part 1

David Sontag

1

Page 2: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Course announcements • Please fill out mid-semester survey • Project proposals

– You will receive e-mail feedback this week • Problem sets

– PS1-4 graded – PS5 out tonight, due next Tuesday, – Last problem set, PS6, released in ~2 weeks

• Recitation this week will be a discussion of – Brat et al., Postsurgical prescriptions for opioid naïve

patients and association with overdose and misuse, BMJ2018

– Bertsimas et al., Personalized diabetes management using electronic medical records, Diabetes Care 2017

2

Page 3: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Does gastric bypass surgery prevent onset of diabetes?

1994 2000 2013

<4.5% 4.5%–5.9% 6.0%–7.4% 7.5%–8.9% >9.0%

• In Lecture 4 & PS2 we used machine learning for early detection of Type 2 diabetes

• Health system doesn’t want to know how to predict diabetes – they want to know how to prevent it

• Gastric bypass surgery is the highest negative weight (9th most predictive feature) – Does this mean it would be a good intervention?

3

Page 4: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

What is the likelihood this patient, with breast cancer, will survive 5 years?

© Cancer Network. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

• Such predictive models widely used to stage patients.Should we initiate treatment? How aggressive?

• What could go wrong if we trained to predict survival,and then used to guide patient care?

� �

Diagnosis Death Time

“Mary”

Treatment

A long survival time may be because of treatment! 4

Page 5: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

ent should we gn Pathology

SH and Protein-IF HER2 Amplified

Expansion pathology (image from Andy Beck)

• People respond differently to treatment• Goal: use data from other patients and their journeysto guide future treatment decisions

• What could go wrong if we trained to predict (past)treatment decisions?

Best this can do is Treatment A “David” match current

“John” Treatment B medical practice! Treatment A “Juana”

5

Whattreatmentshouldwegivethispatient?

Page 6: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Does smoking cause lung cancer?

• Doing a randomized control trial is unethical• Could we simply answer this question by comparingPr(lung cancer | smoker) vs Pr(lung cancer | nonsmoker)?

• No! Answering such questions from observational data isdifficult because of confounding

6

Page 7: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

To properly answer, need to formulate as causal questions:

Patient, � Intervention, �

(including all (e.g. medication, confounding ? procedure) factors)

Outcome, �

High dimensional Observational data

7

Page 8: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Potential Outcomes Framework (Rubin-Neyman Causal Model)

• Each unit (individual) �' has two potential outcomes: – �((�') is the potentialoutcomehad the unit not been treated: “control outcome”

– �+(�') is the potential outcome had the unit been treated: “treated outcome”

• Conditional average treatment effect for unit �: ���� �' = �23~5(23|78) [�+|�'] − �2=~5(2=|78)[�(|�']

• Average Treatment Effect:���: = � �+ − �( = �7~5(7) ���� �

8

Page 9: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Potential Outcomes Framework (Rubin-Neyman Causal Model)

• Each unit (individual) �' has two potential outcomes: – �((�') is the potentialoutcomehad the unit not been treated: “control outcome”

– �+(�') is the potential outcome had the unit been treated: “treated outcome”

• Observed factual outcome: �' = �'�+ �' + 1 − �' �((�')

• Unobserved counterfactual outcome: �'CD = (1 − �')�+ �' + �'�((�')

9

Page 10: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

The fundamental problem of causal inference“The fundamental problem of

causal inference”

We only ever observe one of the two outcomes

10

Page 11: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

Example – Blood pressure and age

� = �����_����.

�+ �

�( �

� = ���

11

Page 12: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

Blood pressure and age

� = �����_����.

�+ � ����(�)

�( �

� = ���

12

Page 13: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

U8AA@)R1-7701-)6:@)6B-

? 0 HIJJKLMNGOP

$+ &

$( &

.#/ .

& 0 EFG

13

Page 14: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

Blood pressure and age

� = �����_����.

�+ �

�( �

Treated

Control � = ���

14

Page 15: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

U8AA@)R1-7701-)6:@)6B-$+ &

? 0 HIJJKLMNGOP

$( &#1-6/-@

5A:/1A8 & 0 EFG 5A0:/-1;6./068)/1-6/-@

5A0:/-1;6./068).A:/1A8 15

Page 16: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Sugar levelshadtheyreceived

medication A

Sugar levelshadtheyreceived

medication B6 5.57 6.57 69 88.5 87.5 710 98 7

(age, gender, Observed exercise,treatment) sugar levels

(45, F, 0, A) 6

(45, F, 1, B) 6.5

(55, M, 0, A) 7

(55, M, 1, B) 8

(65, F, 0, B) 8

(65,F, 1, A) 7.5

(75,M, 0, B) 9

(75,M, 1, A) 8

(Example from Uri Shalit) 16

Page 17: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Sugar levelshadtheyreceived

medication A

Sugar levelshadtheyreceived

medication B6 5.57 6.57 69 88.5 87.5 710 98 7

(age, gender, Observed exercise) sugar levels

(45, F, 0) 6

(45, F, 1) 6.5

(55, M, 0) 7

(55, M, 1) 8

(65, F, 0) 8

(65,F, 1) 7.5

(75,M, 0) 9

(75,M, 1) 8

(Example from Uri Shalit) 17

Page 18: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

(age, gender, Y0: Sugar levels Y1: Sugar levels Observed exercise) had they had they sugar levels

received received medication A medication B

(45, F, 0) 6 5.5 6

(45, F, 1) 7 6.5 6.5

(55, M, 0) 7 6 7

(55, M, 1) 9 8 8

(65, F, 0) 8.5 8 8

(65,F, 1) 7.5 7 7.5

(75,M, 0) 10 9 9

(75,M, 1) 8 7 8

(Example from Uri Shalit) 18

Page 19: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

(age,gender, exercise)

Sugar levels had they received

Sugar levels had they received

Observed sugar levels

medication medication A B

(45, F, 0) 6 5.5 6

(45, F, 1) 7 6.5 6.5

(55, M, 0) 7 6 7

(55, M, 1) 9 8 8

(65, F, 0) 8.5 8 8

(65,F, 1) 7.5 7 7.5

(75,M, 0) 10 9 9

(75,M, 1) 8 7 8

mean(sugar|medication B) – mean(sugar|medicaton A) = ?

mean(sugar|had they received B) – mean(sugar|had they received A) = ?

(Example from Uri Shalit) 19

Page 20: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

(age,gender, exercise)

Sugar levels had they received

Sugar levels had they received

Observed sugar levels

medication medication A B

(45, F, 0) 6 5.5 6

(45, F, 1) 7 6.5 6.5

(55, M, 0) 7 6 7

(55, M, 1) 9 8 8

(65, F, 0) 8.5 8 8

(65,F, 1) 7.5 7 7.5

(75,M, 0) 10 9 9

(75,M, 1) 8 7 8

mean(sugar|medication B) – mean(sugar|medicaton A) = 7.875 - 7.125 = 0.75

mean(sugar|had they received B) – mean(sugar|had they received A) = 7.125 - 7.875 = -0.75

(Example from Uri Shalit) 20

Page 21: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Typical assumption – no unmeasured confounders

�(, �+: potential outcomes for control and treated

�: unit covariates (features) T: treatment assignment

We assume: (�(, �+) ⫫ � | �

The potential outcomes are independent of treatment assignment, conditioned on covariates �

21

Page 22: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Typical assumption – no unmeasured confounders

�(, �+: potential outcomes for control and treated

�: unit covariates (features) T: treatment assignment

We assume: (�(, �+) ⫫ � | �

Ignorability

22

Page 23: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Ignorability

covariates � �

�� ��

treatment (features)

Potential outcomes

(�(, �+) ⫫ � | � 23

Page 24: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Ignorability anti-hypertensive medication

age, gender, weight, diet, heart rate at rest,…

blood pressure blood pressure after medication after

� �

�� ��

A medication B

(�(, �+) ⫫ � | � 24

Page 25: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

No Ignorability anti-hypertensive medication

age, gender, weight, diet, heart rate at diabetic rest,…

blood pressure blood pressure after medication after

�� ��

A medication B

(�(, �+) ⫫ � | � 25

Page 26: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Typical assumption – common support

Y(, �+: potential outcomes for control and treated

�: unit covariates (features) �: treatment assignment

We assume: � � = � � = � > 0 ∀�, �

26

Page 27: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Framing the question

1. Where could we go to for data to answer these questions?

2. What should X, T, and Y be to satisfy ignorability?

3. What is the specific causal inference question that we are interested in?

4. Are you worried about common support?

27

Page 28: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Outline for lecture

• How to recognize a causal inference problem

• Potential outcomes framework – Average treatment effect (ATE) – Conditional average treatment effect (CATE)

• Algorithms for estimating ATE and CATE

28

Page 29: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

29

Page 30: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect – the adjustment formula

• Assuming ignorability, we will derive the adjustment formula (Hernán & Robins 2010, Pearl 2009)

• The adjustment formula is extremely useful in causal inference

• Also called G-formula

30

Page 31: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

31

Page 32: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

law of total E [Y1] = expectation

⇥ ⇤ Ex⇠p(x) EY1⇠p(Y1|x) [Y1|x] =

⇥ ⇤

32

Page 33: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

E [Y1] = ignorability ⇥ ⇤ Ex⇠p(x) EY1⇠p(Y1|x) [Y1|x] = (�(, �+) ⫫ � | �

⇥ ⇤ Ex⇠p(x) EY1⇠p(Y1|x) [Y1|x, T = 1] =

33

Page 34: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

E [Y1] = ⇥ ⇤

Ex⇠p(x) EY1⇠p(Y1|x) [Y1|x] =

⇥ ⇤ Ex⇠p(x) EY1⇠p(Y1|x) [Y1|x, T = 1] =

Ex⇠p(x) [E [Y1|x, T = 1]] shorter notation

34

Page 35: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Average Treatment Effect

The expected causal effect of � on �: AT E := E [Y1 � Y0]

E [Y0] = ⇥ ⇤

Ex⇠p(x) EY0⇠p(Y0|x) [Y0|x] =

⇥ ⇤ Ex⇠p(x) EY0⇠p(Y0|x) [Y0|x, T = 1] =

Ex⇠p(x) [E [Y0|x, T = 0]]

35

Page 36: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

The adjustment formula

Under the assumption of ignorability, we have that: AT E = E [Y1 � Y0] =

Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]

E [Y1|x, T = 1] Quantities we can estimate E [Y0|x, T = 0] from data (

36

Page 37: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

The adjustment formula

Under the assumption of ignorability, we have that: AT E = E [Y1 � Y0] =

Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]

E [Y0|x, T = 1]

E [Y1|x, T = 0] Quantities we cannot directly

E [Y0|x] estimate from data

E [Y1|x] (

37

Page 38: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

The adjustment formula

Under the assumption of ignorability, we have that: AT E = E [Y1 � Y0] =

Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]

E [Y1|x, T = 1] Quantities we can estimate E [Y0|x, T = 0] from data (

Empirically we have samples from �(�|� = 1) or � � � = 0 . Extrapolate to �(�)

38

Page 39: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Many methods!

Covariate adjustment Propensity score re-weighting Doubly robust estimators Matching …

39

Page 40: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Covariate adjustment

• Explicitly model the relationship between treatment, confounders, and outcome

• Also called “Response Surface Modeling” • Used for both ITE and ATE

• Aregression problem

40

Page 41: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Covariates Regression Outcome (Features) model

�+

�\

�]

�(�, �)

41

Page 42: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Nuisance Regression Outcome Parameters model

�+

�\

�]

Parameter of interest

�(�, �)

42

Page 43: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently
Page 44: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Covariate adjustment (parametric g-formula)

• Explicitly model the relationship between treatment, confounders, and outcome

• Under ignorability, the expected causal effect of � on �:

� = 1, � − � �( � = 0, � �7~5 7 � �+

• Fit a model � �, � ≈ � � � = �, � `

a���� �' = � �', 1 − �(�', 0) 44

Page 45: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

Covariate adjustment

� = �����_����.

�+ �

�( �

Treated

Control � = ���

45

Page 46: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

5A>61?6/-)6@e07/C-:/ $+ &

? 0 HIJJKLMNGOP

$( &#1-6/-@

5A:/1A8

f

& 0 EFG 5A0:/-1;6./068)/1-6/-@

5A0:/-1;6./068).A:/1A8 46

Page 47: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

Treated

fK6CR8-)A;)IAE).A>61?6/-)6@e07/C-:/) ;6?87)EI-:)/I-1-)?7):A)A>-186R

? 0 HIJJKLMNGOP

$+ &

$( &

#1-6/-@

& 0 EFG 5A:/1A8 47

Page 48: Machine Learning for Healthcare - MIT OpenCourseWare · 2020. 12. 30. · SH and Protein-IF . HER2 Amplified. Expansion pathology (image from Andy Beck) • People respond differently

MIT OpenCourseWare https://ocw.mit.edu

6.S897 / HST.956 Machine Learning for Healthcare Spring 2019

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

48