propensity score analysis a tool for causal inference in non-randomized studies summer statistics...
Post on 19-Dec-2015
231 views
TRANSCRIPT
Propensity Score Analysis
A tool for causal inference in non-randomized studies
Summer Statistics Workshop 2010Felix Thoemmes
Texas A&M University
Agenda
• Motivating examples
• Definition of causal effects with potential outcomes
• Definition of propensity scores
• Applied example of propensity scores
• Hands-on example in R
• Advanced Topics
2
Motivation• Randomized experiment is considered “gold
standard” for causal inference • Randomization not always possible– Trials where treatment is offered to community at large– Participants do not permit randomization– Ethical or legal considerations– Naturally occurring phenomena– Broken randomized experiments
(attrition, non-compliance, treatment diffusion)
3
Broken randomized experiments
4
Motivation• Non-random assignment leads to group imbalances
at pretest
• Selection bias
• Confounding of treatment effects due to imbalanced covariates
5
Hormone Replacement therapy
• 1968 “Feminine Forever“
• 2002 Women‘s Health Initative trial on hormone replacement therapy
6
Motivation
• Adjustment methods– ANCOVA / Regression Adjustment– Matching– Stratification
• Many covariates are needed to control for potentially confounding influences
7
Motivation
• Assumptions of classic ANCOVA model– linearity – no baseline by treatment interactions
• Region of common support in multi-dimensional space hard to assess – extrapolation beyond data is sensitive to model adequacy
8
Motivation – Key Issues
1. Non-randomized studies are necessary
2. Many covariates should be assessed to control for confounding influences
3. High-dimensional regression adjustment has strong assumptions and distributional overlap is hard to check
9
Defining causal effects
• Definition of causal effect is often lacking in applied social science
• Parameter estimates from any model (ANOVA, regression, structural equation model) may or may not be causally interpretable
10
Rubin Causal Model
TREATMENT
CONTROL
i0i1i Y - Y Unit –level Causal Effect
11
Rubin Causal Model
TREATMENT
CONTROL
n
ii
n
ii Y
nY
n 10
11
11
Average Causal Effect
12
Rubin Causal Model
TREATMENT
CONTROL
)|(1
)|(1
*1
001
11
n
ii
n
ii ZY
nZY
n
Estimate of the Average Causal Effect
13
Rubin Causal Model
TREATMENTCONTROL
14
TREATMENTCONTROL
E(Yi1) = E(Yi1 | zi = 1)E(Yi0) = E(Yi0 | zi = 0)
E(Yi1) ≠ E(Yi1 | zi = 1)E(Yi0) ≠ E(Yi0 | zi = 0)
Rubin Causal Model
Potential Outcomes Observed Outcomes
T C T C
10 10 10
11 11 11
12 12 12
12 16 16
11 13 13
12 15 12
11.33 12.83 11.33 13.33
τ = 11.33-12.83 = -1.5 τ* = 11.33-13.33 = -2.0
E(Yi1) = E(Yi1 | zi = 1)
E(Yi0) = E(Yi0 | zi = 0)
15Source: West and Thoemmes (2008)
Rubin Causal Model
Potential Outcomes Observed Outcomes
T C T C
10 10 10
11 11 11
12 12 12
12 16 12
11 13 11
12 15 12
11.33 12.83 11.66 11
τ = 11.33-12.83 = -1.5 τ* = 11.66-11.00 = .66
E(Yi1) ≠ E(Yi1 | zi = 1)
E(Yi0) ≠ E(Yi0 | zi = 0)
Source: West and Thoemmes (2008) 16
Obtaining unbiased estimatesE(Yi1) = E(Yi1 | zi = 1) Randomized experiment E(Yi0) = E(Yi0 | zi = 0)
E(Yi1) ≠ E(Yi1 | zi = 1) Non-randomizedE(Yi0) ≠ E(Yi0 | zi = 0) experiment
E(Yi1) = Ex{E(Yi1 | zi = 1, x)} Non-randomized E(Yi0) = Ex{E(Yi0 | zi = 0, x)}experiment with
unconfoundedness assumption
X contains all confounding covariates 17
Randomization
• Randomized experiment is gold standard for causal inference– Covariate balance ensures that confounders cannot bias
treatment effect
• Few assumptions– Compliance– No missing data– No hidden treatment variations– Independence of units (assignment of one unit does not
influence outomce of another unit)
18
Non-randomized trial
• Lack of randomization can create imbalance PRIOR to treatment assignment
• Confounding occurs due to imbalance and relationship with outcome
• Bias can be corrected, but all confounders must be assessed no unique influence of confounder can be left out for unbiased effect estimate
19
Increasing use of Propensity Scores
20Source: Web of Science
1983 1993 20030
200
400
600
800
1000
1200
1400
1600
1800
Cumulative Citations
Cumulative Citations Social Sciences (rescaled)
Propensity scores
e(x) = p (z=1 | x)
21
Propensity score
probability
z = treatment assignment 1 = treatment group0 = control group
x = vector of covariates
conditional oncontrolled for
Propensity scores
A single number summary based on all available covariates that expresses the probability that a given subject is assigned to the treatment condition, based on the values of the set of observed covariates
22
e(x) = p (z=1 | x)
Propensity scoresActual assignment
Like
lihoo
d of
rece
ivin
g tr
eatm
ent
Control Treatment
23
Actual assignment
Like
lihoo
d of
rece
ivin
g tr
eatm
ent
Control Treatment
Example of balance property
a b z e(x)
0 0 0 .5
0 0 1 .5
1 0 0 .33
1 0 0 .33
1 0 1 .33
0 1 0 .66
0 1 1 .66
0 1 1 .66
1 1 1 1
1 1 1 1
e(x) = p(z=1, x={0 0}) = .5
original sample
(a=1 | z=0) = .5(a=1 | z=1) = .5
(b=1 | z=0) = 1/4(b=1 | z=1) = .5
24
e(x) = p(z=1, x={1 0}) = .33
e(x) = p(z=1, x={0 1}) = .66
e(x) = p(z=1, x={1 1}) = 1
Example of balance property
a b z e*(x)
0 0 0 .5
0 0 1 .5
1 0 0 .5
1 0 1 .5
0 1 0 .5
0 1 1 .5
matched sample
(a=1 | z=0) = .5(a=1 | z=1) = .5
(b=1 | z=0) = .5(b=1 | z=1) = .5
p(z, x|e(x)) = p(z |e(x)) p(x |e(x))
Examples for z=1 and x = {0 1}
p(z=1, x={0 1}|e(x)) = 1/6
p(z=1 |e(x)) = .5p(x={0 1} |e(x)) = .33
p(z |e(x)) p(x |e(x)) = (.5)(.33) = 1/6
25
Propensity scores
• Balance on the propensity score implies on average balance on all observed covariates
• Two units in the treatment and the control group that have the same propensity score are similar on all covariates. They only differ in terms of treatment received
26
Propensity score
• Propensity score models influence of confounders on treatment assignment
• In comparisons, ANCOVA models influence of confounders on outcome
27
Treatment Outcome
Confounder
ComparisonPropensity scores Regression adjustment
Tool to strengthen causal conclusions Tool to strengthen causal conclusions
Models relationship between confounders and treatment
Models relationship between confounders and outcome
Assessment of overlap No assessment of overlap
No assumption about functional form of propensity score
Classic ANCOVA assumes lineartiy and absence of interaction, but can be extended
Non-parametric conditioning (e.g., macthing) Parametric conditioning, functional form of regression adjustment
Outcome variable unknown during propensity score analysis
Outcome variable always part of the adjustment
Sample size can be diminished, loss of power Sample size stays constant, power can increase due to covariates
Hard, time-consuming Easy, widely implemented in software
Subjective choices in modeling Widely accepted procedure
28
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
Propensity score analysis is a multi-step process
Researcher has choices at each step of the analysis
29
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
Treatment Outcome
Predicting Selection
ConfounderPredicting Outcome
Select true confounders and covariates predictive of outcome30
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
• Estimation of propensity scores can be achieved in numerous ways–Logistic regression–Discriminant analysis– (Boosted) regression trees
31
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
• Logistic regression model–Outcome is treatment assignment–Predictors are covariates–can be overfitted to the sample, e.g. include interactions, higher order
terms–only interest is prediction and covariate balance
)(1
)(
xe
xeLog = β0 + βi X
32
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
• Conditioning strategies–Matching–Weighting–Regression adjustment
33
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
• Check of covariate balance– t-test (not recommended) or standardized difference–graphical assessment (e.g. Q-Q plot)
• Region of common support (distributional overlap)–graphical assessment (e.g. histograms)
34
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score WorkflowLo
git P
rope
nsity
Sco
re
Trea
tmen
t Gro
up
Logit Propensity Score Control Group
Before Matching After Matching
35
Quantiles of both distributions are plotted against each other
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
Before Matching After Matching
36
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
37
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
38
Selection Estimation Conditioning ModelChecks
Effect Estimation
Propensity Score Workflow
• Estimate of treatment effect–Mean difference–Standard error dependent on conditioning scheme
39
Applied Example
Braver, Thoemmes, Moser, & Baham (in progress)
• Can random invitation designs yield the same results as randomized controlled trials?
• Evaluation of a math treatment to teach rules of exponents – either administered as a randomized experiment or a random invitation design
• Currently in progress– Pilot data available
40
Design
Overall Sample
RCT RI - Treatment RI - Control
RCT RI - T RI - C
RCT - T RCT - C
d*
= Attrition
41
d
Example
• Pretest• General attitude towards math• Altruism scale• Available time
• 19 covariates after forming factor scores
42
43
44
Results
46
RCT - T RCT - C
RI - T RI - C
Effect 95% CI Sample Size
.146* .077 - .215 176
Effect 95% CI Sample Size
.176* .111 - .242 193
Effect 95% CI Sample Size
.148* .064 - .231 122
Effect 95% CI Sample Size
.146* .094 - .198 193
Linear regression adjustment
Propensity score adjustment
Mechanics of propensity score analysis
• Can all of this be done in SPSS / SAS?– Only parts of the analysis can be performed– Mostly based on self-written macros
• R packages MatchIt and PSAgraphics offer best solutions– Some experience / learning of R required– Packages automate most of the analysis
47
Estimation of propensity score
• Put covariates in model that are– Theoretically important confounders– Signifcantly related with treatment selection (unbalanced)
• Iterative process of including covariates and potentially higher order terms (interactions, polynomials)
48
Estimation of propensity score
• Estimate PS
– Logistic regression
– Generlized additive model
– Classification tree / Regression tree / Recursive Partioning
49
Estiamtion of propensity score
• Generalized additive model– Instead of regular regression weight, a smoother is applied
Graphic from SAS PROC GAM
Imagine lowesssmoother for eachcoefficient
50
Estimation of propensity score
• Regression tree• Splits sample at
predictors that maximally seperate groups
• Final nodes are balanced on all variables
Picture taken from XLMiner 51
Estimation choices
• GAMs are useful because they model non-linear relationships automatically
• Regression trees automatically detect non-linear relationships and interactions
• Little research on performance of these methods
52
Estimation choices
• Shadish et al. report that single regression trees do not outperform regular logistic regression
• Regression trees need to be pruned – unclear what kind of pruning achieves good overall balance in dataset
• Work on boosted regression trees (McCaffrey et al.)
53
Conditioning
• Choices are:– Matching (in many different variations)– Stratification / Subclassification– Weighting (done outside MatchIt)
54
Conditioning
• Stratification– straightforward method to classify sample into strata
based on estimated PS– in each strata covariates should be approximately balanced– number of strata can be varied – Cochran suggested 5
strata to remove 90% bias of a single confounding covariate
– Stratification is easy to implement– Residual bias can occur
55
Conditioning
• Matching– Exact matching– Nearest Neighbor matching– Optimal matching– Full matching
– other matching algorithms possible
56
Matching
• Exact matching
Only units that areidentical are matchedwith each other
Control Treated
.124 .226
.126 .289
.211 .365
.389 .389
.415 .517
.566 .656
.656 .789
.733 .856
.821 .997
57
Matching
• Nearest Neighbor
Control Treated
.124 .226
.126 .289
.211 .365
.389 .389
.415 .517
.566 .656
.656 .789
.733 .856
.821 .997
• Match units that are approximately the same
• order• caliper• replacement• ratio
58
Matching• order• start at largest, smallest, or random• matched unit will not be matched to another unit• local minimum
• caliper• define the maximum distance between two neighbors, .1 of
a standard deviation of the PS
• replacement• unit can be recycled to be matched, weights necessary
• ratio• one treated unit can be matched to more than one control
(e.g., 1:2, 1:3) 59
Matching
• Optimal Matching
Control Treated
.124 .226
.126 .289
.211 .365
.389 .389
.415 .517
.566 .656
.656 .789
.733 .856
.821 .997
• Match units that are approximately the same, but allow matches to be broken, if better match is possible
• Find global minimum• ratio
60
Matching
• Full Matching
Control Treated
.124 .226
.126 .289
.211 .365
.389 .389
.415 .517
.566 .656
.656 .789
.733 .856
.821 .997
• Allows in some regions of the propensity score to match several controls to one treated and in other regions to match several treated to one control
• Useful if distributions are highly imbalanced
61
Summary of matching choices
• Methods offer tradeoffs between bias and variance
• Incomplete vs. Inexact matching
• Exact matching has (in theory) no bias, but can have large variance due to diminished sample size
• Less exact matching methods may have small residual bias, but less variance due to inclusion of more subjects
62
Other issues
• Discarding units prior to matching
• Usually not necessary if caliper is definded
• Excluding only one side changes quantity of interest
63
Overlap
Graphic from T.Love, ASA Workshop 64
Annotated R code (reference)library(foreign)
library(MatchIt)
library(PSAgraphics)
#read in dataset using Rcmdr#
psa <- read.spss("C:/Users/fthoemmes/Desktop/ps workshop/testdatapsa.sav",
use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
#prima facie effect#
pf <- lm(y~z, data=psa)
summary(pf)
##matching#
##model to be used to predict z#
#(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16,
##type of matching#
#method= "nearest",
##discard option#
#discard="both",
##caliper option#
#caliper = .2,
##dataset#
#data = psa)
#same code in one single line#
match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method= "nearest", discard="none", caliper = .1, data = psa)
#summary of the matched sample#
summary(match1)
plot(match1,type="QQ")
plot(match1,type="hist")
plot(match1,type="jitter")
#additional plot of standardized differences
smatch1<-summary(match1,standardize=TRUE)
plot(smatch1)
#write out data
dmatch1 <-match.data(match1)
#additional graphics#
#put variables in objects
continuous<-dmatch1$X1
treatment<-dmatch1$z
#create strata from estimated PS
dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions',
labels=FALSE)
strata<-dmatch1$strata
#box plot comparing balance of variables across strata
box.psa(continuous, treatment, strata)
#treatment effect of matched sample
m1 <- lm(y~z, data=dmatch1)
summary(m1)
65
Annotated R output (reference)
> pf <- lm(y~z, data=psa)
> summary(pf)
Call:lm(formula = y ~ z, data = psa)
Residuals: Min 1Q Median 3Q Max -10.167 -1.660 0.106 1.658 8.143
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.635 0.130 -4.88 1.3e-06 ***z 1.476 0.197 7.49 2.3e-13 ***
66
• OLS regression
• Outcome regressed on treatment
• Prima facie treatment effect• 1.476, p < .01
Annotated R output (reference)Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Maxdistance 0.527 0.364 0.184 0.163 0.177 0.164 0.193X1 0.483 -0.496 1.636 0.979 0.990 0.989 1.427X2 0.234 -0.204 1.764 0.438 0.460 0.450 0.798X3 -0.042 -0.035 1.413 -0.007 0.131 0.182 1.690X4 0.106 -0.093 1.413 0.199 0.202 0.209 0.565X5 0.171 0.178 1.976 -0.007 0.099 0.135 2.982X6 -0.038 -0.177 1.215 0.139 0.145 0.154 0.705X7 0.342 -0.013 2.138 0.355 0.326 0.370 2.013X8 0.056 -0.134 1.286 0.190 0.188 0.205 0.789X9 0.197 -0.286 1.512 0.483 0.516 0.504 1.434X10 0.365 -0.185 1.629 0.550 0.487 0.562 1.229
Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Maxdistance 0.472 0.463 0.163 0.009 0.008 0.010 0.020X1 0.239 0.104 1.516 0.135 0.123 0.149 0.577X2 0.087 0.061 1.763 0.027 0.142 0.155 0.582X3 -0.146 -0.053 1.444 -0.093 0.188 0.215 1.927X4 0.050 0.059 1.378 -0.009 0.096 0.109 1.120X5 0.154 0.189 1.972 -0.035 0.123 0.181 2.982X6 -0.101 -0.105 1.243 0.004 0.079 0.098 0.726X7 0.162 0.231 2.164 -0.069 0.124 0.222 1.610X8 -0.019 0.020 1.338 -0.040 0.109 0.124 0.789X9 0.013 -0.017 1.464 0.030 0.178 0.191 1.670X10 0.138 0.198 1.496 -0.060 0.128 0.142 0.684
67
• Balance on all covariates
• Unmatched Sample and matched sample
• Means, SD, Mean Diff, differences based on Q-Q plot (mean, median, max)
Annotated R output (reference)
Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Maxdistance 94.49 95.25 94.13 89.747X1 86.20 87.53 84.89 59.582X2 93.90 69.12 65.58 27.080X3 -1324.83 -43.21 -18.15 -14.002X4 95.29 52.59 47.77 -98.143X5 -409.06 -24.54 -33.84 0.000X6 97.43 45.17 36.23 -2.878X7 80.66 61.94 40.07 20.052X8 79.17 42.15 39.37 0.000X9 93.78 65.39 62.07 -16.423X10 89.17 73.63 74.71 44.328
Sample sizes: Control TreatedAll 368 283Matched 217 217Unmatched 151 66Discarded 0 0
68
• Percent balance improvment
• Measure of balance
• Sample sizes before and after matching
Graphics
• Q-Q plot for each variable to check balance• Straight line following 45° diagonal is desired• plot(match1,type="QQ")
69
Graphics
• Histograms to examine distribution of propensity score
• Split by treatment and control
• Before and after matching
• plot(match1,type="hist")
70
Graphics
• Jittered dotplot
• Shows regions of propensity score that were matched
• plot(match1,type="jitter")
71
Graphics
• Plot of standardized differences pre- and post-matching
smatch1<-summary(match1,standardize=TRUE)
plot(smatch1)
72
Graphics
#write out datadmatch1 <-match.data(match1)
#additional graphics##put variables in objectscontinuous<-dmatch1$X1treatment<-dmatch1$z#create strata from estimated PSdmatch1$strata <-
bin.var(dmatch1$distance, bins=5, method='proportions',
labels=FALSE)strata<-dmatch1$strata#box plot comparing balance of
variables across stratabox.psa(continuous, treatment, strata)
73
Annotated R output (reference)
Call:lm(formula = y ~ z, data = dmatch1)
Residuals: Min 1Q Median 3Q Max -9.868 -1.742 0.052 1.630 6.956
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.246 0.163 -1.51 0.13117 z 0.789 0.230 3.42 0.00068 ***
Treatment effect after matching now .789, p < .05
Treatment effect still in same direction but greatly diminished
74
Advantages of Propensity Scores
• Collapses multivariate problem into single
dimensional problem
• No stringent assumptions about functional form
• Model checks allow easy assessment of balance
• Clearly defined region of common support
(no extrapolation)
75
Limitations
• Unmeasured covariates can still bias effect estimates
• Propensity score function can be challenging to estimate
• If assumptions of ANCOVA are fully met, propensity scores offer little gain
76
Propensity scores
• Method is another tool for applied researchers to adjust for confounding influences
• Propensity scores have some advantages and disadvantages over traditional regression adjustment
• In applied context, choice of confounding variables and reliabilty of measurement will be more critical than choice of adjustment method!
77
Thank you
Summer Statistics Workshop 2010Felix Thoemmes
Texas A&M University