propensity score analysis a tool for causal inference in non-randomized studies summer statistics...

78
Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Post on 19-Dec-2015

231 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity Score Analysis

A tool for causal inference in non-randomized studies

Summer Statistics Workshop 2010Felix Thoemmes

Texas A&M University

Page 2: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Agenda

• Motivating examples

• Definition of causal effects with potential outcomes

• Definition of propensity scores

• Applied example of propensity scores

• Hands-on example in R

• Advanced Topics

2

Page 3: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Motivation• Randomized experiment is considered “gold

standard” for causal inference • Randomization not always possible– Trials where treatment is offered to community at large– Participants do not permit randomization– Ethical or legal considerations– Naturally occurring phenomena– Broken randomized experiments

(attrition, non-compliance, treatment diffusion)

3

Page 4: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Broken randomized experiments

4

Page 5: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Motivation• Non-random assignment leads to group imbalances

at pretest

• Selection bias

• Confounding of treatment effects due to imbalanced covariates

5

Page 6: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Hormone Replacement therapy

• 1968 “Feminine Forever“

• 2002 Women‘s Health Initative trial on hormone replacement therapy

6

Page 7: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Motivation

• Adjustment methods– ANCOVA / Regression Adjustment– Matching– Stratification

• Many covariates are needed to control for potentially confounding influences

7

Page 8: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Motivation

• Assumptions of classic ANCOVA model– linearity – no baseline by treatment interactions

• Region of common support in multi-dimensional space hard to assess – extrapolation beyond data is sensitive to model adequacy

8

Page 9: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Motivation – Key Issues

1. Non-randomized studies are necessary

2. Many covariates should be assessed to control for confounding influences

3. High-dimensional regression adjustment has strong assumptions and distributional overlap is hard to check

9

Page 10: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Defining causal effects

• Definition of causal effect is often lacking in applied social science

• Parameter estimates from any model (ANOVA, regression, structural equation model) may or may not be causally interpretable

10

Page 11: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

TREATMENT

CONTROL

i0i1i Y - Y Unit –level Causal Effect

11

Page 12: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

TREATMENT

CONTROL

n

ii

n

ii Y

nY

n 10

11

11

Average Causal Effect

12

Page 13: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

TREATMENT

CONTROL

)|(1

)|(1

*1

001

11

n

ii

n

ii ZY

nZY

n

Estimate of the Average Causal Effect

13

Page 14: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

TREATMENTCONTROL

14

TREATMENTCONTROL

E(Yi1) = E(Yi1 | zi = 1)E(Yi0) = E(Yi0 | zi = 0)

E(Yi1) ≠ E(Yi1 | zi = 1)E(Yi0) ≠ E(Yi0 | zi = 0)

Page 15: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

Potential Outcomes Observed Outcomes

T C T C

10 10 10

11 11 11

12 12 12

12 16 16

11 13 13

12 15 12

11.33 12.83 11.33 13.33

τ = 11.33-12.83 = -1.5 τ* = 11.33-13.33 = -2.0

E(Yi1) = E(Yi1 | zi = 1)

E(Yi0) = E(Yi0 | zi = 0)

15Source: West and Thoemmes (2008)

Page 16: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Rubin Causal Model

Potential Outcomes Observed Outcomes

T C T C

10 10 10

11 11 11

12 12 12

12 16 12

11 13 11

12 15 12

11.33 12.83 11.66 11

τ = 11.33-12.83 = -1.5 τ* = 11.66-11.00 = .66

E(Yi1) ≠ E(Yi1 | zi = 1)

E(Yi0) ≠ E(Yi0 | zi = 0)

Source: West and Thoemmes (2008) 16

Page 17: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Obtaining unbiased estimatesE(Yi1) = E(Yi1 | zi = 1) Randomized experiment E(Yi0) = E(Yi0 | zi = 0)

E(Yi1) ≠ E(Yi1 | zi = 1) Non-randomizedE(Yi0) ≠ E(Yi0 | zi = 0) experiment

E(Yi1) = Ex{E(Yi1 | zi = 1, x)} Non-randomized E(Yi0) = Ex{E(Yi0 | zi = 0, x)}experiment with

unconfoundedness assumption

X contains all confounding covariates 17

Page 18: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Randomization

• Randomized experiment is gold standard for causal inference– Covariate balance ensures that confounders cannot bias

treatment effect

• Few assumptions– Compliance– No missing data– No hidden treatment variations– Independence of units (assignment of one unit does not

influence outomce of another unit)

18

Page 19: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Non-randomized trial

• Lack of randomization can create imbalance PRIOR to treatment assignment

• Confounding occurs due to imbalance and relationship with outcome

• Bias can be corrected, but all confounders must be assessed no unique influence of confounder can be left out for unbiased effect estimate

19

Page 20: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Increasing use of Propensity Scores

20Source: Web of Science

1983 1993 20030

200

400

600

800

1000

1200

1400

1600

1800

Cumulative Citations

Cumulative Citations Social Sciences (rescaled)

Page 21: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity scores

e(x) = p (z=1 | x)

21

Propensity score

probability

z = treatment assignment 1 = treatment group0 = control group

x = vector of covariates

conditional oncontrolled for

Page 22: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity scores

A single number summary based on all available covariates that expresses the probability that a given subject is assigned to the treatment condition, based on the values of the set of observed covariates

22

e(x) = p (z=1 | x)

Page 23: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity scoresActual assignment

Like

lihoo

d of

rece

ivin

g tr

eatm

ent

Control Treatment

23

Actual assignment

Like

lihoo

d of

rece

ivin

g tr

eatm

ent

Control Treatment

Page 24: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Example of balance property

a b z e(x)

0 0 0 .5

0 0 1 .5

1 0 0 .33

1 0 0 .33

1 0 1 .33

0 1 0 .66

0 1 1 .66

0 1 1 .66

1 1 1 1

1 1 1 1

e(x) = p(z=1, x={0 0}) = .5

original sample

(a=1 | z=0) = .5(a=1 | z=1) = .5

(b=1 | z=0) = 1/4(b=1 | z=1) = .5

24

e(x) = p(z=1, x={1 0}) = .33

e(x) = p(z=1, x={0 1}) = .66

e(x) = p(z=1, x={1 1}) = 1

Page 25: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Example of balance property

a b z e*(x)

0 0 0 .5

0 0 1 .5

1 0 0 .5

1 0 1 .5

0 1 0 .5

0 1 1 .5

matched sample

(a=1 | z=0) = .5(a=1 | z=1) = .5

(b=1 | z=0) = .5(b=1 | z=1) = .5

p(z, x|e(x)) = p(z |e(x)) p(x |e(x))

Examples for z=1 and x = {0 1}

p(z=1, x={0 1}|e(x)) = 1/6

p(z=1 |e(x)) = .5p(x={0 1} |e(x)) = .33

p(z |e(x)) p(x |e(x)) = (.5)(.33) = 1/6

25

Page 26: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity scores

• Balance on the propensity score implies on average balance on all observed covariates

• Two units in the treatment and the control group that have the same propensity score are similar on all covariates. They only differ in terms of treatment received

26

Page 27: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity score

• Propensity score models influence of confounders on treatment assignment

• In comparisons, ANCOVA models influence of confounders on outcome

27

Treatment Outcome

Confounder

Page 28: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

ComparisonPropensity scores Regression adjustment

Tool to strengthen causal conclusions Tool to strengthen causal conclusions

Models relationship between confounders and treatment

Models relationship between confounders and outcome

Assessment of overlap No assessment of overlap

No assumption about functional form of propensity score

Classic ANCOVA assumes lineartiy and absence of interaction, but can be extended

Non-parametric conditioning (e.g., macthing) Parametric conditioning, functional form of regression adjustment

Outcome variable unknown during propensity score analysis

Outcome variable always part of the adjustment

Sample size can be diminished, loss of power Sample size stays constant, power can increase due to covariates

Hard, time-consuming Easy, widely implemented in software

Subjective choices in modeling Widely accepted procedure

28

Page 29: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

Propensity score analysis is a multi-step process

Researcher has choices at each step of the analysis

29

Page 30: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

Treatment Outcome

Predicting Selection

ConfounderPredicting Outcome

Select true confounders and covariates predictive of outcome30

Page 31: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

• Estimation of propensity scores can be achieved in numerous ways–Logistic regression–Discriminant analysis– (Boosted) regression trees

31

Page 32: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

• Logistic regression model–Outcome is treatment assignment–Predictors are covariates–can be overfitted to the sample, e.g. include interactions, higher order

terms–only interest is prediction and covariate balance

)(1

)(

xe

xeLog = β0 + βi X

32

Page 33: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

• Conditioning strategies–Matching–Weighting–Regression adjustment

33

Page 34: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

• Check of covariate balance– t-test (not recommended) or standardized difference–graphical assessment (e.g. Q-Q plot)

• Region of common support (distributional overlap)–graphical assessment (e.g. histograms)

34

Page 35: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score WorkflowLo

git P

rope

nsity

Sco

re

Trea

tmen

t Gro

up

Logit Propensity Score Control Group

Before Matching After Matching

35

Quantiles of both distributions are plotted against each other

Page 36: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

Before Matching After Matching

36

Page 37: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

37

Page 38: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

38

Page 39: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Selection Estimation Conditioning ModelChecks

Effect Estimation

Propensity Score Workflow

• Estimate of treatment effect–Mean difference–Standard error dependent on conditioning scheme

39

Page 40: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Applied Example

Braver, Thoemmes, Moser, & Baham (in progress)

• Can random invitation designs yield the same results as randomized controlled trials?

• Evaluation of a math treatment to teach rules of exponents – either administered as a randomized experiment or a random invitation design

• Currently in progress– Pilot data available

40

Page 41: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Design

Overall Sample

RCT RI - Treatment RI - Control

RCT RI - T RI - C

RCT - T RCT - C

d*

= Attrition

41

d

Page 42: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Example

• Pretest• General attitude towards math• Altruism scale• Available time

• 19 covariates after forming factor scores

42

Page 43: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

43

Page 44: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

44

Page 45: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Page 46: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Results

46

RCT - T RCT - C

RI - T RI - C

Effect 95% CI Sample Size

.146* .077 - .215 176

Effect 95% CI Sample Size

.176* .111 - .242 193

Effect 95% CI Sample Size

.148* .064 - .231 122

Effect 95% CI Sample Size

.146* .094 - .198 193

Linear regression adjustment

Propensity score adjustment

Page 47: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Mechanics of propensity score analysis

• Can all of this be done in SPSS / SAS?– Only parts of the analysis can be performed– Mostly based on self-written macros

• R packages MatchIt and PSAgraphics offer best solutions– Some experience / learning of R required– Packages automate most of the analysis

47

Page 48: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estimation of propensity score

• Put covariates in model that are– Theoretically important confounders– Signifcantly related with treatment selection (unbalanced)

• Iterative process of including covariates and potentially higher order terms (interactions, polynomials)

48

Page 49: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estimation of propensity score

• Estimate PS

– Logistic regression

– Generlized additive model

– Classification tree / Regression tree / Recursive Partioning

49

Page 50: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estiamtion of propensity score

• Generalized additive model– Instead of regular regression weight, a smoother is applied

Graphic from SAS PROC GAM

Imagine lowesssmoother for eachcoefficient

50

Page 51: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estimation of propensity score

• Regression tree• Splits sample at

predictors that maximally seperate groups

• Final nodes are balanced on all variables

Picture taken from XLMiner 51

Page 52: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estimation choices

• GAMs are useful because they model non-linear relationships automatically

• Regression trees automatically detect non-linear relationships and interactions

• Little research on performance of these methods

52

Page 53: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Estimation choices

• Shadish et al. report that single regression trees do not outperform regular logistic regression

• Regression trees need to be pruned – unclear what kind of pruning achieves good overall balance in dataset

• Work on boosted regression trees (McCaffrey et al.)

53

Page 54: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Conditioning

• Choices are:– Matching (in many different variations)– Stratification / Subclassification– Weighting (done outside MatchIt)

54

Page 55: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Conditioning

• Stratification– straightforward method to classify sample into strata

based on estimated PS– in each strata covariates should be approximately balanced– number of strata can be varied – Cochran suggested 5

strata to remove 90% bias of a single confounding covariate

– Stratification is easy to implement– Residual bias can occur

55

Page 56: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Conditioning

• Matching– Exact matching– Nearest Neighbor matching– Optimal matching– Full matching

– other matching algorithms possible

56

Page 57: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Matching

• Exact matching

Only units that areidentical are matchedwith each other

Control Treated

.124 .226

.126 .289

.211 .365

.389 .389

.415 .517

.566 .656

.656 .789

.733 .856

.821 .997

57

Page 58: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Matching

• Nearest Neighbor

Control Treated

.124 .226

.126 .289

.211 .365

.389 .389

.415 .517

.566 .656

.656 .789

.733 .856

.821 .997

• Match units that are approximately the same

• order• caliper• replacement• ratio

58

Page 59: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Matching• order• start at largest, smallest, or random• matched unit will not be matched to another unit• local minimum

• caliper• define the maximum distance between two neighbors, .1 of

a standard deviation of the PS

• replacement• unit can be recycled to be matched, weights necessary

• ratio• one treated unit can be matched to more than one control

(e.g., 1:2, 1:3) 59

Page 60: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Matching

• Optimal Matching

Control Treated

.124 .226

.126 .289

.211 .365

.389 .389

.415 .517

.566 .656

.656 .789

.733 .856

.821 .997

• Match units that are approximately the same, but allow matches to be broken, if better match is possible

• Find global minimum• ratio

60

Page 61: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Matching

• Full Matching

Control Treated

.124 .226

.126 .289

.211 .365

.389 .389

.415 .517

.566 .656

.656 .789

.733 .856

.821 .997

• Allows in some regions of the propensity score to match several controls to one treated and in other regions to match several treated to one control

• Useful if distributions are highly imbalanced

61

Page 62: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Summary of matching choices

• Methods offer tradeoffs between bias and variance

• Incomplete vs. Inexact matching

• Exact matching has (in theory) no bias, but can have large variance due to diminished sample size

• Less exact matching methods may have small residual bias, but less variance due to inclusion of more subjects

62

Page 63: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Other issues

• Discarding units prior to matching

• Usually not necessary if caliper is definded

• Excluding only one side changes quantity of interest

63

Page 64: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Overlap

Graphic from T.Love, ASA Workshop 64

Page 65: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Annotated R code (reference)library(foreign)

library(MatchIt)

library(PSAgraphics)

#read in dataset using Rcmdr#

psa <- read.spss("C:/Users/fthoemmes/Desktop/ps workshop/testdatapsa.sav",

use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)

#prima facie effect#

pf <- lm(y~z, data=psa)

summary(pf)

##matching#

##model to be used to predict z#

#(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16,

##type of matching#

#method= "nearest",

##discard option#

#discard="both",

##caliper option#

#caliper = .2,

##dataset#

#data = psa)

#same code in one single line#

match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method= "nearest", discard="none", caliper = .1, data = psa)

#summary of the matched sample#

summary(match1)

plot(match1,type="QQ")

plot(match1,type="hist")

plot(match1,type="jitter")

#additional plot of standardized differences

smatch1<-summary(match1,standardize=TRUE)

plot(smatch1)

#write out data

dmatch1 <-match.data(match1)

#additional graphics#

#put variables in objects

continuous<-dmatch1$X1

treatment<-dmatch1$z

#create strata from estimated PS

dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions',

labels=FALSE)

strata<-dmatch1$strata

#box plot comparing balance of variables across strata

box.psa(continuous, treatment, strata)

#treatment effect of matched sample

m1 <- lm(y~z, data=dmatch1)

summary(m1)

65

Page 66: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Annotated R output (reference)

> pf <- lm(y~z, data=psa)

> summary(pf)

Call:lm(formula = y ~ z, data = psa)

Residuals: Min 1Q Median 3Q Max -10.167 -1.660 0.106 1.658 8.143

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.635 0.130 -4.88 1.3e-06 ***z 1.476 0.197 7.49 2.3e-13 ***

66

• OLS regression

• Outcome regressed on treatment

• Prima facie treatment effect• 1.476, p < .01

Page 67: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Annotated R output (reference)Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Maxdistance 0.527 0.364 0.184 0.163 0.177 0.164 0.193X1 0.483 -0.496 1.636 0.979 0.990 0.989 1.427X2 0.234 -0.204 1.764 0.438 0.460 0.450 0.798X3 -0.042 -0.035 1.413 -0.007 0.131 0.182 1.690X4 0.106 -0.093 1.413 0.199 0.202 0.209 0.565X5 0.171 0.178 1.976 -0.007 0.099 0.135 2.982X6 -0.038 -0.177 1.215 0.139 0.145 0.154 0.705X7 0.342 -0.013 2.138 0.355 0.326 0.370 2.013X8 0.056 -0.134 1.286 0.190 0.188 0.205 0.789X9 0.197 -0.286 1.512 0.483 0.516 0.504 1.434X10 0.365 -0.185 1.629 0.550 0.487 0.562 1.229

Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Maxdistance 0.472 0.463 0.163 0.009 0.008 0.010 0.020X1 0.239 0.104 1.516 0.135 0.123 0.149 0.577X2 0.087 0.061 1.763 0.027 0.142 0.155 0.582X3 -0.146 -0.053 1.444 -0.093 0.188 0.215 1.927X4 0.050 0.059 1.378 -0.009 0.096 0.109 1.120X5 0.154 0.189 1.972 -0.035 0.123 0.181 2.982X6 -0.101 -0.105 1.243 0.004 0.079 0.098 0.726X7 0.162 0.231 2.164 -0.069 0.124 0.222 1.610X8 -0.019 0.020 1.338 -0.040 0.109 0.124 0.789X9 0.013 -0.017 1.464 0.030 0.178 0.191 1.670X10 0.138 0.198 1.496 -0.060 0.128 0.142 0.684

67

• Balance on all covariates

• Unmatched Sample and matched sample

• Means, SD, Mean Diff, differences based on Q-Q plot (mean, median, max)

Page 68: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Annotated R output (reference)

Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Maxdistance 94.49 95.25 94.13 89.747X1 86.20 87.53 84.89 59.582X2 93.90 69.12 65.58 27.080X3 -1324.83 -43.21 -18.15 -14.002X4 95.29 52.59 47.77 -98.143X5 -409.06 -24.54 -33.84 0.000X6 97.43 45.17 36.23 -2.878X7 80.66 61.94 40.07 20.052X8 79.17 42.15 39.37 0.000X9 93.78 65.39 62.07 -16.423X10 89.17 73.63 74.71 44.328

Sample sizes: Control TreatedAll 368 283Matched 217 217Unmatched 151 66Discarded 0 0

68

• Percent balance improvment

• Measure of balance

• Sample sizes before and after matching

Page 69: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Graphics

• Q-Q plot for each variable to check balance• Straight line following 45° diagonal is desired• plot(match1,type="QQ")

69

Page 70: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Graphics

• Histograms to examine distribution of propensity score

• Split by treatment and control

• Before and after matching

• plot(match1,type="hist")

70

Page 71: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Graphics

• Jittered dotplot

• Shows regions of propensity score that were matched

• plot(match1,type="jitter")

71

Page 72: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Graphics

• Plot of standardized differences pre- and post-matching

smatch1<-summary(match1,standardize=TRUE)

plot(smatch1)

72

Page 73: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Graphics

#write out datadmatch1 <-match.data(match1)

#additional graphics##put variables in objectscontinuous<-dmatch1$X1treatment<-dmatch1$z#create strata from estimated PSdmatch1$strata <-

bin.var(dmatch1$distance, bins=5, method='proportions',

labels=FALSE)strata<-dmatch1$strata#box plot comparing balance of

variables across stratabox.psa(continuous, treatment, strata)

73

Page 74: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Annotated R output (reference)

Call:lm(formula = y ~ z, data = dmatch1)

Residuals: Min 1Q Median 3Q Max -9.868 -1.742 0.052 1.630 6.956

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.246 0.163 -1.51 0.13117 z 0.789 0.230 3.42 0.00068 ***

Treatment effect after matching now .789, p < .05

Treatment effect still in same direction but greatly diminished

74

Page 75: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Advantages of Propensity Scores

• Collapses multivariate problem into single

dimensional problem

• No stringent assumptions about functional form

• Model checks allow easy assessment of balance

• Clearly defined region of common support

(no extrapolation)

75

Page 76: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Limitations

• Unmeasured covariates can still bias effect estimates

• Propensity score function can be challenging to estimate

• If assumptions of ANCOVA are fully met, propensity scores offer little gain

76

Page 77: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Propensity scores

• Method is another tool for applied researchers to adjust for confounding influences

• Propensity scores have some advantages and disadvantages over traditional regression adjustment

• In applied context, choice of confounding variables and reliabilty of measurement will be more critical than choice of adjustment method!

77

Page 78: Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Thank you

Summer Statistics Workshop 2010Felix Thoemmes

Texas A&M University